Teams: Microsoft Improves Audio and Video Capabilities via Machine Learning
Among other things, users get echo cancellation. It differentiates between a spoken voice and a voice played back via a speaker.
Microsoft has announced new features for Teams that are designed to improve audio and video capabilities. Among them is echo cancellation, which many users have requested and which the company is implementing using machine learning.
Echo cancellation is part of Microsoft’s efforts to filter unwanted background noise to improve audio quality. Echo cancellation is aimed primarily at users who deploy teams in rooms with poor acoustics. However, it also allows users to speak and listen simultaneously without interruptions.
Machine learning is designed to detect the differences between a user’s voice and what is played back through a speaker, according to the company. This helps especially in situations where the distance between the microphone and speakers is too small, resulting in a loop between audio input and output. According to Microsoft, echo cancellation should not limit the possibility of multiple people speaking at the same time.
No user data to teach ML models
Teams will also use a machine learning model to convert recorded audio to sound. This should reduce reverberation and prevent people from sounding like they are talking in a cave. Teams will also use a model trained with 30,000 hours of speech samples to improve “interruptibility” in natural conversations.
Microsoft emphasized that no customer data was used to train the new models. ” We used instead either publicly available data or crowdsourcing to gather specific scenarios. We also made sure we had a balance of female and male speakers and 74 different languages,” Microsoft wrote in a blog post.
Echo cancellation, improved interruptibility and reverb reduction will initially roll out to Teams users on Windows and Mac devices. The new audio features should also be available for mobile platforms in the future.