|Photo: John Boyd A spectrograph of the sound the car’s microphone picks up when the driver is speaking [left]. A system developed using machine learning lets through only the person’s voice [right].|
To better distinguish human speech from other sounds, the researchers are developing speech-enhancement systems that learn to exploit spectral and dynamic characteristics of human speech such as pitch and timber.
These systems employ machine-learning methods based on deep neural networks. (Facebook’s AI chief, Yann Le Cun, explained deep neural networks for us here.)These are trained to distinguish and suppress the noise and retain the clean speech using massive amounts of noise -contaminated speech data. The systems have millions of parameters that are optimized during training in order to reduce the difference between the output of the system and the original clean speech.
In order to reconstruct the clean speech, the neural networks construct special time-varying filters on the fly and apply them to the contaminated speech.
“The frequency contents of the speech and the noise can be intricately intermingled, and change abruptly,” says Le Roux. “Transient noises may last only tens of milliseconds, while speech changes from one phoneme to another every 100 to 200 milliseconds. So to effectively remove the noise, the filter needs to have a fine frequency resolution and be updated very rapidly.”
In tests, Le Roux says they were able to cancel out 96 percent of the ambient noise compared to just 78 percent achieved by conventional methods.
This technology fundamentally differs in approach and aim from active noise-cancellation methods such as those in anti-noise headphones, which try to physically remove ambient noise in a user’s environment. Examples of these methods applied in the car are Bose’s engine-noise cancellation and Harman’s road noise suppression.
Mitsubishi’s goal is to eliminate the noise picked up by the microphone while the user is speaking during telephone calls. Although active noise-cancellation methods could indirectly help with this problem by reducing noise in the cabin, Mitsubishi says they can only suppress low-frequency noise.
“We want to make the driver’s speech more clear and intelligible to the person on the other end of the call by cancelling as much noise as possible, not just low-frequency noise,” says Le Roux. “Our technology will also be useful for hands-free command and control situations, such as when using Apple’s Siri or Google’s Voice Search in smart phones, as well as in call centers that use speech recognition to handle common requests.”
Mitsubishi plans to launch the technology in 2018 in its line of automotive navigation and communication devices.