Researchers at the University of Washington have created a prototype system they are calling ‘Spatial Speech Translation’ that can translate languages in real time. The headphone component of the system, which builds upon a previous spatial-listening project the researchers were working on, was created entirely of off-the-shelf noise canceling headphones fitted with microphones. The headphones track voices based on which direction the user is facing, as to not be overwhelmed by multiple voices speaking simultaneously. Audio picked up by the headphones is then fed into a mobile device that can run neural networks–the team used a laptop powered by Apple’s M2 silicon chip for these tests. Language translation is handled in real-time by the network, and can be fed back to the user in as little as 1-2 seconds. Most testers preferred a delay of 3-4 seconds, which improved translation accuracy.
For the researchers’ paper, the network had been trained in Spanish, French, and German, but the network is said to have the capacity to be trained in up to 100 languages.
“This is a step toward breaking down the language barriers between cultures,” says Tuochao Chen, doctoral student and co-author of the research. “So if I’m walking down the street in Mexico, even though I don’t speak Spanish, I can translate all the people’s voices and know who said what.”