A team of researchers at Cornell University led by assistant professor of information science Cheng Zhang, along with François Guimbretière, professor of information science, have developed a new approach to facial tracking: sonar. A new wearable earphone device, or “earable” as it has been coined, uses acoustic signals to map facial changes in real time. The team has named their device EarIO, and it works like this: a speaker on each side of the earphone sends out an acoustic pulse which bounces off the wearer’s cheeks. Whatever changes to facial structure have been made- a smile, a furled brow- affect the signal, which is then caught by a microphone. The echo is analyzed by a deep learning algorithm, which allows information about the entirety of the face to be inferred and displayed on a computer-generated model in real time.
Facial tracking devices are typically done with a camera-based system, including a previous effort by Zhang’s team. This is not ideal, as such systems are “large, heavy, and energy-hungry, which is a big issue for wearables. Also importantly, they capture a lot of private information,” said Zhang, principal investigator of the Smart Computer Interfaces for Future Interactions (SciFi) Lab. Going an acoustic route has other advantages as well- the information gathered can be communicated to a phone via Bluetooth, which adds additional privacy. Camera-based systems require a Wi-Fi connection to send the images that are gathered, which can be a security concern. They also use far more energy than EarIO’s method; the team’s previous effort, which used a camera, consumed 25 times more energy than the EarIO.
Current field tests of EarIO have shown that the device maintains a satisfactory degree of accuracy, even with winds, noisy roads, and background chatter. The team hopes to improve on this further, as stated by co-author Ruidong Zhang, an information science doctoral student, “The acoustic sensing method that we use is very sensitive. It’s good, because it’s able to track very subtle movements, but it’s also bad because when something changes in the environment, or when your head moves slightly, we also capture that.”