Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×

Microsoft VASA-1

For those who wish to participate in video calls without being on camera, Microsoft has already added customizable 3D avatars to Teams. The company isn’t stopping there, however. VASA-1 is a newly announced feature that Microsoft is working on which leverages an AI model to scan a single photo of a person, along with a clip of their speech, to generate a realistic talking avatar that resembles the user. “Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness,” says Microsoft. “Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512×512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.” Examples of VASA-1 in motion can be viewed on Microsoft’s post. Note that all of the “people” used as examples aren’t real either but are rather “non-existing identities generated by StyleGAN2 or DALL·E-3,” which is jarring in of itself.

Featured Articles

Close