NVIDIA has revealed its new generative AI audio model, called Fugatto, which it claims can create “sounds never heard before” from simple text inputs. In an announcement video, examples are shown that echo the showcases given by other models, like creating upbeat scores or alien-like soundscapes that can be used for film. However, a text input like “create a saxophone howling, barking then electronic music with dogs barking” yields results that showcase the range of creative freedom users can have when experimenting with Fugatto.
Examples that generate a human voice are also shown, from operatic scat-singing to changing the tone and emphasis of spoken words, Fugatto can even isolate the vocal mix from a piece of music. NVIDIA has also released a research paper along with its announcement, which details their intention to create a model that “reveal[s] meaningful relationships between audio and language.” The paper delves into how the model was created, and lists the extensive datasets that Fugatto was trained on.
Fugatto is not yet available for public testing, but NVIDIA has launched a website complete with more samples than are shown in the video above.
See also: NVIDIA ends Intel’s 25-year run on the Dow Jones