Date:

Fugatto: World’s Most Flexible Sound Machine

A team of generative AI researchers created a Swiss Army knife for sound, allowing users to control audio output using text. While some AI models can compose a song or modify a voice, none have the dexterity of the new offering.

# A Sound Grasp of Audio

The team wanted to create a model that understands and generates sound like humans do. Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties – capabilities that arise from the interaction of its various trained abilities – and the ability to combine free-form instructions.

# A Sample Playlist of Use Cases

For example, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices, and instruments. They could also add effects and enhance the overall audio quality of an existing track. An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers. Language learning tools could be personalized to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend. Video game developers could use the model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets on the fly from text instructions and optional audio inputs.

# Making a Joyful Noise

One of the model’s capabilities we’re especially proud of is what we call the avocado chair. For instance, Fugatto can make a trumpet bark or a saxophone meow. Whatever users can describe, the model can create. With fine-tuning and small amounts of singing data, researchers found it could handle tasks it was not pretrained on, like generating a high-quality singing voice from a text prompt.

# Users Get Artistic Controls

Several capabilities add to Fugatto’s novelty. During inference, the model uses a technique called ComposableART to combine instructions that were only seen separately during training. For example, a combination of prompts could ask for text spoken with a sad feeling in a French accent. The model’s ability to interpolate between instructions gives users fine-grained control over text instructions, in this case, the heaviness of the accent or the degree of sorrow.

# Conclusion

Fugatto is a groundbreaking AI model that has the potential to revolutionize the way we create and interact with sound. Its capabilities go beyond traditional music generation and voice modification, allowing users to create an almost endless range of sounds and effects. With Fugatto, the possibilities are endless, and the future of sound has never looked brighter.

# FAQs

Q: What is Fugatto?
A: Fugatto is a generative AI model that can generate or transform any mix of music, voices, and sounds described with prompts using any combination of text and audio files.

Q: What can Fugatto do?
A: Fugatto can create music snippets, remove or add instruments from an existing song, change the accent or emotion in a voice, and even let people produce sounds never heard before.

Q: Who is behind Fugatto?
A: A team of generative AI researchers at NVIDIA, including Rafael Valle, Ido Zmishlany, and Rohan Badlani, among others.

Q: How was Fugatto trained?
A: Fugatto was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs. The team employed a multifaceted strategy to generate data and instructions that expanded the range of tasks the model could perform.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here