Fugatto: World’s Most Flexible Sound Machine

A team of generative AI researchers created a Swiss Army knife for sound, allowing users to control audio output using text. While some AI models can compose a song or modify a voice, none have the dexterity of the new offering.

# A Sound Grasp of Audio

The team wanted to create a model that understands and generates sound like humans do. Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties – capabilities that arise from the interaction of its various trained abilities – and the ability to combine free-form instructions.

# A Sample Playlist of Use Cases

For example, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices, and instruments. They could also add effects and enhance the overall audio quality of an existing track. An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers. Language learning tools could be personalized to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend. Video game developers could use the model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets on the fly from text instructions and optional audio inputs.

# Making a Joyful Noise

One of the model’s capabilities we’re especially proud of is what we call the avocado chair. For instance, Fugatto can make a trumpet bark or a saxophone meow. Whatever users can describe, the model can create. With fine-tuning and small amounts of singing data, researchers found it could handle tasks it was not pretrained on, like generating a high-quality singing voice from a text prompt.

# Users Get Artistic Controls

Several capabilities add to Fugatto’s novelty. During inference, the model uses a technique called ComposableART to combine instructions that were only seen separately during training. For example, a combination of prompts could ask for text spoken with a sad feeling in a French accent. The model’s ability to interpolate between instructions gives users fine-grained control over text instructions, in this case, the heaviness of the accent or the degree of sorrow.

# Conclusion

Fugatto is a groundbreaking AI model that has the potential to revolutionize the way we create and interact with sound. Its capabilities go beyond traditional music generation and voice modification, allowing users to create an almost endless range of sounds and effects. With Fugatto, the possibilities are endless, and the future of sound has never looked brighter.

# FAQs

Q: What is Fugatto?
A: Fugatto is a generative AI model that can generate or transform any mix of music, voices, and sounds described with prompts using any combination of text and audio files.

Q: What can Fugatto do?
A: Fugatto can create music snippets, remove or add instruments from an existing song, change the accent or emotion in a voice, and even let people produce sounds never heard before.

Q: Who is behind Fugatto?
A: A team of generative AI researchers at NVIDIA, including Rafael Valle, Ido Zmishlany, and Rohan Badlani, among others.

Q: How was Fugatto trained?
A: Fugatto was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs. The team employed a multifaceted strategy to generate data and instructions that expanded the range of tasks the model could perform.

Post Views: 67

Fugatto: World’s Most Flexible Sound Machine

Generate single title from this title AgentWatch: Proactive AWS monitoring with ambient agents in 100 -150 characters. And it must return only title i...

How is Technology Changing Recreational Boating?

When robots start to feel: HBK and Siléane bring tactile intelligence to high-speed cosmetics packaging

Generate single title from this title I tested a 4TB quantum-resistant USB drive – but you don’t have to spend $3000 for this much...

Generate single title from this title Data Science • AI • Advanced Analytics in 100 -150 characters. And it must return only title i...

Generate single title from this title AgentWatch: Proactive AWS monitoring with ambient agents in 100 -150 characters. And it must return only title i...

How is Technology Changing Recreational Boating?

When robots start to feel: HBK and Siléane bring tactile intelligence to high-speed cosmetics packaging

Generate single title from this title I tested a 4TB quantum-resistant USB drive – but you don’t have to spend $3000 for this much...

Generate single title from this title Data Science • AI • Advanced Analytics in 100 -150 characters. And it must return only title i...

Strider Robotics demonstrates 40 kg payload quadruped robot as commercial pilots begin

mimic Robotics unveils full-stack platform for dexterous robot manipulation

Aetina expands Nvidia Jetson Thor portfolio with T3000 and T2000 support

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title AgentWatch: Proactive AWS monitoring with ambient agents in 100 -150 characters. And it must return only title i...

How is Technology Changing Recreational Boating?

When robots start to feel: HBK and Siléane bring tactile intelligence to high-speed cosmetics packaging

Categories

Useful Links

Our Newsletter