Date:

Sesame Releases Base AI Model

Sesame Releases AI Voice Assistant Model for Commercial Use

Introduction
Sesame, an AI company, has released the base model that powers Maya, its impressively realistic voice assistant. The model, called CSM-1B, is 1 billion parameters in size and is under an Apache 2.0 license, making it available for commercial use with few restrictions.

How it Works
CSM-1B generates "RVQ audio codes" from text and audio inputs, using a technique called residual vector quantization (RVQ). RVQ is used in various AI audio technologies, including Google’s SoundStream and Meta’s Encodec. The model is based on a fine-tuned variant of Meta’s Llama family and an audio "decoder" component.

Open-Source Model
The model is open-sourced and can be used to produce a variety of voices. However, it has not been fine-tuned on any specific voice and may not perform well on non-English languages. Sesame did not disclose the data used to train the model.

Safeguards
The model has no real safeguards to prevent misuse. Sesame relies on an honor system, urging developers and users not to use the model to mimic a person’s voice without their consent, create misleading content, or engage in harmful or malicious activities.

Demo and Concerns
The author of the article tried the demo on Hugging Face and found it easy to generate speech, including on controversial topics. This raises concerns about the potential for misuse and fraud. Consumer Reports recently warned that many popular AI-powered voice cloning tools on the market do not have meaningful safeguards to prevent abuse.

Company Background
Sesame was co-founded by Brendan Iribe, co-creator of Oculus, and has raised an undisclosed amount of capital from Andreessen Horowitz, Spark Capital, and Matrix Partners. The company is prototyping AI glasses designed to be worn all day, equipped with its custom models.

Conclusion
The release of CSM-1B is significant, as it marks the first time a realistic voice assistant model has been made available for commercial use. However, the lack of safeguards and potential for misuse are concerns that need to be addressed.

FAQs

Q: What is CSM-1B?
A: CSM-1B is a 1 billion-parameters model that generates "RVQ audio codes" from text and audio inputs.

Q: What is RVQ?
A: RVQ is a technique for encoding audio into discrete tokens called codes.

Q: How does CSM-1B work?
A: CSM-1B uses a fine-tuned variant of Meta’s Llama family and an audio "decoder" component to generate RVQ audio codes.

Q: Is the model open-source?
A: Yes, the model is open-source and available for commercial use under an Apache 2.0 license.

Q: What safeguards does the model have?
A: The model has no real safeguards to prevent misuse. Sesame relies on an honor system to prevent misuse.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here