Sesame Releases Base AI Model

Sesame Releases AI Voice Assistant Model for Commercial Use

Introduction
Sesame, an AI company, has released the base model that powers Maya, its impressively realistic voice assistant. The model, called CSM-1B, is 1 billion parameters in size and is under an Apache 2.0 license, making it available for commercial use with few restrictions.

How it Works
CSM-1B generates "RVQ audio codes" from text and audio inputs, using a technique called residual vector quantization (RVQ). RVQ is used in various AI audio technologies, including Google’s SoundStream and Meta’s Encodec. The model is based on a fine-tuned variant of Meta’s Llama family and an audio "decoder" component.

Open-Source Model
The model is open-sourced and can be used to produce a variety of voices. However, it has not been fine-tuned on any specific voice and may not perform well on non-English languages. Sesame did not disclose the data used to train the model.

Safeguards
The model has no real safeguards to prevent misuse. Sesame relies on an honor system, urging developers and users not to use the model to mimic a person’s voice without their consent, create misleading content, or engage in harmful or malicious activities.

Demo and Concerns
The author of the article tried the demo on Hugging Face and found it easy to generate speech, including on controversial topics. This raises concerns about the potential for misuse and fraud. Consumer Reports recently warned that many popular AI-powered voice cloning tools on the market do not have meaningful safeguards to prevent abuse.

Company Background
Sesame was co-founded by Brendan Iribe, co-creator of Oculus, and has raised an undisclosed amount of capital from Andreessen Horowitz, Spark Capital, and Matrix Partners. The company is prototyping AI glasses designed to be worn all day, equipped with its custom models.

Conclusion
The release of CSM-1B is significant, as it marks the first time a realistic voice assistant model has been made available for commercial use. However, the lack of safeguards and potential for misuse are concerns that need to be addressed.

FAQs

Q: What is CSM-1B?
A: CSM-1B is a 1 billion-parameters model that generates "RVQ audio codes" from text and audio inputs.

Q: What is RVQ?
A: RVQ is a technique for encoding audio into discrete tokens called codes.

Q: How does CSM-1B work?
A: CSM-1B uses a fine-tuned variant of Meta’s Llama family and an audio "decoder" component to generate RVQ audio codes.

Q: Is the model open-source?
A: Yes, the model is open-source and available for commercial use under an Apache 2.0 license.

Q: What safeguards does the model have?
A: The model has no real safeguards to prevent misuse. Sesame relies on an honor system to prevent misuse.

Post Views: 38

Sesame Releases Base AI Model

Generate single title from this title Why AI insurance underwriting is finally attracting institutional capital in 100 -150 characters. And it must return only...

Generate single title from this title A New AI Model Could Help Scientists Design New Forms of Life in 100 -150 characters. And it...

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Why AI insurance underwriting is finally attracting institutional capital in 100 -150 characters. And it must return only...

Generate single title from this title A New AI Model Could Help Scientists Design New Forms of Life in 100 -150 characters. And it...

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title Why AI insurance underwriting is finally attracting institutional capital in 100 -150 characters. And it must return only...

Generate single title from this title A New AI Model Could Help Scientists Design New Forms of Life in 100 -150 characters. And it...

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Categories

Useful Links

Our Newsletter