Text-to-Speech AI Models Get a Boost with Hume’s Octave
Octave: A Text-to-Speech Large Language Model with Contextual Awareness
On Wednesday, Hume launched Octave, a text-to-speech large language model (LLM) with contextual awareness. The LLM can use this awareness to adjust its tone, rhythm, and timbre of speech to the words it is reading based on their meaning, according to the company. For example, an AI-enabled voice can convey a sense of disgust when reading a sentence.
Understanding the Context of the Text
Beyond understanding the context of the text, the model can also take directions. Users can instruct it to be "calm", "whispering", "disgustful", "angry", and more. Hume says the advantage Octave has over a voice actor is that it can take on any voice or even invent a new one based on the user description.
Testing the Model
The user interface is easy to navigate, with one text box for Voice, in which you can describe exactly what you want the voice to sound like, and another for Script, in which you enter what you want the model to say. For my first test, I used the detailed pre-made prompts to see how it sounded.
Impressive Results
After clicking on "Generate", Octave generated three voice results, and upon first listen I was impressed. Although I wasn’t convinced that the generations captured the "valley girl" sound, I was super-impressed with the intonations and inflections.
Conclusion
Overall, it seems like the model’s strength is placing the nuances of human speech in its output. What often gives AI voices away is their monotony, making the output sound quite boring to listen to. With Octave, you could hear the reader’s emotions, whether frustration, defeat, or tiredness. Words like "ugh" have the exact length and breathing a human would use, creating an engaging experience.
How to Access
There are different tiers for accessing the model, including a free one with a 10,000-character limit (around 10 minutes) and unlimited character voices if you want to try it out. Beyond the free tier, there are six additional tiers, ranging from $3 to $900 per month, depending on access needs.
Frequently Asked Questions
Q: What is the difference between Octave and a voice actor?
A: Octave can take on any voice or even invent a new one based on the user description, whereas a voice actor is a human.
Q: What is the character limit for the free tier?
A: The free tier has a 10,000-character limit (around 10 minutes).
Q: How much does the Business tier cost?
A: The Business tier costs $900 per month for 10,000,000 characters (around 10,000 minutes).

