Google’s Plan to Combine AI Models for a Universal Digital Assistant
DeepMind CEO’s Recent Podcast Appearance
In a recent appearance on Possible, a podcast co-hosted by LinkedIn co-founder Reid Hoffman, Google DeepMind CEO Demis Hassabis said Google plans to eventually combine its Gemini AI models with its Veo video-generating models to improve the former’s understanding of the physical world.
A Vision for a Universal Digital Assistant
“We’ve always built Gemini, our foundation model, to be multimodal from the beginning,” Hassabis said, “and the reason we did that [is because] we have a vision for this idea of a universal digital assistant, an assistant that … actually helps you in the real world.”
The Rise of Omni Models
AI Industry Trends
The AI industry is moving gradually toward “omni” models, if you will — models that can understand and synthesize many forms of media. Google’s newest Gemini models can generate audio as well as images and text, while OpenAI’s default model in ChatGPT can natively create images — including, of course, Studio Ghibli-style art. Amazon has also announced plans to launch an “any-to-any” model later this year.
Training Data for Omni Models
These omni models require a lot of training data — images, videos, audio, text, and so on. Hassabis implied that the video data for Veo is coming mostly from YouTube, a platform that Google owns.
YouTube as a Source of Training Data
“Basically, by watching YouTube videos — a lot of YouTube videos — [Veo 2] can figure out, you know, the physics of the world,” Hassabis said.
Conclusion
Google’s plan to combine its Gemini AI models with its Veo video-generating models is a significant step toward building a universal digital assistant. By leveraging training data from YouTube and other sources, Google aims to improve its models’ understanding of the physical world. As the AI industry continues to evolve, it will be interesting to see how these omni models shape the future of AI.
FAQs
Q: What are Gemini AI models?
A: Gemini AI models are multimodal models that can generate audio, images, and text.
Q: What is Veo?
A: Veo is a video-generating model developed by Google.
Q: Where does the training data for Veo come from?
A: The video data for Veo comes mostly from YouTube, a platform that Google owns.
Q: What are omni models?
A: Omni models are AI models that can understand and synthesize many forms of media, including images, videos, audio, and text.

