Google Expands AI Capabilities with Text-to-Video Generation
New Model, Veo, Now Available in Private Preview on Vertex AI
Despite being late to the image generation space, Google’s Imagen models have proven highly competitive, even powering ZDNET’s overall top pick for best image generator. Now, the company is expanding into text-to-video generation and making its model Veo available to Google Cloud customers.
What is Veo?
Veo is Google’s most advanced video generation model, capable of creating realistic videos that adhere to a user’s prompt with 24 or 30 fps. In the examples provided, the generations look impressive, tackling the issue of consistency between motions, which is a big challenge for video generators.
How Does Veo Work?
Along with text prompts, the model can also use reference images to create videos that bring pictures to life, remaining consistent in style. This is evident in the two examples below:
Examples of Veo in Action
What’s New in Vertex AI?
Vertex AI users will also have access to Imagen 3, the company’s most advanced text-to-image generator, which now has a customization feature that enables users to include a reference image, making it easier to create brand assets.
New Features in Imagen 3
Imagen 3 also adds a new editing feature that makes it easier for users to fine-tune images generated by inpainting aspects and outpainting or expanding the image further.
Potential Use Cases
According to Google, potential customer use cases include generating images or videos for marketing and advertisement purposes, such as social media content and assets for blogs and events, and even creating film clips.
Getting Started with Vertex AI
To get started with Vertex AI, visit the webpage, which contains many educational materials, including tutorials, a glossary, and tips. You can also start a free trial or contact the sales team for more information.
FAQs
Q: What is Veo?
A: Veo is Google’s most advanced video generation model, capable of creating realistic videos that adhere to a user’s prompt with 24 or 30 fps.
Q: How does Veo work?
A: Veo uses text prompts and reference images to create videos that bring pictures to life, remaining consistent in style.
Q: What is Imagen 3?
A: Imagen 3 is Google’s most advanced text-to-image generator, which now has a customization feature that enables users to include a reference image.
Q: What are the potential use cases for Veo and Imagen 3?
A: Potential customer use cases include generating images or videos for marketing and advertisement purposes, creating film clips, and more.

