Date:

Google’s Veo 2 video generator takes on Sora Turbo

Google Unveils Veo 2: A State-of-the-Art Text-to-Video Generator

Improving on Real-World Physics

Google has launched Veo 2, a text-to-video generator that boasts improvements from its previous model, including a better understanding of real-world physics. This enables the AI to produce videos with more detail and realism. The videos generated by Veo 2 can reach up to 4K resolution and can tackle common video generator challenges, including hallucinations such as extra fingers.

Outperforming Competitors

When evaluated by human raters against other leading video models, including Sora Turbo, Kiling v1.5, and Meta Movie Gen, Veo 2 was voted best on overall performance and prompt adherence.

Understanding Cinematography Language

Veo 2 also understands cinematography language, such as a specific genre, lens, or angle. For example, if a user says "shallow depth of field," Veo 2 knows to blur out the subject’s background to produce the effect. This is demonstrated in the video below, which was created with a shot that specifically said, "Shot with a 35mm lens on Kodak Portra 400 film."

Availability and Accessibility

The model is available to the public and can be accessed in VideoFX in Google Labs. The early access waitlist form asks for basic information such as age, name, place of residence, relevant work, and how you heard about it. Submissions are reviewed on a rolling basis.

Improvements to Imagen 3

Google also shared that it has improved its Imagen 3 image-generation model to generate "brighter and better composed" images. The improved model can generate more diverse styles and output images with higher prompt fidelity, richer details, and textures.

Availability of Imagen 3

This version of Imagen 3 is rolling out to the public via ImageFX in Google Labs starting today, and unlike VideoFX, it does not require a waitlist. The previous version of Imagen 3 was already very capable, ranking as the best AI image generator on ZDNET’s 2024 roundup.

Introducing Whisk

Lastly, Google unveiled Whisk, a new experiment that is also available in Labs. This tool allows users to create an image — or input their own — and transform it into a new image in the style of a plushie, pin, or sticker. It leverages Imagen 3 and Gemini, creating detailed captions for your image that are fed into Imagen 3 to create the final products.

Conclusion

Google’s latest advancements in AI-generated content, including Veo 2, Imagen 3, and Whisk, demonstrate the company’s commitment to pushing the boundaries of machine learning and computer vision. These tools have the potential to revolutionize the way we create and interact with visual content.

Frequently Asked Questions

Q: What is Veo 2?
A: Veo 2 is a text-to-video generator that uses real-world physics to produce high-quality videos.

Q: How does Veo 2 work?
A: Veo 2 uses a combination of natural language processing and computer vision to generate videos based on text prompts.

Q: Is Veo 2 available to the public?
A: Yes, Veo 2 is available in VideoFX in Google Labs, but it requires an early access waitlist.

Q: How do I access Imagen 3?
A: Imagen 3 is available in ImageFX in Google Labs, and it does not require a waitlist.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here