Multimodal output opens up new possibilities
Having true multimodal output opens up interesting new possibilities in chatbots. For example, Gemini 2.0 Flash can play interactive graphical games or generate stories with consistent illustrations, maintaining character and setting continuity throughout multiple images. It’s far from perfect, but character consistency is a new capability in AI assistants. We tried it out and it was pretty wild—especially when it generated a view a photo we provided from another angle.
Creating a multi-image story with Gemini 2.0 Flash
Creating a multi-image story with Gemini 2.0 Flash, part 1.
Creating a multi-image story with Gemini 2.0 Flash, part 2. Notice the alternative angle of the original photo.

Creating a multi-image story with Gemini 2.0 Flash, part 2. Notice the alternative angle of the original photo.
Creating a multi-image story with Gemini 2.0 Flash, part 3.

Creating a multi-image story with Gemini 2.0 Flash, part 3.
Text rendering represents another potential strength of the model
Google claims that internal benchmarks show Gemini 2.0 Flash performs better than “leading competitive models” when generating images containing text, making it potentially suitable for creating content with integrated text. From our experience, the results weren’t that exciting, but they were legible.

An example of in-image text rendering generated with Gemini 2.0 Flash.
Conclusion
Despite Gemini 2.0 Flash’s shortcomings so far, the emergence of true multimodal image output feels like a notable moment in AI history because of what it suggests if the technology continues to improve. If you imagine a future, say 10 years from now, where a sufficiently complex AI model could generate any type of media in real time—text, images, audio, video, 3D graphics, 3D-printed physical objects, and interactive experiences—you basically have a holodeck, but without the matter replication.
Frequently Asked Questions
Q: What is Gemini 2.0 Flash?
A: Gemini 2.0 Flash is a new AI model that can generate multimodal images, including text, images, and interactive elements.
Q: What are the potential uses of Gemini 2.0 Flash?
A: The potential uses of Gemini 2.0 Flash include creating interactive stories, games, and other multimedia content.
Q: Is Gemini 2.0 Flash perfect?
A: No, Gemini 2.0 Flash is not perfect, and its output may not always be accurate or visually appealing.
Q: What is the future of Gemini 2.0 Flash?
A: The future of Gemini 2.0 Flash is uncertain, but it has the potential to be a powerful tool for generating multimodal content in the future.

