Google DeepMind’s new AI models help robots perform physical tasks, even without training

Google DeepMind’s Latest AI Models Revolutionize Robotics

Introducing Gemini Robotics: A Vision-Language-Action Model

Google DeepMind has launched two new AI models designed to empower robots to perform a wider range of real-world tasks than ever before. The first model, called Gemini Robotics, is a vision-language-action model capable of understanding new situations, even if it hasn’t been trained on them.

How Gemini Robotics Works

Gemini Robotics is built on Gemini 2.0, the latest version of Google’s flagship AI model. According to Carolina Parada, senior director and head of robotics at Google DeepMind, Gemini Robotics "draws from Gemini’s multimodal world understanding and transfers it to the real world by adding physical actions as a new modality."

Advancements in Three Key Areas

The new model makes significant advancements in three key areas that Google DeepMind believes are essential to building helpful robots: generality, interactivity, and dexterity. In addition to its ability to generalize new scenarios, Gemini Robotics is better at interacting with people and their environment. It’s also capable of performing more precise physical tasks, such as folding a piece of paper or removing a bottle cap.

Gemini Robotics-ER: An Advanced Visual Language Model

Google DeepMind is also launching Gemini Robotics-ER (or embodied reasoning), an advanced visual language model that can "understand our complex and dynamic world." As Parada explains, this model is designed to enable roboticists to connect with existing low-level controllers, allowing them to enable new capabilities powered by Gemini Robotics-ER.

Safety Considerations

In terms of safety, Google DeepMind is developing a "layered-approach" to ensure the safe operation of its AI models. According to researcher Vikas Sindhwani, the company is training Gemini Robotics-ER models to evaluate whether or not a potential action is safe to perform in a given scenario. The company is also releasing new benchmarks and frameworks to help further safety research in the AI industry.

Conclusion

Google DeepMind’s latest AI models have the potential to revolutionize the field of robotics. With Gemini Robotics and Gemini Robotics-ER, the company is pushing the boundaries of what is possible with AI-powered robots. As Parada notes, "We’re very focused on building the intelligence that is going to be able to understand the physical world and be able to act on that physical world. We’re very excited to basically leverage this across multiple embodiments and many applications for us."

Frequently Asked Questions

Q: What is Gemini Robotics?
A: Gemini Robotics is a vision-language-action model that can understand new situations, even if it hasn’t been trained on them.

Q: How does Gemini Robotics work?
A: Gemini Robotics is built on Gemini 2.0, the latest version of Google’s flagship AI model, and adds physical actions as a new modality.

Q: What are the key advancements in Gemini Robotics?
A: Gemini Robotics makes significant advancements in generality, interactivity, and dexterity, enabling robots to perform a wider range of real-world tasks.

Q: What is Gemini Robotics-ER?
A: Gemini Robotics-ER is an advanced visual language model that can "understand our complex and dynamic world" and enable roboticists to connect with existing low-level controllers.

Post Views: 359

Google DeepMind’s new AI models help robots perform physical tasks, even without training

Generate single title from this title Why AI insurance underwriting is finally attracting institutional capital in 100 -150 characters. And it must return only...

Generate single title from this title A New AI Model Could Help Scientists Design New Forms of Life in 100 -150 characters. And it...

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Why AI insurance underwriting is finally attracting institutional capital in 100 -150 characters. And it must return only...

Generate single title from this title A New AI Model Could Help Scientists Design New Forms of Life in 100 -150 characters. And it...

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title Why AI insurance underwriting is finally attracting institutional capital in 100 -150 characters. And it must return only...

Generate single title from this title A New AI Model Could Help Scientists Design New Forms of Life in 100 -150 characters. And it...

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Categories

Useful Links

Our Newsletter