Google Unveils New AI Models for Controlling Robots with Delicate Precision
Introducing Gemini Robotics and Gemini Robotics-ER
On Wednesday, Google DeepMind announced two new AI models designed to control robots: Gemini Robotics and Gemini Robotics-ER. These models aim to help robots of all shapes and sizes understand and interact with the physical world more effectively and delicately, paving the way for applications such as humanoid robot assistants.
What is Embodied AI?
Creating a capable AI model that can pilot robots autonomously through novel scenarios with safety and precision has proven elusive. The industry calls this "embodied AI," a moonshot goal of Nvidia, and it remains a holy grail that could potentially turn robotics into general-use laborers in the physical world.
How Do the New AI Models Work?
Google’s new models build upon its Gemini 2.0 large language model foundation, adding capabilities specifically for robotic applications. Gemini Robotics includes what Google calls "vision-language-action" (VLA) abilities, allowing it to process visual information, understand language commands, and generate physical movements. Gemini Robotics-ER, on the other hand, focuses on "embodied reasoning" with enhanced spatial understanding, allowing roboticists to connect it to their existing robot control systems.
Real-World Applications
For example, with Gemini Robotics, you can ask a robot to "pick up the banana and put it in the basket," and it will use a camera view of the scene to recognize the banana, guiding a robotic arm to perform the action successfully. Or you might say, "fold an origami fox," and it will use its knowledge of origami and how to fold paper carefully to perform the task.
Enhanced Generalization
According to DeepMind, the new Gemini Robotics system demonstrates much stronger generalization, or the ability to perform novel tasks that it was not specifically trained to do, compared to its previous AI models. This is crucial because robots that can adapt to new scenarios without specific training could one day work in unpredictable real-world environments.
Safety and Limitations
For safety considerations, Google mentions a "layered, holistic approach" that maintains traditional robot safety measures like collision avoidance and force limitations. The company also describes developing a "Robot Constitution" framework inspired by Isaac Asimov’s Three Laws of Robotics and releasing a dataset called "ASIMOV" to help researchers evaluate safety implications of robotic actions.
Conclusion
Google’s new AI models have the potential to revolutionize the robotics industry by enabling robots to understand and interact with the physical world more effectively and delicately. With the ability to perform complex physical tasks and adapt to new scenarios, these models could lead to the development of more capable and useful robots. However, it remains to be seen how these systems will perform in unpredictable real-world settings.
Frequently Asked Questions
Q: What are the new AI models for?
A: Gemini Robotics and Gemini Robotics-ER are designed to control robots with delicate precision.
Q: What are the key features of the new AI models?
A: Gemini Robotics includes vision-language-action (VLA) abilities, while Gemini Robotics-ER focuses on embodied reasoning with enhanced spatial understanding.
Q: What are the potential applications of these AI models?
A: These models could lead to the development of humanoid robot assistants and other robots that can perform complex physical tasks.
Q: Are the new AI models safe?
A: Google mentions a "layered, holistic approach" to safety, which includes traditional robot safety measures and a "Robot Constitution" framework inspired by Isaac Asimov’s Three Laws of Robotics.

