Google’s AI Can Fold Delicate Origami

Google Unveils New AI Models for Controlling Robots with Delicate Precision

Introducing Gemini Robotics and Gemini Robotics-ER

On Wednesday, Google DeepMind announced two new AI models designed to control robots: Gemini Robotics and Gemini Robotics-ER. These models aim to help robots of all shapes and sizes understand and interact with the physical world more effectively and delicately, paving the way for applications such as humanoid robot assistants.

What is Embodied AI?

Creating a capable AI model that can pilot robots autonomously through novel scenarios with safety and precision has proven elusive. The industry calls this "embodied AI," a moonshot goal of Nvidia, and it remains a holy grail that could potentially turn robotics into general-use laborers in the physical world.

How Do the New AI Models Work?

Google’s new models build upon its Gemini 2.0 large language model foundation, adding capabilities specifically for robotic applications. Gemini Robotics includes what Google calls "vision-language-action" (VLA) abilities, allowing it to process visual information, understand language commands, and generate physical movements. Gemini Robotics-ER, on the other hand, focuses on "embodied reasoning" with enhanced spatial understanding, allowing roboticists to connect it to their existing robot control systems.

Real-World Applications

For example, with Gemini Robotics, you can ask a robot to "pick up the banana and put it in the basket," and it will use a camera view of the scene to recognize the banana, guiding a robotic arm to perform the action successfully. Or you might say, "fold an origami fox," and it will use its knowledge of origami and how to fold paper carefully to perform the task.

Enhanced Generalization

According to DeepMind, the new Gemini Robotics system demonstrates much stronger generalization, or the ability to perform novel tasks that it was not specifically trained to do, compared to its previous AI models. This is crucial because robots that can adapt to new scenarios without specific training could one day work in unpredictable real-world environments.

Safety and Limitations

For safety considerations, Google mentions a "layered, holistic approach" that maintains traditional robot safety measures like collision avoidance and force limitations. The company also describes developing a "Robot Constitution" framework inspired by Isaac Asimov’s Three Laws of Robotics and releasing a dataset called "ASIMOV" to help researchers evaluate safety implications of robotic actions.

Conclusion

Google’s new AI models have the potential to revolutionize the robotics industry by enabling robots to understand and interact with the physical world more effectively and delicately. With the ability to perform complex physical tasks and adapt to new scenarios, these models could lead to the development of more capable and useful robots. However, it remains to be seen how these systems will perform in unpredictable real-world settings.

Frequently Asked Questions

Q: What are the new AI models for?
A: Gemini Robotics and Gemini Robotics-ER are designed to control robots with delicate precision.

Q: What are the key features of the new AI models?
A: Gemini Robotics includes vision-language-action (VLA) abilities, while Gemini Robotics-ER focuses on embodied reasoning with enhanced spatial understanding.

Q: What are the potential applications of these AI models?
A: These models could lead to the development of humanoid robot assistants and other robots that can perform complex physical tasks.

Q: Are the new AI models safe?
A: Google mentions a "layered, holistic approach" to safety, which includes traditional robot safety measures and a "Robot Constitution" framework inspired by Isaac Asimov’s Three Laws of Robotics.

Post Views: 49

Google’s AI Can Fold Delicate Origami

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

Generate single title from this title Nvidia shares fall as blockbuster results fail to dazzle in 100 -150 characters. And it must return only...

Generate single title from this title It exposed what was already broken in 100 -150 characters. And it must return only title i dont...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Categories

Useful Links

Our Newsletter