Google’s Gemini Robotics AI Model Reaches the Physical World

Breaking Down the Barriers in Robotics: Google DeepMind’s New AI Model

A New Era in Robotics

In the world of science fiction, artificial intelligence often powers advanced, capable, and sometimes even homicidal robots. However, the reality is that today’s best AI is still limited to being confined to the chat window.

A Breakthrough in Fusion of Language, Vision, and Physical Action

Google DeepMind has announced a new version of its AI model, Gemini, which combines language, vision, and physical action to power a range of more capable, adaptive, and potentially useful robots. This breakthrough is a game-changer in the field of robotics.

Demonstrations and Capabilities

In a series of demonstration videos, Google DeepMind showcased several robots equipped with the new model, called Gemini Robotics, manipulating items in response to spoken commands. The robots were able to fold paper, hand over vegetables, put a pair of glasses into a case, and complete other tasks. The robots rely on the new model to connect visible items with possible actions to perform the tasks.

Gemini Robotics-ER: A New Model for Embodied Reasoning

Google DeepMind also announced a version of its model, called Gemini Robotics-ER, which focuses on visual and spatial understanding. This model is designed for other robot researchers to use, allowing them to train their own models for controlling robots’ actions.

Controlling Robots with Gemini Robotics-ER

In a video demonstration, Google DeepMind’s researchers used the model to control a humanoid robot called Apollo, from the startup Apptronik. The robot conversed with a human and moved letters around a tabletop when instructed to.

The Power of Generalized Understanding

"We’ve been able to bring the world-understanding—the general-concept understanding—of Gemini 2.0 to robotics," said Kanishka Rao, a robotics researcher at Google DeepMind who led the work. "Once the robot model has general-concept understanding, it becomes much more general and useful."

Breaking Down the Barriers

The new model is able to control different robots successfully in hundreds of specific scenarios not previously included in their training. This breakthrough has the potential to revolutionize the field of robotics.

Conclusion

Google DeepMind’s new AI model, Gemini, has the potential to change the landscape of robotics. By fusing language, vision, and physical action, this model has the ability to control robots in a more natural and intuitive way. With its generalized understanding, Gemini has the potential to overcome the limitations of current AI technology and bring about a new era in robotics.

FAQs

Q: What is the main advantage of Google DeepMind’s new AI model, Gemini?
A: The main advantage is its ability to fuse language, vision, and physical action, allowing it to control robots in a more natural and intuitive way.

Q: What is the purpose of Gemini Robotics-ER?
A: Gemini Robotics-ER is a version of the model that focuses on visual and spatial understanding, designed for other robot researchers to use and train their own models for controlling robots’ actions.

Q: What is the potential impact of Google DeepMind’s new AI model on the field of robotics?
A: The potential impact is significant, as this model has the ability to overcome the limitations of current AI technology and bring about a new era in robotics, with the potential to revolutionize the field.

Post Views: 57

Google’s Gemini Robotics AI Model Reaches the Physical World

SmartThings Blog

Generate single title from this title When AI means something different in every classroom in 100 -150 characters. And it must return only title...

It took 40 years for technology to catch up to this zipper design | MIT News

Generate single title from this title Organizing Agents’ memory at scale: Namespace design patterns in AgentCore Memory in 100 -150 characters. And it must...

Generate single title from this title 5 Use Cases to Boost ROI in 2026 in 100 -150 characters. And it must return only title...

SmartThings Blog

Generate single title from this title When AI means something different in every classroom in 100 -150 characters. And it must return only title...

It took 40 years for technology to catch up to this zipper design | MIT News

Generate single title from this title Organizing Agents’ memory at scale: Namespace design patterns in AgentCore Memory in 100 -150 characters. And it must...

Generate single title from this title 5 Use Cases to Boost ROI in 2026 in 100 -150 characters. And it must return only title...

Generate single title from this title IBM launches AI platform Bob to regulate SDLC costs in 100 -150 characters. And it must return only...

With a swipe of a magnet, microscopic “magno-bots” perform complex maneuvers | MIT News

Generate single title from this title When AI does the work, who does the learning? in 100 -150 characters. And it must return only...

LEAVE A REPLY Cancel reply

Latest

SmartThings Blog

Generate single title from this title When AI means something different in every classroom in 100 -150 characters. And it must return only title...

It took 40 years for technology to catch up to this zipper design | MIT News

Categories

Useful Links

Our Newsletter