Date:

Google DeepMind’s new AI models help robots perform physical tasks, even without training

Google DeepMind’s Latest AI Models Revolutionize Robotics

Introducing Gemini Robotics: A Vision-Language-Action Model

Google DeepMind has launched two new AI models designed to empower robots to perform a wider range of real-world tasks than ever before. The first model, called Gemini Robotics, is a vision-language-action model capable of understanding new situations, even if it hasn’t been trained on them.

How Gemini Robotics Works

Gemini Robotics is built on Gemini 2.0, the latest version of Google’s flagship AI model. According to Carolina Parada, senior director and head of robotics at Google DeepMind, Gemini Robotics "draws from Gemini’s multimodal world understanding and transfers it to the real world by adding physical actions as a new modality."

Advancements in Three Key Areas

The new model makes significant advancements in three key areas that Google DeepMind believes are essential to building helpful robots: generality, interactivity, and dexterity. In addition to its ability to generalize new scenarios, Gemini Robotics is better at interacting with people and their environment. It’s also capable of performing more precise physical tasks, such as folding a piece of paper or removing a bottle cap.

Gemini Robotics-ER: An Advanced Visual Language Model

Google DeepMind is also launching Gemini Robotics-ER (or embodied reasoning), an advanced visual language model that can "understand our complex and dynamic world." As Parada explains, this model is designed to enable roboticists to connect with existing low-level controllers, allowing them to enable new capabilities powered by Gemini Robotics-ER.

Safety Considerations

In terms of safety, Google DeepMind is developing a "layered-approach" to ensure the safe operation of its AI models. According to researcher Vikas Sindhwani, the company is training Gemini Robotics-ER models to evaluate whether or not a potential action is safe to perform in a given scenario. The company is also releasing new benchmarks and frameworks to help further safety research in the AI industry.

Conclusion

Google DeepMind’s latest AI models have the potential to revolutionize the field of robotics. With Gemini Robotics and Gemini Robotics-ER, the company is pushing the boundaries of what is possible with AI-powered robots. As Parada notes, "We’re very focused on building the intelligence that is going to be able to understand the physical world and be able to act on that physical world. We’re very excited to basically leverage this across multiple embodiments and many applications for us."

Frequently Asked Questions

Q: What is Gemini Robotics?
A: Gemini Robotics is a vision-language-action model that can understand new situations, even if it hasn’t been trained on them.

Q: How does Gemini Robotics work?
A: Gemini Robotics is built on Gemini 2.0, the latest version of Google’s flagship AI model, and adds physical actions as a new modality.

Q: What are the key advancements in Gemini Robotics?
A: Gemini Robotics makes significant advancements in generality, interactivity, and dexterity, enabling robots to perform a wider range of real-world tasks.

Q: What is Gemini Robotics-ER?
A: Gemini Robotics-ER is an advanced visual language model that can "understand our complex and dynamic world" and enable roboticists to connect with existing low-level controllers.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here