Training General-Purpose Robots
In the classic cartoon “The Jetsons,” Rosie the robotic maid seamlessly switches from vacuuming the house to cooking dinner to taking out the trash. But in real life, training a general-purpose robot remains a major challenge.
Traditional Training Methods
Typically, engineers collect data that are specific to a certain robot and task, which they use to train the robot in a controlled environment. However, gathering these data is costly and time-consuming, and the robot will likely struggle to adapt to environments or tasks it hasn’t seen before.
New Approach to Training Robots
To train better general-purpose robots, MIT researchers developed a versatile technique that combines a huge amount of heterogeneous data from many sources into one system that can teach any robot a wide range of tasks.
Heterogeneous Pretrained Transformers (HPT)
Their method involves aligning data from varied domains, like simulations and real robots, and multiple modalities, including vision sensors and robotic arm position encoders, into a shared “language” that a generative AI model can process.
Architecture
The researchers developed a new architecture called Heterogeneous Pretrained Transformers (HPT) that unifies data from these varied modalities and domains. They put a machine-learning model known as a transformer into the middle of their architecture, which processes vision and proprioception inputs.
Data Collection
The researchers built a massive dataset to pretrain the transformer, which included 52 datasets with more than 200,000 robot trajectories in four categories, including human demo videos and simulation.
Results
When they tested HPT, it improved robot performance by more than 20 percent on simulation and real-world tasks, compared with training from scratch each time. Even when the task was very different from the pretraining data, HPT still improved performance.
Conclusion
The researchers’ new approach to training robots could be faster and less expensive than traditional techniques because it requires far fewer task-specific data. This method has the potential to enable general-purpose robots that can adapt to a wide range of tasks and environments.
FAQs
Q: What is the goal of the researchers’ new approach to training robots?
A: The goal is to develop a versatile technique that combines a huge amount of heterogeneous data from many sources into one system that can teach any robot a wide range of tasks.
Q: What is Heterogeneous Pretrained Transformers (HPT)?
A: HPT is a new architecture that unifies data from varied modalities and domains, including vision sensors and robotic arm position encoders, into a shared “language” that a generative AI model can process.
Q: How does HPT improve robot performance?
A: HPT improves robot performance by more than 20 percent on simulation and real-world tasks, compared with training from scratch each time. Even when the task is very different from the pretraining data, HPT still improves performance.
Q: What are the potential applications of HPT?
A: The potential applications of HPT include enabling general-purpose robots that can adapt to a wide range of tasks and environments, and improving the efficiency and effectiveness of robot training.