Boosting AI Performance without Clean Labeled Data
The Challenge of Dirty Data
Jonathan Frankle, chief AI scientist at Databricks, has been talking to customers about the key challenges they face in getting AI to work reliably. The problem, Frankle says, is dirty data. "Everybody has some data, and has an idea of what they want to do," but the lack of clean data makes it challenging to fine-tune a model to perform a specific task.
The Solution: Test-time Adaptive Optimization (TAO)
Databricks’ model offers a rare look at some of the key tricks that engineers are using to improve the abilities of advanced AI models, especially when good data is hard to come by. The method leverages ideas that have helped produce advanced reasoning models by combining reinforcement learning, a way for AI models to improve through practice, with "synthetic," or AI-generated training data.
How it Works
The Databricks method exploits the fact that, given enough tries, even a weak model can score well on a given task or benchmark. Researchers call this method of boosting a model’s performance "best-of-N". Databricks trained a model to predict which best-of-N result human testers would prefer, based on examples. The Databricks reward model, or DBRM, can then be used to improve the performance of other models without the need for further labelled data.
The DBRM Process
DBRM is then used to select the best outputs from a given model. This creates synthetic training data for further fine-tuning the model so that it produces a better output first time. Databricks calls its new approach Test-time Adaptive Optimization (TAO). "This method we’re talking about uses some relatively lightweight reinforcement learning to basically bake the benefits of best-of-N into the model itself," Frankle says.
Scalability and Future Development
The research done by Databricks shows that the TAO method improves as it is scaled up to larger, more capable models. Reinforcement learning and synthetic data are already widely used but combining them in order to improve language models is a relatively new and technically challenging technique.
Conclusion
Databricks’ new approach to AI model development offers a promising solution to the problem of dirty data. By combining reinforcement learning with synthetic data, the company has developed a method that can boost the performance of AI models without the need for clean labelled data. This could have significant implications for businesses and organizations that struggle to fine-tune their AI models.
FAQs
Q: What is the main challenge in getting AI to work reliably?
A: The main challenge is dirty data, or the lack of clean data, which makes it difficult to fine-tune a model to perform a specific task.
Q: What is Databricks’ solution to this challenge?
A: Databricks’ solution is to combine reinforcement learning with synthetic data to boost the performance of AI models without the need for clean labelled data.
Q: How does the Databricks method work?
A: The method uses a model to predict which best-of-N result human testers would prefer, based on examples. The Databricks reward model, or DBRM, can then be used to improve the performance of other models without the need for further labelled data.
Q: How does the DBRM process work?
A: DBRM is used to select the best outputs from a given model, creating synthetic training data for further fine-tuning the model so that it produces a better output first time.

