Databricks Has a Trick That Lets AI Models Improve Themselves

Boosting AI Performance without Clean Labeled Data

The Challenge of Dirty Data

Jonathan Frankle, chief AI scientist at Databricks, has been talking to customers about the key challenges they face in getting AI to work reliably. The problem, Frankle says, is dirty data. "Everybody has some data, and has an idea of what they want to do," but the lack of clean data makes it challenging to fine-tune a model to perform a specific task.

The Solution: Test-time Adaptive Optimization (TAO)

Databricks’ model offers a rare look at some of the key tricks that engineers are using to improve the abilities of advanced AI models, especially when good data is hard to come by. The method leverages ideas that have helped produce advanced reasoning models by combining reinforcement learning, a way for AI models to improve through practice, with "synthetic," or AI-generated training data.

How it Works

The Databricks method exploits the fact that, given enough tries, even a weak model can score well on a given task or benchmark. Researchers call this method of boosting a model’s performance "best-of-N". Databricks trained a model to predict which best-of-N result human testers would prefer, based on examples. The Databricks reward model, or DBRM, can then be used to improve the performance of other models without the need for further labelled data.

The DBRM Process

DBRM is then used to select the best outputs from a given model. This creates synthetic training data for further fine-tuning the model so that it produces a better output first time. Databricks calls its new approach Test-time Adaptive Optimization (TAO). "This method we’re talking about uses some relatively lightweight reinforcement learning to basically bake the benefits of best-of-N into the model itself," Frankle says.

Scalability and Future Development

The research done by Databricks shows that the TAO method improves as it is scaled up to larger, more capable models. Reinforcement learning and synthetic data are already widely used but combining them in order to improve language models is a relatively new and technically challenging technique.

Conclusion

Databricks’ new approach to AI model development offers a promising solution to the problem of dirty data. By combining reinforcement learning with synthetic data, the company has developed a method that can boost the performance of AI models without the need for clean labelled data. This could have significant implications for businesses and organizations that struggle to fine-tune their AI models.

FAQs

Q: What is the main challenge in getting AI to work reliably?
A: The main challenge is dirty data, or the lack of clean data, which makes it difficult to fine-tune a model to perform a specific task.

Q: What is Databricks’ solution to this challenge?
A: Databricks’ solution is to combine reinforcement learning with synthetic data to boost the performance of AI models without the need for clean labelled data.

Q: How does the Databricks method work?
A: The method uses a model to predict which best-of-N result human testers would prefer, based on examples. The Databricks reward model, or DBRM, can then be used to improve the performance of other models without the need for further labelled data.

Q: How does the DBRM process work?
A: DBRM is used to select the best outputs from a given model, creating synthetic training data for further fine-tuning the model so that it produces a better output first time.

Post Views: 40

Databricks Has a Trick That Lets AI Models Improve Themselves

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

Generate single title from this title Nvidia shares fall as blockbuster results fail to dazzle in 100 -150 characters. And it must return only...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs in 100 -150 characters. And it must...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Categories

Useful Links

Our Newsletter