Employing Tricks During Inference to Improve AI Accuracy
One of the big trends in artificial intelligence in the past year has been the employment of various tricks during inference — the act of making predictions — to dramatically improve the accuracy of those predictions.
Chain-of-Thought and Breakthroughs in Accuracy
For example, chain-of-thought — having a large language model (LLM) spell out the logic of an answer in a series of statements — can lead to increased accuracy on benchmark tests. Such "thinking" has apparently led to breakthroughs in accuracy on abstract tests of problem-solving, such as OpenAI’s GPTo3’s high score last month on the ARC-AGI test.
LLMs Fall Short on Practical Tests
However, it turns out that LLMs still fall short on very practical tests, something as simple as planning a trip. Google DeepMind researchers, led by Kuang-Huei Lee, pointed out in a report last week that Google’s Gemini and OpenAI’s GPTo1, the companies’ best respective models, fail miserably when tested on TravelPlanner, a benchmark test introduced last year by scholars at Fudon University, Penn State, and Meta AI.
Introducing Mind Evolution
Given the weak results of top models, Lee and team propose an advance beyond chain-of-thought and similar approaches that they say is dramatically more accurate on tests such as TravelPlanner. Called "mind evolution," the new approach is a form of searching through possible answers — but with a twist.
How Mind Evolution Works
The authors adopt a genetically inspired algorithm that induces an LLM, such as Gemini 1.5 Flash, to generate multiple answers to a prompt, which are then evaluated for which is most "fit" to answer the question. In the real world, evolution happens via natural selection, where entities are evaluated for "fitness" in their environment. The most fit combine to produce offspring, and occasionally there are beneficial genetic mutations.
Evaluating AI Model’s Multiple Answers
The point of such an evolutionary approach is that it’s hard to find good solutions in one stroke, but it’s relatively easy to weed out the bad ones and try again. As they write, "This approach exploits the observation that it is often easier to evaluate the quality of a candidate solution than it is to generate good solutions for a given problem."
Author-Critic Dialogue
The key is how best to evaluate the AI model’s multiple answers. To do so, the authors fall back on a well-established prompting strategy. Instead of just chain-of-thought, they have the model conduct a dialogue of sorts. The LLM is prompted to portray two personas in dialogue, one of which is a critic, and the other, an author. The author proposes solutions, such as a travel plan, and the critic points out where there are flaws.
Results
The Gemini 1.5 Flash is tested on multiple planning benchmarks. On TravelPlanner, Gemini with the mind evolution approach soars above the typical 5.6% success rate to reach 95.2%, they relate. And, when they use the more powerful Gemini Pro model, it’s nearly perfect, 99.9%.
Conclusion
The results show "a clear advantage of an evolutionary strategy" combining both a search for possible solutions very broadly speaking, and also using the language model to refine those solutions with the author-critic roles.
FAQs
Q: What is mind evolution?
A: Mind evolution is a form of searching through possible answers using a genetically inspired algorithm that induces an LLM to generate multiple answers to a prompt, which are then evaluated for which is most "fit" to answer the question.
Q: How does mind evolution work?
A: Mind evolution works by inducing an LLM to generate multiple answers to a prompt, which are then evaluated for which is most "fit" to answer the question. The process is repeated, and the LLM is forced to modify its output to be better, a kind of recombination and mutation as seen in natural selection.
Q: What are the benefits of mind evolution?
A: The benefits of mind evolution include improved accuracy on practical tests, such as planning a trip, and the ability to evaluate the quality of a candidate solution more easily than generating good solutions for a given problem.
Q: What are the limitations of mind evolution?
A: The limitations of mind evolution include the need for more computing power and the potential for the approach to be computationally expensive.