The Generative AI Revolution: Can We Overcome AI Hallucinations?
The generative AI revolution is remaking businesses’ relationship with computers and customers. Hundreds of billions of dollars are being invested in large language models (LLMs) and agentic AI, and trillions are at stake. But GenAI has a significant problem: the tendency of LLMs to hallucinate. The question is: Is this a fatal flaw, or can we work around it?
Countering AI Hallucinations
Users have come up with various methods to control or counteract hallucinations, or at least to counteract some of their negative impacts.
- For starters, you can get better data. AI models are only as good as the data they’re trained on. Many organizations have raised concerns about bias and the quality of their data. While there are no easy fixes to improving data quality, customers that dedicate resources to better data management and governance can make a difference.
- Users can also improve the quality of LLM response by providing better prompts. The field of prompt engineering has emerged to serve this need.
- Instead of using a general-purpose LLM, fine-tuning open source LLMs on smaller sets of domain- or industry-specific data can also improve accuracy within that domain or industry.
- Implementing guardrails is another technique. Some organizations use a second, specially crafted AI model to interpret the results of the primary LLM. When a hallucination is detected, it can tweak the input or the context until the results come back clean.
AI Hallucination Rates
When ChatGPT first came out, its hallucination rate was around 15% to 20%. The good news is the hallucination rate appears to be going down.
- For instance, Vectara’s Hallucination Leader Board uses the Hughes Hallucination Evaluation Model-which calculates the odds of an output being true or false on a range from 0 to 1. Vectara’s hallucination board currently shows several LLMs with hallucination rates below 1%, led by Google Gemini-2.0 Flash.
- However, other hallucinations measures don’t show quite the same improvement. The research arm of AIMultiple benchmarked nine LLMs on the capability to recall information from CNN articles. The top-scoring LLM was GPT-4.5 preview with a 15% hallucination rate. Google’s Gemini-2.0 Flash at 60%.
High-Stakes AI
One company working to make AI usable for some high-stakes use cases is the search company Pearl. The company combines an AI-powered search engine along with human expertise in professional services to minimize the odds that a hallucination will reach a user.
- Pearl has taken steps to minimize the hallucination rate in its AI-powered search engine, which Pearl CEO Andy Kurtzig said is 22% more accurate than ChatGPT and Gemini out of the box.
- The company does that by using the standard techniques, including multiple models and guardrails. Beyond that, Pearl has contracted with 12,000 experts in fields like medicine, law, auto repair, and pet health who can provide a quick sanity check on AI-generated answers to further drive the accuracy rate up.
Diminishing AI Returns
The CEO of Anthropic recently made headlines when he claimed that 90% of coding work would be done by AI within months. Kurtzig, who employs 300 developers, doesn’t see that happening anytime soon.
- The real productivity gains are somewhere between 10% and 20%, he said.
- The combination of reasoning models and AI agents is supposed to be heralding a new era of productivity, not to mention a 100x increase in inference workloads to occupy all those Nvidia GPUs, according to Nvidia CEO Jensen Huang. However, while reasoning models like DeepSeek can run more efficiently than Gemini or GPT-4.5, Kurtzig doesn’t see them increasing the state of the art.
Conclusion
The tendency of LLMs to hallucinate is a significant problem that the AI community is still trying to overcome. While there are some promising solutions and techniques being developed, it’s clear that there is still much work to be done.
FAQs
- What is hallucination in AI?
Hallucination refers to the phenomenon where AI models generate responses that are not based on the input data and are often incorrect or misleading. - How common is hallucination in AI?
Hallucination is a common problem in AI, especially in large language models (LLMs). According to Vectara’s Hallucination Leader Board, the hallucination rate can be as high as 15% to 20%. - Can we overcome hallucination in AI?
Yes, researchers and developers are working on various techniques to overcome hallucination in AI, including improving data quality, using better prompts, implementing guardrails, and fine-tuning models on specific domains or industries. - What are the consequences of hallucination in AI?
Hallucination can have serious consequences, including generating incorrect or misleading information, causing harm to individuals or organizations, and damaging the reputation of AI technology.