Reducing Hallucinations in LLM Agents with a Verified Semantic Cache using Amazon Bedrock Knowledge Bases

Reducing Hallucinations in Large Language Models

Large language models (LLMs) excel at generating human-like text but face a critical challenge: hallucination—producing responses that sound convincing but are factually incorrect. While these models are trained on vast amounts of generic data, they often lack the organization-specific context and up-to-date information needed for accurate responses in business settings.

Solution Overview

Our solution implements a verified semantic cache using the Amazon Bedrock Knowledge Bases Retrieve API to reduce hallucinations in LLM responses while simultaneously improving latency and reducing costs. This read-only semantic cache acts as an intelligent intermediary layer between the user and Amazon Bedrock Agents, storing curated and verified question-answer pairs.

Solution Architecture

The solution architecture in the preceding figure consists of the following components and workflow. Let’s assume that the question “What date will AWS re:invent 2024 occur?” is within the verified semantic cache. The corresponding answer is also input as “AWS re:Invent 2024 takes place on December 2–6, 2024.” Let’s walkthrough an example of how this solution would handle a user’s question.

Step-by-Step Guide

1. Query processing:
a. User submits a question “When is re:Invent happening this year?”, which is received by the Invoke Agent function.
b. The function checks the semantic cache (Amazon Bedrock Knowledge Bases) using the Retrieve API.
c. Amazon Bedrock Knowledge Bases performs a semantic search and finds a similar question with an 85% similarity score.
2. Response paths:
a. Strong match (similarity score greater than 80%):
i. Invoke Agent function returns exactly the verified answer “AWS re:Invent 2024 takes place on December 2–6, 2024” directly from the Amazon Bedrock knowledge base, providing a deterministic response.
ii. No LLM invocation needed, response in less than 1 second.
b. Partial match (similarity score 60–80%):
i. Invoke Agent function returns the verified answer directly from the Amazon Bedrock knowledge base, providing a near-instant response.
c. No match (similarity score less than 60%):
i. Fall back to standard LLM agent processing, making sure user questions receive appropriate responses.

Benefits

* Reduced costs: By minimizing unnecessary LLM invocations for frequently answered questions, the solution significantly reduces operational costs at scale.
* Improved accuracy: Curated and verified answers minimize the possibility of hallucinations for known user queries, while few-shot prompting enhances accuracy for similar questions.
* Lower latency: Direct retrieval of cached answers provides near-instant responses for known queries, improving the overall user experience.

Production Readiness Considerations

Before deploying this solution in production, address these key considerations:

* Similarity threshold optimization: Experiment with different thresholds to balance cache hit rates and accuracy. This directly impacts the solution’s effectiveness in preventing hallucinations while maintaining relevance.
* Feedback loop implementation: Create a mechanism to continuously update the verified cache with new, accurate responses. This helps prevent cache staleness and maintains the solution’s integrity as a source of truth for the LLM.
* Cache management and update strategy: Regularly refresh the semantic cache with current, frequently asked questions to maintain relevance and improve hit rates. Implement a systematic process for reviewing, validating, and incorporating new entries to help ensure cache quality and alignment with evolving user needs.
* Ongoing tuning: Adjust similarity thresholds as your dataset evolves. Treat the semantic cache as a dynamic component, requiring continuous optimization for your specific use case.

Conclusion

This verified semantic cache approach offers a powerful solution to reduce hallucinations in LLM responses while improving latency and reducing costs. By using Amazon Bedrock Knowledge Bases, you can implement a solution that can efficiently serve curated and verified answers, guide LLM responses with few-shot examples, and gracefully fall back to full LLM processing when needed.

About the Authors

Dheer Toprani, Chaithanya Maisagoni, Rajesh Nedunuri, and Karam Muppidi are the authors of this article.

FAQs

Q: What is the main challenge LLMs face?
A: Hallucination, producing responses that sound convincing but are factually incorrect.

Q: What is the solution to reduce hallucinations in LLMs?
A: A verified semantic cache using Amazon Bedrock Knowledge Bases.

Q: What are the benefits of using this solution?
A: Reduced costs, improved accuracy, and lower latency.

Post Views: 43

Reducing Hallucinations in LLM Agents with a Verified Semantic Cache using Amazon Bedrock Knowledge Bases

Reducing Hallucinations in Large Language Models

Solution Overview

Solution Architecture

Step-by-Step Guide

Benefits

Production Readiness Considerations

Conclusion

About the Authors

FAQs

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Featured video: Coding for underwater robotics | MIT News

Generate single title from this title Upgrading agentic AI for finance workflows in 100 -150 characters. And it must return only title i dont...

Generate single title from this title Making Softmax More Efficient with NVIDIA Blackwell Ultra in 100 -150 characters. And it must return only title...

Generate single title from this title Nvidia shares fall as blockbuster results fail to dazzle in 100 -150 characters. And it must return only...

Generate single title from this title It exposed what was already broken in 100 -150 characters. And it must return only title i dont...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title Nearly half of high school students now use AI in college search in 100 -150 characters. And it...

Engineering confidence to navigate uncertainty | MIT News

Generate single title from this title Best of MWC 2026: Live updates on phones, concepts, and robots we’re seeing in 100 -150 characters. And...

Categories

Useful Links

Our Newsletter