AI Chatbots Churning Out Inaccurate News Summaries
Major AI chatbots, including OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini, and Perplexity AI, have been found to produce "significant inaccuracies" and "distortions" when summarizing news stories.
Investigation Methodology
A recent BBC investigation presented the four AI chatbots with news content from the BBC’s website and asked them to summarize the news. The report details that the BBC asked the chatbots to summarize 100 news stories, and journalists with relevant expertise rated the quality of each answer.
Findings
The investigation found that:
- 51% of all AI-produced answers had significant issues
- 19% of the AI-generated answers "introduced factual errors, such as incorrect factual statements, numbers, and dates"
- 13% of the quotes from BBC articles were altered in some way, undermining the "original source" or not even being present in the cited article
Examples of Errors
Some notable errors highlighted in the report include:
- ChatGPT claiming that Hamas chairman Ismail Haniyeh was assassinated in December 2024 in Iran when he was actually killed in July
- Gemini stating that the National Health Service (NHS) "advises people not to start vaping and recommends that smokers who want to quit should use other methods" (this statement is incorrect)
- Perplexity misquoting a statement from Liam Payne’s family after his death
- ChatGPT and Copilot both misstating that former UK politicians Rishi Sunak and Nicola Sturgeon were still in office
Comparison of AI Chatbots
The report found that Copilot and Gemini had more inaccuracies and issues overall than OpenAI’s ChatGPT and Perplexity.
Concerns and Recommendations
The investigation concluded that factual inaccuracies weren’t the only concern about the chatbot’s output; the AI assistants also "struggled to differentiate between opinion and fact, editorialized, and often failed to include essential context."
Publishers should have control over whether and how their content is used, and AI companies should show how assistants process news along with the scale and scope of errors and inaccuracies they produce.
Reactions and Responses
Deborah Turness, CEO of BBC News and Current Affairs, responded to the investigation’s findings, stating: "The price of AI’s extraordinary benefits must not be a world where people searching for answers are served distorted, defective content that presents itself as fact."
A spokesperson for OpenAI emphasized the quality of ChatGPT’s output, stating that OpenAI is working with partners to improve in-line citation accuracy and respect publisher preferences to enhance search results.
Conclusion
The investigation highlights the need for AI companies to improve the accuracy and reliability of their chatbots. As AI becomes increasingly prevalent in our daily lives, it is crucial that we ensure that the information we receive is accurate and trustworthy.
FAQs
Q: What was the purpose of the investigation?
A: The investigation aimed to assess the accuracy of AI chatbots in summarizing news stories.
Q: Which AI chatbots were tested?
A: OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini, and Perplexity AI.
Q: What were the findings of the investigation?
A: The investigation found that 51% of AI-produced answers had significant issues, 19% introduced factual errors, and 13% of quotes from BBC articles were altered.
Q: What are the implications of these findings?
A: The findings highlight the need for AI companies to improve the accuracy and reliability of their chatbots to ensure that users receive trustworthy information.

