OpenAI’s Latest Development: Improved AI Agents with Web Search Capability
Developers using the Responses API can now access the same models that power ChatGPT Search: GPT-4o search and GPT-4o mini search. These models can browse the web to answer questions and cite sources in their responses.
Improved Factual Accuracy
That’s notable because OpenAI says the added web search ability dramatically improves the factual accuracy of its AI models. On OpenAI’s SimpleQA benchmark, which aims to measure confabulation rate, GPT-4o search scored 90 percent, while GPT-4o mini search achieved 88 percent—both substantially outperforming the larger GPT-4.5 model without search, which scored 63 percent.
Limitations and Challenges
Despite these improvements, the technology still has significant limitations. Aside from issues with CUA properly navigating websites, the improved search capability doesn’t completely solve the problem of AI confabulations, with GPT-4o search still making factual mistakes 10 percent of the time.
Open Source Agents SDK and Integrated Systems
Alongside the Responses API, OpenAI released the open source Agents SDK, providing developers free tools to integrate models with internal systems, implement safeguards, and monitor agent activities. This toolkit follows OpenAI’s earlier release of Swarm, a framework for orchestrating multiple agents.
Conclusion
The AI agent movement is still in its early days, and things will likely improve rapidly. However, at the moment, the AI agent movement remains vulnerable to unrealistic claims, as demonstrated earlier this week when users discovered that Chinese startup Butterfly Effect’s Manus AI agent platform failed to deliver on many of its promises, highlighting the persistent gap between promotional claims and practical functionality in this emerging technology category.
FAQs
Q: What are the benefits of OpenAI’s new AI agents?
A: The new AI agents can browse the web to answer questions and cite sources in their responses, improving factual accuracy.
Q: How accurate are OpenAI’s new AI agents?
A: According to OpenAI’s SimpleQA benchmark, GPT-4o search scored 90 percent, while GPT-4o mini search achieved 88 percent.
Q: What are the limitations of OpenAI’s new AI agents?
A: Despite improvements, the technology still has limitations, including issues with CUA properly navigating websites and making factual mistakes 10 percent of the time.
Q: What is the Open Source Agents SDK?
A: The Open Source Agents SDK is a free toolkit providing developers with tools to integrate models with internal systems, implement safeguards, and monitor agent activities.

