The Frontier of AI: A Leap in Mathematical Reasoning Capabilities
Benchmarks vs. Real-World Value
On the Frontier Math benchmark by EpochAI, o3 solved 25.2 percent of problems, while no other model has exceeded 2 percent – suggesting a significant leap in mathematical reasoning capabilities over the previous model.
Potential Applications
Ideally, potential applications for a true PhD-level AI model would include analyzing medical research data, supporting climate modeling, and handling routine aspects of research work.
The High Price of Innovation
The high price points reported by The Information, if accurate, suggest that OpenAI believes these systems could provide substantial value to businesses. The publication notes that SoftBank, an OpenAI investor, has committed to spending $3 billion on OpenAI’s agent products this year alone – indicating significant business interest despite the costs.
Financial Pressures
OpenAI faces financial pressures that may influence its premium pricing strategy. The company reportedly lost approximately $5 billion last year covering operational costs and other expenses related to running its services.
A New Era of Pricing
News of OpenAI’s stratospheric pricing plans come after years of relatively affordable AI services that have conditioned users to expect powerful capabilities at relatively low costs. ChatGPT Plus remains $20 per month and Claude Pro costs $30 monthly – both tiny fractions of these proposed enterprise tiers. Even ChatGPT Pro’s $200/month subscription is relatively small compared to the new proposed fees. Whether the performance difference between these tiers will match their thousandfold price difference is an open question.
Confabulations and Accuracy
Despite their benchmark performances, these simulated reasoning models still struggle with confabulations – instances where they generate plausible-sounding but factually incorrect information. This remains a critical concern for research applications where accuracy and reliability are paramount. A $20,000 monthly investment raises questions about whether organizations can trust these systems not to introduce subtle errors into high-stakes research.
Social Media Reactions
In response to the news, several people quipped on social media that companies could hire an actual PhD student for much cheaper. "In case you have forgotten," wrote xAI developer Hieu Pham in a viral tweet, "most PhD students, including the brightest stars who can do way better work than any current LLMs – are not paid $20K / month."
Conclusion
While these systems show strong capabilities on specific benchmarks, the "PhD-level" label remains largely a marketing term. These models can process and synthesize information at impressive speeds, but questions remain about how effectively they can handle the creative thinking, intellectual skepticism, and original research that define actual doctoral-level work.
FAQs
Q: What are the potential applications of a true PhD-level AI model?
A: Analyzing medical research data, supporting climate modeling, and handling routine aspects of research work.
Q: Why are these systems so expensive?
A: OpenAI faces financial pressures and believes these systems could provide substantial value to businesses, but the high price points may be a reflection of the company’s financial struggles.
Q: Are these systems accurate?
A: Despite their strong performance on benchmarks, these systems still struggle with confabulations and may introduce subtle errors into high-stakes research.
Q: Can companies hire actual PhD students for cheaper?
A: Yes, and some have pointed out that even the brightest PhD students are not paid $20,000 per month.

