OpenAI Unveils o3 and o3-mini Models for Reasoning
On the last day of OpenAI’s 12 days of ‘shipmas,’ the company unveiled its latest models, o3 and o3-mini, which excel at reasoning and even outperform o1 on a series of benchmarks, including math and science.
o3-mini
On Friday, OpenAI released its o3-mini model, the most cost-efficient model in OpenAI’s reasoning series, to the public. Until now, that series has been comprised of o1 and o1-mini. Like its predecessor, the model is particularly strong in science, math, and coding, according to the company.
OpenAI o3-mini is now available in ChatGPT and the API. Pro users will have unlimited access to o3-mini and Plus & Team users will have triple the rate limits (vs o1-mini). Free users can try o3-mini in ChatGPT by selecting the Reason button under the message composer.
When o3-mini is selected, it will use medium reasoning effort, which balances speed and accuracy. While the original o1 model still has broader general knowledge than o3-mini, the new model’s major advantage is its faster speed and higher performance compared to o1-mini.
Benchmark Performance
When comparing the performance of o3-mini to o1-mini, expert testers found that o3-mini delivered more accurate, reasoned-through, and clearer responses than o1-mini. According to the post, they preferred o3-mini responses 56% of the time and observed a 39% reduction in major errors.
Beyond human preference evaluations, in several STEM benchmarks, including the Competition Math (AIME 2024), PhD-level Science Questions (GPQA Diamond), and Competition Code (Codeforces), o3-mini with medium reasoning – which is what ChatGPT users will get by default – outperformed o1-mini.
Also notable is that o3-mini, with high reasoning effort in the benchmarks, came close to o1 performance, sometimes even surpassing it, as seen in the AIME 2024 above and Software Engineering (SWE-bench Verified) benchmarks. The o3-mini model with medium reasoning effort matched o1’s performance in the Codeforces benchmark.
Safety
OpenAI assessed o3-mini’s safety through public release through jailbreak and disallowed content evaluations. The company found that the model significantly surpasses GPT-4o on the evaluations. OpenAI posted the evaluation results below and also launched an o3-mini System Card, a 37-page PDF that includes the detailed results of the evaluations.
How to Access
All subscribers to OpenAI’s paid tiers, including ChatGPT Plus, Team, and Pro, can access OpenAI o3-mini starting today. Plus and Team users now have three times the rate limit, going from 50 messages per day with o1-mini to 150 messages per day. ChatGPT Enterprise access is coming in a week.
Conclusion
OpenAI’s o3 and o3-mini models are significant advancements in AI reasoning capabilities. With o3-mini’s ability to outperform o1 in several benchmarks, it’s an exciting development for those looking for improved performance and accuracy in AI-generated responses.
FAQs
Q: What are the key differences between o3 and o3-mini?
A: o3 is the full version of the model, while o3-mini is the cost-efficient version with medium reasoning effort.
Q: How do I access o3-mini?
A: All subscribers to OpenAI’s paid tiers, including ChatGPT Plus, Team, and Pro, can access OpenAI o3-mini starting today. Plus and Team users now have triple the rate limits (vs o1-mini).
Q: Can free users access o3-mini?
A: Yes, free ChatGPT users can try o3-mini in ChatGPT by selecting the Reason button under the message composer.
Q: What are the safety evaluations of o3-mini?
A: OpenAI assessed o3-mini’s safety through public release through jailbreak and disallowed content evaluations. The company found that the model significantly surpasses GPT-4o on the evaluations.