Date:

OpenAI Unveils o3-Mini Model for Free ChatGPT Users

OpenAI Unveils o3 and o3-mini Models for Reasoning

On the last day of OpenAI’s 12 days of ‘shipmas,’ the company unveiled its latest models, o3 and o3-mini, which excel at reasoning and even outperform o1 on a series of benchmarks, including math and science.

o3-mini

On Friday, OpenAI released its o3-mini model, the most cost-efficient model in OpenAI’s reasoning series, to the public. Until now, that series has been comprised of o1 and o1-mini. Like its predecessor, the model is particularly strong in science, math, and coding, according to the company.

OpenAI o3-mini is now available in ChatGPT and the API. Pro users will have unlimited access to o3-mini and Plus & Team users will have triple the rate limits (vs o1-mini). Free users can try o3-mini in ChatGPT by selecting the Reason button under the message composer.

When o3-mini is selected, it will use medium reasoning effort, which balances speed and accuracy. While the original o1 model still has broader general knowledge than o3-mini, the new model’s major advantage is its faster speed and higher performance compared to o1-mini.

Benchmark Performance

When comparing the performance of o3-mini to o1-mini, expert testers found that o3-mini delivered more accurate, reasoned-through, and clearer responses than o1-mini. According to the post, they preferred o3-mini responses 56% of the time and observed a 39% reduction in major errors.

Beyond human preference evaluations, in several STEM benchmarks, including the Competition Math (AIME 2024), PhD-level Science Questions (GPQA Diamond), and Competition Code (Codeforces), o3-mini with medium reasoning – which is what ChatGPT users will get by default – outperformed o1-mini.

Competition Math benchmark

Also notable is that o3-mini, with high reasoning effort in the benchmarks, came close to o1 performance, sometimes even surpassing it, as seen in the AIME 2024 above and Software Engineering (SWE-bench Verified) benchmarks. The o3-mini model with medium reasoning effort matched o1’s performance in the Codeforces benchmark.

Safety

OpenAI assessed o3-mini’s safety through public release through jailbreak and disallowed content evaluations. The company found that the model significantly surpasses GPT-4o on the evaluations. OpenAI posted the evaluation results below and also launched an o3-mini System Card, a 37-page PDF that includes the detailed results of the evaluations.

How to Access

All subscribers to OpenAI’s paid tiers, including ChatGPT Plus, Team, and Pro, can access OpenAI o3-mini starting today. Plus and Team users now have three times the rate limit, going from 50 messages per day with o1-mini to 150 messages per day. ChatGPT Enterprise access is coming in a week.

Conclusion

OpenAI’s o3 and o3-mini models are significant advancements in AI reasoning capabilities. With o3-mini’s ability to outperform o1 in several benchmarks, it’s an exciting development for those looking for improved performance and accuracy in AI-generated responses.

FAQs

Q: What are the key differences between o3 and o3-mini?

A: o3 is the full version of the model, while o3-mini is the cost-efficient version with medium reasoning effort.

Q: How do I access o3-mini?

A: All subscribers to OpenAI’s paid tiers, including ChatGPT Plus, Team, and Pro, can access OpenAI o3-mini starting today. Plus and Team users now have triple the rate limits (vs o1-mini).

Q: Can free users access o3-mini?

A: Yes, free ChatGPT users can try o3-mini in ChatGPT by selecting the Reason button under the message composer.

Q: What are the safety evaluations of o3-mini?

A: OpenAI assessed o3-mini’s safety through public release through jailbreak and disallowed content evaluations. The company found that the model significantly surpasses GPT-4o on the evaluations.

Latest stories

Read More

Go Module Mirror Served Backdoor to Devs for 3+ Years

A Backdoored Package Lurked in a Google-Run Go Mirror...

TechCrunch Sessions: AI

Step into the Future of AI at TechCrunch Sessions Register...

DeepSeek Serves as a Warning About Big Tech

A.I.'s Sputnik Moment: A Canary in the Coal Mine When...

Adobe Slashes 70% Off Creative Cloud for Education

Adobe Offers 70% Discount on Creative Cloud All Apps...

UK hospitals begin live trial of prostate cancer-detecting AI

Three English Hospitals Launch Clinical Trial of AI Technology...

The Thing: Comic Perfect

Marvel's Fantastic Four: First Steps Trailer Sparks Excitement Over...

Multisensory Marketing: The Future of Brand Engagement

Tapping into our multiple senses is an incredibly effective...

Practice Soft Skills with AI-Powered Guide

Startups and Big Tech Companies Leverage AI for Soft...

LEAVE A REPLY

Please enter your comment!
Please enter your name here