Grok 3: Elon Musk’s AI Model Soars to the Top of Chatbot Leaderboards
Grok 3
On Monday, Elon Musk launched xAI’s latest family of AI models, Grok 3, via a live stream. Grok 3 boasts 10 times more training data than Grok 2, made possible by xAI’s creation of its own data center in Memphis, Tennessee, home to 200,000 GPUs.
"We are excited to present Grok 3, which we think is an order of magnitude more capable than Grok 2," said Musk during the livestream.
The family of models also includes a reasoning model, which builds on Grok 3. Like other reasoning models on the market, including OpenAI’s o1 and o3 models, the Grok 3 Reasoning beta thinks for a bit longer to output higher-quality results.
Performance
The model’s pre-training ended in early January, and even though it is still undergoing training, Grok 3 has outperformed leading models on AI benchmarks, including the AIME ’24, which tests for mathematical reasoning; GPQA, which tests for proficiency in science, specifically biology, physics, and chemistry; and the LCB Oct-Feb, which tests for coding capabilities.
Benchmarks
The model’s performance is also reflected in the following charts, which show Grok 3 outperforming leading models on various benchmarks.
[Insert chart: Grok 3 benchmarks]
Reasoning Model
The Grok 3 Reasoning model and Grok 3 mini reasoning model are still being developed, but according to results shared by xAI during the live stream, the betas of both models performed competitively against o3-mini (high), o1, DeepSeek-R1, and Gemini-2 Flash Thinking across the AIME, GPQA, and LCB.
[Insert chart: Reasoning model Grok 3]
Chatbot Arena
Beyond technical benchmarks, Grok 3 climbed the charts on the Chatbot Arena, a crowdsourced platform where users can evaluate LLMs by chatting with two LLMs side by side and comparing their responses to each other without knowing the models’ names.
DeepSearch
To meet the demand for agentic capabilities, xAI also launched DeepSearch, which is similar to OpenAI’s and Google’s deep research features. With DeepSearch, users can ask a question, and Grok will think it through, search the web, output its thinking process as it goes, and then generate a final, robust response with data and tables as necessary. This means you can ask it to research a topic, come back 10 minutes later, and the task will be completed.
Access
Starting today, you can access some of the Grok models in beta. Grok 3 is available on X Premium+, which also grants users access to the latest features, an increased usage limit, DeepSearch access, and advanced reasoning modes by clicking on the "Think" or "Big Brain" options.
Conclusion
Grok 3 has made a significant impact on the chatbot landscape, outperforming leading models on various benchmarks and dominating the Chatbot Arena. With its advanced reasoning capabilities and agentic features, Grok 3 is poised to revolutionize the way we interact with AI.
Frequently Asked Questions
Q: What is Grok 3?
A: Grok 3 is a large language model developed by xAI, featuring 10 times more training data than Grok 2.
Q: How does Grok 3 compare to other AI models?
A: Grok 3 has outperformed leading models on various benchmarks, including the AIME ’24, GPQA, and LCB.
Q: What is DeepSearch?
A: DeepSearch is a feature of Grok 3 that allows users to ask a question, and the model will think it through, search the web, and generate a final response with data and tables.
Q: How can I access Grok 3?
A: Grok 3 is available on X Premium+, which grants users access to the latest features, an increased usage limit, DeepSearch access, and advanced reasoning modes.
Q: What is the price of X Premium+?
A: The price of X Premium+ is $40 per month, up from $22 before the announcement was made.
Q: What is SuperGrok?
A: SuperGrok is a new subscription tier, similar to ChatGPT Pro, meant for super fans who want the earliest access to the most advanced capabilities. The price is yet to be shared.

