xAI’s Grok 3 is better than expected: Try it for free before you subscribe

Grok 3: Elon Musk’s AI Model Soars to the Top of Chatbot Leaderboards

Grok 3

On Monday, Elon Musk launched xAI’s latest family of AI models, Grok 3, via a live stream. Grok 3 boasts 10 times more training data than Grok 2, made possible by xAI’s creation of its own data center in Memphis, Tennessee, home to 200,000 GPUs.

"We are excited to present Grok 3, which we think is an order of magnitude more capable than Grok 2," said Musk during the livestream.

The family of models also includes a reasoning model, which builds on Grok 3. Like other reasoning models on the market, including OpenAI’s o1 and o3 models, the Grok 3 Reasoning beta thinks for a bit longer to output higher-quality results.

Performance

The model’s pre-training ended in early January, and even though it is still undergoing training, Grok 3 has outperformed leading models on AI benchmarks, including the AIME ’24, which tests for mathematical reasoning; GPQA, which tests for proficiency in science, specifically biology, physics, and chemistry; and the LCB Oct-Feb, which tests for coding capabilities.

Benchmarks

The model’s performance is also reflected in the following charts, which show Grok 3 outperforming leading models on various benchmarks.

[Insert chart: Grok 3 benchmarks]

Reasoning Model

The Grok 3 Reasoning model and Grok 3 mini reasoning model are still being developed, but according to results shared by xAI during the live stream, the betas of both models performed competitively against o3-mini (high), o1, DeepSeek-R1, and Gemini-2 Flash Thinking across the AIME, GPQA, and LCB.

[Insert chart: Reasoning model Grok 3]

Chatbot Arena

Beyond technical benchmarks, Grok 3 climbed the charts on the Chatbot Arena, a crowdsourced platform where users can evaluate LLMs by chatting with two LLMs side by side and comparing their responses to each other without knowing the models’ names.

DeepSearch

To meet the demand for agentic capabilities, xAI also launched DeepSearch, which is similar to OpenAI’s and Google’s deep research features. With DeepSearch, users can ask a question, and Grok will think it through, search the web, output its thinking process as it goes, and then generate a final, robust response with data and tables as necessary. This means you can ask it to research a topic, come back 10 minutes later, and the task will be completed.

Access

Starting today, you can access some of the Grok models in beta. Grok 3 is available on X Premium+, which also grants users access to the latest features, an increased usage limit, DeepSearch access, and advanced reasoning modes by clicking on the "Think" or "Big Brain" options.

Conclusion

Grok 3 has made a significant impact on the chatbot landscape, outperforming leading models on various benchmarks and dominating the Chatbot Arena. With its advanced reasoning capabilities and agentic features, Grok 3 is poised to revolutionize the way we interact with AI.

Frequently Asked Questions

Q: What is Grok 3?
A: Grok 3 is a large language model developed by xAI, featuring 10 times more training data than Grok 2.

Q: How does Grok 3 compare to other AI models?
A: Grok 3 has outperformed leading models on various benchmarks, including the AIME ’24, GPQA, and LCB.

Q: What is DeepSearch?
A: DeepSearch is a feature of Grok 3 that allows users to ask a question, and the model will think it through, search the web, and generate a final response with data and tables.

Q: How can I access Grok 3?
A: Grok 3 is available on X Premium+, which grants users access to the latest features, an increased usage limit, DeepSearch access, and advanced reasoning modes.

Q: What is the price of X Premium+?
A: The price of X Premium+ is $40 per month, up from $22 before the announcement was made.

Q: What is SuperGrok?
A: SuperGrok is a new subscription tier, similar to ChatGPT Pro, meant for super fans who want the earliest access to the most advanced capabilities. The price is yet to be shared.

Post Views: 221

xAI’s Grok 3 is better than expected: Try it for free before you subscribe

Generate single title from this title AgentWatch: Proactive AWS monitoring with ambient agents in 100 -150 characters. And it must return only title i...

How is Technology Changing Recreational Boating?

When robots start to feel: HBK and Siléane bring tactile intelligence to high-speed cosmetics packaging

Generate single title from this title I tested a 4TB quantum-resistant USB drive – but you don’t have to spend $3000 for this much...

Generate single title from this title Data Science • AI • Advanced Analytics in 100 -150 characters. And it must return only title i...

Generate single title from this title AgentWatch: Proactive AWS monitoring with ambient agents in 100 -150 characters. And it must return only title i...

How is Technology Changing Recreational Boating?

When robots start to feel: HBK and Siléane bring tactile intelligence to high-speed cosmetics packaging

Generate single title from this title I tested a 4TB quantum-resistant USB drive – but you don’t have to spend $3000 for this much...

Generate single title from this title Data Science • AI • Advanced Analytics in 100 -150 characters. And it must return only title i...

Strider Robotics demonstrates 40 kg payload quadruped robot as commercial pilots begin

mimic Robotics unveils full-stack platform for dexterous robot manipulation

Aetina expands Nvidia Jetson Thor portfolio with T3000 and T2000 support

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title AgentWatch: Proactive AWS monitoring with ambient agents in 100 -150 characters. And it must return only title i...

How is Technology Changing Recreational Boating?

When robots start to feel: HBK and Siléane bring tactile intelligence to high-speed cosmetics packaging

Categories

Useful Links

Our Newsletter