Musk's 'Strongest on Earth' Grok 3 Makes Waves, Tops Arena with Over 1400 Points

Frontier Models · Published: Feb 18, 2025 · David Kowalski · ~12 min read

Author Info

Developer Tools & Agents Editor

15+ years software engineering; maintainer of internal agent-evaluation playbooks

David tests coding agents, IDE integrations, and terminal workflows the way working teams use them. He documents prompts, environment pins, and regression cases so readers can compare tools fairly. When vendors sponsor access, he discloses it and keeps scoring criteria unchanged.

#Coding Agents #IDE Integrations #Developer Productivity #Tool Comparisons

Full author profile →

Just now, Elon Musk’s xAI unveiled Grok 3 (watched by over 3 million people).

It reached the pinnacle upon debut. The official evaluation from the Arena (lmarena.ai) states:

Grok 3 is the first model to break the 1400-point mark and ranks first in all categories.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Furthermore, Grok 3 is the first model trained on a cluster of 100,000 (later expanded to 200,000) H100 GPUs.

In his teaser for the release, Musk praised Grok 3 highly, calling it “the smartest AI on Earth.”

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Before the official release, AI expert Karpathy gained early access. After playing with it for two hours, he published a long post detailing his impressions.

Karpathy believes that Grok 3’s reasoning capabilities have reached SOTA (State of the Art), with inference performance comparable to o1-pro, and slightly better than DeepSeek R1 and Gemini’s inference models.

Considering Grok 3 was trained from scratch a year ago, achieving such results is truly incredible.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Moreover, for classic large model “hard problems” such as comparing digits and decimals, Grok 3 correctly solved them after enabling reasoning.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

However, some have questioned Grok’s status. One netizen spoofing Nvidia’s Jensen Huang commented that even if Grok 3 is truly the strongest, it will only stay on top for at most a week.

Coupled with OpenAI teasing its next-generation GPT plans, another Twitter battle between Musk and Sam Altman is about to erupt.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Altman also tweeted last night that testing GPT-4.5 gave him a stronger “feel the AGI” sensation than expected.

Netizens in the comments section stirred things up, urging him to beat Musk to the punch and livestream the release of GPT-4.5 in the morning.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Back to business, let’s look at what was discussed during the livestream.

200,000 H100s Trained the Strongest Model

Four people participated in this livestream. Besides Musk, the most prominent figures were the two Chinese individuals seated in the center; they are founding members of xAI.

From left to right:

Jimmy Ba, a 2023 Sloan Prize winner and Assistant Professor under Geoffrey Hinton, who completed both his undergraduate and doctoral studies at the University of Toronto.
Yuhuai (Tony) Wu, a postdoctoral researcher at Stanford University, who received his Ph.D. from the University of Toronto.

The person on the far left is Igor Babuschkin, an engineer at xAI.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

The four first introduced the training process of Grok 3.

Last year, Musk teased that Grok 3 was being trained on 100,000 H100 GPUs, making it the first model to reach such a massive training cluster scale.

At the time, netizens called this a “super factory” for neural networks.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Today at the press conference, it was revealed that by day 92 of training, the cluster scale expanded to 200,000 cards.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

With such powerful computing power, xAI also followed the trend by introducing Chain-of-Thought reasoning capabilities in Grok 3.

At a summit in Dubai earlier, Musk proudly declared:

Grok 3 has strong reasoning capabilities and is smarter than all currently known models.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

This wave of Grok 3 comes in two versions: Full and Mini. Both outperformed non-reasoning models like GPT-4o and DeepSeek-V3 on datasets for mathematics, science, and code.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Additionally, in its early stages under the alias “Chocolate,” Grok 3 topped the LMSYS leaderboard, becoming the only model to score over 1400.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Building on the base Grok 3 and Mini models, the xAI team also created two reasoning models.

The reasoning model based on Mini (Grok 3 mini Reasoning) is relatively mature, while the one based on the Full version (Grok 3 Reasoning Beta) is still in the Beta stage.

Before presenting the results, the four used Musk’s account to run two cases with Grok, related to physics and gaming respectively.

Generate code to create a 3D animated chart depicting a launch from Earth to Mars, followed by a return to Earth during the next launch window.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

During the generation process, someone joked about when Grok could be installed on SpaceX rockets. Musk responded that it might take another two years.

Musk also stated that if everything goes smoothly, SpaceX plans to send the Optimus robot to Mars via Starship around November 2025, during the next Earth-Mars transfer window.

Returning to Grok, after considering Kepler’s laws and converting them into code, it ultimately generated code capable of producing such an animation:

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

The second question activated Big Brain mode, allowing the model to use more computing resources for deeper thinking.

The prompt required using the pygame library to design a game that combines Tetris and Bejeweled.

It also hinted that the code might be long, needing to be saved in a single file, and should be “insanely great.”

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Grok 3 lived up to expectations, successfully combining these two games and introducing the features of the hybrid version:

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

When run, it looks like this: it retains Tetris’s elimination mechanics but adjusts them based on Bejeweled’s traits to require three blocks for a match.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Looking at the benchmark results, both versions achieved impressive scores in mathematics, science, and coding tasks.

Furthermore, when prompted to “think more” (the lighter shaded area above the bars), their performance surpassed DeepSeek-R1 and the high-end o3-mini.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

However, many models are currently showing signs of “overfitting” on benchmarks. So, how does Grok 3 perform in reality?

The R&D team challenged it with questions from this year’s AIME 2025 competition. The results showed that Grok-3 Reasoning Beta and mini Reasoning scored 93 and 90 respectively, outperforming other reasoning models.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

In addition to the Grok 3 pre-trained model and the two reasoning models, xAI also released an AI Agent called DeepSearch.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

This feature can be seen as xAI’s counterpart to the Deep Research functions recently launched by OpenAI, Google, and others.

In short, DeepSearch scans the internet and X (formerly Twitter) to analyze information and provides summaries to answer questions.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Regarding access, X Premium+ users can experience Grok 3 starting today.

On the standalone app, a SuperGrok subscription is required—$30/month or $300/year.

The Release Process Was Full of Twists; Voice Mode Delayed

Looking back at the entire process of Grok 3’s emergence, it was indeed full of twists and turns.

Last August, during an interview with popular podcaster Lex Fridman, Musk said that Grok 3 was expected to be released by the end of that year.

However, the first test instance wasn’t published until January 19 this year, and the actual release has been delayed until now.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

Moreover, just over the weekend before the release, the xAI team was still urgently refining Grok 3.

Elon Musk's "Strongest on Earth" Grok 3 Stuns, Arena Score Exceeds 1400 for the First Time

An xAI employee also shared their experience, noting that on

At 11:30 PM that night (3:30 PM Beijing time on Monday, less than 24 hours before the launch), Musk posted online stating he was still pulling an all-nighter to finish his work.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

Just an hour and a half before the press conference, Musk suddenly announced that the voice feature originally planned for release would be delayed.

Musk tweeted that the voice mode was still unstable and needed to be postponed by another week.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

During the live Q&A session, a netizen asked about the specific release date. The team responded that an early version would go online soon, followed by gradual iterations.

However, Shivon Zilis, an executive at Musk’s Neuralink, had already experienced Ara for one hour and posted her impressions in Beijing time earlier that morning.

Zilis described it as one of the most surprising and meaningful moments of her life.

She discussed topics such as biology and quantum entanglement with Ara, and even asked Ara to create quiz questions to test her learning outcomes.

Zilis only answered half of the questions correctly, but Ava patiently explained the remaining ones without dismissing any questions as too foolish.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

Someone later asked in the comments if Ara was a voice version, to which Shivon confirmed that it was.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

Seeking $10 Billion in New Financing and Entering the Gaming Sector

In fact, Musk’s timing for releasing Grok 3 is somewhat delicate.

Just last Friday, Bloomberg reported that xAI was seeking a new round of financing worth approximately $10 billion, valuing the company at around $75 billion (approximately 545.46 billion RMB).

Existing investors, including Sequoia Capital, Andreessen Horowitz, and Valor Equity Partners, are in talks to participate in this funding round.

Since the deal has not yet been finalized, the release of the new model is likely to have a certain impact on this financing round.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

If these reports are confirmed, xAI’s fundraising speed is indeed astonishing.

At the end of last December, the company had just completed a $6 billion Series C round, valuing it at $51 billion.

In less than two months, the valuation jumped by approximately 47%. Looking further back, from the B to C rounds, the valuation doubled within six months.

It can be said that xAI, established less than two years ago, has grown into a formidable rival to OpenAI.

With ample funding, xAI announced not only continued model development but also other strategic directions:

Betting on the gaming sector by establishing an AI game studio.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

Musk first hinted at this last November, complaining that “too many game studios are controlled by large corporations.”

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

Now, Musk’s business empire is expanding once again.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

One More Thing

A few days before the Grok 3 launch, another dramatic incident sparked heated discussion.

An xAI engineer (now a former employee) publicly posted a comparison of Grok 3’s coding abilities against several competitors.

Although he clearly labeled this as his personal opinion, ranking his own model, Grok 3, fourth (with the top three spots taken by OpenAI models) caused controversy.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

The employee later revealed that the company demanded he either delete the post or be fired, claiming the post exposed Grok 3’s existence.

Upon hearing this, the engineer felt it was absurd, noting that everyone already knew about Grok 3, and even shared screenshots of Musk’s previous statements.

Facing what he perceived as xAI’s petty behavior, the engineer decided to quit without hesitation, posting a lengthy explanation.

I will maintain my words and dignity, find another job, or start my own business. See you later.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

Regarding this incident, Musk later responded that it was “weird,” but no further actions were reported.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

More dramatically, due to a dispute over salary payments, the engineer later publicly posted again, tagging Musk:

Please do the right thing.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

However, despite having “broken up,” the engineer who worked on Grok 3’s voice mode still set aside past grievances and helped promote Grok 3 multiple times.

Moreover, the voice feature that Musk announced would be delayed today was indeed the work of this engineer’s team. Even after leaving, he remains proud of his contribution to the project.

Musk's "World's Strongest" Grok 3 Steals the Show, Surpasses 1400 Points in Arena for the First Time

That said, what do you think of this version of Grok 3? Once the next generation of GPT is released, can Musk maintain his lead?

References

1891720635363254772. 1891720635363254772 — x.com/karpathy/status/1891720635363254772 1891587630854209768. 1891587630854209768 — x.com/shivon/status/1891587630854209768 1891390812795146746. 1891390812795146746 — x.com/hyhieu226/status/1891390812795146746

Musk's 'Strongest on Earth' Grok 3 Makes Waves, Tops Arena with Over 1400 Points

Author Info

200,000 H100s Trained the Strongest Model

The Release Process Was Full of Twists; Voice Mode Delayed

Seeking $10 Billion in New Financing and Entering the Gaming Sector

One More Thing

References

Related News

Latest Headlines