Just now, Elon Musk’s xAI unveiled Grok 3 (watched by over 3 million people).
It reached the pinnacle upon debut. The official evaluation from the Arena (lmarena.ai) states:
Grok 3 is the first model to break the 1400-point mark and ranks first in all categories.

Furthermore, Grok 3 is the first model trained on a cluster of 100,000 (later expanded to 200,000) H100 GPUs.
In his teaser for the release, Musk praised Grok 3 highly, calling it “the smartest AI on Earth.”

Before the official release, AI expert Karpathy gained early access. After playing with it for two hours, he published a long post detailing his impressions.
Karpathy believes that Grok 3’s reasoning capabilities have reached SOTA (State of the Art), with inference performance comparable to o1-pro, and slightly better than DeepSeek R1 and Gemini’s inference models.
Considering Grok 3 was trained from scratch a year ago, achieving such results is truly incredible.

Moreover, for classic large model “hard problems” such as comparing digits and decimals, Grok 3 correctly solved them after enabling reasoning.

However, some have questioned Grok’s status. One netizen spoofing Nvidia’s Jensen Huang commented that even if Grok 3 is truly the strongest, it will only stay on top for at most a week.
Coupled with OpenAI teasing its next-generation GPT plans, another Twitter battle between Musk and Sam Altman is about to erupt.

Altman also tweeted last night that testing GPT-4.5 gave him a stronger “feel the AGI” sensation than expected.
Netizens in the comments section stirred things up, urging him to beat Musk to the punch and livestream the release of GPT-4.5 in the morning.

Back to business, let’s look at what was discussed during the livestream.
200,000 H100s Trained the Strongest Model
Four people participated in this livestream. Besides Musk, the most prominent figures were the two Chinese individuals seated in the center; they are founding members of xAI.
From left to right:
- Jimmy Ba, a 2023 Sloan Prize winner and Assistant Professor under Geoffrey Hinton, who completed both his undergraduate and doctoral studies at the University of Toronto.
- Yuhuai (Tony) Wu, a postdoctoral researcher at Stanford University, who received his Ph.D. from the University of Toronto.
The person on the far left is Igor Babuschkin, an engineer at xAI.

The four first introduced the training process of Grok 3.
Last year, Musk teased that Grok 3 was being trained on 100,000 H100 GPUs, making it the first model to reach such a massive training cluster scale.
At the time, netizens called this a “super factory” for neural networks.

Today at the press conference, it was revealed that by day 92 of training, the cluster scale expanded to 200,000 cards.

With such powerful computing power, xAI also followed the trend by introducing Chain-of-Thought reasoning capabilities in Grok 3.
At a summit in Dubai earlier, Musk proudly declared:
Grok 3 has strong reasoning capabilities and is smarter than all currently known models.

This wave of Grok 3 comes in two versions: Full and Mini. Both outperformed non-reasoning models like GPT-4o and DeepSeek-V3 on datasets for mathematics, science, and code.

Additionally, in its early stages under the alias “Chocolate,” Grok 3 topped the LMSYS leaderboard, becoming the only model to score over 1400.

Building on the base Grok 3 and Mini models, the xAI team also created two reasoning models.
The reasoning model based on Mini (Grok 3 mini Reasoning) is relatively mature, while the one based on the Full version (Grok 3 Reasoning Beta) is still in the Beta stage.
Before presenting the results, the four used Musk’s account to run two cases with Grok, related to physics and gaming respectively.
Generate code to create a 3D animated chart depicting a launch from Earth to Mars, followed by a return to Earth during the next launch window.

During the generation process, someone joked about when Grok could be installed on SpaceX rockets. Musk responded that it might take another two years.
Musk also stated that if everything goes smoothly, SpaceX plans to send the Optimus robot to Mars via Starship around November 2025, during the next Earth-Mars transfer window.
Returning to Grok, after considering Kepler’s laws and converting them into code, it ultimately generated code capable of producing such an animation:

The second question activated Big Brain mode, allowing the model to use more computing resources for deeper thinking.
The prompt required using the pygame library to design a game that combines Tetris and Bejeweled.
It also hinted that the code might be long, needing to be saved in a single file, and should be “insanely great.”

Grok 3 lived up to expectations, successfully combining these two games and introducing the features of the hybrid version:

When run, it looks like this: it retains Tetris’s elimination mechanics but adjusts them based on Bejeweled’s traits to require three blocks for a match.

Looking at the benchmark results, both versions achieved impressive scores in mathematics, science, and coding tasks.
Furthermore, when prompted to “think more” (the lighter shaded area above the bars), their performance surpassed DeepSeek-R1 and the high-end o3-mini.

However, many models are currently showing signs of “overfitting” on benchmarks. So, how does Grok 3 perform in reality?
The R&D team challenged it with questions from this year’s AIME 2025 competition. The results showed that Grok-3 Reasoning Beta and mini Reasoning scored 93 and 90 respectively, outperforming other reasoning models.

In addition to the Grok 3 pre-trained model and the two reasoning models, xAI also released an AI Agent called DeepSearch.

This feature can be seen as xAI’s counterpart to the Deep Research functions recently launched by OpenAI, Google, and others.
In short, DeepSearch scans the internet and X (formerly Twitter) to analyze information and provides summaries to answer questions.

Regarding access, X Premium+ users can experience Grok 3 starting today.
On the standalone app, a SuperGrok subscription is required—$30/month or $300/year.
The Release Process Was Full of Twists; Voice Mode Delayed
Looking back at the entire process of Grok 3’s emergence, it was indeed full of twists and turns.
Last August, during an interview with popular podcaster Lex Fridman, Musk said that Grok 3 was expected to be released by the end of that year.
However, the first test instance wasn’t published until January 19 this year, and the actual release has been delayed until now.

Moreover, just over the weekend before the release, the xAI team was still urgently refining Grok 3.

An xAI employee also shared their experience, noting that on
At 11:30 PM that night (3:30 PM Beijing time on Monday, less than 24 hours before the launch), Musk posted online stating he was still pulling an all-nighter to finish his work.

Just an hour and a half before the press conference, Musk suddenly announced that the voice feature originally planned for release would be delayed.
Musk tweeted that the voice mode was still unstable and needed to be postponed by another week.

During the live Q&A session, a netizen asked about the specific release date. The team responded that an early version would go online soon, followed by gradual iterations.
However, Shivon Zilis, an executive at Musk’s Neuralink, had already experienced Ara for one hour and posted her impressions in Beijing time earlier that morning.
Zilis described it as one of the most surprising and meaningful moments of her life.
She discussed topics such as biology and quantum entanglement with Ara, and even asked Ara to create quiz questions to test her learning outcomes.
Zilis only answered half of the questions correctly, but Ava patiently explained the remaining ones without dismissing any questions as too foolish.

Someone later asked in the comments if Ara was a voice version, to which Shivon confirmed that it was.

Seeking $10 Billion in New Financing and Entering the Gaming Sector
In fact, Musk’s timing for releasing Grok 3 is somewhat delicate.
Just last Friday, Bloomberg reported that xAI was seeking a new round of financing worth approximately $10 billion, valuing the company at around $75 billion (approximately 545.46 billion RMB).
Existing investors, including Sequoia Capital, Andreessen Horowitz, and Valor Equity Partners, are in talks to participate in this funding round.
Since the deal has not yet been finalized, the release of the new model is likely to have a certain impact on this financing round.

If these reports are confirmed, xAI’s fundraising speed is indeed astonishing.
At the end of last December, the company had just completed a $6 billion Series C round, valuing it at $51 billion.
In less than two months, the valuation jumped by approximately 47%. Looking further back, from the B to C rounds, the valuation doubled within six months.
It can be said that xAI, established less than two years ago, has grown into a formidable rival to OpenAI.
With ample funding, xAI announced not only continued model development but also other strategic directions:
Betting on the gaming sector by establishing an AI game studio.

Musk first hinted at this last November, complaining that “too many game studios are controlled by large corporations.”

Now, Musk’s business empire is expanding once again.

One More Thing
A few days before the Grok 3 launch, another dramatic incident sparked heated discussion.
An xAI engineer (now a former employee) publicly posted a comparison of Grok 3’s coding abilities against several competitors.
Although he clearly labeled this as his personal opinion, ranking his own model, Grok 3, fourth (with the top three spots taken by OpenAI models) caused controversy.

The employee later revealed that the company demanded he either delete the post or be fired, claiming the post exposed Grok 3’s existence.
Upon hearing this, the engineer felt it was absurd, noting that everyone already knew about Grok 3, and even shared screenshots of Musk’s previous statements.
Facing what he perceived as xAI’s petty behavior, the engineer decided to quit without hesitation, posting a lengthy explanation.
I will maintain my words and dignity, find another job, or start my own business. See you later.

Regarding this incident, Musk later responded that it was “weird,” but no further actions were reported.

More dramatically, due to a dispute over salary payments, the engineer later publicly posted again, tagging Musk:
Please do the right thing.

However, despite having “broken up,” the engineer who worked on Grok 3’s voice mode still set aside past grievances and helped promote Grok 3 multiple times.
Moreover, the voice feature that Musk announced would be delayed today was indeed the work of this engineer’s team. Even after leaving, he remains proud of his contribution to the project.

That said, what do you think of this version of Grok 3? Once the next generation of GPT is released, can Musk maintain his lead?
References
1891720635363254772. 1891720635363254772 — x.com/karpathy/status/1891720635363254772 1891587630854209768. 1891587630854209768 — x.com/shivon/status/1891587630854209768 1891390812795146746. 1891390812795146746 — x.com/hyhieu226/status/1891390812795146746