Just Released: GPT-5.5; NVIDIA Engineer Compares Losing It to Amputation

Author Info

Yuki Tanaka

Asia-Pacific AI Markets Reporter

B.A. Economics (University of Tokyo); bilingual EN/JP; former APAC tech wire correspondent

Yuki tracks model launches, cloud partnerships, and industrial policy across East Asia. She sources from company filings, local press briefings, and on-the-ground industry contacts, then contextualizes moves for a global English-speaking audience. She is careful to note translation limits and regional regulatory differences.

#APAC Markets #Cloud Partnerships #Industrial Policy #Cross-Border Launches

Full author profile →

GPT-5.5 has just arrived.

Its official positioning is described as “a new type of intelligence designed for practical work and agents.

This time, Sam Altman did not personally step forward to say he was “shocked, dizzy, and collapsed upon first experience, feeling like he had witnessed an atomic bomb explosion.” Instead, he invited a group of proxies (early test users) to share their experiences.

One of them is an NVIDIA engineer who briefly lost access to GPT-5.5 after the early testing phase concluded, stating:

Losing access to GPT-5.5 is like having a limb amputated.

Jokes aside, the collaboration between OpenAI and NVIDIA this time is unprecedented.

First, GPT-5.5 was jointly designed with NVIDIA’s GB200 and GB300 NVL72 systems. From training to deployment, the model and hardware have been mutually optimized from their inception.

Second, OpenAI promoted Codex across NVIDIA’s entire company, with Altman sharing an email exchange with Jensen Huang.

Let’s look at the data to see the results of this collaboration.

Compared to the previous version, GPT-5.4, the new model has pulled ahead in three key areas: coding, knowledge work, and scientific research.

According to the Artificial Analysis Intelligence Index comprehensive test results, there are two ways to interpret the findings:

GPT-5.5 achieves the same scores as Claude Opus 4.7 and other models while consuming fewer tokens.

Alternatively, for the same token consumption, GPT-5.5 completes more tasks.

However, what surprised people most was not just the benchmark scores.

In every previous model upgrade, “stronger” and “slower” were almost always bundled together. This is the cost of Scaling Laws: larger models, more parameters, and longer thinking times. Users pay for intelligence but also for latency.

GPT-5.5 has broken this iron law.

In real-world production environments, its per-token latency is comparable to GPT-5.4, and it requires fewer tokens to complete the same tasks.

It is more efficient and more powerful.

(But the price has doubled.)

As of press time, the latest version of Codex already supports GPT-5.5.

The context window has also been upgraded to 400K.

Cheating at Coding

Programming is the area where GPT-5.5 shows the most significant improvement.

With previous-generation models, users had to carefully break down tasks, watch them step by step, and be ready to correct deviations at any time.

GPT-5.5 is different. You feed it a requirement, and it breaks it down, executes it, and checks itself. You only need to review the results.

OpenAI demonstrated a 3D action game generated by GPT-5.5 under Codex, running directly in the browser.

This included implementing combat systems, enemy encounters, HUD feedback, and environment textures generated by GPT using TypeScript/Three.js.

In Terminal-Bench 2.0, a hardcore test measuring complex command-line workflows, GPT-5.5 scored 82.7%.

The previous version, GPT-5.4, scored 75.1%, while the current strongest competitor, Claude Opus 4.7, scored 69.4%.

To put it simply: where previous-generation models would get stuck on nearly a third of such difficult problems, this new model reduces that failure rate to less than one-quarter.

Next up are the voices of early testers:

Dan Shipper, an early tester and CEO of a startup who is also an active AI product developer, conducted an experiment. After his app launched with a bug, he hired a top-tier engineer to refactor it. The engineer worked hard and eventually provided a solution.

Then, Shipper rewound the clock: he fed the buggy code into the model to see if it could independently arrive at the same decision as the engineer.

GPT-5.4 couldn’t do it. GPT-5.5 did.

Shipper said this was the first time he felt true “conceptual clarity” from a coding model. It wasn’t just responding; it understood the problem and figured out how to solve it on its own.

More senior engineers are reporting the same thing: GPT-5.5 is significantly stronger in reasoning and autonomy than both GPT-5.4 and Claude Opus 4.7.

It can identify issues in advance and predict testing and review requirements without explicit prompts.

Coding is just the beginning. The same leap in capability is spreading to knowledge work and scientific research.

Beyond Coding

What GPT-5.5 does in Codex goes far beyond writing code: generating documents, organizing spreadsheets, creating presentations.

OpenAI has emphasized multiple times that it understands what you want better than the previous generation.

More importantly, it uses tools on its own and verifies whether its output is correct. You provide a vague idea, and it helps complete the rest.

Here is an interesting data point: over 85% of OpenAI’s own employees use Codex for work every week. (What about the other 15%?)

Let’s look at the evaluation results first.

On the knowledge work benchmark GDPval, GPT-5.5 scored 84.9%, outperforming Claude Opus 4.7 by 4.6 percentage points.

In FrontierMath Tier 4, one of the most difficult math benchmarks currently available, featuring problems from unpublished papers and open questions from top researchers, GPT-5.5 Pro achieved a score of 39.6%. Claude Opus 4.7 scored 22.9%, a gap close to double.

What is truly interesting is how scientists are using it.

Bartosz Naskręcki, an Assistant Professor of Mathematics at Adam Mickiewicz University in Poland, typed a single sentence into Codex. Eleven minutes later, an algebraic geometry visualization application was running.

The application could plot the intersection lines of two quadratic surfaces (marked in red) and transform these intersections into the standard form of Weierstrass curves using the Riemann-Roch theorem. He later expanded it to include more stable singularity visualization features.

One sentence, 11 minutes. In the past, setting up the project framework alone would have taken half a day.

Derya Unutmaz, an Professor of Immunology at the Jackson Laboratory for Genomic Medicine, used GPT-5.5 Pro to analyze a gene expression dataset: 62 samples and nearly 28,000 genes. The result was a complete research report.

He stated that this would have taken his team several months to accomplish.

OpenAI summarized its positioning of GPT-5.5 in scientific research accurately: it is no longer like a one-off answer engine but more like a “research partner.”

Early testers are not just using it for information retrieval. They use it for multi-round paper revisions, pointing out logical flaws item by item, and proposing new analytical approaches. It remembers the entire context of your research, with each conversation building upon the previous one.

GPT-5.5 achieved something significant in the field of mathematics.

Ramsey numbers, one of the core problems in combinatorics.

In layman’s terms, it studies how large a network must be to guarantee that a certain order inevitably emerges.

For example, among six people, there are always three who know each other or three who do not know each other; this is the simplest Ramsey theorem.

It has been a hard nut for mathematicians to crack for decades, particularly the asymptotic properties of off-diagonal Ramsey numbers, which have long remained unresolved.

GPT-5.5 found a new proof path. It did not reproduce known methods but discovered a novel route. Subsequently, this proof was confirmed as correct by Lean, one of the most rigorous formal verification tools in mathematics.

An AI has made an original contribution verified by formal tools in a core field of pure mathematics.

A year ago, this was unimaginable.

The Secret Behind “Stronger but Not Slower”

How is it possible to be “stronger yet faster”?

The answer is not optimization in a single link in the chain. OpenAI tore down and rebuilt the entire reasoning system.

As mentioned earlier, GPT-5.5 was jointly designed with NVIDIA’s GB200 and GB300 NVL72 systems, resulting in a significant leap in intelligence levels under equivalent latency conditions.

But there is another story.

The Codex system powered by GPT-5.5 analyzed weeks of production traffic data and then wrote a partitioning heuristic algorithm for load balancing.

Previously, requests were split into fixed-size chunks distributed to accelerators for processing. However, this fixed chunking strategy was not always optimal under different traffic patterns. Sometimes the chunks were too coarse, sometimes too fine, leading to fluctuating resource utilization.

Codex analyzed weeks of real traffic data and wrote its own adaptive partitioning algorithm, dynamically adjusting the chunking strategy based on actual traffic shapes.

Token generation speed increased by over 20%.

The model optimized the infrastructure running itself; AI is making itself run faster.

The overall reconstruction of the reasoning system, combined with the model participating in its own optimization, led to this result.

OpenAI stated that this is “a step toward a new way of getting computers to do work.”

But when models begin to optimize the infrastructure they run on—

How far has this step really gone?

One More Thing

With GPT-5.5, OpenAI expects the pace of future model releases to accelerate.

We see quite significant progress in the short term and extremely significant progress in the medium term.

I think progress over the past few years was unexpectedly slow.

These words came from Chief Scientist Jakub Pachocki during a press conference call.

References

2047378435555651856. 2047378435555651856 — x.com/firstadopter/status/2047378435555651856