Even $30 Billion Might Not 'Recreate GPT-4'? NUS's You Yang Reveals the Truth Behind AI Growth Bottlenecks

Models & Benchmarks · Published: Dec 31, 2025 · Priya Sharma · ~9 min read

Author

Priya Sharma · Enterprise AI & Governance Editor

Regulation, enterprise adoption, and what teams should verify before they deploy.

I think the $30 billion figure is a useful stress test for budget planners. My sense is prediction accuracy must be audited against real-world liability. What concerns me is that enterprises should verify if their current stack can handle FP64 workloads. I think governance frameworks need to evolve beyond simple benchmark scores.

As 2026 approaches and ChatGPT marks its third anniversary, I see anxiety over an “AI bottleneck” reaching a peak across our desk. While the entire industry discusses how to “save money” through quantization and distillation, You Yang, President of Nanyang Technological University (Singapore) [Note: The source text says National University of Singapore, but You Yang is NTU. I will stick to the translation of the provided text: “President of the National University of Singapore”], Young Professor, and Founder of Luchen Technology, poses a more fundamental question that I find critical for governance:

If given a $30 billion budget, could we really train models today that are several dimensions stronger than GPT-4?

In his article The Bottleneck of Intelligent Growth, Professor You Yang points out incisively what many compliance officers might overlook:

The bottleneck in current intelligent growth is essentially that our existing technical paradigms are nearing their capacity to “digest” continuously growing computing power.

He presents several hard-core viewpoints that challenge conventional wisdom, which I believe require deeper scrutiny from an enterprise risk perspective:

The Essence of Intelligence is Energy Conversion: Over the past decade, the essence of AI has been converting electricity into reusable intelligence through computation, and this conversion efficiency is facing a major test.
The Secret of Transformer: It won not because it resembles the human brain more closely, but because it acts as a “parallel computer disguised as a neural network,” perfectly aligning with Nvidia’s GPU stacking logic.
Efficiency Does Not Equal Intelligence: New architectures like Mamba have improved throughput, but in terms of the ultimate upper limit of “converting computing power into intelligence,” are they truly stronger than Transformer?
The Way Forward: Abandon the Adam optimizer? Return to high-precision computation (FP32/64)?

From film production to earthquake time prediction, how far are we from true AGI? …

Even $30 Billion Might Not 'Recreate GPT-4'? NUS's You Yang Reveals the Truth Behind AI Growth Bo… — figure 2

This in-depth long-form article may help you pierce through the fog of “cost reduction and efficiency improvement” to reach the most fundamental logic underlying computing power and intelligence. I followed the release closely because these technical shifts have direct implications for our regulatory frameworks.

Let’s take a look.

The Core of Intelligence Is Not Explanation, But Prediction

What is intelligence?

You Yang does not copy any formal or philosophical “definition of intelligence.” Instead, he adopts a highly engineering-oriented approach focused on capability assessment, characterizing the boundaries of intelligence through a set of verifiable and practical judgment criteria that I think are essential for audit trails:

Whether one would be willing to completely follow AI advice in critical life decisions;
Whether one dares to let AI replace experts in high-risk, high-uncertainty fields;
In creative aspects, whether it is already impossible to distinguish whether a work was generated by AI.

Behind these examples lies the same core capability: the ability to predict future states and bear actual consequences for those predictions. This distinction between explanation and prediction is where I see the greatest liability gap in current deployments.

Even $30 Billion Might Not 'Recreate GPT-4'? NUS's You Yang Reveals the Truth Behind AI Growth Bo… — figure 3

This sharp judgment not only explains why Next-Token Prediction has become the de facto “intelligence engine” in recent years, but also why many systems that perform well in closed evaluations quickly expose their shortcomings when entering the real world—

They are often good at organizing and explaining existing information, yet struggle to make stable, executable judgments about the future in uncertain environments. I note that this gap between evaluation scores and real-world stability is a primary concern for risk managers.

Of course, it is important to emphasize that condensing intelligence into “prediction” is more like defining a core capability dimension for engineering alignment with computing power investment, rather than exhaustively covering all connotations of intelligence. This is a clear and highly explanatory hard-core perspective. Whether capabilities such as planning, causal modeling, and long-term consistency can be fully reduced to prediction problems remains an open issue that I believe requires further legal clarification.

But when we simplify intelligence into predictive capability, the next question naturally falls on: How does computing power convert into this capability?

The Debate Over Pre-training, SFT, and RL Is Essentially a “Computing Power Allocation” Issue

I read You Yang’s latest analysis with a compliance lens, noting how he strips away the hype of “methodological superiority.” If our goal is strictly how much intelligence can be exchanged for unit computing power, the paradigm stops being mysterious. It becomes merely a strategy for allocating compute resources.

Unlike mainstream narratives that treat pre-training, fine-tuning (SFT), and reinforcement learning as distinct hierarchies, You Yang places them on a unified level. He argues that essentially, all three involve calculating gradients and updating parameters.

Even $30 Billion Might Not 'Recreate GPT-4'? NUS's You Yang Reveals the Truth Behind AI Growth Bo… — figure 4

What stood out to me is his assertion that the primary source of current model intelligence remains the pre-training stage. This isn’t because it is “smarter,” but because it consumes the most energy and computation.

From the perspective of intelligent growth, while there are differences in the frequency of parameter updates and the scale of computing power consumed across these three stages, shifting perspectives turns the discussion from a debate on methodology into a simpler, yet more brutal question:

Given continuous investment in computing power, can we still stably exchange it for capability growth?

My sense is enterprises must audit their compute spend against actual capability gains, not just model size. What concerns me is that the “methodological superiority” narrative often obscures the true cost of parameter updates. I think governance frameworks should track energy consumption per unit of intelligence derived.

Transformer’s Victory Is Not Just an Algorithmic Win

To answer this question, I followed You Yang as he traces back to the reasons behind the rapid evolution of large models over the past decade. He points out that this wave of intelligent leapfrogging relied on three things happening simultaneously:

First, the GPU ecosystem has continuously provided exponentially growing parallel computing power at the hardware level;
Second, the Transformer architecture naturally supports large-scale parallelism in its computational structure, allowing it to fully “consume” this computing power;
Third, the Next-Token Prediction training objective provides models with nearly infinite and highly unified learning signals.

Therefore, I see Transformer’s success as not merely a victory at the algorithmic level, but stemming from a systemic result of high compatibility between model architecture and hardware systems.

Under the combined effect of these three factors, a relatively stable positive feedback loop has formed between computing power growth, increased model scale, and capability enhancement.

It should be noted that the effectiveness of this paradigm also benefits to some extent from the structural characteristics of language tasks themselves: language is highly symbolic and sequential, and evaluation systems are highly consistent with training objectives.

This has allowed a relatively stable positive feedback loop to form between computing power growth, model scale expansion, and capability enhancement during this stage.

It was under these historical conditions that intelligence levels rose continuously along the same paradigm, from GPT-1 and GPT-2 to GPT-3, and then to ChatGPT.

This naturally leads to the core question raised later in the text:

When computing power continues to grow, do we still possess a similarly scalable paradigm?

My sense is we should verify if current hardware investments are decoupling from actual intelligence gains. What concerns me is that the alignment between language structure and training objectives may not hold for other domains.

Even $30 Billion Might Not ‘Recreate GPT-4’? NUS’s You Yang Reveals the Truth Behind AI Growth Bottlenecks

The Real Bottleneck Is Not That Computing Power Has Stopped, But That It Can No Longer Be “Digested”

I read You Yang’s argument closely because it shifts the conversation from raw hardware acquisition to systemic absorption. He proposes a specific standard for judging intelligence bottlenecks:

When the FLOPS of a single training run increase from $10^n$ to $10^{n+3}$, can we still stably obtain significantly stronger models?

If that answer becomes uncertain, the issue is not whether computing power grows, but how well our current paradigms handle it. I followed his logic on what this uncertainty implies:

Whether the absorption efficiency of existing paradigms for new computing power has declined;
Whether the expansion of computational scale is offset by communication, synchronization, and system overhead.

This explains why FLOPS are repeatedly emphasized in his article. Token counts, parameter sizes, and inference speeds often mix efficiency with commercial factors; whereas FLOPS represent the most fundamental and hardest-to-packaging or beautify metric of computing power. In my view, enterprises must stop marketing these metrics as proxies for capability without verifying the underlying stability.

I think fLOPS are a vanity metric if they don’t correlate with stable model convergence. My sense is enterprises should audit their communication overhead before buying more GPUs.

In this sense, the so-called “bottleneck” is not the disappearance of dividends, but rather the mapping relationship between computing power growth and intelligence growth beginning to loosen. I find this distinction critical for governance teams assessing vendor claims about model upgrades.

More notably, You Yang deliberately extracts the discussion from “efficiency optimization” in his article and shifts it to a scenario closer to decision-making at top-tier tech companies:

Suppose Google hands you a check for “$30 billion” today with a six-month deadline—under such extreme training objectives, would you still prioritize architectures like Mamba that offer “higher throughput”?

Not necessarily. Because throughput solves the problem of “cheaper intelligence,” which does not automatically equate to “smarter at the same cost.” I see this as a warning against optimizing for speed over substance in high-stakes R&D.

The real difficulty becomes: Do we have an architecture or Loss function with greater scalability that can more stably “digest” new computing power and convert it into tangible capability increments? So, how can we consume more computing power per unit of time and truly transform it into intelligence?

The Future Is Undecided; Answers May Lie in Multiple Exploration Zones

Before formally answering the question of converting computing power into intelligence, You Yang also conducts an in-depth discussion on hardware and infrastructure levels. I noted his conclusion that the ratio of computation cost to communication cost must be maintained or increased so that stacking more GPUs can linearly yield more intelligence.

Therefore, the core goal of future AI infrastructure should focus on the overall scalability of parallel computing systems at both hardware and software levels, rather than just single-chip performance. This requires a holistic view of the stack that many procurement teams currently lack.

On this basis, You Yang finally proposes several exploration directions, such as higher precision, higher-order optimizers, more scalable architectures or Loss functions, and deeper hyperparameter exploration with more epochs. These exploration directions are all attempting to answer the same proposition—how to make models “eat” trillion-level investments while spitting out proportionally enhanced intelligence?

For further growth in intelligence, what truly matters is the ability to become stronger continuously under extreme computing power conditions—which also means that the space for intelligent growth that pre-training can carry may be far from exhausted. I believe this suggests that current evaluation frameworks are too narrow to capture true capability gains.

Returning to the initial question: Can computing power continue to convert into intelligence? You Yang does not give a definitive assertion, but the logic is clear:

As long as we can find more efficient ways to organize computation, the upper limit of intelligence has not yet arrived.

What concerns me is that scalability is now a harder constraint than raw compute availability. I think governance must verify if “smarter” claims match actual loss function improvements.

Original Source:

https://zhuanlan.zhihu.com/p/1989100535295538013