No new GPUs, no new AI accelerators.
Yet a domestic GPU manufacturer did something remarkably tangible during its entire press conference—
It launched the first fully localized, embodied intelligence simulation platform.
Let’s look at the results first.

A robot dog named Xiao Fei slowly walked onto the stage.
Upon reaching the center, Xiao Fei in the simulated world on the screen performed a sideways flip. Immediately after, the physical-world Xiao Fei executed the exact same movement.

It turned around and did it again; the movements were as if copied and pasted.

Xiao Fei’s motion strategy was simple:
100% trained in a simulation environment, then seamlessly transferred to the real physical world.
So, who is this domestic GPU player behind it? And what is this embodied intelligence simulation platform called?
No more suspense.
It is MT Lambda, freshly released by Moore Threads.

The sequence performed by Xiao Fei can be understood as:
This marks the first time a motion control strategy trained on fully domestic hardware has been completely deployed onto domestic edge-side chips, achieving the first real-world verification of Sim-to-Real (simulation to reality).
With this, Moore Threads has become the only GPU company in China to connect the entire chain of “Large Model Training — Simulation — Edge Deployment.”
If the explosion of large models relied on being “fed” massive amounts of internet data, then the rise of embodied intelligence urgently requires an extremely realistic virtual world.
Now, domestic GPUs are starting to build their own worlds.
More Like a Production Line for Physical AI Training
If we break down MT Lambda, it looks more like a production line centered around robot training.
At the top layer are two platforms: MT Lambda-Lab and MT Lambda-Sim.
MT Lambda-Lab focuses on embodied strategy development and training, targeting tasks such as reinforcement learning, imitation learning, and VLA (Vision-Language-Action) models.
For developers, this layer addresses the question of “how to teach agents to act”—specifically, how to train action policies, iterate behaviors, and make models progressively more stable in complex tasks.
MT Lambda-Sim, on the other hand, focuses on high-fidelity physical simulation and rendering, handling scene construction, sensor simulation, data generation, and simulation verification.
It addresses a different question: Can what the robot sees, the objects it encounters, and the feedback after executing actions be as close to the real world as possible?
Together, these two form the main pipeline for embodied intelligence development: Data Synthesis — Strategy Training — Simulation Verification — Edge Deployment.

Why is this pipeline important? Because the real world is too expensive.
Zhang Jianzhong highlighted three major pain points in training a good agent during the press conference:
- First, there is a lack of large amounts of high-quality data; collecting it manually or via teleoperation is very costly.
- Second, training on physical machines carries high risks and costs; robots or robot dogs cannot be allowed to fall and break repeatedly every day.
- Third, real-world scenarios are often uncontrollable and difficult to generalize; a model that works in the lab may fail when moved to a different environment.
These points highlight the most pressing contradiction in the embodied intelligence industry today: models evolve rapidly, but physical scenario accumulation is slow.
Large models can consume internet data, but robots consume real-world data. A cup slipping off a table, a cloth being grasped by a claw, or a car encountering an unexpected obstacle on a rainy night—these tasks are difficult to fully describe with simple text. They involve lighting, materials, friction, collisions, motion trajectories, and sensor feedback. To truly teach robots how to act, these complex scenarios must be produced at low cost, on a large scale, and in a reproducible manner.
The underlying capabilities of MT Lambda revolve around three engines: physics, rendering, and AI.
First, the physics engine.
MT Lambda integrates open-source backends such as MuJoCo-Warp-MUSA and Newton-MUSA, along with Moore Threads’ self-developed AlphaCore physics engine.
These are based on the MUSA architecture for parallel solving, supporting high-precision, differentiable physical calculations. Under typical simulation loads, overall simulation throughput efficiency can be improved by approximately 30 times.
What does this mean?
For robots, the value of a physics engine goes far beyond making things move on screen. When a robotic arm grasps a flexible object, there is force feedback upon fingertip contact; when a quadruped robot lands, different ground materials alter stress and posture; in autonomous driving simulations, the motion relationships between vehicles, pedestrians, and obstacles must adhere to real physical laws. If the simulation is inaccurate, strategies trained within it are likely to fail in reality.
Next, the rendering engine.
MT Lambda features the MT Photon engine, combining ray tracing with hybrid rendering capabilities. It also introduces 3DGS (3D Gaussian Splatting) and self-developed AI generative rendering capabilities, used to
Enhance the realism, detail, and rendering efficiency of simulated environments.
This aspect is particularly critical. Embodied AI must not only calculate actions but also perceive the world. Multimodal inputs from cameras, depth sensors, LiDAR, and tactile sensors all influence how a robot interprets its environment. The more realistic the rendering, the closer synthetic data becomes to real-world data, thereby narrowing the “Sim-to-Real” gap.
During his presentation on the collaboration with Guanglun Intelligence, Zhang Jian mentioned that the MTT S5000 features RT Core ray tracing capabilities, offering nearly a threefold increase in graphics rendering performance. In relevant tests, using the MTT S5000’s hardware-accelerated RT Core for ray tracing resulted in a 2.7x performance boost.

Finally, the AI Engine.
MT Lambda integrates the Torch-MUSA framework, which is deeply adapted for PyTorch, along with acceleration libraries such as muSolver and muFFT. It supports the development and deployment of Vision-Language-Action (VLA) models while incorporating reinforcement learning and imitation learning training paradigms.
In the context of embodied AI, the AI engine corresponds to the training of the robot’s “brain”: it connects vision, language, and action, transforming environmental feedback into subsequent decisions.
Why Can Moore Threads Fit “Compute, Simulation, and Rendering” into a Single Lambda?
This is precisely where the value of a fully functional GPU is amplified. After all, fully functional GPUs are scarce in China.
The demands that embodied AI places on chips extend far beyond AI matrix calculations.
Robot training requires running VLA models, reinforcement learning, and imitation learning—this constitutes AI intelligent computing. It must simulate collisions, friction, dynamics, and complex contacts—this involves scientific computing and physical AI. It needs to generate sufficiently realistic training visuals and sensor data—this is 3D rendering. In the future, it will also involve extensive collection, transmission, generation, and playback of video data, which relies heavily on ultra-high-definition video codec capabilities.
TPUs, NPUs, or certain GPGPU approaches often focus on specific types of AI or general-purpose computing tasks. While they can achieve high efficiency in particular scenarios, embodied AI presents a more complex mix of challenges: it requires training digital brains, constructing physical worlds, and integrating real-world visuals and sensor feedback into the training loop simultaneously.
The underlying reason Moore Threads was able to create MT Lambda as an integrated platform combining physical simulation, rendering, and AI engines lies in its consistent adherence to the fully functional GPU roadmap since its inception.
According to Moore Threads’ definition, fully functional GPUs rely on their proprietary MUSA architecture to support AI computing, graphics rendering, physical simulation, scientific computing, and ultra-high-definition video codec within a single chip.
In other words, MT Lambda is not merely a patchwork of disjointed tools but a platform capability grown from the foundation of a fully functional GPU and the unified MUSA architecture.
For embodied AI, this integration of “compute, simulation, and rendering” aligns perfectly with the actual needs of robot training: running AI models while calculating physical collisions and rendering realistic visuals simultaneously.
In the past, developers might have needed to switch between different hardware and software stacks: one platform for AI training, another for graphics rendering, and a third tool for physical simulation. Data had to be moved across systems, leading to low efficiency, difficult debugging, and accumulated errors.
MT Lambda aims to consolidate these previously disjointed processes onto a single underlying infrastructure. For developers, the ideal state is to spend less time battling low-level adaptations and more time focusing on algorithms, tasks, and scenarios themselves.
Cloud, Edge, and Ecosystem: Closing the Loop
If MT Lambda addresses how to train and simulate, then Moore Threads’ other thread of development focuses on completing the loop across cloud, edge, and ecosystem components.
The Cloud side is powered by the Kuae (KUAE) Intelligent Computing Cluster.
In the era of large models, clusters were primarily understood as training foundations; however, in the embodied AI era, they function like massive robot training grounds. As simulation data scales up, demand expands rapidly:
A single robotic arm trajectory may require generating visuals from multiple camera angles, under various lighting conditions, with different materials and disturbances. Autonomous driving world models may generate vast amounts of test mileage weekly. Humanoid robot training also requires extensive parallel environments for repeated trial and error…
When data reaches scales of millions or tens of millions of frames, the role of underlying computing power shifts from an accelerator to a production line.
The core acceleration units of Moore Threads’ Kuae Intelligent Computing Cluster include the MTT S5000. Based on the fourth-generation MUSA architecture “Pinghu,” the single-card AI dense computing power reaches up to 1,000 TFLOPS. It is equipped with 80GB of VRAM and a memory bandwidth of 1.6TB/s, supporting full-precision calculations from FP8 to FP64. It is also one of the few domestic GPUs that simultaneously support hardware-level ray tracing and AI training/inference.

In the context of embodied AI, these specifications take on clearer meanings: capabilities such as FP8, BF16, and FP16 serve AI training and inference; ray tracing serves high-fidelity rendering; and physical simulation and scientific computing capabilities handle complex dynamic solutions. In other words, embodied AI requires multiple capabilities to collaborate within the same architecture.
The Edge side consists of the Changjiang SoC and E300 AI Module.
While the cloud handles large-scale training and the simulation platform manages trial-and-error and validation, strategies must ultimately run on the robot itself. When acting in the real world, robots cannot always rely entirely on cloud responses. They need to perform perception, decision-making, and control locally, especially for tasks requiring low latency and high reliability. Edge computing power is a necessary component to complete this loop.
Based on Changjiang…
The MTT E300 AI module within the SoC provides 50 TOPS of local computing power, enabling direct deployment on robot terminals to support low-latency, high-reliability real-time responses. In other words, experience trained in the cloud must be transformed into immediate reactions by edge-side modules installed on the robots themselves.
This creates a more complete closed loop: the cloud handles large-scale training and parallel simulation; MTT Lambda manages strategy development, data synthesis, and simulation verification; and the E300 AI module is responsible for executing trained results at the robot terminal.
More importantly, Moore Threads’ layout has begun entering real-world ecosystem validation.
For instance, in its collaboration with BAAI (Beijing Academy of Artificial Intelligence), RoboBrain 2.5 completed end-to-end training on an MTT S5000 cluster comprising thousands of cards. Validation results showed that the training loss curve closely matched that of H100 clusters, with a difference of only 0.62%, and it performed better in certain tasks. The cluster achieved over 90% linear scaling efficiency when expanded from 64 to 1,024 cards.
The significance of these results lies in validating the feasibility of domestic computing power clusters as the foundation for embodied AI model training.
Another example is the collaboration with Guanglun Intelligence, which focuses more on mass production of simulation data. Relying on Moore Threads’ fully functional GPUs and the Kuai’e intelligent computing cluster, combined with Guanglun Intelligence’s “solve-measure-generate” integrated simulation platform, the two parties jointly developed a high-confidence simulation data synthesis solution. Guanglun Intelligence’s high-precision GPU-based physics solver has been adapted to the MUSA architecture, supporting high-precision real-time simulation of complex physical processes such as rigid bodies, soft bodies, fluids, and granular materials. In relevant cases, the accuracy of core physical parameter simulations reached over 99%.
The collaboration with Pony.ai extends scenarios into autonomous driving. Based on the MTT S5000 and Kuai’e intelligent computing cluster, the two parties are advancing the adaptation and verification of world model and vehicle-side model training. Pony.ai’s world model generates over 10 billion kilometers of test data weekly, deriving numerous extreme scenarios. For autonomous driving, long-tail scenarios, extreme hazards, and safety validation are precisely where simulation adds the most value.
Additionally, Moore Threads is working with partners such as Wuyi Vision and RayData Cloud to advance physical AI simulation systems and embodied simulation platforms. Whether it involves 4D Gaussian Splatting (4DGS) model training and inference, synthetic data generation, or the closed loop of task libraries, simulation computing, and virtual verification, these efforts essentially answer one question: Embodied intelligence cannot be developed in isolation by a single company; it requires computing power providers, simulation experts, algorithm developers, and scenario owners to collaborate and build an ecosystem.
This is another key aspect of Moore Threads’ recent launch worth noting.
It has advanced the narrative from “I have a chip” to “I can build an infrastructure system.”
Building platforms upward from the underlying MUSA architecture and fully functional GPUs, connecting downward to edge devices, and horizontally expanding the ecosystem, this approach may not overnight change the industry landscape. However, it has further pushed the battlefield for domestic GPUs beyond large model training and inference into physical AI infrastructure.
The Goal Is Domestic Embodied Intelligence Infrastructure
A major contradiction in embodied intelligence today is that models are advancing rapidly, but scenarios are lagging behind.
In the digital world, large models can continuously evolve through massive amounts of text, image, and video data. However, in the physical world, for robots to learn how to open doors, move boxes, grasp flexible objects, or navigate complex intersections, every action carries real-world costs.
Real-machine data collection is expensive; teleoperation is slow; there is a high risk of equipment damage; dangerous scenarios cannot be tested casually; and long-tail situations are difficult to exhaustively cover. Consequently, simulation-generated data and the Sim-to-Real closed loop have become critical infrastructure for moving embodied intelligence from laboratories to industry applications.
This is why “building worlds” has become the core proposition in the competition for embodied intelligence.
The core value of these virtual worlds does not lie in visual appeal for gaming purposes, but in their ability to train robots, verify robot behaviors, and correct errors. These environments must be realistic enough to reflect lighting, materials, collisions, friction, and sensor noise; efficient enough to generate data at scale through parallel processing; and open enough to allow different models, robots, and scenarios to integrate seamlessly.
From this perspective, Moore Threads’ advantage cannot be summarized by a single technical metric. Its “fully functional GPU + MUSA ecosystem” technology roadmap is naturally better suited to the composite needs of embodied intelligence.
Fully functional GPUs provide capabilities across AI computing, graphics rendering, physics simulation, scientific computing, and video encoding/decoding; MUSA offers a unified software ecosystem; MTT Lambda integrates physical engines, rendering engines, and AI engines; the Kuai’e intelligent computing cluster handles large-scale training and simulation; the Changjiang SoC and E300 AI modules bring these capabilities to the edge; and external ecosystem partners supplement data, scenarios, simulation platforms, and industry applications.
The value of this chain lies in the fact that embodied intelligence is essentially a systematic engineering project.
Large model companies may initially compete on building digital brains, but robot companies ultimately face the challenge of how the brain controls the body, how the body understands the environment, and how the environment can be reproduced at low cost. Whoever can build a training world for robots that is sufficiently realistic, controllable, and scalable at lower costs and higher efficiency will have a better chance of moving embodied intelligence from demos to real production lines, roads, homes, and urban spaces.
Of course, building domestic embodied intelligence infrastructure will not happen overnight.
Continuous verification is needed regarding simulation realism, Sim-to-Real transfer effects, developer ecosystem maturity, and large-scale adoption by industrial clients. How far Moore Threads’ solution can go depends on feedback from subsequent real-world projects, more developers, and a wider range of robot platforms.
But at least judging from this launch event, domestic GPUs are entering a new phase.
They are beginning to move beyond the passive narrative of whether they can replace specific chips, actively defining new computing scenarios: the upgraded “Xiaomai” (Little Wheat) unveiled at the event is a digital agent; the somersaulting robot dog “Xiao Fei” (Little Fly) is a physical agent. As AI moves from screens to reality, and as agents evolve from merely speaking to acting, underlying computing power must simultaneously understand models, graphics, and physics.
Zhang Jianzhong
The event highlighted the aspiration for Moore Threads’ products, ranging from the Kuae supercomputing platform to the Changjiang large-scale model, to empower all intelligent agents.
In the context of embodied AI, this statement can be articulated more concretely: the cloud hosts massive training grounds, simulation environments create virtual worlds, edge devices execute tasks via “cerebellum”-like processing, and ecosystems provide real-world scenarios.
The competition in large models hinges on who can train a more powerful digital brain. In embodied AI, the contest extends to another critical dimension: who can first build a sufficiently realistic training world.
This time, domestic GPUs have begun entering the arena to construct these worlds.