Not Simulated, Not VLA, Not Teleoperated: Tashi Zhixang Unveils 'Capable General-Purpose Embodied Large Model' AWE3.0

Author Info

Amara Okonkwo

Robotics & Embodied AI Editor

M.Eng. Robotics (Imperial College London); former field applications engineer

Amara covers humanoids, industrial automation, and simulation-to-real transfer. She interviews practitioners about safety cases, unit economics, and dataset quality rather than demo videos alone. Her reviews call out what is lab-only versus commercially deployed.

#Embodied AI #Industrial Robotics #Simulation #Safety & Deployment

Full author profile →

The countdown has begun for robots to perform production line operations in real-world, complex environments.

“AWE 3.0 empowers ItStone’s A1 robot to claim the first Guinness World Record for embodied intelligence in industrial precision manipulation!” At the launch event for “ItStone ZhiHang Embodied General Large Model AWE 3.0 and Data Solution SenseHub,” Dr. Ding Wenchao, Chief Scientist at ItStone ZhiHang, delivered a keynote speech and officially unveiled the world’s first general-purpose embodied large model capable of practical work: AWE 3.0. This is the industry’s first large model to undergo a Turing test for flexible manipulation, comprehensively endowing robots with tangible industrial capabilities and enabling them to truly handle tasks in complex physical worlds.

AWE 3.0: Comprehensive Upgrade of Five Core Capabilities, Building a “Valuable” Large Model

The globally first general-purpose embodied large model capable of practical work, AWE 3.0, released by ItStone today, has achieved breakthroughs in cross-scenario migration and generalization, as well as performance improvements in fluent millimeter-level precision manipulation, flexible object perception and control, and stable execution of long-horizon tasks. Its core capabilities are defined by “stepping out of the lab, landing in real-world applications, and achieving universal generalization.”

AWE 3.0 combines two mature capability enhancements with three hard-core technological breakthroughs. It retains core advantages such as whole-body end-to-end learning and dynamic spatio-temporal reasoning, while leveraging ItStone’s newly self-developed Omni-Sense Decision (OSD) to eliminate viewpoint dependency. This allows robots to increase task success rates by three times in unseen viewpoints, ensuring stable and reliable operations in complex, changing real-world environments. Relying on the WIYH dataset—which boasts over one million hours of data scale—and rich tactile data, AWE 3.0 utilizes High-Density Tactile Sensing (HTS) to make robotic tactile perception more acute. This enables millimeter-level fine-grained responses and significantly enhances generalization capabilities. Additionally, through Latent Action Smoothing (LAS), AWE 3.0 ensures smooth task execution, reducing jitter by over 45% and largely eliminating stuttering. This allows robots to handle professional scenarios such as precision assembly and flexible manufacturing.

In short, the emergence of AWE 3.0 has laid a solid technological foundation for the large-scale implementation of embodied intelligence across various industries.

Omni-Sense Decision (OSD): Adapting Calmly to Unseen Scenarios

Traditional robots rely solely on known, fixed viewpoints of their own bodies. Once the environment changes, they struggle to execute tasks stably—a long-standing industry bottleneck that AWE 3.0 has broken through. ItStone’s AWE 3.0 subversively achieves autonomous decision-making based on world states via Omni-Sense Decision (OSD). Even when facing new viewpoints not encountered during training, it can generate stable operational strategies through reasoning. According to experimental data, OSD improves robot task performance in unseen viewpoints by up to three times, significantly enhancing generalization capabilities in real-world environments. This hard-core strength ensures stable and reliable operations in complex, changing real-world settings, providing robust support for the steady operation of industrial production lines and the reliable execution of complex tasks.

High-Density Tactile Sensing (HTS): Micro-Level Perception, Millimeter Precision

In scenarios requiring high-precision contact and flexible manipulation, tactile perception and feedback are crucial for robots. Leveraging the WIYH dataset, which has accumulated over one million hours of data, along with rich collected tactile data, ItStone’s High-Density Tactile Sensing (HTS) technology enables AWE 3.0 to deeply master common task operation scenarios and contact features in both daily life and industry, such as grasping, assembly, adjustment, wiping, cleaning, and placement. This significantly enhances the robot’s local perception and response capabilities regarding physical contact. Consequently, robots can generate millimeter-level fine-grained actions and perform sensitive, stable closed-loop adjustments to contact changes. This advancement allows robots to truly handle contact-intensive and flexible manipulation scenarios such as precision assembly and wire harness insertion, bringing their fine-operation capabilities up to industrial standards that are implementable, generalizable, and scalable.

Latent Action Smoothing (LAS): Smooth as Silk, Fine-Tuned Precision

Addressing common issues of jitter and stuttering during robotic action execution, AWE 3.0 utilizes Latent Action Smoothing (LAS). By reusing and optimizing action latent variables within the latent space, it achieves continuous and smooth transitions between frames and within frames. This reduces trajectory jitter by over 45%, significantly improving operational stability and fluidity. Combined with a world-aligned latent action space, robots can perform millimeter-level precision operations and possess the ability to automatically recover from near-failure states, thereby supporting more complex long-horizon task execution.

In terms of understanding and predictive reasoning regarding spatial and physical laws—knowing not just “what” but “why”—AWE 3.0 inherits the real-time perception and precise prediction advantages of Dynamic Spatio-temporal Reasoning (DSR). Its spatial description accuracy exceeds industry benchmarks by 21%, and its inference speed is 2.21 times faster, enabling robots to understand environments in real-time and predict future states with “if… then…” hypothetical reasoning capabilities. Combined with Whole-Body End-to-End Learning (E2E-WBC), which integrates perception throughout action coordination across the entire body, robots not only know where objects are and their shapes but also generate more stable and smooth whole-body actions based on human data. This represents a leap in capability from “seeing” to “doing.”

“What kind of data allows robots to truly ‘understand’ the physical world? We believe that Human-Centric data is the only solution, rather than relying on teleoperation or simulation data to create a VLA model that is top-heavy and unusable in real-world complex environments,” Dr. Ding Wenchao stated at the launch event. He introduced ItStone’s globally pioneering Human-Centric data collection paradigm and the high-quality data support provided by the SenseHub data acquisition suite. This unique hardware-software collaborative design enables SenseHub to achieve deep integration of perception, computation, and transmission, providing a complete system-level solution for collecting natural and authentic human behavior data. Furthermore, it empowers ItStone’s proprietary hardware with AI, forming a perfect closed loop. ItStone’s self-developed high-precision whole-body motion capture algorithm fuses multi-modal perception data. In complex scenarios, SenseHub achieves millimeter-level precision and low-latency real-time tracking and 3D reconstruction of whole-body joints and postures, ensuring stable transmission of all dynamic capture data with high bandwidth and low latency during movement, completely freeing users’ range of motion.

Moreover, SenseHub has overcome the industry challenge of synchronizing multi-source heterogeneous data, achieving microsecond-level time synchronization between various whole-body sensors for the first time. This provides a strict and unified temporal baseline for subsequent precise data fusion and analysis. Building on this, ItStone’s one-stop data ground truth service system offers full-chain tools and services ranging from data collection and automated labeling to quality assessment and management analysis, significantly improving the generation efficiency of high-quality training datasets. Thus, ItStone has become the world’s only provider of industrial-grade Human-Centric data acquisition solutions.

From technological breakthroughs to industrial empowerment, the globally first AWE 3.0 has achieved comprehensive dimensional capability upgrades, promoting the implementation of embodied intelligence in diverse real-world scenarios such as manufacturing and services, officially ushering in the era of large-scale commercialization for embodied intelligent brains. “The Guinness World Record is just the starting point; more importantly, it is about helping robots break through more boundaries of capability.”

Source: ItStone ZhiHang

This article is reprinted with authorization from this website. The views expressed are solely those of the original author.