Using a dishwasher in a home kitchen or wiping a whiteboard while moving around an office—scenes that humans take for granted—are “high-difficulty challenges” for humanoid robots, requiring the coordinated operation of joints throughout their entire bodies.
Recently, a team from UC Berkeley published a research paper titled “Coordinated Humanoid Manipulation with Choice Policies” on arXiv. Through an innovative approach combining “modular teaching” and “intelligent action selection,” they successfully solved the core problem of full-body coordination in humanoid robots, paving the way for their integration into real-world human environments.

The “Two Major Dilemmas” Hindering Humanoid Robots from Entering Daily Life
Humanoid robots have long been hailed as promising tools capable of assisting humans with daily tasks in unstructured environments such as homes and offices. However, two key challenges have prevented them from breaking out of the “laboratory boundary” and achieving practical application:
Challenge 1: Difficulty in Full-Body Coordination; High Cost and Scarcity of Teaching Data
Tasks that require long-term continuous execution, such as using a dishwasher or moving to wipe a blackboard, demand simultaneous coordination of the robot’s head (for target localization), hands (for grasping and manipulation), and legs (for movement and balance). This mimics the human state of “eyes leading hands” with steady footing.
However, traditional teleoperation modes require operators to control dozens or even hundreds of joints simultaneously. This not only presents an extremely high operational difficulty and causes rapid operator fatigue but also makes it difficult to collect high-quality demonstration data. Without reliable “teacher demonstrations,” robots naturally struggle to learn complex coordinated movements.
Challenge 2: Incompatibility Between Action “Flexibility” and “Response Speed”
Humans often have multiple viable ways to perform the same action (e.g., holding a plate can be done by supporting it with five fingers or gripping the edge with the thumb). This “action diversity” is a key difficulty in enabling robots to imitate humans.
Traditional solutions either suffer from being “too rigid”: for instance, behavior cloning techniques only allow robots to learn one fixed action sequence, causing them to fail when encountering slightly varied scenarios;
or they are “too sluggish”: while diffusion policies can generate multiple action options, they require repeated calculations with high latency, making it impossible to meet real-time operational needs (such as missing the optimal alignment timing when inserting a plate).
Tackling Both Problems: Solving Dilemmas with “Modular Teaching + Intelligent Action Selection”
Addressing these two major challenges, the Berkeley team avoided the traditional path of complex all-in-one control. Instead, they proposed a combined solution of “modular simplified teaching + multi-candidate intelligent action selection,” achieving an effect greater than the sum of its parts (“1+1>2”):
1. Simplifying “Teaching”: Modular Teleoperation Allows Anyone to Become a “Robot Teacher” in 10 Minutes
The team divided full-body robot control into four “user-friendly” modules, allowing operators to easily control the robot using VR controllers without requiring specialized skills:

① Hand-Eye Coordination Module: The head follows hand movements, ensuring the “eyes” always focus on the operational area.
② Hand Grasping Module: Pulling the trigger enables a “power grasp,” while adjusting the joystick allows for fine-tuning of the thumb position to precisely control grip strength.
③ Arm Tracking Module: The orientation of the VR controller is directly mapped to the robot’s arm; wherever the controller moves, the arm follows.
④ Omni-directional Movement Module: By switching to joystick mode, operators can control the robot’s forward/backward movement, lateral shifting, or turning.
This design significantly lowers the barrier to entry. Operators can get started within 10 minutes, reducing fatigue and enabling the rapid collection of large amounts of high-quality demonstration data. This effectively provides robots with an efficient “personal tutor,” preventing blind imitation.
2. Optimizing “Decision-Making”: Choice Policy Algorithm Enables Robots to “Instantly Select Optimal Actions”
Discarding the drawbacks of traditional solutions, the team designed a mechanism featuring “multi-candidate action generation + real-time scoring and filtering.” The robot generates multiple feasible action plans simultaneously (e.g., three different postures for holding a plate) and then uses a trained model to score each option, instantly selecting the optimal solution.
This process is similar to how humans quickly weigh several options in their minds before choosing the safest one. It preserves action diversity while ensuring response speed, perfectly resolving the core contradiction between “rigidity” and “sluggishness.”

3. Research Methodology: Bidirectional Synergy Between Algorithms and Hardware, with StarDroid’s Humanoid Robot as Key Support
The success of this research relies on the deep integration of algorithmic innovation and hardware performance. The hardware advantages of StarDroid’s full-size bipedal humanoid robot, StarDroid STAR1, provided strong support for implementing algorithms, allowing “modular teaching” and “multi-candidate decision-making” to function effectively:

1. Ultra-High Degrees of Freedom + Precise Control, Adapting to Hand and Arm Module Requirements
StarDroid STAR1 is equipped with two StarDroid XHAND1 hands, each featuring 12 fully active degrees of freedom without passive joints. This means the fingers can perform more refined and flexible movements, perfectly matching the requirements of the “hand grasping module.”
When an operator triggers a “power grasp” via the controller, the robot’s fingers adjust their grip strength with precision akin to humans, preventing plates from breaking or erasers from slipping; meanwhile
Its bionic arm features a high-rigidity design with seven degrees of freedom, enabling rapid response to “arm tracking” commands. This prevents operational errors caused by hardware lag and ensures precise execution of module instructions.
2. Omni-directional Mobility and Stable Balance Support Mobile Manipulation Tasks
Tasks such as mobile blackboard erasing, which require simultaneous movement and operation, place extremely high demands on a robot’s leg performance. Each leg of the Xingdong STAR1 has six degrees of freedom and supports omni-directional movement (forward/backward, left/right, and turning), making it perfectly suited for teleoperation “mobility modules.”
More importantly, it is equipped with built-in attitude sensors and low-level PD controllers that adjust joint forces in real time. Similar to how humans naturally shift their center of gravity while walking, this allows the robot to maintain stability during movement—forming the core hardware foundation for achieving the deep integration of mobility and manipulation described in the paper.
3. Multi-Sensor Fusion Empowers Hand-Eye Coordination Modules
Hand-eye coordination is critical for the success of long-duration tasks, requiring precise visual feedback.
The full-size humanoid robot from Xingdong Era features RGB and depth cameras on its head, enabling it to quickly capture target locations (such as dishwasher slots or whiteboard stains) and synchronize this visual information with the hand manipulation module. This achieves “where the eyes see, the hands aim.”
Data from the paper indicates that without hand-eye coordination, dishwasher slots are easily obstructed, causing the robot to fail in insertion due to poor visibility. With STAR1’s high-definition visual sensors combined with the flexible rotation of its two-degree-of-freedom head, the slot remains visible at all times, significantly improving operational success rates.
4. High Robustness Design Ensures Smooth Experiment Progression
The study required ten consecutive trials to verify stability. The Xingdong STAR1 humanoid robot’s 55 actuated degrees of freedom (head: 2 + waist: 3 + arms: 7×2 + legs: 6×2 + hands: 12×2) provide ample motion redundancy. Coupled with anti-interference hardware design, this effectively minimizes issues such as hardware failures and network timeouts, ensuring the continuous collection of high-quality demonstration data—a crucial prerequisite for fairly comparing three algorithms and highlighting the advantages of the Choice Policy.

IV. Outperforming Traditional Solutions: Hand-Eye Coordination is Key
The team conducted extensive experiments in two real-world scenarios, with results intuitively demonstrating the advantages of the new approach. Hand-eye coordination and the Choice Policy algorithm emerged as decisive factors:
1. Core Task: Dishwasher Loading (10 Consecutive Trials)
This task tests “head-hand synergy,” requiring four steps: sliding plates → grasping → hand-to-hand transfer → insertion into slots. Failure in any step results in overall task failure.
Without Hand-Eye Coordination: All methods failed almost entirely during the “insertion” phase, with success rates of only 10%-20%. The core reason was that slots were obstructed, leaving the robot unable to see where to insert them.
With Hand-Eye Coordination: Choice Policy stood out—achieving a 100% grasping success rate, 90% hand-to-hand transfer success rate, and 70% insertion success rate. In contrast, traditional “Behavior Cloning” achieved only a 50% insertion success rate, while the “Diffusion Policy,” hindered by high latency, also reached just 50%.


2. Advanced Task: Whiteboard Erasing (5 Consecutive Trials)
This is a more complex “move-and-operate” synergy task, requiring the sequence: locate eraser with head → grasp → walk to whiteboard → erase. It demands high levels of whole-body coordination.
Traditional “Behavior Cloning”: Grasping, walking, and erasing success rates were all only 20%. Tasks were frequently interrupted due to balance loss during movement or inaccurate positioning.
Choice Policy: Grasping, walking, and erasing success rates all reached 40%. While there is still room for improvement overall, this represents a doubling of performance compared to traditional methods, fully demonstrating the capability of “deep integration of mobility and manipulation.”

3. Three Key Findings
Hand-Eye Coordination is Core to Long-Duration Tasks: Without it, even precise individual hand or leg operations will lead to overall failure due to inaccurate targeting.
The “Scoring Mechanism” of Choice Policy is its Core Advantage: Ablation studies showed that if actions were selected randomly, averaged, or fixed, the maximum insertion success rate was only 30%. However, using a scoring system to select the optimal action achieved 70%, proving the necessity of intelligent selection.
Hardware Redundancy is Indispensable: The Xingdong STAR1 humanoid robot’s 55 actuated degrees of freedom allow for flexible movement adjustments to adapt to different candidate solutions, while its low-latency characteristics ensure the advantage of “real-time action selection.”

V. Advancing Humanoid Robots from “Laboratories” to “Real Life”
This research represents not only an algorithmic breakthrough but also brings three core implementation values for the industrialization of humanoid robots, accelerating their integration into daily life:
1. Reducing “Teaching Costs,” Allowing Ordinary People to Teach Robots Tasks
Modular teleoperation enables non-professionals to learn how to teach robots within 10 minutes, eliminating reliance on expensive professional engineers and significantly reducing the cost of collecting high-quality demonstration data. This means robots will have more “learning materials,” doubling training efficiency.
2. Solving Implementation Pain Points, Adapting to Real-World Unstructured
Environment
Choice Policy resolves the contradiction between “stiff movements” and “slow reactions.” Supported by high-degree-of-freedom hardware like the Star-Dynamics STAR1, robots can operate robustly in complex environments such as homes (loading dishwashers, folding laundry), offices (erasing whiteboards, organizing documents), and warehouses (moving goods). This capability allows them to completely break free from reliance on the “ideal scenarios” typical of laboratory settings.
3. Establishing a “Software-Hardware Synergy” Paradigm to Provide Replicable Templates for the Industry
The research demonstrates that the combination of “modular teleoperation (data collection) + Choice Policy (algorithmic learning) + high-degree-of-freedom hardware (execution)” is entirely feasible, offering a clear technical template for subsequent humanoid robot development.
In particular, the hardware design of the Star-Dynamics STAR1 validates that “multi-degree-of-freedom + precise control + stable locomotion” are key to implementing complex tasks, providing hardware manufacturers with clear directions for optimization.
4. Enhancing Robustness to Handle Uncertainties in Real-World Environments
In scenarios outside the training scope, such as “unseen plate colors” or “shifted plate positions,” Choice Policy maintains a higher success rate than traditional methods. This indicates that robots can adapt to changes in real-world environments—a core threshold for transitioning from “laboratory prototypes” to “practical products.”
In the future, with further optimization of this technical framework, the integration of humanoid robots into daily life may be realized sooner than expected: upon returning home from work, you might find that dishes have already been neatly loaded into the dishwasher; entering the office, you would see that residual writing on the whiteboard has already been erased by a robot.

Paper Title:
Coordinated Humanoid Manipulation with Choice Policies
Paper Link: