Mechanical Intelligence-Aware Curriculum Reinforcement Learning for Humanoids with Parallel Actuation

Yusuke Tanaka*, Alvin Zhu*, Quanyou Wang, Yeting Liu, and Dennis Hong

* denotes co-first authorship

Accepted to Humanoids 2025

Paper

Code

Video

Overview of BRUCE 5-bar and 4-bar linkages, and the RL training results with accurate simulation modeling

Overview

Reinforcement learning (RL) has enabled advances in humanoid robot locomotion, yet most learning frameworks do not account for mechanical intelligence embedded in parallel actuation mechanisms due to limitations in simulators for closed kinematic chains. This omission can lead to inaccurate motion modeling and suboptimal policies, particularly for robots with high actuation complexity. This paper presents general formulations and simulation methods for three types of parallel mechanisms: a differential pulley, a five-bar linkage, and a four-bar linkage, and trains a parallelmechanism aware policy through an end-to-end curriculum RL framework for BRUCE, a custom kid-sized humanoid robot. Unlike prior approaches that rely on simplified serial approximations, we simulate all closed-chain constraints natively using GPU-accelerated MuJoCo (MJX), preserving the hardware’s mechanical nonlinear properties during training. We benchmark our RL approach against a model predictive controller (MPC), demonstrating better surface generalization and performance in real-world zero-shot deployment. This work highlights the computational approaches and performance benefits of fully simulating parallel mechanisms in end-to-end learning pipelines for legged humanoids.

Overview of all the parallel mechanisms on BRUCE and modeled in MuJoCo simulation.

BRUCE’s lower body uses three distinct parallel mechanisms: a cable-driven differential pulley at the hip, a four-bar linkage, and a five-bar linkage in the legs. These designs combine motor outputs, reduce moving mass, and provide high transmission ratios. Instead of simplifying these closed-chain linkages into serial joints, our simulation models all equality constraints directly in GPU-accelerated MuJoCo (MJX). This preserves the true actuator-to-output mappings for position, velocity, and torque, while also making nonlinear transmission and singularities in the four-bar and five-bar linkages explicit. With this high-fidelity model, we train locomotion policies through curriculum reinforcement learning that control the same actuator joints as the hardware, achieving zero-shot transfer and showing the benefits of simulating parallel actuation fully.

We trained BRUCE with a curriculum reinforcement learning pipeline that gradually increased task difficulty, starting from simple balancing and progressing to dynamic walking under disturbances. The entire process ran on GPU-accelerated MuJoCo (MJX), allowing thousands of parallel environments to simulate BRUCE’s closed-chain mechanisms efficiently. This large-scale, staged training produced robust policies that transferred directly to the real robot.

A high-level overview of BRUCE’s RL training framework. The robot’s mechanically intelligent parallel mechanisms were modeled in simulation and trained with AURA, an autonomous RL training framework.

Full Video

Citation

If you use this work or find it helpful, please consider citing: (bibtex)

@misc{tanaka2025mechanicalintelligenceawarecurriculumreinforcement,
title={Mechanical Intelligence-Aware Curriculum Reinforcement Learning for Humanoids with Parallel Actuation},
author={Yusuke Tanaka and Alvin Zhu and Quanyou Wang and Dennis Hong},
year={2025},
eprint={2507.00273},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2507.00273},
}