Related Publication, Project Website: aura-research.org

Parallel Training Envs.

I trained reinforcement learning policies for BRUCE, a full-body humanoid robot, using GPU-accelerated MJX simulation with thousands of parallel environments via JAX. Complex parallel mechanisms—such as differential drives, four-bar, and five-bar linkages—were accurately modeled using MuJoCo’s equality constraints to enforce kinematic consistency. To enable zero-shot sim-to-real transfer, the simulation included domain randomization (actuator dynamics, contact, sensor noise, latency) and injected disturbances (external pushes, torque noise). This setup enabled BRUCE to learn robust locomotion skills like walking, turning, and recovery without hardware fine-tuning.

Auto-Training

I built a multi-agent LLM framework powered by retrieval-augmented generation (RAG) to automate curriculum learning for humanoid reinforcement learning. Instead of relying on manual tuning, the system uses a team of language agents that retrieve relevant prior training examples and collaboratively generate new environment configurations, reward functions, and difficulty stages. This creates an adaptive, closed-loop training process that evolves based on policy performance. Compared to expert-designed curricula, the RAG-driven system achieved faster learning, better generalization, and stronger robustness to noise and disturbances. In benchmark comparisons, it matched or exceeded the performance of human-tuned pipelines—while operating almost entirely autonomously after initialization.

Zero-Shot Deployment

The policies developed through this automated training framework were deployed to the BRUCE humanoid robot in zero-shot conditions. On hardware, BRUCE successfully demonstrated robust locomotion across diverse real-world scenarios, including walking on outdoor surfaces like asphalt and grass, and performing challenging transitions such as stepping off elevated platforms. These zero-shot deployments showed strong stability and adaptability, even in conditions that were not explicitly seen during training. The ability to generalize so effectively from simulation to hardware highlights the strength of the curriculum learning pipeline and the fidelity of the simulated environment.

Simulation rollout of BRUCE trained for omni-directional locomotion on rough terrain with high perturbations and domain randomization.

BRUCE_outside1-ezgif.com-optimize (2).gif

BRUCE walking outside on dirt and grass.

2025-05-0721-27-56online-video-cutter.com1-ezgif.com-video-to-gif-converter.gif

BRUCE stabilizing after walking off a 50mm high platform.