HDMI is a framework that enables humanoid robots to acquire diverse whole-body interaction skills directly from monocular RGB videos of human demonstrations. This repository contains the official training code for HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos.
Set up the environment, then install IsaacSim, IsaacLab, and HDMI:
# 1) Conda env
conda create -n hdmi python=3.10 -y
conda activate hdmi
# 2) IsaacSim
pip install "isaacsim[all,extscache]==4.5.0" --extra-index-url https://pypi.nvidia.com
isaacsim # test isaacsim
# 3) IsaacLab
cd ..
git clone [email protected]:isaac-sim/IsaacLab.git
cd IsaacLab
git checkout v2.2.0
./isaaclab.sh -i none
# 4) HDMI
cd ..
git clone https://github.com/LeCAR-Lab/HDMI
cd HDMI
pip install -e .This codebase is designed to be a flexible, high-performance RL framework for Isaac Sim, built from composable MDP components, modular RL algorithms, and Hydra-driven configs. It relies on tensordict/torchrl for efficient data flow.
active_adaptation/envs/— unified base env with composable modular MDP components: Documentation →.learning/— single-file PPO implementations: Documentation →.
scripts/— training, evaluation, visualization entry points: Documentation →.cfg/— Hydra configs for tasks, algorithms, and app launch settingsdata/— motion assets and samples referenced by configs
HDMI-specific code is primarily in active_adaptation/envs/mdp/commands/hdmi/ (commands, observations, rewards) and active_adaptation/learning/ppo_roa.py (PPO with residual action distillation).
The training scripts load motion data from motion.npz (see active_adaptation/utils/motion.py). The desired data format is as follows:
- Body states:
pos,quat,lin_vel,ang_vel→[T, B, 3/4] - Joint states:
pos,vel→[T, J]
T = time steps, B = bodies (including appended objects), J = joints. Body/joint ordering is defined in the accompanying meta.json.
To turn HOI/video data into this format:
- Convert human motion to robot motion via GVHMR → GMR/LocoMujoco to obtain robot body/joint states.
- Extract the object trajectory (position, orientation, velocities).
- Append the object name to
meta.json, then concatenate the object body states (pos,quat,lin_vel,ang_vel) to the robot body states so shapes become[T, B_robot + B_object, 3/4].
Visualize motions in Isaac Sim with +task.command.record_motion=true:
python scripts/train.py algo=ppo_roa_train task=G1/hdmi/move_suitcase +task.command.replay_motion=trueOr visualize a motion.npz in MuJoCo:
# one terminal
python scripts/vis/mujoco_mocap_viewer.py
# another terminal
python scripts/vis/motion_data_publisher.py <path-to-motion-folder>Teacher policy
# train teacher
python scripts/train.py algo=ppo_roa_train task=G1/hdmi/move_suitcase
# evaluate teacher
python scripts/play.py algo=ppo_roa_train task=G1/hdmi/move_suitcase checkpoint_path=run:<teacher-wandb_run_path>Student policy
# train student
python scripts/train.py algo=ppo_roa_finetune task=G1/hdmi/move_suitcase checkpoint_path=run:<teacher-wandb_run_path>
# evaluate student
python scripts/play.py algo=ppo_roa_finetune task=G1/hdmi/move_suitcase checkpoint_path=run:<student-wandb_run_path>To export trained policies, add export_policy=true to the play script.
Please see github.com/EGalahad/sim2real for details.
If you find our work useful for your research, please consider cite us:
@misc{weng2025hdmilearninginteractivehumanoid,
title={HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos},
author={Haoyang Weng and Yitang Li and Nikhil Sobanbabu and Zihan Wang and Zhengyi Luo and Tairan He and Deva Ramanan and Guanya Shi},
year={2025},
eprint={2509.16757},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2509.16757},
}