HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos

HDMI is a framework that enables humanoid robots to acquire diverse whole-body interaction skills directly from monocular RGB videos of human demonstrations. This repository contains the official training code for HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos.

🚀 Quick Start

Set up the environment, then install IsaacSim, IsaacLab, and HDMI:

# 1) Conda env
conda create -n hdmi python=3.10 -y
conda activate hdmi

# 2) IsaacSim
pip install "isaacsim[all,extscache]==4.5.0" --extra-index-url https://pypi.nvidia.com
isaacsim # test isaacsim

# 3) IsaacLab
cd ..
git clone [email protected]:isaac-sim/IsaacLab.git
cd IsaacLab
git checkout v2.2.0
./isaaclab.sh -i none

# 4) HDMI
cd ..
git clone https://github.com/LeCAR-Lab/HDMI
cd HDMI
pip install -e .

Repository Structure

This codebase is designed to be a flexible, high-performance RL framework for Isaac Sim, built from composable MDP components, modular RL algorithms, and Hydra-driven configs. It relies on tensordict/torchrl for efficient data flow.

active_adaptation/
- envs/ — unified base env with composable modular MDP components: Documentation →.
- learning/ — single-file PPO implementations: Documentation →.
scripts/ — training, evaluation, visualization entry points: Documentation →.
cfg/ — Hydra configs for tasks, algorithms, and app launch settings
data/ — motion assets and samples referenced by configs

HDMI-specific code is primarily in active_adaptation/envs/mdp/commands/hdmi/ (commands, observations, rewards) and active_adaptation/learning/ppo_roa.py (PPO with residual action distillation).

Data Preparation

Desired Data Format

The training scripts load motion data from motion.npz (see active_adaptation/utils/motion.py). The desired data format is as follows:

Body states: pos, quat, lin_vel, ang_vel → [T, B, 3/4]
Joint states: pos, vel → [T, J]

T = time steps, B = bodies (including appended objects), J = joints. Body/joint ordering is defined in the accompanying meta.json.

Processing Steps

To turn HOI/video data into this format:

Convert human motion to robot motion via GVHMR → GMR/LocoMujoco to obtain robot body/joint states.
Extract the object trajectory (position, orientation, velocities).
Append the object name to meta.json, then concatenate the object body states (pos, quat, lin_vel, ang_vel) to the robot body states so shapes become [T, B_robot + B_object, 3/4].

Verify Your Data

Visualize motions in Isaac Sim with +task.command.record_motion=true:

python scripts/train.py algo=ppo_roa_train task=G1/hdmi/move_suitcase +task.command.replay_motion=true

Or visualize a motion.npz in MuJoCo:

# one terminal
python scripts/vis/mujoco_mocap_viewer.py
# another terminal
python scripts/vis/motion_data_publisher.py <path-to-motion-folder>

Train and Evaluate

Teacher policy

# train teacher
python scripts/train.py algo=ppo_roa_train task=G1/hdmi/move_suitcase
# evaluate teacher
python scripts/play.py algo=ppo_roa_train task=G1/hdmi/move_suitcase checkpoint_path=run:<teacher-wandb_run_path>

Student policy

# train student
python scripts/train.py algo=ppo_roa_finetune task=G1/hdmi/move_suitcase checkpoint_path=run:<teacher-wandb_run_path>
# evaluate student
python scripts/play.py algo=ppo_roa_finetune task=G1/hdmi/move_suitcase checkpoint_path=run:<student-wandb_run_path>

To export trained policies, add export_policy=true to the play script.

Sim2Real

Please see github.com/EGalahad/sim2real for details.

Citation

If you find our work useful for your research, please consider cite us:

@misc{weng2025hdmilearninginteractivehumanoid,
      title={HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos}, 
      author={Haoyang Weng and Yitang Li and Nikhil Sobanbabu and Zihan Wang and Zhengyi Luo and Tairan He and Deva Ramanan and Guanya Shi},
      year={2025},
      eprint={2509.16757},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2509.16757}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.vscode		.vscode
active_adaptation		active_adaptation
cfg		cfg
data/motion		data/motion
record_motion/x65r1823		record_motion/x65r1823
scripts		scripts
.gitignore		.gitignore
FAQ.md		FAQ.md
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos

🚀 Quick Start

Repository Structure

Data Preparation

Desired Data Format

Processing Steps

Verify Your Data

Train and Evaluate

Sim2Real

Citation

Star History

About

Uh oh!

Releases

Packages

Contributors 2

Languages

LeCAR-Lab/HDMI

Folders and files

Latest commit

History

Repository files navigation

HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos

🚀 Quick Start

Repository Structure

Data Preparation

Desired Data Format

Processing Steps

Verify Your Data

Train and Evaluate

Sim2Real

Citation

Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages