This is the official PyTorch implementation of Pose-Aware Diffusion for 3D Generation.
conda create -n pad python=3.10 -y
conda activate padInstall the version matching your CUDA. For CUDA 12.1:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121pip install -r requirements.txttorch-cluster requires a version matching your PyTorch and CUDA. For PyTorch 2.5.1 + CUDA 12.1:
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.5.1+cu121.htmlFor other versions, see PyG installation guide.
MoGe is used for monocular depth estimation to lift a 2D image into a 3D point cloud.
pip install git+https://github.com/microsoft/MoGe.gitThe following models are downloaded automatically from HuggingFace on first run (~17GB total). No manual action needed.
| Model | HuggingFace ID | Purpose | Cache Location |
|---|---|---|---|
| Hunyuan3D-2.1 (DiT + VAE) | tencent/Hunyuan3D-2.1 |
Base 3D generation model | ~/.cache/hy3dgen/ |
| DINOv2-Large | facebook/dinov2-large |
Image condition encoder (loaded via config YAML) | ~/.cache/huggingface/ |
| MoGe v2 | Ruicheng/moge-2-vitl-normal |
Monocular depth estimation | ~/.cache/huggingface/ |
The finetuned denoiser weights must be downloaded separately from zzh0000/PAD:
Option A: Using huggingface-cli (recommended)
huggingface-cli download zzh0000/PAD pytorch_model.bin --local-dir checkpoints/Option B: Using Python
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="zzh0000/PAD", filename="pytorch_model.bin", local_dir="checkpoints/")Option C: Direct download
Download pytorch_model.bin from https://huggingface.co/zzh0000/PAD and place it under checkpoints/.
# Minimal example — other models download automatically on first run
python inference.py \
--image ./assets/example_images/052.png \
--output ./results/ \
--ckpt_config configs/objaverse_ptscond.yaml \
--ckpt checkpoints/pytorch_model.bin# Process all images in a directory
python inference.py \
--image ./assets/example_images/ \
--output ./results/ \
--ckpt_config configs/objaverse_ptscond.yaml \
--ckpt checkpoints/pytorch_model.binYou can also run with the base Hunyuan3D-2.1 model only (no finetuning):
python inference.py --image ./assets/example_images/052.png --output ./results/Output is one .glb 3D mesh file per input image.
- Release scene generation model weights and inference code
- Release training code
- Release data preprocessing code
This work is built on many amazing open source projects, thanks to all the authors!
This project builds upon Hunyuan3D-2.1 and is subject to the Tencent Hunyuan Community License Agreement. Non-commercial use only.