We build and benchmark world models, embodied agents, and world–action models — systems that simulate, predict, and act in the physical world. Our public releases include video generation benchmarks, robot policies, action models, and the datasets and judges that hold them to a physical standard.
Phyground is a benchmark for the physical plausibility of text-and-image-to-video (ti2v) generations: 250 curated prompts, a 13-physical-law taxonomy across solid-body, fluid, and optical domains, and a quality-controlled human study (459 annotators, 37K fine-grained labels). We also release PhyJudge-9B, an open VLM judge fine-tuned on the human ratings.
| Artifact | Link |
|---|---|
| 🌐 Project page | phyground.github.io |
| 💻 Code | PhyGround |
| 📦 Dataset | phyground |
| 🧑⚖️ Judge model | phyjudge-9B |
PhyWorld is a video-generation world model post-trained from Wan2.2-I2V-A14B in two stages — flow-matching fine-tuning for temporal coherence, then DPO over physics preference pairs sourced from the PhyGround human-annotation pool. Reaches 3.09 on PhyGround (vs. 2.99 for the strongest open baseline) and 0.769 on VBench (vs. 0.756 or below for SOTA baselines).
| Artifact | Link |
|---|---|
| 🌐 Project page | nu-world-model-embodied-ai.github.io/PhyWorld |
| 💻 Code | PhyWorld |
| 🤖 Model | phyworld |
The two projects share a loop: the same human annotations that score every model on PhyGround supply the preference pairs that train PhyWorld. More releases — robot policies, world–action models, and additional benchmarks — are on the way.