- Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations. CVPR24. project page
- Emu3. Emu3: Next-Token Prediction is All You Need. 2024. project pager
- Simran Khanuja et al., An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance. EMNLP 2024 Best Paper. paper.
- Janus Series: Unified Multimodal Understanding and Generation Models. GitHub.