Skip to content

Latest commit

 

History

History
4 lines (4 loc) · 616 Bytes

File metadata and controls

4 lines (4 loc) · 616 Bytes
  1. Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations. CVPR24. project page
  2. Emu3. Emu3: Next-Token Prediction is All You Need. 2024. project pager
  3. Simran Khanuja et al., An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance. EMNLP 2024 Best Paper. paper.
  4. Janus Series: Unified Multimodal Understanding and Generation Models. GitHub.