Bingnan Li 1* · Chen-Yu Wang 1* · Haiyang Xu 1* · Xiang Zhang 1 · Ethan Armand 1 · Divyansh Srivastava 1 · Xiaojun Shan 1 · Zeyuan Chen 1 · Jianwen Xie 2 · Zhuowen Tu 1
1UC San Diego · 2Lambda, Inc.
Examples from OverLayBench with difficulty increasing from left to right.
Despite steady progress in layout-to-image generation, current methods still struggle with layouts containing significant overlap between bounding boxes. We identify two primary challenges: (1) large overlapping regions and (2) overlapping instances with minimal semantic distinction. Through both qualitative examples and quantitative analysis, we demonstrate how these factors degrade generation quality. To systematically assess this issue, we introduce OverLayScore, a novel metric that quantifies the complexity of overlapping bounding boxes. Our analysis reveals that existing benchmarks are biased toward simpler cases with low OverLayScore values, limiting their effectiveness in evaluating models under more challenging conditions. To reduce this gap, we present OverLayBench, a new benchmark featuring balanced OverLayScore distributions and high-quality annotations. As an initial step toward improved performance on complex overlaps, we also propose CreatiLayout-AM, a model trained on a curated amodal mask dataset. Together, our contributions establish a foundation for more robust layout-to-image generation under realistic and challenging scenarios.
- [2025-09-23]: The preprint is available on arXiv!
- [2025-09-19]: OverLayBench is accepted by NeurIPS 2025 D&B Track! 🎉🎉🎉
- [2024-06-17]: The code and the evaluation toolkit are released!
If you are using Multi-GPUs, we recommend you to use vllm for accelerated inference.
git clone https://github.com/mlpc-ucsd/OverLayBench.git
cd OverLayBenchPyTools
conda create -n overlaybench python=3.10.16 --yes
conda activate overlaybench
bash install_vllm.shOtherwise, you may also choose to use the default huggingface transformers, which is slower but more stable.
git clone https://github.com/mlpc-ucsd/OverLayBench.git
cd OverLayBenchPyTools
conda create -n overlaybench python=3.10.16 --yes
conda activate overlaybench
bash install.shOverLayBenchMeter assumes that the generated images are organized in the following structure:
EXP_NAME
├── simple
│ ├── seed_1
│ │ ├── img_id_1.png
│ │ ├── img_id_2.png
│ │ ├── img_id_3.png
│ │ └── ...
│ ├── seed_2
│ └── seed_3
├── medium
│ ├── seed_1
│ ├── seed_2
│ └── seed_3
└── hard
├── seed_1
├── seed_2
└── seed_3
According to the discussion, for vllm inference, please set environment variable VLLM_WORKER_MULTIPROC_METHOD=spawn before running the code.
Also, please make sure the OverLayBenchMeter is initialized within if __name__ == "__main__": block to avoid the RuntimeError: Cannot re-initialize CUDA in forked subprocess error.
from overlaybenchpytools.meter import OverLayBenchMeter
if __name__ == "__main__":
meter = OverLayBenchMeter(
root='{YOUR_GENERATED_IMAGES_DIR}',
extension='png', save_dir='./metrics',
resolution=1024, bs_qwen="all", use_vllm=True,
vllm_args={"tensor_parallel_size": 8})
for split in ["simple", "medium", "hard"]:
meter.set_split(split, '{YOUR SEED}')
meter.evaluate()For transformers based inference, please remove the use_vllm and the vllm_args argument and set bs_qwen to a reasonable size.
from overlaybenchpytools.meter import OverLayBenchMeter
if __name__ == "__main__":
meter = OverLayBenchMeter(
root='{YOUR_GENERATED_IMAGES_DIR}',
extension='png', save_dir='./metrics',
resolution=1024, bs_qwen=8)
for split in ["simple", "medium", "hard"]:
meter.set_split(split, '{YOUR_SEED}')
meter.evaluate()OverLayBenchMeter covers the evaluation of mIoU, Overlay mIoU(o-mIoU), Entity Success Rate (SR_E),
Relashionship Success Rate (SR_R), Relationship Success Rate (SR_R), Global CLIPScore and Local CLIPScore.
For FID, please refer to the IQA-PyTorch package.
The expected log file structure is as follows:
metrics
├── overlay_bench.log
├── simple
│ ├── seed_1
│ │ ├── baseline_bbox_predictions.json
│ │ ├── entity_VQA.json
│ │ ├── relation_VQA.json
│ │ └── ...
│ ├── seed_2
│ └── seed_3
├── medium
│ ├── seed_1
│ ├── seed_2
│ └── seed_3
└── hard
├── seed_1
├── seed_2
└── seed_3
Comparison of generated images from different models on OverLayBench.
We deeply appreciate the contributions of the following projects:
@article{li2025overlaybench,
title={OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps},
author={Li, Bingnan and Wang, Chen-Yu and Xu, Haiyang and Zhang, Xiang and Armand, Ethan and Srivastava, Divyansh and Shan, Xiaojun and Chen, Zeyuan and Xie, Jianwen and Tu, Zhuowen},
journal={arXiv preprint arXiv:2509.19282},
year={2025}
}
