[Benchmark] Support SArena_MINI Benchmark #1353

JoeLeelyf · 2025-12-07T03:38:43Z

This PR adds support for the SArena_MINI benchmark for SVG understanding, editing, and generation.

SArena is a benchmark for evaluating MLLMs on SVG-related tasks (icons, illustrations, chemistry diagrams, etc.) across understanding, editing, Text-to-SVG (T2SVG), and Image-to-SVG (I2SVG).
SArena_MINI is a small subset sampled from SArena-Icon / SArena-Illustration / SArena-Chemistry, designed for quick validation and debugging of SVG-related capabilities while keeping the original task definitions and metrics.

What this PR does

Adds SArena_MINI benchmark configuration and dataset support to VLMEvalKit.
Defines SArena_MINI as a sampled subset of:
- SArena-Icon: understanding, editing, T2SVG, I2SVG.
- SArena-Illustration: T2SVG, I2SVG.
- SArena-Chemistry: T2SVG, I2SVG.
Reuses the official SArena evaluation metrics for SVG tasks:
- Understanding: O (overall), C (color), G (geometry), Q (quantity), S (semantic).
- Editing / I2SVG: DINO, SSIM, LPIPS, PSNR.
- T2SVG: FID, FID-C, CLIP-T2I, CLIP-I2I, token length.
Provides example configs and evaluation scripts for quickly running SArena_MINI in VLMEvalKit.
Verifies the integration by reproducing InternVL3-8B performance on SArena / SArena_MINI.

Experimental results (InternVL3-8B)

All experiments below use the official SArena metrics.
We compare the original paper numbers (Ori), the official benchmark implementation (Use-Ori-Bench), and this PR’s VLMEvalKit implementation (Ours).

SArena-Icon

Understanding (accuracy)

Setting	O ↑	C ↑	G ↑	Q ↑	S ↑
Ori	59.5	79.1	59.3	38.2	61.3
Use-Ori-Bench	60.775	80.9	63.0	44.1	55.1
Ours	59.7	80.9	60.9	44.6	57.14

Editing (rendered-image metrics)

Setting	DINO ↑	SSIM ↑	LPIPS ↓	PSNR ↑
Ori	0.921	0.761	0.170	29.615
Use-Ori-Bench	0.862	0.637	0.219	25.550
Ours	0.902	0.702	0.196	24.790

Text-to-SVG (T2SVG)

Setting	FID ↓	FID-C ↓	CLIP-T2I ↑	CLIP-I2I ↑
Ori	23.061	14.303	21.897	71.45
Use-Ori-Bench	120.82	23.700	21.68	72.21
Ours	124.84	25.070	21.09	70.64

Image-to-SVG (I2SVG)

Setting	DINO ↑	SSIM ↑	LPIPS ↓	PSNR ↑
Ori	0.812	0.557	0.361	7.22
Use-Ori-Bench	0.813	0.588	0.358	7.763
Ours	0.785	0.516	0.378	6.458

SArena-Illustration

Text-to-SVG (T2SVG)

Setting	FID ↓	FID-C ↓	CLIP-T2I ↑	CLIP-I2I ↑	Tokens
Ori	36.736	25.682	18.493	61.964	493
Use-Ori-Bench	154.02	44.850	17.79	60.9	464
Ours	161.98	49.190	16.89	59.96	1003

Image-to-SVG (I2SVG)

Setting	DINO ↑	SSIM ↑	LPIPS ↓	PSNR ↑	Tokens
Ori	0.772	0.569	0.397	8.542	716
Use-Ori-Bench	0.774	0.546	0.385	8.27	492.4
Ours	0.719	0.309	0.409	5.07	1863

SArena-Chemistry

Text-to-SVG (T2SVG)

Setting	FID ↓	FID-C ↓	CLIP-I2I ↑	DINO ↑	SSIM ↑	LPIPS ↓	PSNR ↑
Ori	33.613	61.675	56.856	0.865	0.783	0.203	13.84
Use-Ori-Bench	114.63	65.220	57.51	0.871	0.807	0.193	14.49
Ours	137.13	76.880	53.13	0.791	0.791	0.256	9.317

The differences between the original paper, the official benchmark implementation, and this PR’s results are within a reasonable range. This suggests that the SArena_MINI subset and its VLMEvalKit integration provide a reliable and efficient way to validate SVG-related tasks.

References

InternSVG

Version

torch: 2.6.0
flash_attn: 2.7.4.post1
transformers: 4.49.0

add support for SArena_MINI

85a83f0

mzr1996 approved these changes Dec 8, 2025

View reviewed changes

Merge branch 'main' into feature/add-support-SArena-mini

845b00d

mzr1996 merged commit 18ce87c into open-compass:main Dec 11, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Support SArena_MINI Benchmark #1353

[Benchmark] Support SArena_MINI Benchmark #1353

JoeLeelyf commented Dec 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Benchmark] Support SArena_MINI Benchmark #1353

[Benchmark] Support SArena_MINI Benchmark #1353

Conversation

JoeLeelyf commented Dec 7, 2025

What this PR does

Experimental results (InternVL3-8B)

SArena-Icon

Understanding (accuracy)

Editing (rendered-image metrics)

Text-to-SVG (T2SVG)

Image-to-SVG (I2SVG)

SArena-Illustration

Text-to-SVG (T2SVG)

Image-to-SVG (I2SVG)

SArena-Chemistry

Text-to-SVG (T2SVG)

References

Version

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants