Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMs
-
Updated
Jun 17, 2025 - Python
Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMs
Live Deep Research Bench. A challenging, objective benchmark for deep research tasks.
Benchmark for evaluating advanced reasoning, recursive dependency resolution, and robustness capabilities of large language models in dynamic, noisy, and structurally challenging environments.
Non-Western symbolic reasoning benchmark for LLMs. Multi-step rule-following inference within the Ba Zi (八字) formal system. Frozen lookup tables, Python reference implementation, mechanically verified gold CoT cases.
Public preview of Society of Thought, a Qwen adapter that reasons through visible multi-persona debate, with benchmark evidence, raw traces, and demo.
Add a description, image, and links to the reasoning-benchmark topic page so that developers can more easily learn about it.
To associate your repository with the reasoning-benchmark topic, visit your repo's landing page and select "manage topics."