reasoning-benchmark

Star

Here are 5 public repositories matching this topic...

InternScience / MME-Reasoning

Star

Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMs

logical-reasoning reasoning-benchmark vlm-benchmark

Updated Jun 17, 2025
Python

microsoft / livedrbench

Star

Live Deep Research Bench. A challenging, objective benchmark for deep research tasks.

reasoning-agents deep-research reasoning-benchmark

Updated Oct 16, 2025
Python

vbepipe / vmrrb-benchmark

Star

Benchmark for evaluating advanced reasoning, recursive dependency resolution, and robustness capabilities of large language models in dynamic, noisy, and structurally challenging environments.

benchmark dependency-resolution ai-accuracy multistep-reasoning ai-evaluation large-language-models ai-reasoning llm-evaluation reasoning-benchmark llm-benchmark ai-reliability recursive-reasoning ai-stability long-chain-reasoning

Updated May 15, 2026
Python

Notcokeaddictedanymore / Bazi-Bech

Star

Non-Western symbolic reasoning benchmark for LLMs. Multi-step rule-following inference within the Ba Zi (八字) formal system. Frozen lookup tables, Python reference implementation, mechanically verified gold CoT cases.

python nlp ai evaluation chinese-nlp symbolic-reasoning ba-zi rule-based-system multi-step-reasoning llm chain-of-thought llm-evaluation reasoning-benchmark llm-benchmark

Updated May 13, 2026
Python

scasella / society-of-thought-bench

Star

Public preview of Society of Thought, a Qwen adapter that reasons through visible multi-persona debate, with benchmark evidence, raw traces, and demo.

benchmark evaluation transformer debate language-model gradio reasoning llm qwen reasoning-benchmark multi-persona reasoning-trace

Updated Apr 1, 2026
Python

Improve this page

Add a description, image, and links to the reasoning-benchmark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the reasoning-benchmark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reasoning-benchmark

Here are 5 public repositories matching this topic...

InternScience / MME-Reasoning

microsoft / livedrbench

vbepipe / vmrrb-benchmark

Notcokeaddictedanymore / Bazi-Bech

scasella / society-of-thought-bench

Improve this page

Add this topic to your repo