Add SCT (Script Concordance Test) benchmark #5

liamgmccoy · 2026-01-03T04:58:09Z

Add SCT (Script Concordance Test) Benchmark

Summary

This PR adds a new benchmark for evaluating AI clinical reasoning using Script Concordance Tests (SCTs).

Examples are from the public SCT-Bench/sctpublic repository.

Paper: McCoy et al., NEJM AI 2025

What is SCT?

Script Concordance Testing is a validated assessment method that measures clinical reasoning by evaluating how new information affects diagnostic or therapeutic hypotheses. Unlike multiple-choice questions, SCT captures the nuanced, probabilistic nature of clinical decision-making.

Changes

New benchmark: benchmarks/sct/
- prompt.md - Prompt template for SCT questions
- schema.json - JSON schema for response validation
- validator.py - API validation script
- inputs/ - 5 example test cases
- outputs/ - Reference outputs for examples
- README.md - Benchmark documentation

Response Format

Models respond with a JSON object containing a rating (-2 to +2) and rationale:

{
  "Rating": 1,
  "Rationale": "Brief clinical justification"
}

Example Cases

Five calibration examples are included, covering the full rating scale:

Example	Clinical Context	Expected
001	Otitis externa vs oral antibiotics	-2
002	Pediatric diarrhea + fever	-1
003	Pregnancy test + denial	0
004	Atopic dermatitis + fever/rash	+1
005	Trisomy 21 + petechiae	+2

Full Benchmark

The complete benchmark includes 750 validated questions from 10 international medical institutions across multiple specialties (internal medicine, emergency medicine, neurology, pediatrics, physiotherapy).

Testing

python benchmarks/sct/validator.py example_001

Adds a new benchmark for evaluating AI clinical reasoning using Script Concordance Tests from McCoy et al., NEJM AI 2025. - prompt.md: SCT prompt template - schema.json: JSON response validation schema - validator.py: API validation script - inputs/: 5 example test cases (from SCT-Bench/sctpublic) - outputs/: Reference outputs for examples Paper: https://ai.nejm.org/doi/full/10.1056/AIdbp2500120

vishnuravi merged commit 0c24004 into HealthRex:main Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SCT (Script Concordance Test) benchmark #5

Add SCT (Script Concordance Test) benchmark #5

Uh oh!

liamgmccoy commented Jan 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add SCT (Script Concordance Test) benchmark #5

Add SCT (Script Concordance Test) benchmark #5

Uh oh!

Conversation

liamgmccoy commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add SCT (Script Concordance Test) Benchmark

Summary

What is SCT?

Changes

Response Format

Example Cases

Full Benchmark

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liamgmccoy commented Jan 3, 2026 •

edited

Loading