Add per eval score threshold support to Evalite #352

cantemizyurek · 2025-11-26T18:24:28Z

Per-eval `scoreThreshold` override

Adds the ability to set scoreThreshold per-eval, overriding the global threshold when specified.

Usage

evalite("Eval that requires high percision", {
  data: [...],
  task: ...,
  scoreThreshold: 95, // Requires 95%
});

evalite("Baseline Eval", {
  data: [...],
  task: ...,
  scoreThreshold: 80, // Only requires 80%
});

Behavior

Per-eval threshold overrides global when set
Falls back to global threshold when not specified
If ANY eval fails its threshold, exit code is 1

Watch mode output

When thresholds fail, shows which evals failed:

 FAIL  Score threshold not met. Watching for file changes...
       - High-Quality Eval: 75% < 95%
       - Baseline Eval: 60% < 80%

Solves: #347

- Introduced per-eval score thresholds to enhance evaluation control. Solves: mattpocock#347

changeset-bot · 2025-11-26T18:24:32Z

🦋 Changeset detected

Latest commit: 73636d9

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2025-11-26T18:24:34Z

@cantemizyurek is attempting to deploy a commit to the Skill Recordings Team on Vercel.

A member of the Team first needs to authorize it.

mattpocock · 2025-12-03T12:38:19Z

Keeping this on ice for now, will put this in the post-v1 milestone.

feat: Add per eval score threshold support to Evalite

cdb4c17

- Introduced per-eval score thresholds to enhance evaluation control. Solves: mattpocock#347

cantemizyurek requested a review from mattpocock November 26, 2025 18:24

feat: Add changeset

73636d9

mattpocock added this to the Post-v1 milestone Dec 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add per eval score threshold support to Evalite #352

Add per eval score threshold support to Evalite #352

Uh oh!

cantemizyurek commented Nov 26, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

vercel bot commented Nov 26, 2025

Uh oh!

mattpocock commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add per eval score threshold support to Evalite #352

Are you sure you want to change the base?

Add per eval score threshold support to Evalite #352

Uh oh!

Conversation

cantemizyurek commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Per-eval scoreThreshold override

Usage

Behavior

Watch mode output

Uh oh!

changeset-bot bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel bot commented Nov 26, 2025

Uh oh!

mattpocock commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cantemizyurek commented Nov 26, 2025 •

edited

Loading

Per-eval `scoreThreshold` override

changeset-bot bot commented Nov 26, 2025 •

edited

Loading