feat!: version 13 - dataset evaluators #9642

mikeldking · 2025-09-25T20:54:32Z

this is the feature branch for the upcoming version 13

pkg-pr-new · 2025-09-25T20:57:43Z

npm i https://pkg.pr.new/Arize-ai/phoenix/@arizeai/phoenix-client@9642

npm i https://pkg.pr.new/Arize-ai/phoenix/@arizeai/phoenix-mcp@9642

commit: 7a0c5f7

RogerHYang

blocking feature branch

Co-authored-by: Mikyo King <[email protected]> Co-authored-by: Xander Song <[email protected]>

* feat(evaluators): mutations for playground evaluator selector * add test * use upsert update * clean up * clean up * clean up --------- Co-authored-by: Roger Yang <[email protected]>

…ded (#10068) * fix * fix * fix test

* backend * fix tests * fix tests * types * types * update frontend * clean * fix

…0075)

* add minimal evaluators menu * handle selection * styling * add menu footer * set menu max width * replace footer button with link

* feat: Add name field to evaluators form * Reorganize choices and their default state * Disable prompt save, tools, response format * Move input mapping field from select to combobox * Update arrow icon * Persist input mapping fields across labels * Do not render response format or tools if they are saved as provider default

* Add dummy evaluation payloads to single playground run * Implement for chat mutations and subscription over dataset * Ruff 🐶 and update graphql schema * compile relay * frontend * Add dataset example id and repetition number * Address feedback * Update input typing * Update relay * Load and display real global evaluators --------- Co-authored-by: Alexander Song <[email protected]> Co-authored-by: Tony Powell <[email protected]>

* Add filter and sort capabilities to evaluators * Improve clarity of allowed sort columns

* add EvaluatorSelect to dataset page * stub out evaluator config dialog and rework data fetching * add readonly prompt messages to eval config modal * add output config to modal * add dataset example preview and input mapping section to modal * wire up add evaluator mutation * add suspense boundaries * Refactor promptVersionToInstance to depend on inline fragment * remove unnecessary type annotations: --------- Co-authored-by: Tony Powell <[email protected]>

…0152) * output config resolver * clean

* evaluator crud * clean * patch mutation * update * types * Revert "types" This reverts commit 25579b5. * type ignore * plural delete * clean * decorator * fix metadata * clean * clean * already exists * test * simplify * test * simplify

* add annotation name to eval select * address feedback

…s useful (#10187) * feat(evaluators): provide a useful correctness pre-built evaluator * feat(evaluators): provide a useful correctness pre-built evaluator * simplify

* evaluator prompt validation * cursor tests * clean * condense * test * clean * clean * test * parse pydantic errors * clean * validate mutations * fix tests * validate choices * test with form * test * type check * clean

…10253)

* include only dataset-specific evaluators in playground eval selector * fix dataset page tab selection * add aria label to dialog * add annotation names to playground select * handle long annotation names * separate components for DatasetEvaluatorSelect and PlaygroundEvaluatorSelect * remove extra opacity css var * updates to Menu * updates to evaluator menus * fix menu item flicker * wip: enable mapping evaluator from playground * formatting

…10292)

mikeldking requested review from a team as code owners September 25, 2025 20:54

github-project-automation bot added this to phoenix Sep 25, 2025

github-project-automation bot moved this to 📘 Todo in phoenix Sep 25, 2025

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Sep 25, 2025

mikeldking changed the base branch from main to feat/version-12 September 25, 2025 20:56

dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Sep 25, 2025

mikeldking changed the title ~~version 13~~ feat!: version 13 - dataset evaluators Sep 25, 2025

mikeldking added the feature branch a feature branch that consolidates multiple features into a single commit on main label Sep 25, 2025

mikeldking marked this pull request as draft September 25, 2025 21:22

RogerHYang requested changes Sep 25, 2025

View reviewed changes

github-project-automation bot moved this from 📘 Todo to 🔍. Needs Review in phoenix Sep 25, 2025

axiomofjoy force-pushed the feat/version-12 branch from 001f109 to b65ea42 Compare September 29, 2025 17:18

Base automatically changed from feat/version-12 to main September 29, 2025 18:13

An error occurred while trying to automatically change base from feat/version-12 to main September 29, 2025 18:13

mikeldking force-pushed the version-13 branch from 3ed12e6 to 4ad9dc8 Compare October 2, 2025 08:01

RogerHYang force-pushed the version-13 branch from 4ad9dc8 to 4da307b Compare October 6, 2025 17:24

RogerHYang removed this from phoenix Oct 6, 2025

RogerHYang force-pushed the version-13 branch from 5afdccb to e0e6709 Compare October 8, 2025 06:19

RogerHYang force-pushed the version-13 branch 4 times, most recently from fc49ed1 to 2b19c56 Compare October 24, 2025 15:47

RogerHYang force-pushed the version-13 branch 3 times, most recently from f4ab1f0 to 2ef94fd Compare October 29, 2025 15:31

mikeldking closed this Nov 4, 2025

mikeldking reopened this Nov 4, 2025

RogerHYang and others added 23 commits November 8, 2025 13:29

fix: drop support for python 3.9 (#9818)

4ca4a46

feat(evaluators): db migration for evaluator tables (#9960)

bd60c88

Co-authored-by: Mikyo King <[email protected]> Co-authored-by: Xander Song <[email protected]>

feat(evaluators): mutations for playground evaluator selector (#10042)

29d8fdc

* feat(evaluators): mutations for playground evaluator selector * add test * use upsert update * clean up * clean up * clean up --------- Co-authored-by: Roger Yang <[email protected]>

feat: Create evaluator mutations with optional dataset_id (#10065)

6e15506

fix: ensure fields of polymorphic evaluator orm types are eagerly loa…

41c11e8

…ded (#10068) * fix * fix * fix test

feat: Evaluators creation page (#10054)

080ca91

fix(evaluators): persist choices (#10076)

c00393c

* backend * fix tests * fix tests * types * types * update frontend * clean * fix

feat: Collect all json path segments when flattening example keys (#1…

8b31543

…0075)

feat(evaluators): add evaluator select (#10063)

6323a3b

* add minimal evaluators menu * handle selection * styling * add menu footer * set menu max width * replace footer button with link

feat: Add examples route with examples table (#10123)

f7307c1

* Add filter and sort capabilities to evaluators * Improve clarity of allowed sort columns

feat: Add optional description field to new evaluator creation (#10132)

629e489

feat: Improve rendering of dataset evals on playground (#10136)

0b311c3

feat: add metadata to evaluator db table (#10139)

b8a666f

fix(evaluators): return annotation name in output config resolver (#1…

db79532

…0152) * output config resolver * clean

feat(evaluators): add annotation name to eval menu (#10156)

03e30b7

* add annotation name to eval select * address feedback

feat: Add evaluators table to dataset evaluators page (#10157)

cbdc6ad

fix: Fix import error on evaluator page (#10185)

69b924d

feat(evaluators): load in a default template for the evaluator that i…

ddffc13

…s useful (#10187) * feat(evaluators): provide a useful correctness pre-built evaluator * feat(evaluators): provide a useful correctness pre-built evaluator * simplify

fix: eslint errors

4e667b9

mikeldking force-pushed the version-13 branch from 941d3b4 to 4e667b9 Compare November 8, 2025 20:43

mikeldking and others added 6 commits November 8, 2025 13:52

ci: add ci for 12 (#10196)

8996ee1

feat: persist tools with eval (#10220)

f0b4c2d

feat: Refactor evaluator form for usage in create and edit workflows (#…

b672c62

…10253)

only include dataset-specific evaluators in playground eval select (#…

b0edf44

…10292)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat!: version 13 - dataset evaluators #9642

feat!: version 13 - dataset evaluators #9642

Uh oh!

mikeldking commented Sep 25, 2025 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Sep 25, 2025

Uh oh!

RogerHYang left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

feat!: version 13 - dataset evaluators #9642

Are you sure you want to change the base?

feat!: version 13 - dataset evaluators #9642

Uh oh!

Conversation

mikeldking commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new bot commented Sep 25, 2025

Uh oh!

RogerHYang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mikeldking commented Sep 25, 2025 •

edited

Loading