-
Notifications
You must be signed in to change notification settings - Fork 626
feat!: version 13 - dataset evaluators #9642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
mikeldking
wants to merge
32
commits into
main
Choose a base branch
from
version-13
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
commit: |
RogerHYang
requested changes
Sep 25, 2025
Contributor
RogerHYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking feature branch
001f109 to
b65ea42
Compare
An error occurred while trying to automatically change base from
feat/version-12
to
main
September 29, 2025 18:13
fc49ed1 to
2b19c56
Compare
f4ab1f0 to
2ef94fd
Compare
Co-authored-by: Mikyo King <[email protected]> Co-authored-by: Xander Song <[email protected]>
* feat(evaluators): mutations for playground evaluator selector * add test * use upsert update * clean up * clean up * clean up --------- Co-authored-by: Roger Yang <[email protected]>
…ded (#10068) * fix * fix * fix test
* backend * fix tests * fix tests * types * types * update frontend * clean * fix
* add minimal evaluators menu * handle selection * styling * add menu footer * set menu max width * replace footer button with link
* feat: Add name field to evaluators form * Reorganize choices and their default state * Disable prompt save, tools, response format * Move input mapping field from select to combobox * Update arrow icon * Persist input mapping fields across labels * Do not render response format or tools if they are saved as provider default
* Add dummy evaluation payloads to single playground run * Implement for chat mutations and subscription over dataset * Ruff 🐶 and update graphql schema * compile relay * frontend * Add dataset example id and repetition number * Address feedback * Update input typing * Update relay * Load and display real global evaluators --------- Co-authored-by: Alexander Song <[email protected]> Co-authored-by: Tony Powell <[email protected]>
* Add filter and sort capabilities to evaluators * Improve clarity of allowed sort columns
* add EvaluatorSelect to dataset page * stub out evaluator config dialog and rework data fetching * add readonly prompt messages to eval config modal * add output config to modal * add dataset example preview and input mapping section to modal * wire up add evaluator mutation * add suspense boundaries * Refactor promptVersionToInstance to depend on inline fragment * remove unnecessary type annotations: --------- Co-authored-by: Tony Powell <[email protected]>
…0152) * output config resolver * clean
* evaluator crud * clean * patch mutation * update * types * Revert "types" This reverts commit 25579b5. * type ignore * plural delete * clean * decorator * fix metadata * clean * clean * already exists * test * simplify * test * simplify
* add annotation name to eval select * address feedback
…s useful (#10187) * feat(evaluators): provide a useful correctness pre-built evaluator * feat(evaluators): provide a useful correctness pre-built evaluator * simplify
941d3b4 to
4e667b9
Compare
* evaluator prompt validation * cursor tests * clean * condense * test * clean * clean * test * parse pydantic errors * clean * validate mutations * fix tests * validate choices * test with form * test * type check * clean
* include only dataset-specific evaluators in playground eval selector * fix dataset page tab selection * add aria label to dialog * add annotation names to playground select * handle long annotation names * separate components for DatasetEvaluatorSelect and PlaygroundEvaluatorSelect * remove extra opacity css var * updates to Menu * updates to evaluator menus * fix menu item flicker * wip: enable mapping evaluator from playground * formatting
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
feature branch
a feature branch that consolidates multiple features into a single commit on main
size:XS
This PR changes 0-9 lines, ignoring generated files.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
this is the feature branch for the upcoming version 13