-
Notifications
You must be signed in to change notification settings - Fork 10
enforce single evaluator upload per command #387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Reject comma-separated --entry values in upload command - Enforce single test selection after discovery - Fix non-interactive mode to auto-select first test only - Simplify upload logic by removing loop (only 1 evaluator per run) This prevents duplicate evaluator creation when the same test is discovered multiple times or when using ep create rft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def _prompt_select(tests: list[DiscoveredTest], non_interactive: bool) -> list[DiscoveredTest]: | ||
| """Prompt user to select tests to upload.""" | ||
| if non_interactive: | ||
| return tests | ||
| # In non-interactive mode, auto-select only the first test | ||
| if len(tests) > 1: | ||
| print(f"Note: {len(tests)} tests discovered. Auto-selecting first test in non-interactive mode:") | ||
| print(f" {_format_test_choice(tests[0], 1)}") | ||
| return [tests[0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-interactive selection masks ambiguous test choice
In _prompt_select non-interactive mode now always returns [tests[0]], so _discover_and_select_tests no longer surfaces when multiple evaluation tests are discovered. create_rft._resolve_evaluator (eval_protocol/cli_commands/create_rft.py:319-341) depends on len(selected_tests) != 1 to force disambiguation; with this change ep create rft --yes in a repo containing multiple tests will silently choose whichever test _discover_tests returns first and proceed to create datasets/jobs for it instead of erroring, risking creating resources for the wrong evaluator.
Useful? React with 👍 / 👎.
| if len(tests) > 1: | ||
| print(f"Note: {len(tests)} tests discovered. Auto-selecting first test in non-interactive mode:") | ||
| print(f" {_format_test_choice(tests[0], 1)}") | ||
| return [tests[0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty list access causes crash in non-interactive mode
The _prompt_select function now accesses tests[0] without checking if the list is empty. Previously, return tests would safely handle an empty list by returning [], but return [tests[0]] raises an IndexError if tests is empty. Although the current caller _discover_and_select_tests validates that tests is non-empty before calling this function, this change makes _prompt_select less robust and could cause crashes if a future refactor or new caller passes an empty list.
This prevents duplicate evaluator creation when the same test
is discovered multiple times or when using ep create rft.
Note
Behavior change: single evaluator per upload
--entry; only onemodule::functionorpath::functionis allowedqualname/source_file_pathImplementation updates
entry_pointand performs onecreate_evaluationcallWritten by Cursor Bugbot for commit 56fe56a. This will update automatically on new commits. Configure here.