enforce single evaluator upload per command #387

benjibc · 2025-12-24T08:11:12Z

Reject comma-separated --entry values in upload command
Enforce single test selection after discovery
Fix non-interactive mode to auto-select first test only
Simplify upload logic by removing loop (only 1 evaluator per run)

This prevents duplicate evaluator creation when the same test
is discovered multiple times or when using ep create rft.

Note

Behavior change: single evaluator per upload

Rejects comma-separated --entry; only one module::function or path::function is allowed
After discovery, requires exactly one test; in non-interactive mode auto-selects the first and warns when multiple are found
Removes per-test loop; upload now runs once per command with a single qualname/source_file_path

Implementation updates

Simplifies evaluator ID generation and normalization for the single selected test
Computes a single pytest entry_point and performs one create_evaluation call
Updates success/error messaging and return codes to reflect single-run execution

^{Written by Cursor Bugbot for commit 56fe56a. This will update automatically on new commits. Configure here.}

- Reject comma-separated --entry values in upload command - Enforce single test selection after discovery - Fix non-interactive mode to auto-select first test only - Simplify upload logic by removing loop (only 1 evaluator per run) This prevents duplicate evaluator creation when the same test is discovered multiple times or when using ep create rft.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-24T08:14:59Z

eval_protocol/cli_commands/utils.py

 def _prompt_select(tests: list[DiscoveredTest], non_interactive: bool) -> list[DiscoveredTest]:
    """Prompt user to select tests to upload."""
    if non_interactive:
-        return tests
+        # In non-interactive mode, auto-select only the first test
+        if len(tests) > 1:
+            print(f"Note: {len(tests)} tests discovered. Auto-selecting first test in non-interactive mode:")
+            print(f"  {_format_test_choice(tests[0], 1)}")
+        return [tests[0]]


Non-interactive selection masks ambiguous test choice

In _prompt_select non-interactive mode now always returns [tests[0]], so _discover_and_select_tests no longer surfaces when multiple evaluation tests are discovered. create_rft._resolve_evaluator (eval_protocol/cli_commands/create_rft.py:319-341) depends on len(selected_tests) != 1 to force disambiguation; with this change ep create rft --yes in a repo containing multiple tests will silently choose whichever test _discover_tests returns first and proceed to create datasets/jobs for it instead of erroring, risking creating resources for the wrong evaluator.

Useful? React with 👍 / 👎.

cursor · 2025-12-24T08:16:59Z

eval_protocol/cli_commands/utils.py

+        if len(tests) > 1:
+            print(f"Note: {len(tests)} tests discovered. Auto-selecting first test in non-interactive mode:")
+            print(f"  {_format_test_choice(tests[0], 1)}")
+        return [tests[0]]


Empty list access causes crash in non-interactive mode

The _prompt_select function now accesses tests[0] without checking if the list is empty. Previously, return tests would safely handle an empty list by returning [], but return [tests[0]] raises an IndexError if tests is empty. Although the current caller _discover_and_select_tests validates that tests is non-empty before calling this function, this change makes _prompt_select less robust and could cause crashes if a future refactor or new caller passes an empty list.

benjibc requested a review from xzrderek December 24, 2025 08:11

chatgpt-codex-connector bot reviewed Dec 24, 2025

View reviewed changes

cursor bot reviewed Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enforce single evaluator upload per command #387

enforce single evaluator upload per command #387

benjibc commented Dec 24, 2025 •

edited by cursor bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 24, 2025

Uh oh!

cursor bot Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

enforce single evaluator upload per command #387

Are you sure you want to change the base?

enforce single evaluator upload per command #387

Conversation

benjibc commented Dec 24, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

cursor bot Dec 24, 2025

Choose a reason for hiding this comment

Empty list access causes crash in non-interactive mode

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benjibc commented Dec 24, 2025 •

edited by cursor bot

Loading