Skip to content

Conversation

@benjibc
Copy link
Contributor

@benjibc benjibc commented Dec 24, 2025

  • Reject comma-separated --entry values in upload command
  • Enforce single test selection after discovery
  • Fix non-interactive mode to auto-select first test only
  • Simplify upload logic by removing loop (only 1 evaluator per run)

This prevents duplicate evaluator creation when the same test
is discovered multiple times or when using ep create rft.


Note

Behavior change: single evaluator per upload

  • Rejects comma-separated --entry; only one module::function or path::function is allowed
  • After discovery, requires exactly one test; in non-interactive mode auto-selects the first and warns when multiple are found
  • Removes per-test loop; upload now runs once per command with a single qualname/source_file_path

Implementation updates

  • Simplifies evaluator ID generation and normalization for the single selected test
  • Computes a single pytest entry_point and performs one create_evaluation call
  • Updates success/error messaging and return codes to reflect single-run execution

Written by Cursor Bugbot for commit 56fe56a. This will update automatically on new commits. Configure here.

- Reject comma-separated --entry values in upload command
- Enforce single test selection after discovery
- Fix non-interactive mode to auto-select first test only
- Simplify upload logic by removing loop (only 1 evaluator per run)

This prevents duplicate evaluator creation when the same test
is discovered multiple times or when using ep create rft.
@benjibc benjibc requested a review from xzrderek December 24, 2025 08:11
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 334 to +341
def _prompt_select(tests: list[DiscoveredTest], non_interactive: bool) -> list[DiscoveredTest]:
"""Prompt user to select tests to upload."""
if non_interactive:
return tests
# In non-interactive mode, auto-select only the first test
if len(tests) > 1:
print(f"Note: {len(tests)} tests discovered. Auto-selecting first test in non-interactive mode:")
print(f" {_format_test_choice(tests[0], 1)}")
return [tests[0]]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Non-interactive selection masks ambiguous test choice

In _prompt_select non-interactive mode now always returns [tests[0]], so _discover_and_select_tests no longer surfaces when multiple evaluation tests are discovered. create_rft._resolve_evaluator (eval_protocol/cli_commands/create_rft.py:319-341) depends on len(selected_tests) != 1 to force disambiguation; with this change ep create rft --yes in a repo containing multiple tests will silently choose whichever test _discover_tests returns first and proceed to create datasets/jobs for it instead of erroring, risking creating resources for the wrong evaluator.

Useful? React with 👍 / 👎.

if len(tests) > 1:
print(f"Note: {len(tests)} tests discovered. Auto-selecting first test in non-interactive mode:")
print(f" {_format_test_choice(tests[0], 1)}")
return [tests[0]]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty list access causes crash in non-interactive mode

The _prompt_select function now accesses tests[0] without checking if the list is empty. Previously, return tests would safely handle an empty list by returning [], but return [tests[0]] raises an IndexError if tests is empty. Although the current caller _discover_and_select_tests validates that tests is non-empty before calling this function, this change makes _prompt_select less robust and could cause crashes if a future refactor or new caller passes an empty list.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants