Implement Evaluator Versions #402

dphuang2 · 2026-01-08T00:25:29Z

Note

Introduces evaluator versioning and a unified Fireworks SDK client across the codebase.

evaluation.py: After evaluators.create, creates evaluator_versions, uploads code via evaluator_versions.get_upload_endpoint, then validate_upload; handles 409 by fetching existing evaluator and proceeding to versioning
New eval_protocol/fireworks_client.py: central factory to instantiate Fireworks with env-resolved api_key, account_id, base_url, and optional FIREWORKS_EXTRA_HEADERS
Refactor CLI and helpers (cli.py, cli_commands/create_rft.py, platform_api.py, cli_commands/upload.py) to use create_fireworks_client; remove --force and overwrite logic; ensure evaluator existence via GET + poll
auth.py: load .env.dev/.env on import; tweak verifyApiKey base selection (use api.fireworks.ai or fallback to dev)
Tests: update to mock create_fireworks_client and new evaluator-version flow; add tests/test_fireworks_client.py; remove force-flow test
Deps: point fireworks-ai to a specific wheel URL in pyproject.toml and uv.lock
Repo hygiene: ignore .vscode/launch.json, add .vscode/launch.json.example

^{Written by Cursor Bugbot for commit f103b69. This will update automatically on new commits. Configure here.}

- Introduced a new `fireworks_client.py` module to centralize Fireworks SDK client creation. - Updated CLI and evaluation modules to use the new `create_fireworks_client` function instead of direct instantiation of the Fireworks class. - Enhanced handling of API key, account ID, base URL, and extra headers through environment variables. - Added tests for the new Fireworks client factory to ensure proper functionality and configuration.

- Added functionality to load environment variables from .env.dev or .env as a fallback when the auth module is imported. - Updated the API key verification process to allow explicit base URL handling, defaulting to dev.api.fireworks.ai if not provided. - Removed redundant environment variable loading code from platform_api module.

- Introduced functionality to create evaluator versions using parameters such as commit hash, entry point, and requirements. - Updated the upload endpoint call to utilize the newly created evaluator version ID instead of a hardcoded test version ID. - Added error handling for missing evaluator version ID in the response to ensure robustness during code uploads.

eval_protocol/cli.py

update to latest once SDK is published with changes

- Implemented a try-except block to handle APIStatusError during evaluator creation. - Added logic to check for existing evaluators and retrieve the existing one if a conflict occurs (status code 409). - Enhanced logging for better traceability of evaluator creation process.

cursor · 2026-01-09T23:52:21Z

eval_protocol/cli_commands/create_rft.py

+                print(f"📊 Please check the evaluator status at: {dashboard_url}")
+                print("   Wait for it to become ACTIVE, then run 'eval-protocol create rft' again.")
+                return False
+            return True


RFT flow skips version creation for existing evaluators

High Severity

The _upload_and_ensure_evaluator function unconditionally skips the upload process when an evaluator already exists (lines 580-596). Previously, this check was guarded by if not force:, allowing users to bypass it with --force. Now that --force was removed, the short-circuit always triggers, preventing new evaluator versions from being created in the RFT workflow. The upload_command properly handles existing evaluators by catching 409 errors and creating new versions, but this short-circuit bypasses that logic entirely. Users running eval-protocol create rft twice cannot deploy updated evaluator code because the function returns early without calling upload_command.

🔬 Verification Test

Why verification test was not possible: This bug requires integration with the Fireworks API to fully verify. The issue is a logic flow problem where the GET request to check evaluator existence returns successfully, causing an early return that bypasses the upload_command call. The bug can be confirmed by code inspection: when resp.ok is True on line 580, the function returns True on line 596 without ever reaching the upload_command call on line 627. This breaks the evaluator versioning feature for the RFT workflow.

cursor · 2026-01-10T00:02:50Z

eval_protocol/evaluation.py

+            if "entry_point" in evaluator_params:
+                evaluator_version_param["entry_point"] = evaluator_params["entry_point"]
+            if "requirements" in evaluator_params:
+                evaluator_version_param["requirements"] = evaluator_params["requirements"]


Requirements never copied to evaluator version params

Low Severity

The check if "requirements" in evaluator_params at line 234 will always be False because requirements is never added to the evaluator_params dictionary. Looking at lines 184-192, only display_name, description, commit_hash, and entry_point are conditionally added to evaluator_params. This means the evaluator_version_param["requirements"] assignment will never execute, and requirements data won't be included in the evaluator version. Either the code to populate evaluator_params["requirements"] is missing, or these lines are dead code that should be removed.

🔬 Verification Test

Test code:

# Examining the code flow in eval_protocol/evaluation.py # Lines 184-192 show what's added to evaluator_params: evaluator_params = { "display_name": self.display_name, "description": self.description, } # commit_hash added if version_str exists # entry_point added if self.entry_point exists # NO "requirements" is ever added # Lines 234-235 check for requirements: # if "requirements" in evaluator_params: # This is always False # evaluator_version_param["requirements"] = evaluator_params["requirements"]

Command run:

grep -n "evaluator_params\[" eval_protocol/evaluation.py

Output:

189: evaluator_params["commit_hash"] = version_str 191: evaluator_params["entry_point"] = self.entry_point 231: evaluator_version_param["commit_hash"] = evaluator_params["commit_hash"] 233: evaluator_version_param["entry_point"] = evaluator_params["entry_point"] 235: evaluator_version_param["requirements"] = evaluator_params["requirements"]

Why this proves the bug: The grep output shows that only commit_hash and entry_point are ever assigned to evaluator_params (lines 189 and 191), while requirements is never assigned. Thus the condition on line 234 if "requirements" in evaluator_params will always be False, making lines 234-235 dead code.

dphuang2 added 5 commits January 7, 2026 13:07

remove launch.json

d465a89

Add .vscode/launch.json to .gitignore

348bb58

cursor bot reviewed Jan 8, 2026

View reviewed changes

eval_protocol/cli.py Show resolved Hide resolved

dphuang2 added 6 commits January 8, 2026 15:20

test

3dbcd59

REVERT this later

532e071

update to latest once SDK is published with changes

Merge branch 'main' into dhuang/dxe-478-implement-evaluator-versions

5e7a5fa

fix mock tests

060d72c

Support EP_REMOTE_API_KEY

ea08062

cursor bot reviewed Jan 9, 2026

View reviewed changes

Merge branch 'main' into dhuang/dxe-478-implement-evaluator-versions

f246087

cursor bot reviewed Jan 10, 2026

View reviewed changes

dphuang2 added 5 commits January 12, 2026 10:28

include launch.json.backup

6b53ac1

rename to .example and add docker run extra arg

ec0c8ca

use ignore-docker by default

fc036f5

delete backup

4566584

ignore-docker by default in dev

f103b69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Evaluator Versions #402

Implement Evaluator Versions #402

Uh oh!

dphuang2 commented Jan 8, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

cursor bot Jan 9, 2026

Uh oh!

cursor bot Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement Evaluator Versions #402

Are you sure you want to change the base?

Implement Evaluator Versions #402

Uh oh!

Conversation

dphuang2 commented Jan 8, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor bot Jan 9, 2026

Choose a reason for hiding this comment

RFT flow skips version creation for existing evaluators

Uh oh!

cursor bot Jan 10, 2026

Choose a reason for hiding this comment

Requirements never copied to evaluator version params

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dphuang2 commented Jan 8, 2026 •

edited by cursor bot

Loading