Skip to content

Conversation

@dphuang2
Copy link
Collaborator

@dphuang2 dphuang2 commented Jan 8, 2026

Note

Introduces evaluator versioning and a unified Fireworks SDK client across the codebase.

  • evaluation.py: After evaluators.create, creates evaluator_versions, uploads code via evaluator_versions.get_upload_endpoint, then validate_upload; handles 409 by fetching existing evaluator and proceeding to versioning
  • New eval_protocol/fireworks_client.py: central factory to instantiate Fireworks with env-resolved api_key, account_id, base_url, and optional FIREWORKS_EXTRA_HEADERS
  • Refactor CLI and helpers (cli.py, cli_commands/create_rft.py, platform_api.py, cli_commands/upload.py) to use create_fireworks_client; remove --force and overwrite logic; ensure evaluator existence via GET + poll
  • auth.py: load .env.dev/.env on import; tweak verifyApiKey base selection (use api.fireworks.ai or fallback to dev)
  • Tests: update to mock create_fireworks_client and new evaluator-version flow; add tests/test_fireworks_client.py; remove force-flow test
  • Deps: point fireworks-ai to a specific wheel URL in pyproject.toml and uv.lock
  • Repo hygiene: ignore .vscode/launch.json, add .vscode/launch.json.example

Written by Cursor Bugbot for commit f103b69. This will update automatically on new commits. Configure here.

- Introduced a new `fireworks_client.py` module to centralize Fireworks SDK client creation.
- Updated CLI and evaluation modules to use the new `create_fireworks_client` function instead of direct instantiation of the Fireworks class.
- Enhanced handling of API key, account ID, base URL, and extra headers through environment variables.
- Added tests for the new Fireworks client factory to ensure proper functionality and configuration.
- Added functionality to load environment variables from .env.dev or .env as a fallback when the auth module is imported.
- Updated the API key verification process to allow explicit base URL handling, defaulting to dev.api.fireworks.ai if not provided.
- Removed redundant environment variable loading code from platform_api module.
- Introduced functionality to create evaluator versions using parameters such as commit hash, entry point, and requirements.
- Updated the upload endpoint call to utilize the newly created evaluator version ID instead of a hardcoded test version ID.
- Added error handling for missing evaluator version ID in the response to ensure robustness during code uploads.
update to latest once SDK is published with changes
- Implemented a try-except block to handle APIStatusError during evaluator creation.
- Added logic to check for existing evaluators and retrieve the existing one if a conflict occurs (status code 409).
- Enhanced logging for better traceability of evaluator creation process.
print(f"📊 Please check the evaluator status at: {dashboard_url}")
print(" Wait for it to become ACTIVE, then run 'eval-protocol create rft' again.")
return False
return True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFT flow skips version creation for existing evaluators

High Severity

The _upload_and_ensure_evaluator function unconditionally skips the upload process when an evaluator already exists (lines 580-596). Previously, this check was guarded by if not force:, allowing users to bypass it with --force. Now that --force was removed, the short-circuit always triggers, preventing new evaluator versions from being created in the RFT workflow. The upload_command properly handles existing evaluators by catching 409 errors and creating new versions, but this short-circuit bypasses that logic entirely. Users running eval-protocol create rft twice cannot deploy updated evaluator code because the function returns early without calling upload_command.

🔬 Verification Test

Why verification test was not possible: This bug requires integration with the Fireworks API to fully verify. The issue is a logic flow problem where the GET request to check evaluator existence returns successfully, causing an early return that bypasses the upload_command call. The bug can be confirmed by code inspection: when resp.ok is True on line 580, the function returns True on line 596 without ever reaching the upload_command call on line 627. This breaks the evaluator versioning feature for the RFT workflow.

Fix in Cursor Fix in Web

if "entry_point" in evaluator_params:
evaluator_version_param["entry_point"] = evaluator_params["entry_point"]
if "requirements" in evaluator_params:
evaluator_version_param["requirements"] = evaluator_params["requirements"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requirements never copied to evaluator version params

Low Severity

The check if "requirements" in evaluator_params at line 234 will always be False because requirements is never added to the evaluator_params dictionary. Looking at lines 184-192, only display_name, description, commit_hash, and entry_point are conditionally added to evaluator_params. This means the evaluator_version_param["requirements"] assignment will never execute, and requirements data won't be included in the evaluator version. Either the code to populate evaluator_params["requirements"] is missing, or these lines are dead code that should be removed.

🔬 Verification Test

Test code:

# Examining the code flow in eval_protocol/evaluation.py

# Lines 184-192 show what's added to evaluator_params:
evaluator_params = {
    "display_name": self.display_name,
    "description": self.description,
}
# commit_hash added if version_str exists
# entry_point added if self.entry_point exists
# NO "requirements" is ever added

# Lines 234-235 check for requirements:
# if "requirements" in evaluator_params:  # This is always False
#     evaluator_version_param["requirements"] = evaluator_params["requirements"]

Command run:

grep -n "evaluator_params\[" eval_protocol/evaluation.py

Output:

189:            evaluator_params["commit_hash"] = version_str
191:            evaluator_params["entry_point"] = self.entry_point
231:                evaluator_version_param["commit_hash"] = evaluator_params["commit_hash"]
233:                evaluator_version_param["entry_point"] = evaluator_params["entry_point"]
235:                evaluator_version_param["requirements"] = evaluator_params["requirements"]

Why this proves the bug: The grep output shows that only commit_hash and entry_point are ever assigned to evaluator_params (lines 189 and 191), while requirements is never assigned. Thus the condition on line 234 if "requirements" in evaluator_params will always be False, making lines 234-235 dead code.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants