Skip to content

Feat/stop on modes conflict#267

Open
rapidandres wants to merge 3 commits into
mainfrom
feat/stop-on-modes-conflict
Open

Feat/stop on modes conflict#267
rapidandres wants to merge 3 commits into
mainfrom
feat/stop-on-modes-conflict

Conversation

@rapidandres

@rapidandres rapidandres commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Changes

  • Brief description of the changes in this PR. Include breaking changes or any unusual behavior.

Changelog Content

Additions

  • Items added in this PR, this content is directly copied into CHANGELOG.md on release. Should be as brief as possible. Include any issues addressed.

Changes

  • Items changed in this PR, this content is directly copied into CHANGELOG.md on release. Should be as brief as possible. Include any issues addressed.

Fixes

  • Items fixed in this PR, this content is directly copied into CHANGELOG.md on release. Should be as brief as possible. Include any issues addressed.

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this change manually
  • I have tested this change in the following environments:
    • Local development
    • Docker environment
    • Other: _______________

Screenshots (if applicable)

Add screenshots to help explain your changes.

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published

Performance Impact

If this PR affects performance, describe the impact and any optimizations made.

Related Issues

Fixes #(issue number)
Closes #(issue number)
Related to #(issue number)


Note

Medium Risk
Changes entry points for training and evals to fail fast on mode mismatch; low security impact but users with cross-mode notebooks will see new hard errors until they re-init.

Overview
Adds installed-mode validation so run_fit() and run_evals() stop before starting work when RapidFire was initialized for the other mode ($RF_HOME/rf_mode.txt). Mismatches previously left experiments running while the dispatcher/IC Ops pointed at the wrong DB and the control panel stayed disabled; failures now surface via display_pretty_error() with re-init/restart instructions.

Introduces rapidfireai/utils/mode_utils.py (get_installed_mode, assert_mode_matches) and wires doctor diagnostics to the same reader instead of duplicating file logic. pytest coverage in tests/test_mode_utils.py plus a small standalone test_mode_guard.py script.

Remaining diff in experiment.py / doctor.py is mostly import ordering and formatting, not behavior.

Reviewed by Cursor Bugbot for commit 1b6fccc. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6c16eeb. Configure here.

f"run_{required}() requires '{required}' mode. "
f"Initialize RapidFire with `{init_cmd}`, then restart services "
f"(`rapidfireai stop && rapidfireai start`)."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing mode file blocks fit

Medium Severity

When rf_mode.txt is absent, get_installed_mode() returns None and run_fit() is stopped, but setup/start.sh still starts the fit dispatcher by default. Fit notebooks can be blocked even though services are already in fit mode.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6c16eeb. Configure here.

Comment thread rapidfireai/experiment.py
print(f"Using {cfg['num_actors']} actors, {cfg['gpus_per_actor']} GPUs per actor, {cfg['cpus_per_actor']} CPUs per actor")
print(
f"Using {cfg['num_actors']} actors, {cfg['gpus_per_actor']} GPUs per actor, {cfg['cpus_per_actor']} CPUs per actor"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mode check runs too late

Medium Severity

Installed-mode validation runs only inside run_fit() and run_evals(), after Experiment.__init__ has already run _init_fit_mode() or _init_evals_mode() (including ray.init). A mode mismatch is detected only after heavy setup, not when the conflict is already knowable.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6c16eeb. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants