Skip to content

fix(backend): resolve MLX and Torch .safetensors checkpoint collision#254

Open
eaglstun wants to merge 1 commit into
nikopueringer:mainfrom
eaglstun:fix/mlx-torch-checkpoint-collision
Open

fix(backend): resolve MLX and Torch .safetensors checkpoint collision#254
eaglstun wants to merge 1 commit into
nikopueringer:mainfrom
eaglstun:fix/mlx-torch-checkpoint-collision

Conversation

@eaglstun

@eaglstun eaglstun commented Jun 6, 2026

Copy link
Copy Markdown

The MLX setup in the README places the MLX weights at CorridorKeyModule/checkpoints/corridorkey_mlx.safetensors, next to the auto-downloaded Torch checkpoint (CorridorKey_v1.0.safetensors). Checkpoint discovery distinguished checkpoints by screen colour (the "blue" filename token) but not by backend, and both backends share the .safetensors extension. With two "green" .safetensors present, both the Torch and the MLX path raised "Multiple ... checkpoints. Keep exactly one." — so following the documented MLX setup broke inference on both backends.

Make discovery backend-aware via an "mlx" filename token (mirroring the existing BLUE_FILENAME_TOKEN convention and the MLX_MODEL_FILENAME constant):

  • Torch discovery excludes MLX-named .safetensors files.
  • MLX discovery prefers an MLX-named file when present, but still falls back to any lone .safetensors so the previous contract is preserved.

Adds a regression test covering the coexistence case; it fails on the prior code with the exact "Multiple ... checkpoints" error and passes with the fix.

What does this change?

How was it tested?

Checklist

  • uv run pytest passes
  • uv run ruff check passes
  • uv run ruff format --check passes

The MLX setup in the README places the MLX weights at
CorridorKeyModule/checkpoints/corridorkey_mlx.safetensors, next to the
auto-downloaded Torch checkpoint (CorridorKey_v1.0.safetensors). Checkpoint
discovery distinguished checkpoints by screen colour (the "blue" filename
token) but not by backend, and both backends share the .safetensors extension.
With two "green" .safetensors present, both the Torch and the MLX path raised
"Multiple ... checkpoints. Keep exactly one." — so following the documented MLX
setup broke inference on *both* backends.

Make discovery backend-aware via an "mlx" filename token (mirroring the
existing BLUE_FILENAME_TOKEN convention and the MLX_MODEL_FILENAME constant):

- Torch discovery excludes MLX-named .safetensors files.
- MLX discovery prefers an MLX-named file when present, but still falls back to
  any lone .safetensors so the previous contract is preserved.

Adds a regression test covering the coexistence case; it fails on the prior
code with the exact "Multiple ... checkpoints" error and passes with the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant