Adds Quark as Biome inference engine backend by Clydingus · Pull Request #132 · Overworldai/Biome

Clydingus · 2026-05-07T18:48:55Z

Fixes #125. Wires up quark as a backend dependency, with selectable UI option.

…ctor Re-applies the Apple Silicon backend swap (originally ``e7a2850`` on ``feat/quark-engine-macos``) against the refactored ``server-components/`` layout. The refactor went CUDA-only at the device-helpers layer (``engine/devices.py``) and dropped the ``Platform`` / ``available_quants`` / ``MLX`` machinery; this commit re-introduces just enough platform multiplexing to keep the existing CUDA path untouched while routing Apple Silicon through ``quark.Engine``. Three files change: * ``server-components/engine/devices.py`` — device helpers were pure CUDA. Detect Apple Silicon (``sys.platform == "darwin"`` + ``arm64``); on that platform: - ``WORLD_ENGINE_DEVICE`` / ``SCENE_AUTHORING_DEVICE`` / ``SAFETY_DEVICE`` route to ``"cpu"`` so every existing ``frame.to(device=WORLD_ENGINE_DEVICE)`` call site stays a no-op without per-call branching. Quark owns its own Metal allocator and consumes torch tensors / numpy at the API boundary, so torch tensors stay on CPU until they cross into ``quark.Engine``. - ``OutOfMemoryError`` aliases to plain ``MemoryError`` (``torch.cuda.OutOfMemoryError`` doesn't exist when CUDA isn't built into the active wheel; the ``except devices.OutOfMemoryError`` blocks just never trigger on this path). - ``pynvml`` import is soft-guarded — Apple Silicon ships no NVML and importing the package would raise. Every NVML caller (``open_nvml_handle`` / ``driver_version_via_nvml`` / ``utilization_via_nvml``) short-circuits when ``pynvml is None``. The existing NVML-call try/except blocks already returned sentinels on failure, so the broader contract is unchanged. The other helpers (``is_available`` / ``memory_allocated`` / ``synchronize`` / ``empty_cache`` / ``reset_compiled_graphs``) already short-circuit gracefully via ``torch.cuda.is_available()``; they remain CUDA-only and just no-op on Apple Silicon. * ``server-components/engine/manager.py`` — module-level conditional import of the engine class: if _IS_DARWIN_ARM64: from quark import CtrlInput, Engine as WorldEngine else: from world_engine import CtrlInput, WorldEngine Aliasing ``quark.Engine`` to the local ``WorldEngine`` symbol means every existing type annotation / construction site reads as before. In ``load_engine``, branch on ``_IS_DARWIN_ARM64`` to skip the dtype-fallback loop (quark is bf16-only on Metal — no native fp8 in MSL, no int8 KV path today) and skip the OOM-retry loop (no CUDA allocator pressure). The Apple branch passes ``quant="bf16"`` and ignores the client's ``requested_quant``. TAEHV cache plumbing: read ``BIOME_TAEHV_CACHE_DIR`` from the env and pass it as ``taehv_cache_dir=`` to ``quark.Engine(...)``. The Electron host sets this env var when spawning the server so that pre-built CoreML artifacts pulled from HF land inside Biome's app data dir — uninstall / "clear cache" flows can ``rm -rf`` it without leaving artifacts in ``~/.cache/quark/taehv/``. Unset falls through to quark's default (``~/.cache/quark/taehv``). Electron-side wiring is a follow-up; the server-side hook is ready. * ``server-components/pyproject.toml`` — drop unconditional pins on the CUDA-specific deps (``nvidia-ml-py``, ``bitsandbytes``, ``llama-cpp-python``, ``gguf``); mark them ``sys_platform != 'darwin'`` so ``uv sync`` on Apple Silicon doesn't pull them. Add ``"quark[engine] ; sys_platform == 'darwin'"`` to the dependency list and a ``[tool.uv.sources]`` pin for the ``experimental/biome-base`` integration branch on quark (rev ``c2a87ba`` — carries the ``taehv_cache_dir`` kwarg + the HF-fetched CoreML artifact pipeline that's needed for app-managed cache storage). ``torch`` / ``torchvision`` already came from the ``pytorch-cu128`` index — that index is now marked ``sys_platform != 'darwin'`` so Apple resolves torch from PyPI's default index (no CUDA build available there for darwin/arm64 anyway). Add ``sys_platform == 'darwin' and platform_machine == 'arm64'`` to ``[tool.uv].environments`` so the lockfile resolves under that marker too. CUDA path: identical to ``the-great-server-refactor`` HEAD. None of the platform-conditional code in this commit fires when ``sys.platform != "darwin"``; every CUDA call site reaches the same ``WorldEngine`` import, the same ``WORLD_ENGINE_DEVICE = "cuda"``, the same dtype-fallback loop, and the same OOM-retry path it had before. Apple Silicon path: still requires the upstream ``Overworld-Models/taehv1_5-coreml`` HF repo to be populated (the publish flow lives at ``scripts/publish_taehv_coreml.py`` in quark). Until that repo exists, the runtime fall-back error message points to the publish script. Verified: ``ast.parse`` clean on both modified Python files. Full runtime verification needs ``uv sync`` on a Mac with the quark hash ``c2a87ba`` reachable from origin — pushing ``Overworldai/quark experimental/biome-base`` is the prereq. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ARK toggle Generalises the Apple-Silicon-only quark integration so the same ``quark.Engine`` path runs on CUDA hosts too. quark's ``Engine.__new__`` factory dispatches to ``EngineCUDA`` (Linux / Windows / CUDA Macs) or ``EngineMetal`` (Apple Silicon), with both subclasses accepting the same ``model_uri / quant / device / dtype / taehv_cache_dir`` kwargs. A top-level ``USE_QUARK`` constant in ``manager.py`` selects between ``quark.Engine`` and the legacy ``world_engine.WorldEngine``; hard- coded to ``True`` for now, slated to become a runtime setting once the quark CUDA path stabilises. Imports are aliased so the rest of the file stays backend-agnostic. ``load_engine`` collapses the two prior branches (Apple bf16-only quark vs. CUDA dtype-fallback world_engine) into one unified body that shares the dtype/OOM fallback loop and the per-failure cleanup; ``taehv_cache_dir`` (a quark-only kwarg) is gated on ``USE_QUARK``. ``pyproject.toml`` drops the ``sys_platform == 'darwin'`` marker on ``quark[engine]`` so the package installs everywhere.

Brings in PR #128's StreamingContext hook split + src/ reorganisation so the quark engine integration on this branch lands on top of the new context structure. Pure-frontend merge (all PR #128 changes are under src/); no overlap with the server-components/ Python touched by this branch.

…_interval - gen ms from input latency When ``cap_inference_fps=True``, the per-iteration order was sleep(~frame_interval − gen) → read input → submit → flush_pending → wait → stash so a frame stashed at the end of iter K-1 sat in memory through iter K's full pacing sleep before being encoded and sent. The sleep window was pure idle CPU time that could have been spent encoding. Now the loop also flushes at the top, before the sleep — gated on ``cap_inference_fps`` so uncapped (benchmark) mode keeps its existing encode-during-gen overlap. Pending is empty by the time the post-submit flush runs, so the second call is a deliberate no-op. End-to-end: shaves ~(frame_interval − gen_time) ms off the client-observed ``inputLatency`` for the pacing path. On wp1.5 360p @ 60fps that's ~37 ms (66.67 ms interval − 30 ms gen). Server compute (INFER / SYNC / ENC / MTRC / OVER in the perf overlay) is unchanged; the win shows up as a drop in XMIT.

CONTRIBUTING.md asks for pyproject + uv.lock changes to land together; the prior commit (`6fe8f36 update(pyproj): update quark ref`) updated the pin without the regenerated lockfile, which would trip the `uv lock --check` step in CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Settings whose changes need a restart used to live in three drifting hand-written lists (useEngineRespawn for process-class, useSessionInit for session/live, plus EngineTab.hasChangesRequiringRestart for the modal trigger). Adding engine_backend to the session list was missed, so toggling it mid-session saved without a reset and dropped back to the pause menu. SETTING_CLASSES in types/settings.ts is now the single source of truth, with helpers in utils/settingsClassifier consuming it. Top-level keys cover whole subtrees; dot-paths split when siblings differ (debug_overlays.action_logging is live; siblings are UI-only).

Toggling a process-class setting mid-stream raced: useEngineRespawn called stopServer() fire-and-forget then transitioned to LOADING immediately. The warm flow read stale isServerRunning=true from React state, picked attachToRunningStandalone, and threw "Server exited before becoming ready" while the doomed server finished dying. Awaiting stopServer lets the IPC "server exited" event propagate to isServerRunning=false before LOADING fires, so the warm flow correctly picks bootStandalone and starts a fresh server with the new env vars.

In offline mode uv can't reach the index to verify the lockfile and fails with "No solution found when resolving dependencies" — even though the venv is already correctly populated from the previous online sync. Setting UV_NO_SYNC=1 (which also implies --frozen) inside getOfflineEnv skips both the resolve and sync passes on uv run, so the server just exec's python in the existing venv. Online mode is untouched: uv run still auto-syncs when pyproject changes.

The connectionLost flag was only cleared in MAIN_MENU, so once set it stuck through LOADING and back into STREAMING. Hit visibly when a process-class respawn (useEngineRespawn) fires its disconnect from outside the lifecycle reducer — the brief failed-connection render inside STREAMING latched connectionLost true, and entering LOADING didn't clear it. Clearing on LOADING entry is symmetric with the existing clearEngineErrorOnLoadingEntry: any LOADING transition is "starting fresh", so prior overlays should reset. Also fixes the click-reconnect recovery path, which previously kept the overlay up until the user backed out to main menu. (Follow-up: the overlay still flashes for a frame during a respawn because the disconnect lands in a STREAMING render before LOADING transitions in. To suppress it entirely, the reducer needs to know the respawn is intentional — separate change.)

Toggling a process-class setting mid-stream briefly flashed the "Connection Lost" overlay. The session-class reconnect path already suppresses the overlay via `intentionalReconnectInProgress`, but process-class respawns are driven by useEngineRespawn (outside the reducer) and never set that flag. Rather than adding a parallel `intentionalRespawnInProgress` boolean plus parallel `currentProcessSig` / `lastAppliedProcess` payload fields plus parallel detection branches, collapse all of it through a single discriminator: - New `RestartSignatures = { session, process }` bundle in the classifier; `getRestartSignatures(settings)` builds both at once. - `useSessionInit` returns `lastApplied: RestartSignatures | null` (one piece of state instead of two). - Payload carries `currentSignatures` + `lastAppliedSignatures`. - Reducer state replaces `intentionalReconnectInProgress` with `intentionalRestart: 'reconnect' | 'respawn' | null`. A new `computeIntentionalRestart()` picks the strongest applicable intent (process beats session) and returns null when nothing's pending. - Side-effect dispatch keys off the discriminator: `'reconnect'` fires the existing effect chain, `'respawn'` is silent because useEngineRespawn owns those side effects. Suppression checks become `intentionalRestart !== null`.

Consistent settings behaviour

Two corrections to the Pydantic→TS codegen so the on-disk file round-trips through both `codegen --check` and `prettier --check` on every platform: - `Path.write_text` defaults to translating `\n` → `os.linesep` on Windows, so the codegen was silently writing CRLF on dev machines while prettier expected LF; the resulting on-disk file failed prettier even though the in-memory comparison passed (read_text translates CRLF back). Pinning `newline="\n"` writes LF unconditionally. - `render_enum` always emitted multi-line `z.enum([...])`, but prettier collapses short enums onto a single line under the project's 120-char `printWidth`. Mirror that behaviour so the codegen-emitted shape matches the formatted shape — short enums inline, long ones (e.g. `ServerStageId`) break across lines with two-space indent. The `_PRINT_WIDTH` constant is the gate. Surfaced while a new short enum (`EngineBackendSchema`) landed in the generated output and tripped both gates at once on Windows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…os-v2 # Conflicts: # electron/ipc/server.ts # server-components/server/routes.py # src/components/settings/EngineTab.tsx # src/context/streaming/StreamingContext.tsx # src/context/streaming/streamingWarmConnection.ts # src/hooks/engine/useEngineApi.ts # src/hooks/streaming/useWarmConnection.ts # src/i18n/en.ts # src/i18n/goose.ts # src/i18n/he.ts # src/i18n/ja.ts # src/i18n/zh.ts # src/types/ipc.ts

…p ~frame_interval - gen ms from input latency" This reverts commit 55c71ad.

…n set Quark only supports waypoint-1.5 (NotImplementedError on wp-1 configs: no TAEHV VAE, different scheduler shape). The picker now hides those rows when backend=quark, and the settings panel refuses to leave on save if the saved model is incompatible with the in-flight backend. Server: /api/models accepts ?backend=… and drops rows whose model_type falls outside COMPATIBLE_MODEL_TYPES_BY_BACKEND. Each PickerModel carries its resolved model_type so unresolvable rows (offline / HF outage / malformed config) pass through and stay backend-agnostic rather than silently emptying the picker. _scan_cache now returns the cached model_type per repo; uncached collection entries get a TTL-cached HF config.yaml fetch (~1KB) via _resolve_model_type. Renderer: list-models IPC takes backend, threads it into the query string. EngineTab's loader refires on backend toggle and tracks menuWorldModelAvailable; validateBeforeSave shows a new "Incompatible Model" confirm modal and blocks the save when the saved model fell off the filtered list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…etection of models

… load failures

philpax · 2026-05-12T22:25:25Z

Quick test seems to work. Will look over the code more thoroughly and run it through the gauntlet of tests, but I think we'll be shipping this tomorrow 🎉

HF ids with reserved characters (`#`, `?`, `&`, whitespace) typed via the custom model picker would corrupt the URL. Split on `/`, encode each segment, rejoin — preserves the slash the FastAPI `{model_id:path}` matcher relies on.

With Quark in play, "which backend was active" is part of the diagnostic surface — a wp-1 + quark or intw8a8 + quark mismatch is the likeliest failure mode for early users. Thread `requested_backend` through the diagnostics payload alongside `requested_model` and `requested_quant` at all three call sites (connection-lost overlay, terminal-display loading error overlay, debug-tab copy-to-clipboard).

`useEngineRespawn` hand-checked `engine_mode`/`server_url`/`offline_mode` to skip a respawn when only `offline_mode` flipped in server mode. A future process-class field added to `SETTING_CLASSES` would be silently swallowed by that bespoke check. Pull the logic up: a new `pathsThatDiffer` helper returns the changed paths in a class, and the hook drops to "the only delta is `offline_mode` in non-standalone" against that set, so new process-class fields fall into the respawn branch by default.

`OutOfMemoryError = MemoryError` swept up unrelated CPU-side memory faults on macOS, which would falsely trigger the dtype-downgrade retry in `WorldEngineManager.load_engine`. Replace with a private `_UnreachableOOM(BaseException)` sentinel — `except devices.OutOfMemoryError` still type-checks and resolves at runtime, but never matches anything on Apple Silicon (where there's no CUDA allocator pressure to surface in the first place).

The fallback function it references doesn't exist — the real fallback is just empty dropdowns until a probe lands. Tighten the comment to describe what the code actually does.

`customLabel` was optional, so a caller that set `allowCustom={true}` without a label would silently render a blank "Custom..." footer. Split the props into a discriminated union: `customLabel` is required iff `allowCustom` is true, and forbidden otherwise so accidental passes where it's irrelevant fail at compile time.

`handleCustomModelBlur` used to persist `custom_models` immediately but leave `engine_model` for the Back-click — a mid-edit app crash between the two writes would land a custom id in the saved list without the corresponding selection. Save both fields together so the two advance as one. Other pending menu fields (backend, quant) still wait for Back; cleaning up that asymmetry is a separate refactor.

A user with `engine_backend: 'world_engine'` on a quark-only host (Apple Silicon, or a remote that only advertises quark) would see the wire silently clamp to a valid value on every session, but the saved setting on disk stayed stale — so every menu open while streaming surfaced the EngineTab snap-effect rewrite as a "settings changed, restart?" modal even when the user touched nothing. Introduce `useClampedSettings` as the single seam where the clamp policy is applied: it returns the effective settings to consumers *and* writes them back to disk on first divergence. `buildSessionConfig` drops its internal clamp and trusts the caller to pass effective settings; `useSessionInit` drops `serverCapabilities` and signs `lastApplied` straight off `settings`. With one upstream derivation feeding the wire, the lifecycle signature, and the persisted state, the three can't drift and re-introduce the post-clamp-save race-into-reconnect bug. Reuses the raw settings reference when the clamp is a no-op so downstream `useEffect` deps don't churn on every render that just touches `useSettings()`.

Treats the standalone server as a guaranteed-available resource for the duration of the app session rather than a per-stream process. Settings menu, model picker, capability probe etc. all depend on a live `/health`; making the server come and go forced every consumer to handle a death they couldn't recover from. - `EngineLifecycle.restartServer` becomes the only verb that touches the process. Atomic kill+spawn — no public stop. Refreshes engine status post-spawn so consumers see the new port and `isServerRunning` immediately. - `useConnectionActions` drops the stop calls from `cancelConnection` and `prepareReturnToMainMenu`. User-facing teardown leaves the server alive. - `useEngineRespawn` calls `restartServer` for genuine process-class changes (offline_mode, server_url in standalone), and skips it for `engine_mode` flips because the lifecycle's own orchestration effect already handles that — avoids a redundant double-spawn race. - `useLoadingFailureCleanup` collapses to just `runWarmConnection`. Server cleanup on failed load happens server-side in `_unload_engine_sync`, so the client only needs to re-establish the WS; the engine-error overlay persists across the reconnect. - New lifecycle reconciliation effect detects "state.kind=ready but isServerRunning=false" and auto-fires `restartServer`. Covers external crashes (Python OOM, user pkill) that pre-existing flows couldn't recover from. Status is re-polled on MAIN_MENU entry so the reconciliation has fresh data when the user is most likely to next touch settings. - Bootstrap bails while engineError is set; session-class settings changes clear engineError so a retry can fire against the still-warm WS. - `clearEngineErrorOnLoadingEntry` reducer effect dropped: every explicit-action path (cancelConnection, useEngineRespawn) already clears engineError, and the recovery path now deliberately preserves it across the LOADING re-entry.

`_resolve_model_type` fetches `config.yaml` on demand for collection entries via `hf_hub_download`, which materialises a repo directory with only that file. `_scan_cache` then sees the repo, computes `total = 0` (no `.safetensors`), and registers it in `cached_sizes` anyway — so the picker reports the model as `is_local: true, size_bytes: 0`. Skip the registration when no weight files exist; the picker falls back to the HF size lookup and renders the row as downloadable.

With Quark in play, "World Engine" the generic-concept is ambiguous against "World Engine" the specific backend. The renderer's surfaces collapse to "Engine": - `WorldEngineSection` component → `EngineSection`, file renamed too. - i18n keys `app.settings.worldEngine.*` → `app.settings.engine.*`. - `DEFAULT_WORLD_ENGINE_MODEL` → `DEFAULT_ENGINE_MODEL`. - Comments referring to "WorldEngine server" → "engine server". Settings-page copy also resolves a redundancy (every section was prefixed with "Engine" inside the Engine tab) and tightens the question-style subtitles: - "Engine Mode" → "Mode"; subtitle now points at the engine ("where will the engine run?") rather than the model. - "Engine" status section → "Local Engine"; subtitle becomes a question ("how's the engine doing?") whose answer is the status dot beside it. - "Simulation" subtitle moves from "how will your world be simulated?" to "what should simulate your world?" so it umbrellas both the World Model dropdown and the Backend dropdown. The literal backend name "World Engine" stays — that refers to the specific upstream package, not the generic concept. Server-side Python (`WorldEngineManager`, `WORLD_ENGINE_DEVICE`, etc.) is unchanged: both backends implement a WorldEngine-style interface and the manager's name reflects that contract.

Two near-simultaneous `/api/models` calls (e.g. EngineTab's snap-effect re-firing the loader once the capability probe lands) both missed the TTL cache and both fired the underlying HF requests in parallel — one collection fetch and one `model_info` per model, doubled. Add `get_or_fetch(key, fetcher)` that tracks in-flight `Future`s per key. Concurrent misses share a single fetch; the cache is populated exactly once and every coalesced waiter resolves off the same result. Failures propagate to all waiters and aren't cached, so callers can soft-fall a transient outage without pinning the failure for the full TTL. Refactor `_get_size`, `_get_model_type`, `_fetch_waypoint_ids`, and `get_model_info` onto the new primitive. For the two that have fallback-without-caching semantics (waypoint collection, transient HF errors on `model-info`), let the fetcher raise and catch outside `get_or_fetch` so the cache stays empty for the retry. Also tightens `model_type_cache` typing from `TtlCache[str, str | None]` to `TtlCache[str, str]` — None was never stored, the sentinel string was. The old annotation was misleading and would have masked a real None value if one ever slipped through.

Two blank lines before the `_UnreachableOOM` class definition. Missed by the original commit since the relevant style check only kicked in once the file had passed through `ruff format`.

Clydingus and others added 8 commits May 6, 2026 23:31

Merge branch 'main' into feat/quark-engine-macos-v2

096cb5b

feat(UI): dropdown for world engine/quark backend

c28cb15

feat(diagnostics): add engine backend indicator

dcc316f

update(pyproj): update quark ref

6fe8f36

Clydingus requested a review from philpax May 7, 2026 18:48

Clydingus and others added 21 commits May 8, 2026 02:50

update: stale comments

b72f83b

fix(server): reset server seed state between conns

85c7c54

fix: surface check-engine-status validation error

abdf6d6

feat(UI): improved engine settings

d62103c

Merge pull request #133 from Overworldai/consistent-settings-behaviour

ee38ac8

Consistent settings behaviour

feat: capabilities reported by server to client

fd8ea14

fix(UI): client reported server capabilities

7524f0b

update(pyproj): update quark SHA ref

4624b61

Revert "perf(workers): flush pending batch before pacing sleep to dro…

ef37bfe

…p ~frame_interval - gen ms from input latency" This reverts commit 55c71ad.

fix(settings): server capabilities quant filtering

4e8f1ad

fix(quark): enum translation dict of quark quants

356da4c

fix(settings): block quant until server reports capabilities

8b949a6

Clydingus and others added 5 commits May 13, 2026 03:07

fix(picker): corrected fallback behaviour for backend compatibility d…

83051ae

…etection of models

fix: lint lint lint sahur

e4bb352

fix(streaming): don't kill the standalone server on application-layer…

741c0a8

… load failures

fix(settings): re-add custom model picker

3bba9d4

feat(settings): tweak copy for WM/backend section

221df47

philpax added 13 commits May 13, 2026 18:17

docs(streaming): drop stale predictedCapabilities comment

07d38bb

The fallback function it references doesn't exist — the real fallback is just empty dropdowns until a probe lands. Tighten the comment to describe what the code actually does.

chore(devices): ruff format

3b3a84c

Two blank lines before the `_UnreachableOOM` class definition. Missed by the original commit since the relevant style check only kicked in once the file had passed through `ruff format`.

philpax merged commit a3aa792 into main May 13, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Quark as Biome inference engine backend#132

Adds Quark as Biome inference engine backend#132
philpax merged 47 commits into
mainfrom
feat/quark-engine-macos-v2

Clydingus commented May 7, 2026 •

edited by philpax

Loading

Uh oh!

philpax commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Clydingus commented May 7, 2026 • edited by philpax Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philpax commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Clydingus commented May 7, 2026 •

edited by philpax

Loading