feat: add cpu_offload option for low-VRAM model loading by omnificate · Pull Request #40 · Overworldai/world_engine

omnificate · 2026-04-12T18:56:18Z

When cpu_offload=True, the model is built and patched on CPU before being moved to GPU. Quantization runs on GPU after the move to ensure compatibility with all backends (FP8, INT8/GemLite, NVFP4).

This reduces peak VRAM during model initialization, making it feasible to run on systems with limited GPU memory.

Changes to WorldEngine.__init__:

New parameter: cpu_offload: bool = False
When enabled: model is created and patched inside torch.device('cpu') context, then moved to the target device with .to()
Quantization always runs on the target device (after move) for full backend compatibility
When disabled: zero behavioral change from the existing code path

Companion PR: Overworldai/Biome#97

When cpu_offload=True, the model is built and patched on CPU before being moved to GPU. Quantization runs on GPU after the move to ensure compatibility with all backends (FP8, INT8/GemLite, NVFP4). This reduces peak VRAM during model initialization, making it feasible to run on systems with limited GPU memory.

Adds a 'CPU Model Loading' checkbox in Performance settings that sends cpu_offload in the WebSocket init message. When enabled, the world_engine server builds the model on CPU before moving to GPU, reducing peak VRAM during initialization. Essential for systems with limited GPU memory. Changes: - Top-level cpu_offload setting (default: false) - Checkbox in Performance section with i18n (en/ja/zh/goose) - WebSocket init message includes cpu_offload flag - Lifecycle model key encodes cpu_offload so toggling triggers reconnect - Mode-switch modal shown when toggling during active streaming - Server passes cpu_offload through to WorldEngine constructor Companion PR: Overworldai/world_engine#40

omnificate mentioned this pull request Apr 12, 2026

feat: add CPU offload toggle to performance settings Overworldai/Biome#97

Draft

omnificate force-pushed the feat/cpu-quantize branch from d22049e to f9a71fa Compare April 12, 2026 21:42

omnificate changed the title ~~feat: add cpu_quantize option for low-VRAM systems~~ feat: add cpu_offload option for low-VRAM model loading Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add cpu_offload option for low-VRAM model loading#40

feat: add cpu_offload option for low-VRAM model loading#40
omnificate wants to merge 1 commit intoOverworldai:mainfrom
omnificate:feat/cpu-quantize

omnificate commented Apr 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

omnificate commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

omnificate commented Apr 12, 2026 •

edited

Loading