Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
d0553fd
feat: support SD1.5/SDXL/Dreamshaper alongside SD-Turbo
happyFish Apr 25, 2026
633f427
feat: model_id_or_path as enum dropdown of tested models
happyFish Apr 25, 2026
43a1d8f
fix: accept model_id_or_path kwarg so the UI dropdown actually loads
happyFish Apr 25, 2026
788f7a5
feat: hot-swap model when model_id_or_path changes at runtime
happyFish Apr 25, 2026
c4151c8
refactor: read model_id_or_path via get_param like every other runtim…
happyFish Apr 25, 2026
8315c82
fix: sequential denoising in txt2img mode so multi-step doesn't flash…
happyFish Apr 25, 2026
9fdbb7e
Revert "fix: sequential denoising in txt2img mode so multi-step doesn…
happyFish Apr 25, 2026
7872f91
feat: serial denoise path for txt2img and image-loopback modes
happyFish Apr 25, 2026
25478b7
feat(trt): node-id-keyed adapter cache to survive graph edits
happyFish May 3, 2026
7c89f3d
feat(schema): width/height as Resolution IntEnum with field_validator
happyFish May 5, 2026
705da78
Merge feat/node-keyed-trt-cache
happyFish May 5, 2026
d8d2fbd
Merge branch 'main' into sd-multi-model
happyFish May 5, 2026
29f1ea7
feat(schema): restrict to Turbo-only, drop num_inference_steps
happyFish May 5, 2026
1015f4b
refactor(pipeline): drop non-Turbo paths, hardcode 1-step denoise
happyFish May 5, 2026
cc82a98
docs: handoff notes for the Turbo-only simplification
happyFish May 5, 2026
206300f
feat: DMD2-SDXL-1step preset via extensible MODEL_PRESETS
happyFish May 5, 2026
819ec3d
fix: load DMD2 UNet via state_dict override, not from_pretrained
happyFish May 5, 2026
1f676a4
fix(dmd2): pin training timestep [399] via timesteps_override
happyFish May 5, 2026
8763277
feat(trt): SDXL UNet engine support (Turbo + DMD2 verified)
happyFish May 6, 2026
c56a581
feat(trt): SDXL UNet dynamic shape over 512-1024
happyFish May 6, 2026
12609dd
chore: refresh PR docs to reflect expanded scope
happyFish May 6, 2026
54c4ad6
fix(schema): ModelId as StrEnum so HF URL formatting uses values
happyFish May 6, 2026
393fcab
Testing sd multi
happyFish May 8, 2026
51d0083
feat(loopback): per-model implicit_loopback flag for CFG-distilled mo…
happyFish May 9, 2026
2c91427
feat(negative): embedding-space negative subtraction for single-pass …
happyFish May 9, 2026
90d1c1f
fix(trt): defensive re-activate of TAESD engines if context is lost
happyFish May 9, 2026
b1b5478
refactor: extract PromptEncoder into its own module
happyFish May 9, 2026
468e837
docs(plans): hand-off plans for refactor, SDXL ControlNet, and LoRA
happyFish May 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/scheduled_tasks.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"sessionId":"d8c05a9f-ecd4-4918-ba19-035cdd531a1a","pid":805816,"acquiredAt":1778002935354}
377 changes: 248 additions & 129 deletions CLAUDE.md

Large diffs are not rendered by default.

155 changes: 155 additions & 0 deletions docs/plans/LORA_PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Plan: LoRA Support

## Context
`schema.py` has `supports_lora = True` already. `pipeline.py` has stub `load_lora` and `fuse_lora` methods that aren't called. Scope has a `download_lora` endpoint already — verify in the parent repo `daydreamlive-scope`.

## Schema Changes (`src/scope_streamdiffusion/schema.py`)

Add a `LoraSpec` model and a `loras` list field on `StreamDiffusionConfig`:
```python
class LoraSpec(BaseModel):
repo_id: str # HF repo or local path
weight_name: Optional[str] = None # for repos with multiple files
adapter_name: str # diffusers adapter name; required for stack/swap
scale: float = 1.0 # 0..2 typical

class StreamDiffusionConfig(BaseModel):
...
loras: list[LoraSpec] = Field(
default_factory=list,
json_schema_extra=ui_field_config(order=..., label="LoRAs"),
)
```

Order field: place after model selection but before ControlNet config. Reuse Scope's existing LoRA picker UI if one exists in the parent repo's other pipelines.

## Loader Wiring (`ModelLoader` post-refactor, or `pipeline.py` if pre-refactor)

LoRAs attach via `pipe.load_lora_weights(repo_id, weight_name=..., adapter_name=...)`. After loading all requested adapters, call `pipe.set_adapters([names...], adapter_weights=[scales...])`.

**Lifecycle order:**
1. `ModelLoader._load_model` loads the diffusers pipe.
2. SDXL fp16 VAE swap.
3. **LoRA attach.** Iterate `config.loras`, call `pipe.load_lora_weights` per spec.
4. `pipe.set_adapters(...)` with names + scales.
5. **Do NOT call `fuse_lora`.** Keep adapters live so scales/swaps work without reload. Only fuse before TRT compilation (next step).
6. PromptEncoder.attach, ControlNetHandler.attach.
7. TRTLifecycle.attach. **If TRT is enabled, fuse_lora here** before compiling — TRT bakes weights at compile time, so fused-then-compiled is the only correct path (unless using the refit path — see TRT Refit below).

## Change Detection
Track a "LoRA signature" (sorted tuple of `(repo_id, weight_name, adapter_name, scale)`) on the model loader. On `_swap_model` / `_ensure_pipe_loaded`:
- Same model + same LoRA signature → no-op.
- Same model + different LoRA signature, **eager mode** → call `pipe.unload_lora_weights()`, then re-attach. Cheap, no reload needed.
- Same model + different LoRA signature, **TRT mode without refit** → full reload required. Treat this as a model swap. Surface the cost in the UI — recompiling SDXL UNet is 10+ minutes.
- Same model + different LoRA signature, **TRT mode with refit-capable engine** → refit (see below). 1–10s instead of 10+ min.
- Scale-only change with same adapters loaded, **eager mode** → `pipe.set_adapters(...)` with new weights. No reload.
- Scale-only change, **TRT non-refit** → full reload. **TRT refit** → refit.

## Cache Coordination with TRT
The TRT cache key (in `_trt_cache.py` / `trt_engines.py`) must include the LoRA signature. Otherwise two different LoRA stacks will collide on the same cache slot and you'll silently load the wrong engine. Hash the sorted signature into the engine filename.

When using refit, the cache key for the *engine* uses only the base model + refit-capable flag (LoRA signature does NOT affect the engine identity). The fused weights are applied at refit time. The LoRA signature is tracked separately as the "currently refit-applied state" and used only for change detection.

## Scope Integration
The user mentioned Scope has a `download_lora` endpoint already. Find it in the parent repo (`daydreamlive-scope`) and confirm:
- Whether it returns a local path or a repo_id.
- Whether the UI already has a LoRA picker in other pipelines that we can match.
- Whether LoRA management is per-pipeline or global.

Match the existing pattern. Don't invent a new one.

## Testing
1. Eager SD-Turbo + a single style LoRA from CivitAI (download via Scope, attach via config).
2. Live scale change 0.0 → 1.0 → 1.5. Should update without reload.
3. Live LoRA swap (different adapter). Should be fast (unload + load), no model reload.
4. Toggle TRT on with LoRAs attached. Confirm fuse-then-compile path runs and engine is cached with LoRA-aware key.
5. Live LoRA change with TRT on (non-refit) — confirm full reload + recompile triggers and completes.
6. Stack 2 LoRAs simultaneously. Verify `set_adapters` with multiple names works and scales are independent.
7. SDXL + LoRA (eager and TRT).

## Out of Scope (defer)
- Multi-LoRA blending UI beyond stack-with-scales.
- LoRA training or merging.

---

# Addendum: TRT Refit Path for LoRAs

The base plan above says "LoRA change with TRT → full reload" — correct but expensive (10+ min for SDXL). TensorRT's **refit** feature lets you update weights in a built engine without rebuilding it. This is the right answer for live LoRA swaps on TRT.

## What Refit Buys You
- Engine structure (layers, shapes, fusions) stays compiled.
- Only the weight tensors get re-uploaded.
- Typical refit time: **1–10 seconds** for SDXL UNet vs. 10+ minutes for full rebuild.
- Works for scale changes AND adapter swaps, as long as the LoRA targets the same layers.

## Build-Time Requirements
The engine must be compiled with refit enabled. Two flags in the TRT builder:
- `BuilderFlag.REFIT` — required.
- `BuilderFlag.STRIP_PLAN` (TRT 10+) — optional but recommended; strips weights from the engine file so you ship a smaller cache and refit at load. Trade-off: load is no longer instant — must refit before first inference.

**Decision:** use `REFIT` only (not `STRIP_PLAN`). Cached engines stay self-sufficient; refit only runs when LoRAs change. The size penalty for `REFIT`-only is small (~5%) and inference perf is unchanged.

## Implementation Sketch

### Builder changes (`src/scope_streamdiffusion/_trt/builder.py` or wherever the network config lives)
Add `network_flags` / `builder_config.flags |= 1 << int(trt.BuilderFlag.REFIT)` to all UNet builders (`build_unet_engine`, `build_unet_sdxl_engine`, `build_unet_with_control_engine`, and the new SDXL+control variant). VAE/TAESD/ControlNet engines don't need it — LoRAs target UNet only (cross-attention layers).

### Refit at runtime (new method on `TRTLifecycle`)
```python
def refit_lora(self, lora_signature):
# 1. Load base UNet weights into a temporary diffusers UNet (CPU OK).
# 2. Apply LoRA stack to that UNet (load_lora_weights + set_adapters + fuse_lora).
# 3. Use trt.Refitter to push the fused weights into the live engine.
# 4. Discard the temp UNet.
```

The refitter API:
```python
refitter = trt.Refitter(self._trt_unet_engine.engine, TRT_LOGGER)
for name in refitter.get_all_weights(): # or get_missing()
weights = fused_unet_state_dict[map_trt_name_to_torch(name)]
refitter.set_named_weights(name, weights)
assert refitter.refit_cuda_engine()
```

### Name mapping (the hard part)
TRT weight names come from the ONNX export and don't match diffusers' `state_dict` keys 1:1. You need a map. Two approaches:
1. **Build the map at compile time.** During ONNX export, record the `(torch_param_name → onnx_initializer_name)` mapping and persist it next to the engine in the cache. At refit time, load the map and translate.
2. **Reconstruct the map at refit time** by re-running ONNX export on a dummy UNet with the same architecture and reading the resulting initializer names. Slower but simpler.

Recommend approach 1. Save the map as `<engine>.refit_map.json` alongside the engine file. The TRT cache key already covers architecture variants, so the map is valid for the engine.

### Cache key change
Refit-capable engines and refit-incapable engines are different artifacts. Add `refit=True` to the cache key path component so old (non-refit) cached engines aren't reused. Old engines stay valid for non-LoRA streams; new ones get used when LoRAs are configured.

## Updated LoRA Lifecycle (replaces "full reload" branch in the base plan)

| Change | Eager | TRT (refit-capable engine) | TRT (legacy non-refit engine) |
|---|---|---|---|
| Scale only | `set_adapters` | refit | rebuild |
| Adapter swap, same layers | unload + load + `set_adapters` | refit | rebuild |
| Adapter swap, different layers | same | refit (zero out unused) | rebuild |
| Add ControlNet, etc. | rebuild pipeline state | rebuild engine | rebuild |

"Different layers" case: if a new LoRA targets layers the previous one didn't, those original-weight slots need to be restored to the base model's weights during refit. The fused-state-dict approach handles this naturally since the temp UNet is built from base weights + new LoRA stack.

## When to Skip Refit
- First time TRT is enabled with LoRAs configured → fuse first, then build (current plan). Refit only helps on subsequent changes.
- Engine compiled before this feature lands → fall back to rebuild. Detect via the cache-key version bump.
- Refitter reports missing weights → log and rebuild. Don't run a partially-refit engine.

## Testing (Refit-specific)
1. Cold start with one LoRA + TRT. Confirm engine builds with `REFIT` flag (check `engine.refittable`).
2. Live scale change 0.0 → 1.5. Should complete in <10s, no recompile log.
3. Live adapter swap (different LoRA, same target layers). Same speed.
4. Live adapter swap to a LoRA that targets *additional* layers. Confirm refit covers all weights and output is correct.
5. Stress test: 20 rapid scale/adapter changes. Memory should stay stable (the temp UNet must actually free).
6. SDXL refit specifically — name-map size is larger; verify no missing weights.

## Risk
- TRT refit name mapping is fiddly. Budget time for debugging the ONNX-name ↔ torch-name mapping.
- Some TRT optimizations bake constants. If a LoRA's effective rank changes the optimal kernel choice, refit produces correct but suboptimal output. Acceptable trade-off.
- `STRIP_PLAN` is tempting but adds first-inference latency. Skip it.

This makes live LoRA swaps on TRT actually viable instead of "technically supported but never used."
19 changes: 19 additions & 0 deletions docs/plans/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Plans

Hand-off plans for the next round of work on `sd-multi-model`. Each plan is self-contained and intended to be executed by another agent without needing the originating conversation.

- [REFACTOR_PLAN.md](REFACTOR_PLAN.md) — decompose `pipeline.py` into helper classes (`TRTLifecycle`, `ModelLoader`, `InferenceCore`) following the `PromptEncoder` / `ControlNetHandler` pattern.
- [SDXL_CONTROLNET_PLAN.md](SDXL_CONTROLNET_PLAN.md) — wire SDXL ControlNet through the eager and TRT paths (currently raises `NotImplementedError` on TRT for SDXL).
- [LORA_PLAN.md](LORA_PLAN.md) — schema, loader wiring, change detection, and the TRT refit path for live LoRA swaps.

## Recommended order
1. Refactor (lands first — the LoRA plan assumes the `ModelLoader` and `TRTLifecycle` helpers exist).
2. SDXL ControlNet (independent of LoRA).
3. LoRA (depends on refactor; benefits from but does not require ControlNet work).

## Architectural pattern (read first)
All three plans assume the helper-class composition pattern. The canonical examples in the repo:
- `src/scope_streamdiffusion/prompt_encoder.py`
- `src/scope_streamdiffusion/controlnet.py`

Helpers take `(device, dtype)` at construction, gain a pipe back-reference via `attach(pipe, sdxl)`, and expose runtime state as instance attributes.
116 changes: 116 additions & 0 deletions docs/plans/REFACTOR_PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Refactor Plan: pipeline.py Decomposition

## Goal
Reduce `pipeline.py` from ~1900 lines to a thin orchestrator (~400 lines) by extracting cohesive responsibilities into helper classes. Follow the pattern established by `PromptEncoder` (commit `b1b5478`) and the existing `ControlNetHandler`.

## Architectural Pattern (non-negotiable — already established)
- Helper class lives in its own module under `src/scope_streamdiffusion/`.
- Constructor takes `(device, dtype)` and any static config.
- `attach(pipe, sdxl: bool)` lifecycle method called from `_ensure_pipe_loaded` and `_swap_model` after the diffusers pipeline is loaded. Helpers re-bind to the new pipe here.
- Helper owns its caches and exposes runtime state as instance attributes the pipeline reads through (e.g., `self.prompts.prompt_embeds`).
- Helper has explicit `reset_caches()` / `release()` methods called on model swap or teardown.
- **No mixins.** Composition only. The user explicitly rejected mixins.

## Reference Files
- `src/scope_streamdiffusion/prompt_encoder.py` — the template. Read this first.
- `src/scope_streamdiffusion/controlnet.py` — second example of the pattern.
- `src/scope_streamdiffusion/pipeline.py` — the source to extract from.

## Extraction Order (do them in this order, commit between each)

### Extraction 1: `TRTLifecycle` → `src/scope_streamdiffusion/trt_lifecycle.py`
**Methods to move:**
- `_ensure_trt_taesd`
- `_ensure_trt_controlnet`
- `_ensure_trt_unet`
- `_setup_trt`
- `_reset_trt_state`
- `_set_acceleration_mode`
- `_deactivate_trt`
- `_trt_setup_args_from_config`

**Compromise to accept:** these methods currently mutate `self.unet`, `self.controlnet`, `self.vae`, `self._taesd_vae` directly. Don't fight it — give the helper a back-reference to the pipeline (`self.pipe = pipe` set in `attach()`) and have it write through. The win is moving 500 lines of TRT-specific lifecycle code out of the orchestrator, not pretending TRT doesn't touch pipeline state.

**Caches the helper owns:** `_trt_taesd_paths`, `_trt_controlnet_paths`, `_trt_unet_paths`, `_trt_unet_engine`, `_trt_controlnet_engine`, the `acceleration_mode` last-applied value, and the `_trt_cache` adapter handles. The module-scope `_trt_cache._CACHE` stays where it is — it must survive plugin reinit.

**Pipeline-side after extraction:**
```python
self.trt = TRTLifecycle(device=self.device, dtype=self.dtype)
# in _ensure_pipe_loaded / _swap_model:
self.trt.attach(self, self.sdxl)
# in __call__'s pre-inference setup:
self.trt.ensure_engines(config, want_control=...)
```

**Testing checkpoint after this extraction:**
1. Cold-load each model with `acceleration_mode="trt"`: SD-Turbo, SDXL-Turbo, DMD2.
2. Live-swap from SD-Turbo → SDXL-Turbo → DMD2 → SD-Turbo. Confirm no `context=None` crashes (the band-aid `_ensure_activated` in `_trt/engine.py` should still cover this; if it triggers, that's a regression in the swap teardown path).
3. Toggle ControlNet on SD1.5 + SD-Turbo while running.
4. Switch `acceleration_mode` between `none` / `xformers` / `trt` mid-stream.

---

### Extraction 2: `ModelLoader` → `src/scope_streamdiffusion/model_loader.py`
**Methods to move:**
- `_load_model`
- `_load_preset`
- `_release_pipe_state`
- `_swap_model`
- `_install_sdxl_fp16_vae`
- `_set_taesd`
- `load_lora` (currently a stub — leave as-is, the LoRA plan wires it up)
- `fuse_lora` (stub — same)

**State the helper owns:** the `MODEL_PRESETS` dict (move it to this module), last-loaded `model_id`, last-loaded preset signature, the SDXL fp16 VAE replacement state, TAESD-installed flag.

**Compromise:** like TRT, this writes through to `self.pipe`, `self.unet`, `self.vae`, `self.text_encoder`, `self.text_encoder_2`, `self.tokenizer`, `self.tokenizer_2`, `self.scheduler`, `self.sdxl`. Use the back-reference; the goal is consolidation, not purity.

**Order matters in `attach`/swap flow:** ModelLoader runs first, then PromptEncoder.attach, then ControlNetHandler.attach, then TRTLifecycle.attach. Document this in a comment at the top of `pipeline._ensure_pipe_loaded`.

**Testing checkpoint:**
1. Cold load each preset.
2. Swap each direction. Verify no double-loaded models in VRAM (`nvidia-smi` while swapping).
3. Verify SDXL fp16 VAE replacement still happens on SDXL-Turbo and DMD2.
4. Verify TAESD eager and TRT both still work.

---

### Extraction 3: `InferenceCore` → `src/scope_streamdiffusion/inference_core.py`
**Methods to move:**
- `_set_timesteps`
- `_initialize_noise`
- `_setup_seed_transition`
- `_slerp_noise`
- `_advance_seed_transition`
- `_cancel_seed_transition`
- `_encode_image`
- `_decode_image`
- `_add_noise`
- `_scheduler_step_batch`
- `_unet_step`
- `_predict_x0_batch`

**State the helper owns:** `alpha_prod_t_sqrt`, `beta_prod_t_sqrt`, `c_skip`, `c_out`, `sub_timesteps_tensor`, `init_noise`, `x_t_latent_buffer`, the seed-transition fields (`_pending_seed`, `_transition_remaining`, etc.).

**Reads (not writes) from pipeline:** `self.pipe.prompts.prompt_embeds`, `self.pipe.unet`, `self.pipe.controlnet`, `self.pipe.controlnet_input`, `self.pipe.vae`, `self.pipe.scheduler`. Pass these through the back-reference.

**`__call__` after this extraction shrinks to roughly:**
```python
def __call__(self, **kwargs):
config = self._validate_config(kwargs)
self._prepare_runtime_state(config)
self.prompts.encode_for_frame(...)
if self.controlnet_handler:
self.controlnet_handler.update(...)
latent = self.inference.run_step(video, config)
return {"video": self.inference.to_scope_format(latent)}
```

**Testing checkpoint:** full smoke test — every model × (txt2img / img2img / loopback) × (eager / xformers / TRT) × (with/without negative prompt) × seed transitions.

## Cross-cutting Rules
- **Don't change behavior.** This is a pure move. If you find a bug, note it in a comment — fix it in a separate commit after the refactor lands.
- **Commit per extraction.** Three commits. Each must pass the testing checkpoint before moving to the next.
- **Don't extract `__init__`, `prepare`, `_prepare_runtime_state`, `__call__`, `get_config_class`, or the schema-driven setters.** These are the orchestrator's job.
- **Don't add abstract base classes or interfaces** for the helpers. Three concrete classes is fine.
- **Don't introduce a `BaseHelper` parent class.** They share a pattern, not behavior.
Loading