[fal.ai] longlive/PEFT LoRA: mat1/mat2 shape mismatch (768x1536 vs 5120x32) during inference — LoRA rank incompatibility at qkv linear layer

## Summary

The `longlive` pipeline is crashing with a PEFT LoRA matrix multiplication shape mismatch (768×1536 vs 5120×32) during chunk inference. This occurs deep in the self-attention QKV projection when a LoRA adapter is active, indicating the LoRA rank/target module configuration is incompatible with the current model's projection dimensions.

cc @mjh1 @emranemran

## Error Messages

```
Error in block: (denoise, DenoiseBlock)
Error details: mat1 and mat2 shapes cannot be multiplied (768x1536 and 5120x32)
```

```
scope.server.pipeline_processor - ERROR - [067a55be] Error processing chunk for longlive: mat1 and mat2 shapes cannot be multiplied (768x1536 and 5120x32)
```

## Stack Trace

```
File "/app/src/scope/server/pipeline_processor.py", line 475, in process_chunk
    output_dict = self.pipeline(**call_params)
File "/app/src/scope/core/pipelines/longlive/pipeline.py", line 209, in __call__
    return self._generate(**kwargs)
File "/app/src/scope/core/pipelines/longlive/pipeline.py", line 250, in _generate
    _, self.state = self.blocks(self.components, self.state)
File "/app/.venv/lib/python3.12/site-packages/diffusers/modular_pipelines/modular_pipeline.py", line 932, in __call__
    pipeline, state = block(pipeline, state)
File "/app/src/scope/core/pipelines/wan2_1/blocks/denoise.py", line 185, in __call__
    _, denoised_pred = components.generator(...)
File "/app/src/scope/core/pipelines/wan2_1/components/generator.py", line 207, in _call_model
    return self.model(*args, **accepted)
File "/app/src/scope/core/pipelines/longlive/modules/causal_model.py", line 1425, in forward
File "/app/src/scope/core/pipelines/longlive/modules/causal_model.py", line 1206, in _forward_inference
File "/app/src/scope/core/pipelines/longlive/modules/causal_model.py", line 508, in forward
    self_attn_result = self.self_attn(...)
File "/app/src/scope/core/pipelines/longlive/modules/causal_model.py", line 132, in forward
File "/app/src/scope/core/pipelines/longlive/modules/causal_model.py", line 127, in qkv_fn
File "/app/.venv/lib/python3.12/site-packages/peft/tuners/lora/layer.py", line 807, in forward
File "/app/.venv/lib/python3.12/site-packages/torch/nn/modules/linear.py", line 134, in forward
RuntimeError: mat1 and mat2 shapes cannot be multiplied (768x1536 and 5120x32)
```

## Root Cause Analysis

The error originates in `peft/tuners/lora/layer.py` — the PEFT LoRA down-projection linear layer receives an input of shape `768×1536` but its weight matrix is `5120×32` (rank-32, 5120-dim model). This means:

- The model's QKV projection input dimension is **1536** (e.g. heads × head_dim from the LongLive/Wan2.1-1.3B architecture)
- The LoRA was trained/configured expecting an input dimension of **5120** (the full Wan2.1-5B architecture)

The LoRA rank-32 adapter was trained for the **5B parameter** variant but is being loaded into the **1.3B** model. Despite the LoRA file loading successfully, the adapter dimensions are incompatible at runtime.

## Session Context

Session `067a55be` had loaded params:
```json
{
  "loras": [{"path": "/tmp/.daydream-scope/assets/lora/SUPERSUISH_LoRA_V1_000000750.safetensors", "scale": 2, "merge_mode": "permanent_merge"}],
  "lora_merge_mode": "permanent_merge"
}
```

The LoRA loaded successfully (log: *"load_adapter: Loaded adapter 'SUPERSUISH_LoRA_V1_000000750' in 0.407s"*) but then fails at first inference.

## Frequency (last 12h, 2026-04-12 06:09 – 18:09 UTC)

- **~156 occurrences** in session `067a55be`
- Time window: 14:41–14:55 UTC
- App: `github_f1lhgmk5v76a0ev1w0u378by-scope-app--prod`

## Impact

The pipeline produces no output for the duration of the session while continuing to consume GPU resources.

## Suggested Fix

1. **Dimension validation at LoRA load time:** Check that the LoRA's down-projection input dimension matches the model's hidden dim. If mismatched, reject with a user-friendly error instead of loading and failing at inference.
2. **Architecture detection:** The LoRA loader (`peft_lora.py`) should detect whether the LoRA was trained for 1.3B vs 5B and refuse incompatible adapters.
3. **User-facing message:** Surface something like *"LoRA 'SUPERSUISH_LoRA_V1_000000750' is incompatible with the selected model size"* rather than a cryptic runtime crash.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fal.ai] longlive/PEFT LoRA: mat1/mat2 shape mismatch (768x1536 vs 5120x32) during inference — LoRA rank incompatibility at qkv linear layer #922

Summary

Error Messages

Stack Trace

Root Cause Analysis

Session Context

Frequency (last 12h, 2026-04-12 06:09 – 18:09 UTC)

Impact

Suggested Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[fal.ai] longlive/PEFT LoRA: mat1/mat2 shape mismatch (768x1536 vs 5120x32) during inference — LoRA rank incompatibility at qkv linear layer #922

Description

Summary

Error Messages

Stack Trace

Root Cause Analysis

Session Context

Frequency (last 12h, 2026-04-12 06:09 – 18:09 UTC)

Impact

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions