Skip to content

[fal.ai] longlive/PEFT LoRA: mat1/mat2 shape mismatch (768x1536 vs 5120x32) during inference — LoRA rank incompatibility at qkv linear layer #922

@livepeer-tessa

Description

@livepeer-tessa

Summary

The longlive pipeline is crashing with a PEFT LoRA matrix multiplication shape mismatch (768×1536 vs 5120×32) during chunk inference. This occurs deep in the self-attention QKV projection when a LoRA adapter is active, indicating the LoRA rank/target module configuration is incompatible with the current model's projection dimensions.

cc @mjh1 @emranemran

Error Messages

Error in block: (denoise, DenoiseBlock)
Error details: mat1 and mat2 shapes cannot be multiplied (768x1536 and 5120x32)
scope.server.pipeline_processor - ERROR - [067a55be] Error processing chunk for longlive: mat1 and mat2 shapes cannot be multiplied (768x1536 and 5120x32)

Stack Trace

File "/app/src/scope/server/pipeline_processor.py", line 475, in process_chunk
    output_dict = self.pipeline(**call_params)
File "/app/src/scope/core/pipelines/longlive/pipeline.py", line 209, in __call__
    return self._generate(**kwargs)
File "/app/src/scope/core/pipelines/longlive/pipeline.py", line 250, in _generate
    _, self.state = self.blocks(self.components, self.state)
File "/app/.venv/lib/python3.12/site-packages/diffusers/modular_pipelines/modular_pipeline.py", line 932, in __call__
    pipeline, state = block(pipeline, state)
File "/app/src/scope/core/pipelines/wan2_1/blocks/denoise.py", line 185, in __call__
    _, denoised_pred = components.generator(...)
File "/app/src/scope/core/pipelines/wan2_1/components/generator.py", line 207, in _call_model
    return self.model(*args, **accepted)
File "/app/src/scope/core/pipelines/longlive/modules/causal_model.py", line 1425, in forward
File "/app/src/scope/core/pipelines/longlive/modules/causal_model.py", line 1206, in _forward_inference
File "/app/src/scope/core/pipelines/longlive/modules/causal_model.py", line 508, in forward
    self_attn_result = self.self_attn(...)
File "/app/src/scope/core/pipelines/longlive/modules/causal_model.py", line 132, in forward
File "/app/src/scope/core/pipelines/longlive/modules/causal_model.py", line 127, in qkv_fn
File "/app/.venv/lib/python3.12/site-packages/peft/tuners/lora/layer.py", line 807, in forward
File "/app/.venv/lib/python3.12/site-packages/torch/nn/modules/linear.py", line 134, in forward
RuntimeError: mat1 and mat2 shapes cannot be multiplied (768x1536 and 5120x32)

Root Cause Analysis

The error originates in peft/tuners/lora/layer.py — the PEFT LoRA down-projection linear layer receives an input of shape 768×1536 but its weight matrix is 5120×32 (rank-32, 5120-dim model). This means:

  • The model's QKV projection input dimension is 1536 (e.g. heads × head_dim from the LongLive/Wan2.1-1.3B architecture)
  • The LoRA was trained/configured expecting an input dimension of 5120 (the full Wan2.1-5B architecture)

The LoRA rank-32 adapter was trained for the 5B parameter variant but is being loaded into the 1.3B model. Despite the LoRA file loading successfully, the adapter dimensions are incompatible at runtime.

Session Context

Session 067a55be had loaded params:

{
  "loras": [{"path": "/tmp/.daydream-scope/assets/lora/SUPERSUISH_LoRA_V1_000000750.safetensors", "scale": 2, "merge_mode": "permanent_merge"}],
  "lora_merge_mode": "permanent_merge"
}

The LoRA loaded successfully (log: "load_adapter: Loaded adapter 'SUPERSUISH_LoRA_V1_000000750' in 0.407s") but then fails at first inference.

Frequency (last 12h, 2026-04-12 06:09 – 18:09 UTC)

  • ~156 occurrences in session 067a55be
  • Time window: 14:41–14:55 UTC
  • App: github_f1lhgmk5v76a0ev1w0u378by-scope-app--prod

Impact

The pipeline produces no output for the duration of the session while continuing to consume GPU resources.

Suggested Fix

  1. Dimension validation at LoRA load time: Check that the LoRA's down-projection input dimension matches the model's hidden dim. If mismatched, reject with a user-friendly error instead of loading and failing at inference.
  2. Architecture detection: The LoRA loader (peft_lora.py) should detect whether the LoRA was trained for 1.3B vs 5B and refuse incompatible adapters.
  3. User-facing message: Surface something like "LoRA 'SUPERSUISH_LoRA_V1_000000750' is incompatible with the selected model size" rather than a cryptic runtime crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions