Skip to content

LoraConfig(bias="lora_only") trains base_layer.bias but save_pretrained/get_peft_model_state_dict drops it #3306

@hlc1209

Description

@hlc1209

System Info

Reproduced with:

  • peft: 0.19.1
  • accelerate: 1.13.0
  • transformers: 5.10.2
  • torch: 2.12.0+cu132
  • safetensors: 0.7.0
  • Python: 3.10.20
  • Platform: Linux / WSL2, x86_64
  • CUDA available: yes, CUDA 13.2

I also reproduced this against PEFT main from source:

  • commit: aa2b673
  • version reported: 0.19.2.dev0

Who can help?

@BenjaminBossan @githubnemo

Reproduction

LoraConfig(bias="lora_only") correctly marks the wrapped base layer bias as trainable, but get_peft_model_state_dict() and save_pretrained() do not include that trained bias. Reloading the adapter therefore does not reproduce the trained model output.

Minimal reproducer:

import os
import tempfile

import torch
from torch import nn
from peft import LoraConfig, get_peft_model, PeftModel
from peft.utils import get_peft_model_state_dict
from safetensors.torch import load_file


class Toy(nn.Module):
    def __init__(self):
        super().__init__()
        self.proj = nn.Linear(4, 3, bias=True)

    def forward(self, x):
        return self.proj(x)


x = torch.randn(2, 4)


def run(bias):
    torch.manual_seed(123)
    model = get_peft_model(
        Toy(),
        LoraConfig(
            r=2,
            lora_alpha=2,
            target_modules=["proj"],
            bias=bias,
        ),
    )

    # Simulate training by changing all trainable tensors.
    torch.manual_seed(456)
    with torch.no_grad():
        for name, param in model.named_parameters():
            if param.requires_grad:
                param.add_(torch.randn_like(param) * 0.5)

    trainable = [name for name, param in model.named_parameters() if param.requires_grad]

    with torch.no_grad():
        ref = model(x).detach().clone()

    state_dict = get_peft_model_state_dict(model)
    state_bias_keys = [key for key in state_dict if "bias" in key]

    with tempfile.TemporaryDirectory() as tmpdir:
        model.save_pretrained(tmpdir, safe_serialization=True)

        saved = load_file(os.path.join(tmpdir, "adapter_model.safetensors"))
        saved_bias_keys = [key for key in saved if "bias" in key]

        torch.manual_seed(123)
        reloaded = PeftModel.from_pretrained(Toy(), tmpdir, is_trainable=False)

        with torch.no_grad():
            diff = (reloaded(x) - ref).abs().max().item()

    print(f"bias={bias!r}")
    print("  trainable:", trainable)
    print("  get_peft_model_state_dict bias keys:", state_bias_keys)
    print("  saved bias keys:", saved_bias_keys)
    print(f"  roundtrip max diff: {diff:.6f}")


for bias in ["none", "lora_only", "all"]:
    run(bias)

Observed output:

bias='none'
  trainable: ['base_model.model.proj.lora_A.default.weight', 'base_model.model.proj.lora_B.default.weight']
  get_peft_model_state_dict bias keys: []
  saved bias keys: []
  roundtrip max diff: 0.000000

bias='lora_only'
  trainable: ['base_model.model.proj.base_layer.bias', 'base_model.model.proj.lora_A.default.weight', 'base_model.model.proj.lora_B.default.weight']
  get_peft_model_state_dict bias keys: []
  saved bias keys: []
  roundtrip max diff: 1.215291

bias='all'
  trainable: ['base_model.model.proj.base_layer.bias', 'base_model.model.proj.lora_A.default.weight', 'base_model.model.proj.lora_B.default.weight']
  get_peft_model_state_dict bias keys: ['base_model.model.proj.base_layer.bias']
  saved bias keys: ['base_model.model.proj.base_layer.bias']
  roundtrip max diff: 0.000000

The likely cause seems to be this logic in peft/utils/save_and_load.py:

bias_name = k.split("lora_")[0] + "bias"

For a key such as:

base_model.model.proj.lora_A.default.weight

this constructs:

base_model.model.proj.bias

but the actual trained bias key is:

base_model.model.proj.base_layer.bias

so the bias is never included in the adapter state dict.

Expected behavior

When LoraConfig(bias="lora_only") marks a wrapped layer's base_layer.bias as trainable, that bias should be included by get_peft_model_state_dict() and saved by save_pretrained().

Reloading the saved adapter with PeftModel.from_pretrained() should reproduce the original PEFT model output, as it does with bias="all".

Alternatively, if exporting bias="lora_only" is not intended to be supported for the current tuner-layer structure, PEFT should warn or raise instead of silently dropping trained parameters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions