Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,22 @@
title: PSOFT
- local: package_reference/pvera
title: PVeRA
- local: package_reference/fourierft
title: FourierFT
- local: package_reference/frod
title: FRoD
- local: package_reference/gralora
title: GraLoRA
- local: package_reference/vblora
title: VB-LoRA
- local: package_reference/hira
title: HiRA
- local: package_reference/hra
title: HRA
- local: package_reference/cpt
title: CPT
- local: package_reference/trainable_tokens
title: Trainable Tokens
- local: package_reference/randlora
title: RandLora
- local: package_reference/road
Expand Down
75 changes: 75 additions & 0 deletions docs/source/package_reference/frod.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
<!--Copyright 2026 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# FRoD: Full-Rank Efficient Fine-Tuning with Rotational Degrees

FRoD is a parameter-efficient fine-tuning method that combines a shared full-rank basis with sparse learnable
rotational degrees. The adapter update is expressed through fixed projection tensors and trainable coefficients, which
allows FRoD to apply full-rank updates while keeping the number of trained parameters small.

Paper: [Full-Rank Efficient Fine-Tuning with Rotational Degrees](https://doi.org/10.1609/aaai.v40i31.39813).

When saving the adapter parameters, it is possible to avoid storing the projection tensors by setting
`save_projection=False` on the `FrodConfig`. In that case, the projections are restored from the base model weights and
the fixed random seed from `projection_prng_key`. This reduces checkpoint size, but the default is
`save_projection=True` to make checkpoint loading independent of regeneration details.

Compared to LoRA, FRoD can express a full-rank update in each adapted linear layer while training only the diagonal
coefficients and a sparse set of off-diagonal rotation coefficients. This can be useful when a low-rank update is too
restrictive. The trade-off is that FRoD computes fixed projection tensors from the base weights during adapter
injection, which makes setup more expensive and the implementation less broadly supported than LoRA.

Projection initialization can be slow on large models because FRoD runs matrix decompositions over the target module
categories before injecting the adapters. A progress bar is shown by default and can be disabled with
`FrodConfig(progressbar=False)`.

For memory-constrained training, `runtime_offload_base_weight=True` keeps target base weights on CPU when the active
FRoD path does not need them. This is opt-in because PEFT methods usually keep all base parameters on the accelerator
after moving the model and after forward passes.

FRoD currently has the following constraint:

- Only `nn.Linear` and `transformers.pytorch_utils.Conv1D` layers are supported.

## Quickstart

```python
from transformers import AutoModelForSequenceClassification

from peft import FrodConfig, TaskType, get_peft_model

model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-uncased", num_labels=2)

peft_config = FrodConfig(
task_type=TaskType.SEQ_CLS,
target_modules=["query", "value"],
modules_to_save=["classifier"],
sparse_rate=0.02,
frod_dropout=0.0,
runtime_offload_base_weight=True,
)

model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
```

## FrodConfig

[[autodoc]] tuners.frod.config.FrodConfig

## FrodModel

[[autodoc]] tuners.frod.model.FrodModel
27 changes: 27 additions & 0 deletions examples/frod_finetuning/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# FRoD fine-tuning examples

These examples show minimal FRoD fine-tuning with the Transformers `Trainer`.

Install the example dependencies and run either script directly:

```bash
pip install -r examples/frod_finetuning/requirements.txt
python examples/frod_finetuning/frod_text_classification.py
python examples/frod_finetuning/frod_image_classification.py
```

The text example fine-tunes `google-bert/bert-base-uncased` on `nyu-mll/glue` with the `sst2` configuration. The image
example fine-tunes `openai/clip-vit-base-patch32` on the train and test parquet splits from `tanganke/stanford_cars`.

Both scripts use separate optimizer learning rates for FRoD diagonal coefficients, FRoD sparse coefficients, and the
classification head. FRoD dropout is set to `0.0` because the sparse rotational parameterization is the main
regularizer in these examples.

To use local mirrors of the image model or dataset, pass the paths as CLI arguments:

```bash
python examples/frod_finetuning/frod_image_classification.py \
--model_name_or_path /path/to/local/clip-vit-model \
--data_dir /path/to/local/stanford_cars \
--output_dir clip-vit-local-frod-stanford-cars
```
180 changes: 180 additions & 0 deletions examples/frod_finetuning/frod_image_classification.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Copyright 2026-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");

import os
from dataclasses import dataclass, field
from typing import Optional

import numpy as np
import torch
from datasets import load_dataset
from transformers import (
AutoImageProcessor,
AutoModelForImageClassification,
HfArgumentParser,
Trainer,
TrainingArguments,
)

from peft import FrodConfig, get_peft_model


@dataclass
class FrodImageArguments:
model_name_or_path: str = field(
default="openai/clip-vit-base-patch32",
metadata={"help": "Model checkpoint used for image classification."},
)
data_dir: Optional[str] = field(
default=None,
metadata={"help": "Optional local Stanford Cars dataset directory containing the parquet data files."},
)
target_modules: list[str] = field(
default_factory=lambda: ["q_proj", "k_proj", "v_proj", "out_proj", "fc1", "fc2"],
metadata={"help": "Module names to replace with FRoD adapters."},
)
sparse_rate: float = field(
default=0.01,
metadata={"help": "Fraction of off-diagonal entries trained in the sparse FRoD matrix."},
)
frod_dropout: float = field(
default=0.0,
metadata={"help": "Dropout probability applied before the FRoD adapter branch."},
)
frod_lambda_l_lr: float = field(
default=5e-4,
metadata={"help": "Learning rate for the trainable diagonal FRoD coefficients."},
)
frod_lambda_s_lr: float = field(
default=5e-5,
metadata={"help": "Learning rate for the trainable sparse FRoD coefficients."},
)
classifier_lr: float = field(default=1e-4, metadata={"help": "Learning rate for the classification head."})
projection_prng_key: int = field(default=3, metadata={"help": "Random seed used for FRoD projection masks."})
runtime_offload_base_weight: bool = field(
default=False,
metadata={"help": "Keep target base weights on CPU when active FRoD training does not need them."},
)


@dataclass
class FrodImageTrainingArguments(TrainingArguments):
output_dir: str = "clip-vit-base-patch32-frod-stanford-cars"
learning_rate: float = 5e-4
per_device_train_batch_size: int = 64
per_device_eval_batch_size: int = 64
num_train_epochs: float = 3
eval_strategy: str = "epoch"
save_strategy: str = "epoch"
load_best_model_at_end: bool = True
metric_for_best_model: str = "accuracy"
lr_scheduler_type: str = "constant"
remove_unused_columns: bool = False
report_to: str = "none"


def main():
parser = HfArgumentParser((FrodImageArguments, FrodImageTrainingArguments))
frod_args, training_args = parser.parse_args_into_dataclasses()

if frod_args.data_dir:
data_files = {
"train": [
os.path.join(frod_args.data_dir, "data", "train-00000-of-00002.parquet"),
os.path.join(frod_args.data_dir, "data", "train-00001-of-00002.parquet"),
],
"test": [
os.path.join(frod_args.data_dir, "data", "test-00000-of-00002.parquet"),
os.path.join(frod_args.data_dir, "data", "test-00001-of-00002.parquet"),
],
}
else:
data_files = {
"train": [
"hf://datasets/tanganke/stanford_cars/data/train-00000-of-00002.parquet",
"hf://datasets/tanganke/stanford_cars/data/train-00001-of-00002.parquet",
],
"test": [
"hf://datasets/tanganke/stanford_cars/data/test-00000-of-00002.parquet",
"hf://datasets/tanganke/stanford_cars/data/test-00001-of-00002.parquet",
],
}

dataset = load_dataset("parquet", data_files=data_files)
train_split = dataset["train"]
eval_split = dataset["test"]
image_processor = AutoImageProcessor.from_pretrained(frod_args.model_name_or_path)
label_feature = train_split.features["label"]
label_names = (
label_feature.names if hasattr(label_feature, "names") else [str(i) for i in sorted(set(train_split["label"]))]
)
id2label = dict(enumerate(label_names))
label2id = {name: idx for idx, name in id2label.items()}

model = AutoModelForImageClassification.from_pretrained(
frod_args.model_name_or_path,
num_labels=len(label_names),
id2label=id2label,
label2id=label2id,
ignore_mismatched_sizes=True,
)
peft_config = FrodConfig(
target_modules=frod_args.target_modules,
modules_to_save=["classifier"],
frod_dropout=frod_args.frod_dropout,
sparse_rate=frod_args.sparse_rate,
projection_prng_key=frod_args.projection_prng_key,
runtime_offload_base_weight=frod_args.runtime_offload_base_weight,
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

def transform(batch):
images = [image.convert("RGB") for image in batch["image"]]
inputs = image_processor(images, return_tensors="pt")
inputs["labels"] = batch["label"]
return inputs

train_dataset = train_split.with_transform(transform)
eval_dataset = eval_split.with_transform(transform)

def collate_fn(examples):
pixel_values = torch.stack([example["pixel_values"] for example in examples])
labels = torch.tensor([example["labels"] for example in examples])
return {"pixel_values": pixel_values, "labels": labels}

def compute_metrics(eval_pred):
predictions = np.argmax(eval_pred.predictions, axis=-1)
return {"accuracy": (predictions == eval_pred.label_ids).mean().item()}

optimizer = torch.optim.AdamW(
[
{
"params": [p for n, p in model.named_parameters() if "frod_lambda_l" in n],
"lr": frod_args.frod_lambda_l_lr,
},
{
"params": [p for n, p in model.named_parameters() if "frod_lambda_s_values" in n],
"lr": frod_args.frod_lambda_s_lr,
},
{"params": [p for n, p in model.named_parameters() if "classifier" in n], "lr": frod_args.classifier_lr},
]
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
data_collator=collate_fn,
compute_metrics=compute_metrics,
optimizers=(optimizer, None),
)
trainer.train()
trainer.evaluate()
model.save_pretrained(training_args.output_dir)


if __name__ == "__main__":
main()
Loading
Loading