add glora by not-lain · Pull Request #3098 · huggingface/peft

not-lain · 2026-03-13T01:57:20Z

follow up on #780 and #2568
this pr adds GLoRA to the library
i also made a minor colab notebook to test this remotely in which you can find here

…_adapter_config signature (for linting reasons)

BenjaminBossan

Thanks for reviving the GLoRA implementation to PEFT. There is still a lot missing (docs, examples, more tests) but let's focus on the integration for now and work on the rest later.

Unfortunately, this PR seems to be based on the state that PEFT was in when the PR was first suggested. We made several refactors since then, which require different patterns to implement. The good news is that the final result should be much simpler with less code required. I marked the corresponding parts, please check. Maybe this is even something that a coding agent can do if asked to update the implementation and pointed to the most recent PEFT code for reference.

BenjaminBossan · 2026-03-16T16:12:15Z

+
+
+# Refactored GLoraLinear for PEFT compatibility
+class GLoraLinear(GLoraLayer, nn.Linear):


You probably based this PR on the very old original PR. Since then, we had several refactors in PEFT, some of which affect how we designed the adapter layers. Could you please check the latest implementation in tuners/lora/layer or PRs of other recent PEFT additions (e.g. #2851)? Most notably, we now pass the base_layer (i.e. the original layer) and wrap it inside the PEFT layer. To get the results of the base layer, we can then call self.base_layer(x) in the forward call (no need for F.linear(x, self.weight)).

BenjaminBossan · 2026-03-16T16:18:20Z

+                m.bias.requires_grad = True
+
+
+class GLoraModel(BaseTuner):


Similar argument as for GLoraLayer: We have substantially refactored this part of PEFT. The good news is that it should greatly simplify the overall implementation: You only need to define _create_and_replace and _create_new_module, the remaining methods should all be fine when inherited from the parent class. Moreover, we need these class attributes:

prefix: str = "glora_"

tuner_layer_cls = GloraLayer

target_module_mapping = TRANSFORMERS_MODELS_TO_GLORA_TARGET_MODULES_MAPPING

BenjaminBossan · 2026-03-16T16:24:27Z

+        self.glora_Cu: nn.ParameterDict = nn.ParameterDict()
+        self.glora_D: nn.ParameterDict = nn.ParameterDict()
+        self.glora_E: nn.ParameterDict = nn.ParameterDict()
+        self.eval_config: dict[str, dict[str, object]] = {}


Why do we need this? IIUC, this is supposed to be an object that unifies the settings for each parameter, as we have distinct options like "vector" or "constant". I haven't checked all the details, but ideally we would just define during the layer initialization something like:

if config_A_B == "lora": self.glora_Ad[adapter_name] = nn.Linear(...) elif config_A_B == "vector": self.glora_Ad[adapter_name] = ...

The arguments config_A_B etc. should be passed to the __init__ and update_layer methods, directly coming from the GloraConfig.

This change may require some custom nn.Modules but that would be fine. I would really like to avoid the whole prepare_path call during forward and frontload the whole resolution to the initialization. This make the forward/merge/unmerge call simpler to understand and should also be slightly more performant.

This comment is still relevant.

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

BenjaminBossan · 2026-03-23T13:04:16Z

Note that due to another PEFT method being merged, there is now a merge conflict, but it should be straightforward to resolve. Once you're finished, please ping me for another review.

- Updated GloraLayer to inherit from BaseTunerLayer - Enhanced adapter management with new methods and improved type checks. - Refined initialization to ensure compatibility with nn.Linear layers only. - Adjusted merge logic to handle weight and bias more robustly. - Updated tests to skip unsupported quantized layers.

not-lain · 2026-04-20T00:38:34Z

@BenjaminBossan
made the necessary changes mentioned above and made a couple extra changes to match that of test_custom_models.
would appreciate it if you can take a look at this pr.

BenjaminBossan

Thanks for the latest updates. I found that there is still some work left to make GLoRA consistent with other PEFT methods. Please check my comments and don't hesitate to ask if something is unclear.

Also note that in the meantime, we have updated the contribution guidelines for adding new PEFT methods: https://github.com/huggingface/peft/blob/main/docs/source/developer_guides/contributing.md#add-a-new-peft-fine-tuning-method. You already tick many boxes, but please refer to this when it comes to what's still missing.

BenjaminBossan · 2026-04-21T13:18:58Z

+        self.out_features: int
+        base = self.get_base_layer()
+        # Use exact type check: bitsandbytes.Linear4bit subclasses nn.Linear but is not compatible with GLORA math.
+        if type(base) is not nn.Linear:


Let's remove this. The check for the correct type should happen in _create_new_module.

BenjaminBossan · 2026-04-21T13:36:12Z

+            "config_D_E": peft_config.config_D_E,
+        }
+        new_module = GloraLinear(target, **kwargs_glora)
+        new_module.add_adapter(


Again, don't call add_adapter here, new_module = GloraLinear(target, **kwargs_glora) should be enough, as GloraLinear should call self.update_layer.

Also note that, as mentioned earlier, we did a small refactor here, so the peft_config should be passed to GloraLinear's __init__.

BenjaminBossan · 2026-04-21T13:53:18Z

        _skip_if_merging_not_supported(test_name, config_cls, config_kwargs_1)
        _skip_tests_with_multiple_adapters_with_target_parameters(config_cls, config_kwargs_2)

+        config_1 = config_cls(**config_kwargs_1)


This shouldn't be necessary. Why do the results differ?

The issue is that GLoRA's merge is multiplicative, not purely additive, during multi-active forward, each adapter computes its delta against the original frozen W0 thus the formula :

result = W0@x + (W0*A1 + B1)@x + (W0*A2 + B2)@x

But sequential merging mutates W0 in place between adapters:

after adapter1: W0' = W0 + W0*A1 + B1

after adapter2: W0'' = W0' + W0'*A2 + B2

The second merge uses the already-modified W0', meaning W0'*A2 expands to (W0 + W0*A1 + B1)*A2 instead of the original W0*A2

LoRA doesn't have this issue because its merge is purely additive (W0 + delta), so i chose to skip the test for GLoRA for now.

So you're saying that if I have a GLoRA model with two active adapters, it behaves differently when merged vs unmerged? I think this should be treated like a bug. IIUC, it should be possible to implement the merge so that it gives the same result by not merging sequentially but combining the adapters before merging.

That said, if a user wants to merge adapter 1 first and then adapter 2 in a separate call, that would indeed require extra logic, as the results of naively merging the 2nd adapter would not be correct. I see two solutions here:

First unmerge adapter 1, then merge 1 and 2 together.

When merging, check if there is already a merged adapter and raise an error, telling users they need to merge all adapters in one step.

WDYT?

chose to unmerge adapters before merging, didn't want to introduce an edge case for glora
you can review the changes related to this change in 43ceaee

BenjaminBossan

Thanks for the updates. I still found a couple of issues, but the integration is close to being ready. Please take a look.

BenjaminBossan · 2026-06-01T12:36:47Z

        _skip_if_merging_not_supported(test_name, config_cls, config_kwargs_1)
        _skip_tests_with_multiple_adapters_with_target_parameters(config_cls, config_kwargs_2)

+        config_1 = config_cls(**config_kwargs_1)


So you're saying that if I have a GLoRA model with two active adapters, it behaves differently when merged vs unmerged? I think this should be treated like a bug. IIUC, it should be possible to implement the merge so that it gives the same result by not merging sequentially but combining the adapters before merging.

That said, if a user wants to merge adapter 1 first and then adapter 2 in a separate call, that would indeed require extra logic, as the results of naively merging the 2nd adapter would not be correct. I see two solutions here:

First unmerge adapter 1, then merge 1 and 2 together.

When merging, check if there is already a merged adapter and raise an error, telling users they need to merge all adapters in one step.

WDYT?

HuggingFaceDocBuilderDev · 2026-06-05T09:59:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2026-06-05T10:20:30Z

@not-lain Your latest changes look good. I wanted to run the existing tests to see if there are any issues but the linter is complaining. Could you please run make style?

If the tests pass, the next step would be to complete the "Full PR" according to the contribution guideline.

not-lain · 2026-06-05T15:46:46Z

thanks a lot for the follow up, tried addressing the rest of the points from the guideline, could you try running the CI again and let me know if you have any feedback

BenjaminBossan

Thank you for adding the docs, examples, and other little changes. Some tests are failing, but that's just because the error message doesn't quite match, this should be easy to fix.

To complete the PR, could you please also add the tests to the other test files? test_config.py, test_encoder_decoder_models.py, test_feature_extraction_models.py, test_seq_classifier.py.

Also, we recently added a new benchmark for image generation similar to the MetaMath benchmark. It would be a nice addition to add an experiment for that as well.

BenjaminBossan · 2026-06-08T09:34:16Z

+- GLoRA is a superset of LoRA: setting all paths to "lora" recovers standard LoRA.
+- You can use different path types for A/B/C/D/E to experiment with new adaptation strategies.
+- GLoRA supports all standard PEFT adapter management features (add, delete, switch, merge, etc).
+- For unsupported module types, set `target_modules` to linear projections only.


Could you please elaborate on this comment?

BenjaminBossan · 2026-06-08T09:35:13Z

+    import argparse
+
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--base_model", type=str, default="path/to/model")


How about removing the invalid default value?

BenjaminBossan · 2026-06-08T09:36:28Z

                "machine": platform.machine(),
                "processor": platform.processor(),
-                "accelerator": torch_accelerator_module.get_device_name(0) if torch_accelerator_module.is_available() else "N/A",
+                "accelerator": torch_accelerator_module.get_device_name(0)


Please undo unrelated changes in this file.

BenjaminBossan · 2026-06-08T10:04:07Z

I checked the example and it runs.

BenjaminBossan · 2026-06-08T10:05:29Z

Ran the experiment locally. It got 42% test accuracy, which is not top scoring, but expected given the relatively low rank. We can leave it as is and check for better settings in the future.

BenjaminBossan · 2026-06-08T10:20:02Z

+
+# GLoRA
+
+Generalized Low-Rank Adaptation ([GLoRA](https://huggingface.co/papers/2306.07967)) is a PEFT method that generalizes LoRA and related approaches. GLoRA decomposes updates into configurable paths (A, B, C, D, E), where each path can use low-rank, vector, constant, or disabled parameterization depending on the path.


It would be great if you could quickly explain what the four options ("lora", "vector", "constant", "none") mean. Let's also mention that they trade off number of params vs expressiveness.

BenjaminBossan · 2026-06-08T10:23:54Z

I got 36% test accuracy with this one. Given the very small number of trainable parameters, I think it's a nice result.

not-lain added 7 commits March 6, 2026 04:22

add glora

2b76146

update glora layer to match paper implementation, and update tests

6108568

add missing inference_mode parameter in set_adapter method

ff1ded8

add missing safe_merge parameter in merge_and_unload method

b0dd263

fix parameter names in _create_new_module

7a7a820

update _replace_module, _mark_only_adapters_as_trainable and _prepare…

14a2400

…_adapter_config signature (for linting reasons)

fix circular import

c1ed1c3

BenjaminBossan requested changes Mar 16, 2026

View reviewed changes

not-lain and others added 4 commits March 17, 2026 07:55

Update docs/source/package_reference/glora.md

f3272cb

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

GLora -> Glora

a1ecaff

r = 8

b1c0ff7

lower case config params "LoRA" -> "lora"

324173a

not-lain added 10 commits March 23, 2026 15:08

Merge branch 'main' of github.com:huggingface/peft into glora2

894a095

update metadata

912efe4

use literal instead of str in config

a2601a3

remove redundant allow_lora parameter

05f120d

add a reference to the paper

3e4b8c1

add target_module_mapping attribute

a6c9f57

use testing_common and fix all issues from failed tests

45009b0

initialize glora params directly and update readme

9f8352e

lint

93793b1

BenjaminBossan requested changes Apr 21, 2026

View reviewed changes

not-lain added 4 commits May 3, 2026 15:53

Merge upstream/main into glora2

55c04fa

better

0e9b580

reset testing_comming

8dfdbc6

Merge remote-tracking branch 'upstream/main' into glora2

b13e440

BenjaminBossan requested changes Jun 1, 2026

View reviewed changes

not-lain added 10 commits June 1, 2026 20:31

unformatt utils file

7b3201d

GloraLoRAPath -> GloraLoraPath

7388d9c

add reset_parameters method

b03ce91

update imports

695cbb3

better lint

26174e6

avoid unnecessary clones

bd5bba4

add mixed adapter support

66c57fa

less gpu overhead

27f8042

remove unecessary testing function

236f673

fix sequential merge

43ceaee

not-lain requested a review from BenjaminBossan June 3, 2026 07:55

not-lain added 4 commits June 5, 2026 15:42

lint

caa20ab

add runnable example

cfe506a

add method comparison bench

7aa118f

nits

fcdb0d1

BenjaminBossan requested changes Jun 8, 2026

View reviewed changes



		# Refactored GLoraLinear for PEFT compatibility
		class GLoraLinear(GLoraLayer, nn.Linear):


		# GLoRA

		Generalized Low-Rank Adaptation ([GLoRA](https://huggingface.co/papers/2306.07967)) is a PEFT method that generalizes LoRA and related approaches. GLoRA decomposes updates into configurable paths (A, B, C, D, E), where each path can use low-rank, vector, constant, or disabled parameterization depending on the path.

Conversation

not-lain commented Mar 13, 2026

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BenjaminBossan commented Mar 23, 2026

Uh oh!

not-lain commented Apr 20, 2026

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

not-lain May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 5, 2026

Uh oh!

BenjaminBossan commented Jun 5, 2026

Uh oh!

not-lain commented Jun 5, 2026

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

not-lain May 31, 2026 •

edited

Loading