Fix bias gradient computations #76

baberabb · 2025-11-14T18:53:14Z

Couple of bug fixes to do with bias:

The gradients were contracted over both the batch and sequence dimension (dim=(0,1)), rather than just the sequence (dim=1).
Normalize weights with Adam before concatenating bias to avoid shape mismatch ([N, O, I+1] / [O, I] division error). ~~The biases are currently concatenated raw, as I wasn't sure the best way to handle them. More in comment.~~

update:

Added bias_avg_sq field to AdafactorNormalizer and AdamNormalizer to keep track of the bias second moments so we can handle bias normalization separately from weight gradients in AdafactorNormalizer.normalize_():
- Normalize bias from raw gradient G before weight processing
- Sum bias gradients over sequence dimension
- Append normalized bias as extra column when include_bias=True

Modified GradientCollectorCallback (with help from claude):

Extract bias second moments from both adam and adafactor optimziers
added scale_by_lr(lr) method to AdafactorNormalizer (also fixes bug where optimizer state tensors were being modified in-place)
added test_optimizer_state_extraction

Also added some unit tests. #75 should probably be merged before this.

Someone better at linear algebra than me should probably have a look at this as well.

baberabb · 2025-11-14T18:56:12Z

bergson/gradients.py

-                        .unsqueeze(2)
-                        .expand(P.shape[0], -1, 1),
-                    ],
+                    [P, G.sum(dim=1).unsqueeze(2)],  # [N, S, O] -> [N, O]  # [N, O, 1]


Currently this just concatenates raw bias gradients to the normalized weights. Adam does have second moments for bias, but to use them we would need to expose them through the Normalizer. Also wasn't sure if there's a linear algebra trick I'm missing. @norabelrose

luciaquirke · 2025-11-14T22:59:11Z

This is fabulous, thank you!! 🙏 Interested to hear what Nora thinks but I reckon exposing second moments for bias through the normalizer would be great

# Conflicts: # pyproject.toml

luciaquirke · 2025-11-17T22:06:06Z

Running

pip install -e ".[dev]"
pre-commit install

Should add formatting on commit, let me know if that doesn't work for some reason

baberabb · 2025-11-17T22:52:28Z

Running
pip install -e ".[dev]"
pre-commit install
Should add formatting on commit, let me know if that doesn't work for some reason

oh yeah, it was a problem with the ruff linter. it doesn't fix line length errors (leaves that to the formatter). Will add black back.

baberabb and others added 4 commits November 11, 2025 17:31

add req

3ea0b9c

Merge branch 'EleutherAI:main' into main

02c7341

ci: add unit tests workflow and restrict build to upstream repo

e1bb103

nit

50d4d0d

baberabb commented Nov 14, 2025

View reviewed changes

baberabb added 2 commits November 17, 2025 18:56

Merge branch 'main' into ci

9a8069b

# Conflicts: # pyproject.toml

nit

78fa643

baberabb force-pushed the bias branch from 9461cbf to 4722e6e Compare November 17, 2025 14:01

use ruff-format

45841c2

baberabb force-pushed the bias branch from 6a8756a to 0a1cdb2 Compare November 17, 2025 23:43

baberabb added 4 commits November 18, 2025 05:29

fix typing

b2c1017

fix: sum bias gradients over sequence dim only, not batch + tests

40d05a4

fix: add normalizer bias support. fix trainer callback. add tests

e5f6869

add scale_by_lr method to AdafactorNormalizer

31e5008

baberabb force-pushed the bias branch from 0a1cdb2 to 31e5008 Compare November 18, 2025 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bias gradient computations #76

Fix bias gradient computations #76

Uh oh!

baberabb commented Nov 14, 2025 •

edited

Loading

Uh oh!

baberabb Nov 14, 2025 •

edited

Loading

Uh oh!

luciaquirke commented Nov 14, 2025

Uh oh!

luciaquirke commented Nov 17, 2025

Uh oh!

baberabb commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix bias gradient computations #76

Are you sure you want to change the base?

Fix bias gradient computations #76

Uh oh!

Conversation

baberabb commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baberabb Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luciaquirke commented Nov 14, 2025

Uh oh!

luciaquirke commented Nov 17, 2025

Uh oh!

baberabb commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

baberabb commented Nov 14, 2025 •

edited

Loading

baberabb Nov 14, 2025 •

edited

Loading