Skip to content

Conversation

@susanbao
Copy link
Collaborator

@susanbao susanbao commented Oct 21, 2025

Transfer the code mainly on host offloading from g3 to github:

  • b/816371919, Misc fixes to maxdiffusion: 1. Shards the hidden state input to the model 2. Factor in different flash block sizes to estimate the final padding for q/k/v.
  • b/817363551, Fix fused splash kernel configuration
  • b/817366090, Add remat policy for hidden states and attention activations

Test on v5p-8:

  • FULL Remat, Step time 36718.2 ms, Peak Memory 49.37GB
  • HIDDEN_STATE_WITH_OFFLOAD Remat, Step time 32776.9 ms, Peak Memory 48.72 GB

@github-actions
Copy link

@susanbao susanbao requested a review from entrpn October 22, 2025 15:57
@entrpn entrpn merged commit 9716fc5 into main Oct 22, 2025
3 of 4 checks passed
@susanbao susanbao deleted the sanbao/host branch December 11, 2025 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants