Skip to content

Add fused AdamW option and warn on torch attention mask memory usage#1270

Merged
SWivid merged 2 commits intoSWivid:mainfrom
ZhikangNiu:main
Mar 4, 2026
Merged

Add fused AdamW option and warn on torch attention mask memory usage#1270
SWivid merged 2 commits intoSWivid:mainfrom
ZhikangNiu:main

Conversation

@ZhikangNiu
Copy link
Copy Markdown
Collaborator

  • add optim.use_fused_adamw to training configs, train.py, finetune CLI and Gradio
  • disallow enabling bnb_optimizer and use_fused_adamw at the same time
  • warn when attn_mask_enabled=True with attn_backend=torch due to high GPU memory usage

- add optim.use_fused_adamw to training configs, train.py, finetune CLI and Gradio

- disallow enabling bnb_optimizer and use_fused_adamw at the same time

- warn when attn_mask_enabled=True with attn_backend=torch due to high GPU memory usage
@ZhikangNiu
Copy link
Copy Markdown
Collaborator Author

cc @SWivid

@SWivid SWivid merged commit b5ab1af into SWivid:main Mar 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants