-
Notifications
You must be signed in to change notification settings - Fork 222
Streamline KD & QAD transformers Trainers #708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #708 +/- ##
=======================================
Coverage 74.69% 74.69%
=======================================
Files 192 192
Lines 18946 18946
=======================================
Hits 14152 14152
Misses 4794 4794 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
examples/llm_qat/README.md
Outdated
| > **_NOTE:_** `launch.sh` defaults to use `LlamaDecoderLayer` as the transformer layer class. If your model uses a different class, you need to pass `--fsdp_transformer_layer_cls_to_wrap <your_layer_class>` to the `launch.sh` script. For example, for `Qwen/Qwen3-8B`, specify `--fsdp_transformer_layer_cls_to_wrap Qwen3DecoderLayer` as an additional argument. | ||
| > **_NOTE:_** The script defaults to using FSDP1. To use FSDP2, pass "--use_fsdp2 True" to the `launch.sh` script. Note that FSDP2 is less stable than FSDP1 currently. Use it with caution. | ||
| > **_NOTE:_** The script defaults to using FSDP1. To use FSDP2, pass "--backend=fsdp2" to the `launch.sh` script. Note that FSDP2 is less stable than FSDP1 currently. Use it with caution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this statement still valid? Note that FSDP2 is less stable than FSDP1 currently. Use it with caution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it, but I don't have proof. I don't have proof that it is less stable either.
Signed-off-by: Asha Anoosheh <[email protected]>
Signed-off-by: Asha Anoosheh <[email protected]>
Signed-off-by: Asha Anoosheh <[email protected]>
Signed-off-by: Asha Anoosheh <[email protected]>
Signed-off-by: Asha Anoosheh <[email protected]>
Signed-off-by: Asha Anoosheh <[email protected]>
bde5788 to
190e4d2
Compare
| model = model.export() if export_student else model | ||
| super().save_model(output_dir, _internal_call, *args, **kwargs) | ||
| with model.hide_teacher_model(), model.hide_loss_modules(enable=not _internal_call): | ||
| if _internal_call: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to save teacher if _internal_call = True? Can we avoid saving teacher irrespective of _internal_call value? This will accelerate checkpoint save/load (you would likely need to hide teach during the final checkpoint load as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you misread it. Teacher is indeed always hidden.
Co-authored-by: realAsma <[email protected]> Signed-off-by: Asha Anoosheh <[email protected]>
What does this PR do?
Type of change: ? Refactor and stabilization
Overview:
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information