-
Notifications
You must be signed in to change notification settings - Fork 63
Prefill+decode gpt oss #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+1,805
−174
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
5338048 to
a8ebc0f
Compare
d856cd9 to
e8d1128
Compare
626dbda to
e8d1128
Compare
Signed-off-by: vbaddi <[email protected]> Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: vbaddi <[email protected]> Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: vbaddi <[email protected]> Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: vbaddi <[email protected]> Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: vbaddi <[email protected]> Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: vbaddi <[email protected]> Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Mamta Singh <[email protected]> Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Mamta Singh <[email protected]>
Signed-off-by: Mamta Singh <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
…fill seq_len for prefill_only gpt_oss model Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
…able_chunking flag to get_specialization for gpt-oss Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
…taining full KV for decode-only model Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
aabd446 to
efd671a
Compare
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Mamta Singh <[email protected]>
cc5183f to
502d289
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We should be using disaggragate serving for GPTOSS model for best performance
Prefill-only model
Blocking default behviour when
prefill_only=Truein compile APIChunking pass
enable_chunking=Trueandprefill_only=Truein compile APIkv_cache_batch_size=<int>in compile APIDecode-only model
Retain Sliding window length of KV for sliding window layers, default behavour when
prefill_seq_len=1in compile APIcontinous_batching=Trueinfrom_pretrainedcall and strictly passfull_batch_size=<int>and optinallykv_cache_batch_size=<int>if neededFull KV for sliding window layers pass
retain_full_kv=Truealong withprefill_seq_len=1in compile APIcontinous_batching=Trueinfrom_pretrainedcall and strictly passfull_batch_size=<int>and optinallykv_cache_batch_size=<int>if neededNOTE:
use_onnx_subfunctions=Trueso avoid using itnode_precision_info=<path to file>use_onnx_subfunctions=Truewith prefill-only model, otherwise the compilation times are too high, with this the model is supposed to export and fail during compile as it needs assert sdk, so user is supposed to run this compilation manually by pasting the command printed in the error