Commit 40e0639
Interleave chunked prefills with single decoding steps (#558)
# Description
* Deprioritize chunked prefill when previous step was already a prefill
and requests are decoding
* Evaluate blocks condition at the time of first chunked prefill
* Evaluate remaining conditions at the time of last chunked prefill
* Tests will be implemented other PRs:
* interleaving logic correctness: e.g. that after step X which was a
prefill, step X+1 is a decode
* ✅ DONE constraint correctness: verify that we cannot schedule first
chunked prefill when there isn't enough blocks
---------
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Co-authored-by: Travis Johnson <[email protected]>1 parent bf07727 commit 40e0639
File tree
3 files changed
+217
-77
lines changed- vllm_spyre
- v1
- core
- worker
3 files changed
+217
-77
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
172 | 173 | | |
173 | 174 | | |
174 | 175 | | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
175 | 182 | | |
176 | 183 | | |
177 | 184 | | |
| |||
0 commit comments