-
Notifications
You must be signed in to change notification settings - Fork 29
Add step tests for chunked prefill #575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
Since we need to use random tokens sometimes they end up in the cache file and codespell complains about them. Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
|
The test failure seems related to state carried over from tests like |
Signed-off-by: Max de Bayser <[email protected]>
|
bot:test |
Signed-off-by: Max de Bayser <[email protected]>
to reproduce the conditions of the bug that PR #576 fixes (fix: check only decoding requests in _satisfies_last_chunk_constraints) Signed-off-by: Max de Bayser <[email protected]>
wallashss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Max de Bayser <[email protected]>
| @pytest.mark.full_model | ||
| # These values are all parameterized for test sorting | ||
| @pytest.mark.parametrize("max_num_seqs", [2]) | ||
| @pytest.mark.parametrize("max_model_len", [514]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tjohnson31415 , the 514 model len here is what would trigger the hang before your fix.
| else: | ||
| self._teardown = teardown_method | ||
|
|
||
| self._preexisting_max_tkv = os.getenv("VLLM_DT_MAX_BATCH_TKV_LIMIT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests that modify VLLM_DT_MAX_BATCH_TKV_LIMIT should be using monkeypatch which would automatically restore the value after the test. Can we use that instead of having a special case reset here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's bit more involved than a monkey patch because we have to replicate some logic from platform.py. But it's possible to do so with a fixture if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mean to set the actual value of VLLM_DT_MAX_BATCH_TKV_LIMIT with monkeypatch but using setenv will register to monkeypatch that it needs to restore the previous value even if something in the test overrides it.
I thought we had an example of this in the code, but I can't find it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the interaction with the model cache. The setting/resetting of this variable has to be tied to the life cycle of the llm engine but pytest is not aware of that since it's not a fixture. Do you remember if the example you're referring to also dealt with this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't recall the previous case dealing with the model cache. I'm not very familiar with how the caching works...
Signed-off-by: Max de Bayser <[email protected]>
tjohnson31415
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This PR add step by step execution tests for the chunked prefill scheduler. It tests several scenarios:
The tests show that the interleaving of requests is working as expected.