Skip to content

Conversation

@maxdebayser
Copy link
Collaborator

Description

This PR add step by step execution tests for the chunked prefill scheduler. It tests several scenarios:

  1. Single sequence with several chunks
  2. Two sequences scheduled at the same time, one with 1 chunk and the other with 4
  3. Same as 2 but with interleaving disabled to test this debugging feature
  4. Same as 2 but the second requests arrives later.

The tests show that the interleaving of requests is working as expected.

@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Since we need to use random tokens sometimes they end
up in the cache file and codespell complains about them.

Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
@maxdebayser
Copy link
Collaborator Author

The test failure seems related to state carried over from tests like test_prefill_tkv_too_big. If run together with this test, test_cp_prefill_interleave1 fails locally as well.

@maxdebayser
Copy link
Collaborator Author

bot:test

to reproduce the conditions of the bug that PR #576 fixes
(fix: check only decoding requests in _satisfies_last_chunk_constraints)

Signed-off-by: Max de Bayser <[email protected]>
Copy link
Collaborator

@wallashss wallashss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Max de Bayser <[email protected]>
@pytest.mark.full_model
# These values are all parameterized for test sorting
@pytest.mark.parametrize("max_num_seqs", [2])
@pytest.mark.parametrize("max_model_len", [514])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjohnson31415 , the 514 model len here is what would trigger the hang before your fix.

else:
self._teardown = teardown_method

self._preexisting_max_tkv = os.getenv("VLLM_DT_MAX_BATCH_TKV_LIMIT")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests that modify VLLM_DT_MAX_BATCH_TKV_LIMIT should be using monkeypatch which would automatically restore the value after the test. Can we use that instead of having a special case reset here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's bit more involved than a monkey patch because we have to replicate some logic from platform.py. But it's possible to do so with a fixture if you prefer.

Copy link
Collaborator

@tjohnson31415 tjohnson31415 Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean to set the actual value of VLLM_DT_MAX_BATCH_TKV_LIMIT with monkeypatch but using setenv will register to monkeypatch that it needs to restore the previous value even if something in the test overrides it.

I thought we had an example of this in the code, but I can't find it now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the interaction with the model cache. The setting/resetting of this variable has to be tied to the life cycle of the llm engine but pytest is not aware of that since it's not a fixture. Do you remember if the example you're referring to also dealt with this problem?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't recall the previous case dealing with the model cache. I'm not very familiar with how the caching works...

Copy link
Collaborator

@tjohnson31415 tjohnson31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maxdebayser maxdebayser merged commit 87c69d5 into main Nov 24, 2025
20 checks passed
@maxdebayser maxdebayser deleted the cp_step_tests branch November 24, 2025 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants