[Bug] Fix Batch API echoing request body instead of model responses by ianliuy · Pull Request #2117 · vllm-project/aibrix

ianliuy · 2026-04-17T23:52:34Z

Pull Request Description

Fix a bug where the Batch API's JobDriver writes the original request body into the response body field instead of the actual model inference response. The batch job completes with status_code: 200 but the output JSONL echoes back the input (model, messages, max_tokens) rather than the chat completion result (choices, usage, etc.).

Root cause: The default InferenceEngineClient base class (job_driver.py:37-41) was a test stub that silently echoed request_data back as the "response." When JobDriver received no inference client (inference_client=None), it fell back to this stub without any warning, causing batch outputs to contain request bodies instead of model responses.

Fix:

Make InferenceEngineClient.inference_request() raise NotImplementedError to prevent the base class from being used as a silent echo stub in production
Extract the echo behavior into an explicit MockInferenceEngineClient subclass for test usage
Add a targeted WARNING log in execute_worker() when the mock client is used for actual inference execution (prepare/finalize paths are not affected)
Update test_inference_client_integration.py to use MockInferenceEngineClient explicitly and add a regression test for the NotImplementedError guard

Testing:

All 4 tests in test_inference_client_integration.py pass (including the new test_base_client_raises_not_implemented)
Verified locally via WSL with pytest

Related Issues

Resolves: #1802

cc @Jeffwan @zhangjyr

The default InferenceEngineClient base class silently echoed request_data back as the inference response. When JobDriver received no inference client (inference_client=None), it fell back to this echo stub, causing batch outputs to contain original request bodies instead of model responses. Fix: - Make InferenceEngineClient.inference_request() raise NotImplementedError to prevent silent echo in production - Extract echo behavior into explicit MockInferenceEngineClient subclass for test usage - Add WARNING log in JobDriver.__init__() when falling back to mock client - Update test_inference_client_integration.py to use MockInferenceEngineClient and add test for NotImplementedError guard Resolves: vllm-project#1802 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request refactors the InferenceEngineClient into an abstract base class that raises a NotImplementedError and introduces a MockInferenceEngineClient for testing purposes. The JobDriver is updated to default to the mock client when none is provided, and a warning is logged if the mock client is used during worker execution. Review feedback suggests that defaulting to a mock client in production code could lead to silent failures; it is recommended to leave the client as None by default and implement explicit runtime checks to ensure a valid client is configured before execution.

gemini-code-assist · 2026-04-17T23:54:03Z

        if inference_client is None:
-            self._inference_client = InferenceEngineClient()
+            self._inference_client = MockInferenceEngineClient()
        else:
            self._inference_client = inference_client


Defaulting to a mock client in production code can lead to silent failures or incorrect behavior (like the echoing bug this PR addresses) if the configuration is missing. It is safer to leave the inference client as None by default and explicitly handle the missing client in methods that require it. This forces the caller to provide a valid client for execution tasks while still allowing the JobDriver to be used for non-inference tasks like preparation and finalization.

self._inference_client = inference_client

agree. using mock client doesn't fully solve the problem, I would rather to raise ValueError if the client is None,

gemini-code-assist · 2026-04-17T23:54:03Z

+        if isinstance(self._inference_client, MockInferenceEngineClient):
+            logger.warning(
+                "Using MockInferenceEngineClient for batch execution. "
+                "Inference responses will echo request data instead of model responses. "
+                "Configure a real LLM engine endpoint for production use.",
+                job_id=job_id,
+            )  # type: ignore[call-arg]


Since the inference client is now optional by default, we should explicitly check for its presence before starting the worker execution. This prevents accidental use of a missing client and provides a clear error message. We can also keep the warning for the mock client to alert users if it's being used in a context where a real client might be expected.

Suggested change

if isinstance(self._inference_client, MockInferenceEngineClient):

logger.warning(

"Using MockInferenceEngineClient for batch execution. "

"Inference responses will echo request data instead of model responses. "

"Configure a real LLM engine endpoint for production use.",

job_id=job_id,

) # type: ignore[call-arg]

if self._inference_client is None:

raise RuntimeError(f"Inference client not configured for job {job_id}. Execution cannot proceed.")

if isinstance(self._inference_client, MockInferenceEngineClient):

logger.warning(

"Using MockInferenceEngineClient for batch execution. "

"Inference responses will echo request data instead of model responses. "

"Ensure this is intended for testing purposes.",

job_id=job_id,

) # type: ignore[call-arg]

seem feedback as gemini, explicit throw error is better

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ianliuy mentioned this pull request Apr 17, 2026

Batch requests completed successfully but model responses are empty. #1802

Open

gemini-code-assist Bot reviewed Apr 17, 2026

View reviewed changes

[Bug] Require explicit batch inference client

7abb4f4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Fix Batch API echoing request body instead of model responses#2117

[Bug] Fix Batch API echoing request body instead of model responses#2117
ianliuy wants to merge 2 commits into
vllm-project:mainfrom
ianliuy:fix/issue-1802

ianliuy commented Apr 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Uh oh!

Jeffwan Apr 18, 2026

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Uh oh!

Jeffwan Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ianliuy commented Apr 17, 2026

Pull Request Description

Related Issues

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Jeffwan Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Jeffwan Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants