[CI]Add vLLM-Omni support to Dockerfile.vllm and sample deployment#2129
[CI]Add vLLM-Omni support to Dockerfile.vllm and sample deployment#2129Lucas-Qian6 wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request integrates vLLM-Omni into the AIBrix build process by updating the Dockerfile, README, and adding a sample Kubernetes deployment for Qwen2.5-Omni. Feedback focuses on improving the Dockerfile's pip installation command for consistency and image size, ensuring image naming in samples matches documentation, and fixing a missing newline in the new YAML file.
There was a problem hiding this comment.
Pull request overview
Adds vLLM-Omni support to the AIBrix vLLM container build and introduces a Kubernetes sample intended to demonstrate running an Omni-capable model via vllm serve --omni.
Changes:
- Extend
build/container/Dockerfile.vllmwith aVLLM_OMNI_VERSIONbuild arg and installvllm-omni. - Update
build/container/README.mdto document the new build argument and note Omni support. - Add a new sample deployment YAML for Qwen2.5-Omni.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| build/container/Dockerfile.vllm | Adds VLLM_OMNI_VERSION arg and installs vllm-omni in the vLLM runtime image. |
| build/container/README.md | Documents the new build arg and updates the image description/release matrix to include vllm-omni. |
| samples/vllm_omni/qwen2_5.yaml | New Deployment sample showing vllm serve ... --omni for Qwen2.5-Omni. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1,4 +1,5 @@ | |||
| ARG VLLM_VERSION=v0.19.0 | |||
| ARG VLLM_OMNI_VERSION=0.19.0rc1 | |||
There was a problem hiding this comment.
The ARG VLLM_OMNI_VERSION=0.19.0rc1 line has trailing whitespace. Please remove trailing spaces to avoid lint/formatting issues and noisy diffs.
|
|
||
| ARG PYTHON_BIN=python3 | ||
| ARG VLLM_VERSION=v0.19.0 | ||
| ARG VLLM_OMNI_VERSION=0.19.0rc1 |
There was a problem hiding this comment.
This ARG VLLM_OMNI_VERSION=0.19.0rc1 redeclaration also has trailing whitespace. Please remove the extra spaces so the Dockerfile stays clean and avoids formatting checks failing.
| RUN ${PYTHON_BIN} -m pip uninstall -y aibrix_kvcache && \ | ||
| ${PYTHON_BIN} -m pip install /tmp/aibrix/python/aibrix_kvcache/dist/*.whl | ||
|
|
||
| RUN pip install vllm-omni==${VLLM_OMNI_VERSION} |
There was a problem hiding this comment.
PYTHON_BIN is configurable for this stage, but this install uses pip directly (which may point at a different interpreter than ${PYTHON_BIN}). Use ${PYTHON_BIN} -m pip install ... here to ensure the package is installed into the intended Python environment.
| RUN pip install vllm-omni==${VLLM_OMNI_VERSION} | |
| RUN ${PYTHON_BIN} -m pip install vllm-omni==${VLLM_OMNI_VERSION} |
| apiVersion: apps/v1 | ||
| kind: Deployment |
There was a problem hiding this comment.
This PR description mentions adding a sample at samples/qwen2-5-omni-deployment.yaml, but the PR actually adds samples/vllm_omni/qwen2_5.yaml. Please align the PR description and/or rename/move the sample so it matches the documented location/name.
| metadata: | ||
| name: qwen25omni-7b | ||
| spec: | ||
| replicas: 1 | ||
| selector: | ||
| matchLabels: | ||
| model.aibrix.ai/name: qwen2-5-omni-7b | ||
| template: | ||
| metadata: | ||
| labels: | ||
| model.aibrix.ai/name: qwen2-5-omni-7b | ||
| model.aibrix.ai/port: "8000" | ||
| model.aibrix.ai/engine: vllm | ||
| spec: |
There was a problem hiding this comment.
Most AIBrix Deployment samples also include metadata.labels (model name/port/engine) and a corresponding Service resource so the model is discoverable/routable (and the Service name can match model.aibrix.ai/name). As written, this sample creates only a Deployment with no Service, so it won’t be reachable via a stable in-cluster endpoint. Consider adding the Service (and aligning labels/namespace with other samples) to make the sample runnable end-to-end.
| spec: | ||
| containers: | ||
| - name: vllm-omni | ||
| image: ghcr.io/vllm-project/aibrix/vllm-omni:v0.19.0 |
There was a problem hiding this comment.
The sample image (ghcr.io/vllm-project/aibrix/vllm-omni:...) doesn’t match the image naming documented in build/container/README.md (which describes aibrix/vllm-openai:... as the enhanced vLLM image). If the intent is to demonstrate the newly updated Dockerfile image, please update the sample to reference the corresponding AIBrix vLLM image/tag (or update the docs to describe/publish the vllm-omni image name).
| image: ghcr.io/vllm-project/aibrix/vllm-omni:v0.19.0 | |
| image: ghcr.io/vllm-project/aibrix/vllm-openai:v0.19.0 |
| | AIBrix KVCache | v0.5.0 | v0.5.0 | KV cache disaggregation support | | ||
| | NIXL | 0.7.1 | 0.7.1 | UCX-based RDMA networking | | ||
| | UCX | 1.19.0 | 1.19.0 | Pre-installed for debugging | | ||
| | vllm-omni |0.19.0rc1| - |Activate with `--omni` at runtime| |
There was a problem hiding this comment.
The new vllm-omni row in the release-history table is missing spacing around columns (e.g., |0.19.0rc1|), which makes the Markdown table harder to read and can render inconsistently. Please format it consistently with the other rows.
| | vllm-omni |0.19.0rc1| - |Activate with `--omni` at runtime| | |
| | vllm-omni | 0.19.0rc1 | - | Activate with `--omni` at runtime | |
Signed-off-by: qianyujun <yq2432@columbia.edu>
|
@Jeffwan Please take a look at this. I guess this will bring redundancy into the image if user just wants to use vLLM. |
|
|
please help clean up the commits and I will spend some time reviewing the change tomorrow |
Signed-off-by: Shen Li <sl6008@columbia.edu>
Signed-off-by: Shen Li <sl6008@columbia.edu>
|
@copilot |
Signed-off-by: Shen Li <sl6008@columbia.edu>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for vllm-omni in the AIBrix vLLM Dockerfile, adding conditional patching logic and an optional installation step for the vllm-omni package. It also includes a new sample Kubernetes deployment for the Qwen2.5-Omni-7B model. Review feedback points out a critical redundancy in the Dockerfile where the original patch application block was not removed, which could lead to build failures or duplicate patching. Additionally, it is suggested that the sample deployment should be updated to reference the AIBrix-built image rather than the upstream image to ensure that the integrated patches and optimizations are utilized.
| # apply patch to vLLM (only if patch file exists for this version) | ||
| RUN PATCH_FILE="/tmp/aibrix/python/aibrix_kvcache/integration/vllm/patches/vllm_${VLLM_VERSION}-aibrix-kvcache.patch" && \ | ||
| if [ -f "$PATCH_FILE" ]; then \ | ||
| DIST_DIR=$(${PYTHON_BIN} -m pip show vllm | grep "Location:" | awk '{print $2}') && \ | ||
| cd $DIST_DIR && \ | ||
| patch -p 1 -l -i $PATCH_FILE; \ | ||
| else \ | ||
| echo "No patch file for vLLM ${VLLM_VERSION} — using monkey-patch mode (PR #2060)"; \ | ||
| fi |
There was a problem hiding this comment.
The new conditional patch logic introduced here is a good improvement for handling missing patch files. However, the original unconditional patch block (lines 57-60) was not removed. This redundancy will cause the build to fail if the patch file is missing (as the second block will still attempt to access it) or result in the patch being applied twice if it exists. Please remove the redundant block at lines 57-60.
| spec: | ||
| containers: | ||
| - name: vllm-omni | ||
| image: vllm/vllm-omni:v0.20.0 # Note: Must use vllm-omni image other than vllm-openai image |
There was a problem hiding this comment.
The deployment uses the upstream vllm/vllm-omni:v0.20.0 image. Since this PR adds vllm-omni support to the AIBrix Dockerfile, the sample should ideally reference the AIBrix-built image (or use a placeholder) to ensure that the AIBrix-specific patches and KV cache optimizations are utilized in the deployment.
Pull Request Description
This PR adds vLLM-Omni support to the existing AIBrix vLLM image by installing
vllm-omnion top of the base vLLM installation. Since vLLM-Omni is a superset of vLLM and exposes the same OpenAI-compatible API, the existing image remains fully backward compatible, so existing vLLM workloads are unaffected. Omni-modality support is activated at runtime by passing--omnitovllm serve. This enables deploying one Omni model in a single machine across multiple GPUs through AIBrix.Changes:
build/container/Dockerfile.vllm: addVLLM_OMNI_VERSIONARG andpip install vllm-omnistepbuild/container/README.md: document newVLLM_OMNI_VERSIONbuild argument and update image descriptionsamples/vllm_omni/qwen2_5.yaml: new sample showing Qwen2.5-Omni deployment with--omniflagRelated Issues
Resolves: Part of #2009