[CI]Add vLLM-Omni support to Dockerfile.vllm and sample deployment by Lucas-Qian6 · Pull Request #2129 · vllm-project/aibrix

Lucas-Qian6 · 2026-04-22T20:33:02Z

Pull Request Description

This PR adds vLLM-Omni support to the existing AIBrix vLLM image by installing vllm-omni on top of the base vLLM installation. Since vLLM-Omni is a superset of vLLM and exposes the same OpenAI-compatible API, the existing image remains fully backward compatible, so existing vLLM workloads are unaffected. Omni-modality support is activated at runtime by passing --omni to vllm serve. This enables deploying one Omni model in a single machine across multiple GPUs through AIBrix.

Changes:

build/container/Dockerfile.vllm: add VLLM_OMNI_VERSION ARG and pip install vllm-omni step
build/container/README.md: document new VLLM_OMNI_VERSION build argument and update image description
samples/vllm_omni/qwen2_5.yaml: new sample showing Qwen2.5-Omni deployment with --omni flag

Related Issues

Resolves: Part of #2009

gemini-code-assist

Code Review

This pull request integrates vLLM-Omni into the AIBrix build process by updating the Dockerfile, README, and adding a sample Kubernetes deployment for Qwen2.5-Omni. Feedback focuses on improving the Dockerfile's pip installation command for consistency and image size, ensuring image naming in samples matches documentation, and fixing a missing newline in the new YAML file.

Copilot

Pull request overview

Adds vLLM-Omni support to the AIBrix vLLM container build and introduces a Kubernetes sample intended to demonstrate running an Omni-capable model via vllm serve --omni.

Changes:

Extend build/container/Dockerfile.vllm with a VLLM_OMNI_VERSION build arg and install vllm-omni.
Update build/container/README.md to document the new build argument and note Omni support.
Add a new sample deployment YAML for Qwen2.5-Omni.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
build/container/Dockerfile.vllm	Adds `VLLM_OMNI_VERSION` arg and installs `vllm-omni` in the vLLM runtime image.
build/container/README.md	Documents the new build arg and updates the image description/release matrix to include `vllm-omni`.
samples/vllm_omni/qwen2_5.yaml	New Deployment sample showing `vllm serve ... --omni` for Qwen2.5-Omni.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T20:37:56Z

@@ -1,4 +1,5 @@
 ARG VLLM_VERSION=v0.19.0
+ARG VLLM_OMNI_VERSION=0.19.0rc1   


The ARG VLLM_OMNI_VERSION=0.19.0rc1 line has trailing whitespace. Please remove trailing spaces to avoid lint/formatting issues and noisy diffs.

Copilot · 2026-04-22T20:37:57Z


 ARG PYTHON_BIN=python3
 ARG VLLM_VERSION=v0.19.0
+ARG VLLM_OMNI_VERSION=0.19.0rc1   


This ARG VLLM_OMNI_VERSION=0.19.0rc1 redeclaration also has trailing whitespace. Please remove the extra spaces so the Dockerfile stays clean and avoids formatting checks failing.

Copilot · 2026-04-22T20:37:57Z

 RUN ${PYTHON_BIN} -m pip uninstall -y aibrix_kvcache && \
    ${PYTHON_BIN} -m pip install /tmp/aibrix/python/aibrix_kvcache/dist/*.whl

+RUN pip install vllm-omni==${VLLM_OMNI_VERSION}


PYTHON_BIN is configurable for this stage, but this install uses pip directly (which may point at a different interpreter than ${PYTHON_BIN}). Use ${PYTHON_BIN} -m pip install ... here to ensure the package is installed into the intended Python environment.

Suggested change

RUN pip install vllm-omni==${VLLM_OMNI_VERSION}

RUN ${PYTHON_BIN} -m pip install vllm-omni==${VLLM_OMNI_VERSION}

Copilot · 2026-04-22T20:37:57Z

+apiVersion: apps/v1
+kind: Deployment


This PR description mentions adding a sample at samples/qwen2-5-omni-deployment.yaml, but the PR actually adds samples/vllm_omni/qwen2_5.yaml. Please align the PR description and/or rename/move the sample so it matches the documented location/name.

Copilot · 2026-04-22T20:37:58Z

+metadata:
+  name: qwen25omni-7b
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      model.aibrix.ai/name: qwen2-5-omni-7b
+  template:
+    metadata:
+      labels:
+        model.aibrix.ai/name: qwen2-5-omni-7b
+        model.aibrix.ai/port: "8000"
+        model.aibrix.ai/engine: vllm
+    spec:


Most AIBrix Deployment samples also include metadata.labels (model name/port/engine) and a corresponding Service resource so the model is discoverable/routable (and the Service name can match model.aibrix.ai/name). As written, this sample creates only a Deployment with no Service, so it won’t be reachable via a stable in-cluster endpoint. Consider adding the Service (and aligning labels/namespace with other samples) to make the sample runnable end-to-end.

Copilot · 2026-04-22T20:37:58Z

+    spec:
+      containers:
+        - name: vllm-omni
+          image: ghcr.io/vllm-project/aibrix/vllm-omni:v0.19.0


The sample image (ghcr.io/vllm-project/aibrix/vllm-omni:...) doesn’t match the image naming documented in build/container/README.md (which describes aibrix/vllm-openai:... as the enhanced vLLM image). If the intent is to demonstrate the newly updated Dockerfile image, please update the sample to reference the corresponding AIBrix vLLM image/tag (or update the docs to describe/publish the vllm-omni image name).

Suggested change

image: ghcr.io/vllm-project/aibrix/vllm-omni:v0.19.0

image: ghcr.io/vllm-project/aibrix/vllm-openai:v0.19.0

Copilot · 2026-04-22T20:37:58Z

 | AIBrix KVCache | v0.5.0  | v0.5.0       | KV cache disaggregation support |
 | NIXL           | 0.7.1   | 0.7.1        | UCX-based RDMA networking       |
 | UCX            | 1.19.0  | 1.19.0       | Pre-installed for debugging     |
+| vllm-omni      |0.19.0rc1|      -       |Activate with `--omni` at runtime|


The new vllm-omni row in the release-history table is missing spacing around columns (e.g., |0.19.0rc1|), which makes the Markdown table harder to read and can render inconsistently. Please format it consistently with the other rows.

Suggested change

| vllm-omni |0.19.0rc1| - |Activate with `--omni` at runtime|

| vllm-omni | 0.19.0rc1 | - | Activate with `--omni` at runtime |

Lucas-Qian6 · 2026-04-22T21:10:00Z

@copilot

Signed-off-by: qianyujun <yq2432@columbia.edu>

Lucas-Qian6 · 2026-04-22T23:51:22Z

@Jeffwan Please take a look at this. I guess this will bring redundancy into the image if user just wants to use vLLM.

varungup90 · 2026-04-23T08:09:27Z

⚠️ Critical Issues

Unconditional Installation of vllm-omni
Currently, vllm-omni is installed by default in every AIBrix vLLM image. This adds unnecessary disk space, potential dependency conflicts, and longer CI build times for users who do not require Omni support. This should be opt-in.
- Suggested Fix: Use a build-guard with an empty default:
```
ARG VLLM_OMNI_VERSION=""
RUN if [ -n "${VLLM_OMNI_VERSION}" ]; then \
      ${PYTHON_BIN} -m pip install --no-cache-dir vllm-omni==${VLLM_OMNI_VERSION}; \
    fi
```
Pre-release (RC) as Default Version
VLLM_OMNI_VERSION defaults to 0.19.0rc1. Shipping a release candidate as the default in a production-oriented image is risky. Please pin this to a stable release or explicitly label the PR/image as experimental.
Version and Naming Inconsistency
There is a mismatch between VLLM_VERSION (defaults to v0.19.0 with a v prefix) and VLLM_OMNI_VERSION (defaults to 0.19.0rc1 without the prefix). These should follow a consistent naming convention and, ideally, be kept in lockstep to avoid compatibility issues.

🟠 Moderate Issues

Incorrect Image Tag in Sample YAML
The sample references aibrix/vllm-openai:v0.19.0. This appears to be the upstream tag rather than the AIBrix-extended image. If the user pulls the standard upstream image, the --omni flag will fail because vllm-omni won't be present. Please update this to reference the AIBrix-published tag.
Missing Model Volume / Persistence
The Qwen2.5-Omni-7B sample currently downloads the model from HuggingFace on every pod start. To align with other samples in this repo (which use hostPath or PVC), please add a volume mount or document the requirement for a HUGGING_FACE_HUB_TOKEN.
Undocumented --omni Flag
The --omni flag is not standard in upstream vLLM documentation. To assist operators, please add a comment in the sample YAML or a note in the README explaining what this flag enables and which version of vllm-omni is required.

🟢 Minor Issues

README Historical Inconsistency
The README updates imply that the existing v0.10.2 image includes vLLM-Omni. Since that image was built prior to this PR, it does not contain the package. The documentation should clarify that only images built after this change will include Omni support.
Table Formatting
In the README release history table, the new vllm-omni row has misaligned columns (specifically a blank second column). Please format this to match the surrounding rows for better readability.

Jeffwan · 2026-04-27T14:25:54Z

please help clean up the commits and I will spend some time reviewing the change tomorrow

Signed-off-by: Shen Li <sl6008@columbia.edu>

jasonlee-1024 · 2026-05-15T20:00:37Z

@copilot

Signed-off-by: Shen Li <sl6008@columbia.edu>

jasonlee-1024 · 2026-05-16T15:49:02Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for vllm-omni in the AIBrix vLLM Dockerfile, adding conditional patching logic and an optional installation step for the vllm-omni package. It also includes a new sample Kubernetes deployment for the Qwen2.5-Omni-7B model. Review feedback points out a critical redundancy in the Dockerfile where the original patch application block was not removed, which could lead to build failures or duplicate patching. Additionally, it is suggested that the sample deployment should be updated to reference the AIBrix-built image rather than the upstream image to ensure that the integrated patches and optimizations are utilized.

gemini-code-assist · 2026-05-16T15:50:30Z

+# apply patch to vLLM (only if patch file exists for this version)
+RUN PATCH_FILE="/tmp/aibrix/python/aibrix_kvcache/integration/vllm/patches/vllm_${VLLM_VERSION}-aibrix-kvcache.patch" && \
+    if [ -f "$PATCH_FILE" ]; then \
+      DIST_DIR=$(${PYTHON_BIN} -m pip show vllm | grep "Location:" | awk '{print $2}') && \
+      cd $DIST_DIR && \
+      patch -p 1 -l -i $PATCH_FILE; \
+    else \
+      echo "No patch file for vLLM ${VLLM_VERSION} — using monkey-patch mode (PR #2060)"; \
+    fi


The new conditional patch logic introduced here is a good improvement for handling missing patch files. However, the original unconditional patch block (lines 57-60) was not removed. This redundancy will cause the build to fail if the patch file is missing (as the second block will still attempt to access it) or result in the patch being applied twice if it exists. Please remove the redundant block at lines 57-60.

gemini-code-assist · 2026-05-16T15:50:30Z

+    spec:
+      containers:
+        - name: vllm-omni
+          image: vllm/vllm-omni:v0.20.0 # Note: Must use vllm-omni image other than vllm-openai image


The deployment uses the upstream vllm/vllm-omni:v0.20.0 image. Since this PR adds vllm-omni support to the AIBrix Dockerfile, the sample should ideally reference the AIBrix-built image (or use a placeholder) to ensure that the AIBrix-specific patches and KV cache optimizations are utilized in the deployment.

Copilot AI review requested due to automatic review settings April 22, 2026 20:33

Copilot started reviewing on behalf of Lucas-Qian6 April 22, 2026 20:33 View session

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread build/container/Dockerfile.vllm Outdated

Comment thread samples/vllm_omni/qwen2_5.yaml Outdated

Comment thread samples/vllm_omni/qwen2_5.yaml Outdated

Copilot AI reviewed Apr 22, 2026

View reviewed changes

feat:Add vLLM-Omni support to Dockerfile.vllm and sample deployment

4075698

Signed-off-by: qianyujun <yq2432@columbia.edu>

Lucas-Qian6 force-pushed the main branch from 64fc90e to 4075698 Compare April 22, 2026 21:22

update omni in aibrix

efcc89b

Signed-off-by: Shen Li <sl6008@columbia.edu>

jasonlee-1024 force-pushed the main branch from 6f2ae8b to efcc89b Compare May 15, 2026 19:57

update container

0b4d2b7

Signed-off-by: Shen Li <sl6008@columbia.edu>

Merge branch 'main' into main

3fa79a2

Signed-off-by: Shen Li <sl6008@columbia.edu>

gemini-code-assist Bot reviewed May 16, 2026

View reviewed changes

		@@ -1,4 +1,5 @@
		ARG VLLM_VERSION=v0.19.0
		ARG VLLM_OMNI_VERSION=0.19.0rc1

	RUN pip install vllm-omni==${VLLM_OMNI_VERSION}
	RUN ${PYTHON_BIN} -m pip install vllm-omni==${VLLM_OMNI_VERSION}

	image: ghcr.io/vllm-project/aibrix/vllm-omni:v0.19.0
	image: ghcr.io/vllm-project/aibrix/vllm-openai:v0.19.0

	\| vllm-omni \|0.19.0rc1\| - \|Activate with `--omni` at runtime\|
	\| vllm-omni \| 0.19.0rc1 \| - \| Activate with `--omni` at runtime \|

		apiVersion: apps/v1
		kind: Deployment

Conversation

Lucas-Qian6 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

Related Issues

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Lucas-Qian6 commented Apr 22, 2026

Uh oh!

Lucas-Qian6 commented Apr 22, 2026

Uh oh!

varungup90 commented Apr 23, 2026

⚠️ Critical Issues

🟠 Moderate Issues

🟢 Minor Issues

Uh oh!

Jeffwan commented Apr 27, 2026

Uh oh!

jasonlee-1024 commented May 15, 2026

Uh oh!

jasonlee-1024 commented May 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Lucas-Qian6 commented Apr 22, 2026 •

edited

Loading