Skip to content

[CI]Add vLLM-Omni support to Dockerfile.vllm and sample deployment#2129

Open
Lucas-Qian6 wants to merge 4 commits into
vllm-project:mainfrom
Lucas-Qian6:main
Open

[CI]Add vLLM-Omni support to Dockerfile.vllm and sample deployment#2129
Lucas-Qian6 wants to merge 4 commits into
vllm-project:mainfrom
Lucas-Qian6:main

Conversation

@Lucas-Qian6
Copy link
Copy Markdown
Contributor

@Lucas-Qian6 Lucas-Qian6 commented Apr 22, 2026

Pull Request Description

This PR adds vLLM-Omni support to the existing AIBrix vLLM image by installing vllm-omni on top of the base vLLM installation. Since vLLM-Omni is a superset of vLLM and exposes the same OpenAI-compatible API, the existing image remains fully backward compatible, so existing vLLM workloads are unaffected. Omni-modality support is activated at runtime by passing --omni to vllm serve. This enables deploying one Omni model in a single machine across multiple GPUs through AIBrix.

Changes:

  • build/container/Dockerfile.vllm: add VLLM_OMNI_VERSION ARG and pip install vllm-omni step
  • build/container/README.md: document new VLLM_OMNI_VERSION build argument and update image description
  • samples/vllm_omni/qwen2_5.yaml: new sample showing Qwen2.5-Omni deployment with --omni flag

Related Issues

Resolves: Part of #2009

Copilot AI review requested due to automatic review settings April 22, 2026 20:33
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates vLLM-Omni into the AIBrix build process by updating the Dockerfile, README, and adding a sample Kubernetes deployment for Qwen2.5-Omni. Feedback focuses on improving the Dockerfile's pip installation command for consistency and image size, ensuring image naming in samples matches documentation, and fixing a missing newline in the new YAML file.

Comment thread build/container/Dockerfile.vllm Outdated
Comment thread samples/vllm_omni/qwen2_5.yaml Outdated
Comment thread samples/vllm_omni/qwen2_5.yaml Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds vLLM-Omni support to the AIBrix vLLM container build and introduces a Kubernetes sample intended to demonstrate running an Omni-capable model via vllm serve --omni.

Changes:

  • Extend build/container/Dockerfile.vllm with a VLLM_OMNI_VERSION build arg and install vllm-omni.
  • Update build/container/README.md to document the new build argument and note Omni support.
  • Add a new sample deployment YAML for Qwen2.5-Omni.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
build/container/Dockerfile.vllm Adds VLLM_OMNI_VERSION arg and installs vllm-omni in the vLLM runtime image.
build/container/README.md Documents the new build arg and updates the image description/release matrix to include vllm-omni.
samples/vllm_omni/qwen2_5.yaml New Deployment sample showing vllm serve ... --omni for Qwen2.5-Omni.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread build/container/Dockerfile.vllm Outdated
@@ -1,4 +1,5 @@
ARG VLLM_VERSION=v0.19.0
ARG VLLM_OMNI_VERSION=0.19.0rc1
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ARG VLLM_OMNI_VERSION=0.19.0rc1 line has trailing whitespace. Please remove trailing spaces to avoid lint/formatting issues and noisy diffs.

Copilot uses AI. Check for mistakes.
Comment thread build/container/Dockerfile.vllm Outdated

ARG PYTHON_BIN=python3
ARG VLLM_VERSION=v0.19.0
ARG VLLM_OMNI_VERSION=0.19.0rc1
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ARG VLLM_OMNI_VERSION=0.19.0rc1 redeclaration also has trailing whitespace. Please remove the extra spaces so the Dockerfile stays clean and avoids formatting checks failing.

Copilot uses AI. Check for mistakes.
Comment thread build/container/Dockerfile.vllm Outdated
RUN ${PYTHON_BIN} -m pip uninstall -y aibrix_kvcache && \
${PYTHON_BIN} -m pip install /tmp/aibrix/python/aibrix_kvcache/dist/*.whl

RUN pip install vllm-omni==${VLLM_OMNI_VERSION}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PYTHON_BIN is configurable for this stage, but this install uses pip directly (which may point at a different interpreter than ${PYTHON_BIN}). Use ${PYTHON_BIN} -m pip install ... here to ensure the package is installed into the intended Python environment.

Suggested change
RUN pip install vllm-omni==${VLLM_OMNI_VERSION}
RUN ${PYTHON_BIN} -m pip install vllm-omni==${VLLM_OMNI_VERSION}

Copilot uses AI. Check for mistakes.
Comment thread samples/vllm_omni/qwen2_5.yaml Outdated
Comment on lines +1 to +2
apiVersion: apps/v1
kind: Deployment
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR description mentions adding a sample at samples/qwen2-5-omni-deployment.yaml, but the PR actually adds samples/vllm_omni/qwen2_5.yaml. Please align the PR description and/or rename/move the sample so it matches the documented location/name.

Copilot uses AI. Check for mistakes.
Comment thread samples/vllm_omni/qwen2_5.yaml Outdated
Comment on lines +3 to +16
metadata:
name: qwen25omni-7b
spec:
replicas: 1
selector:
matchLabels:
model.aibrix.ai/name: qwen2-5-omni-7b
template:
metadata:
labels:
model.aibrix.ai/name: qwen2-5-omni-7b
model.aibrix.ai/port: "8000"
model.aibrix.ai/engine: vllm
spec:
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most AIBrix Deployment samples also include metadata.labels (model name/port/engine) and a corresponding Service resource so the model is discoverable/routable (and the Service name can match model.aibrix.ai/name). As written, this sample creates only a Deployment with no Service, so it won’t be reachable via a stable in-cluster endpoint. Consider adding the Service (and aligning labels/namespace with other samples) to make the sample runnable end-to-end.

Copilot uses AI. Check for mistakes.
Comment thread samples/vllm_omni/qwen2_5.yaml Outdated
spec:
containers:
- name: vllm-omni
image: ghcr.io/vllm-project/aibrix/vllm-omni:v0.19.0
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample image (ghcr.io/vllm-project/aibrix/vllm-omni:...) doesn’t match the image naming documented in build/container/README.md (which describes aibrix/vllm-openai:... as the enhanced vLLM image). If the intent is to demonstrate the newly updated Dockerfile image, please update the sample to reference the corresponding AIBrix vLLM image/tag (or update the docs to describe/publish the vllm-omni image name).

Suggested change
image: ghcr.io/vllm-project/aibrix/vllm-omni:v0.19.0
image: ghcr.io/vllm-project/aibrix/vllm-openai:v0.19.0

Copilot uses AI. Check for mistakes.
Comment thread build/container/README.md Outdated
| AIBrix KVCache | v0.5.0 | v0.5.0 | KV cache disaggregation support |
| NIXL | 0.7.1 | 0.7.1 | UCX-based RDMA networking |
| UCX | 1.19.0 | 1.19.0 | Pre-installed for debugging |
| vllm-omni |0.19.0rc1| - |Activate with `--omni` at runtime|
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new vllm-omni row in the release-history table is missing spacing around columns (e.g., |0.19.0rc1|), which makes the Markdown table harder to read and can render inconsistently. Please format it consistently with the other rows.

Suggested change
| vllm-omni |0.19.0rc1| - |Activate with `--omni` at runtime|
| vllm-omni | 0.19.0rc1 | - | Activate with `--omni` at runtime |

Copilot uses AI. Check for mistakes.
@Lucas-Qian6
Copy link
Copy Markdown
Contributor Author

@copilot

Signed-off-by: qianyujun <yq2432@columbia.edu>
@Lucas-Qian6
Copy link
Copy Markdown
Contributor Author

@Jeffwan Please take a look at this. I guess this will bring redundancy into the image if user just wants to use vLLM.

@varungup90
Copy link
Copy Markdown
Collaborator

⚠️ Critical Issues

  • Unconditional Installation of vllm-omni
    Currently, vllm-omni is installed by default in every AIBrix vLLM image. This adds unnecessary disk space, potential dependency conflicts, and longer CI build times for users who do not require Omni support. This should be opt-in.

    • Suggested Fix: Use a build-guard with an empty default:
      ARG VLLM_OMNI_VERSION=""
      RUN if [ -n "${VLLM_OMNI_VERSION}" ]; then \
            ${PYTHON_BIN} -m pip install --no-cache-dir vllm-omni==${VLLM_OMNI_VERSION}; \
          fi
  • Pre-release (RC) as Default Version
    VLLM_OMNI_VERSION defaults to 0.19.0rc1. Shipping a release candidate as the default in a production-oriented image is risky. Please pin this to a stable release or explicitly label the PR/image as experimental.

  • Version and Naming Inconsistency
    There is a mismatch between VLLM_VERSION (defaults to v0.19.0 with a v prefix) and VLLM_OMNI_VERSION (defaults to 0.19.0rc1 without the prefix). These should follow a consistent naming convention and, ideally, be kept in lockstep to avoid compatibility issues.


🟠 Moderate Issues

  • Incorrect Image Tag in Sample YAML
    The sample references aibrix/vllm-openai:v0.19.0. This appears to be the upstream tag rather than the AIBrix-extended image. If the user pulls the standard upstream image, the --omni flag will fail because vllm-omni won't be present. Please update this to reference the AIBrix-published tag.

  • Missing Model Volume / Persistence
    The Qwen2.5-Omni-7B sample currently downloads the model from HuggingFace on every pod start. To align with other samples in this repo (which use hostPath or PVC), please add a volume mount or document the requirement for a HUGGING_FACE_HUB_TOKEN.

  • Undocumented --omni Flag
    The --omni flag is not standard in upstream vLLM documentation. To assist operators, please add a comment in the sample YAML or a note in the README explaining what this flag enables and which version of vllm-omni is required.


🟢 Minor Issues

  • README Historical Inconsistency
    The README updates imply that the existing v0.10.2 image includes vLLM-Omni. Since that image was built prior to this PR, it does not contain the package. The documentation should clarify that only images built after this change will include Omni support.

  • Table Formatting
    In the README release history table, the new vllm-omni row has misaligned columns (specifically a blank second column). Please format this to match the surrounding rows for better readability.

@Jeffwan
Copy link
Copy Markdown
Collaborator

Jeffwan commented Apr 27, 2026

please help clean up the commits and I will spend some time reviewing the change tomorrow

Signed-off-by: Shen Li <sl6008@columbia.edu>
Signed-off-by: Shen Li <sl6008@columbia.edu>
@jasonlee-1024
Copy link
Copy Markdown
Contributor

@copilot

Signed-off-by: Shen Li <sl6008@columbia.edu>
@jasonlee-1024
Copy link
Copy Markdown
Contributor

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for vllm-omni in the AIBrix vLLM Dockerfile, adding conditional patching logic and an optional installation step for the vllm-omni package. It also includes a new sample Kubernetes deployment for the Qwen2.5-Omni-7B model. Review feedback points out a critical redundancy in the Dockerfile where the original patch application block was not removed, which could lead to build failures or duplicate patching. Additionally, it is suggested that the sample deployment should be updated to reference the AIBrix-built image rather than the upstream image to ensure that the integrated patches and optimizations are utilized.

Comment on lines +48 to +56
# apply patch to vLLM (only if patch file exists for this version)
RUN PATCH_FILE="/tmp/aibrix/python/aibrix_kvcache/integration/vllm/patches/vllm_${VLLM_VERSION}-aibrix-kvcache.patch" && \
if [ -f "$PATCH_FILE" ]; then \
DIST_DIR=$(${PYTHON_BIN} -m pip show vllm | grep "Location:" | awk '{print $2}') && \
cd $DIST_DIR && \
patch -p 1 -l -i $PATCH_FILE; \
else \
echo "No patch file for vLLM ${VLLM_VERSION} — using monkey-patch mode (PR #2060)"; \
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new conditional patch logic introduced here is a good improvement for handling missing patch files. However, the original unconditional patch block (lines 57-60) was not removed. This redundancy will cause the build to fail if the patch file is missing (as the second block will still attempt to access it) or result in the patch being applied twice if it exists. Please remove the redundant block at lines 57-60.

spec:
containers:
- name: vllm-omni
image: vllm/vllm-omni:v0.20.0 # Note: Must use vllm-omni image other than vllm-openai image
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The deployment uses the upstream vllm/vllm-omni:v0.20.0 image. Since this PR adds vllm-omni support to the AIBrix Dockerfile, the sample should ideally reference the AIBrix-built image (or use a placeholder) to ensure that the AIBrix-specific patches and KV cache optimizations are utilized in the deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants