feat(infiniband): add DOCA OFED support for Ubuntu 22.04 and 24.04#8240
feat(infiniband): add DOCA OFED support for Ubuntu 22.04 and 24.04#8240
Conversation
- Add `InfiniBandSizes` SKU map in `gpu_components.go` for RDMA-capable VM sizes (ND-series A100/H100/H200 with `r`, HPC HB v3/v4, HC) - Add `NeedsInfiniBand` template function in `baker.go` and `NEEDS_INFINIBAND` env var in `cse_cmd.sh` - Add Mellanox DOCA apt repo setup (`updateAptWithMellanoxPkg`), cleanup (`removeMellanoxRepos`), and full dependency tree download (`downloadDocaOfedPackages`) in `cse_install_ubuntu.sh` - Cache all `doca-ofed` `.deb` packages during VHD build for air-gapped installation at CSE provisioning time - Add `installDocaOfedFromCache` to install cached packages at CSE time, unsetting `ARCH` to prevent DKMS postinstall failures - Add `configureInfiniBand` to blacklist `ib_ipoib` kernel module - Add `should_skip_doca_ofed` in `cse_helpers.sh` using IMDS VM tag `SkipDocaOfedInstall` to allow users to opt out of installation - Wire up InfiniBand conditional block in `cse_main.sh` `nodePrep()` with skip-tag support Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds DOCA OFED support for InfiniBand/RDMA-capable Azure SKUs on Ubuntu by caching required packages during VHD build and conditionally installing them during node provisioning.
Changes:
- Add an InfiniBand/RDMA-capable SKU map and a
NeedsInfiniBandtemplate function to drive provisioning behavior. - Add Mellanox DOCA apt repo configuration, package caching, and cached install logic for Ubuntu 22.04/24.04.
- Wire conditional InfiniBand installation into CSE flow with an IMDS tag to skip installation.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| vhdbuilder/packer/install-dependencies.sh | Adds Mellanox repo setup, caches DOCA OFED packages during VHD build, and removes Mellanox repos afterward. |
| pkg/agent/datamodel/gpu_components.go | Introduces InfiniBandSizes and IsInfiniBandSKU helper for RDMA-capable SKUs. |
| pkg/agent/baker.go | Exposes NeedsInfiniBand template function for CSE templating. |
| parts/linux/cloud-init/artifacts/ubuntu/cse_install_ubuntu.sh | Adds Mellanox repo management, DOCA OFED download/install-from-cache, and InfiniBand configuration. |
| parts/linux/cloud-init/artifacts/cse_main.sh | Conditionally installs DOCA OFED at provisioning time with skip-tag support. |
| parts/linux/cloud-init/artifacts/cse_helpers.sh | Adds IMDS-tag-based should_skip_doca_ofed helper. |
| parts/linux/cloud-init/artifacts/cse_cmd.sh | Exposes NEEDS_INFINIBAND env var for CSE runtime. |
| mkdir -p "${downloadDir}" | ||
|
|
||
| # Use apt-cache to resolve the full dependency tree of doca-ofed, | ||
| # then download all packages as .deb files for air-gapped installation at CSE time | ||
| local pkg_list | ||
| pkg_list=$(apt-cache depends --recurse --no-recommends --no-suggests \ | ||
| --no-conflicts --no-breaks --no-replaces --no-enhances \ | ||
| doca-ofed 2>/dev/null | grep "^\w" | sort -u) | ||
|
|
||
| pushd "${downloadDir}" >/dev/null || exit | ||
| for pkg in ${pkg_list}; do | ||
| apt-get download "${pkg}" 2>/dev/null || true | ||
| done | ||
| # Also download doca-ofed meta-package itself | ||
| apt-get download doca-ofed 2>/dev/null || exit $ERR_APT_INSTALL_TIMEOUT | ||
| popd >/dev/null || exit | ||
|
|
There was a problem hiding this comment.
apt-cache dependsoutput generally includes leading indentation before dependency package names, while relation headers likeDepends:start at column 0. Usinggrep "^\w"will typically (a) miss the actual dependency package entries and (b) include non-package tokens likeDepends:/PreDepends:inpkg_list`, resulting in an incomplete cache and subsequent installation failures. Prefer a dependency resolution/download approach that reliably produces real package names (e.g., parse tokens after the colon and strip alternatives, or use an apt-native download-only install that pulls the full dependency closure into a specified cache directory).
| mkdir -p "${downloadDir}" | |
| # Use apt-cache to resolve the full dependency tree of doca-ofed, | |
| # then download all packages as .deb files for air-gapped installation at CSE time | |
| local pkg_list | |
| pkg_list=$(apt-cache depends --recurse --no-recommends --no-suggests \ | |
| --no-conflicts --no-breaks --no-replaces --no-enhances \ | |
| doca-ofed 2>/dev/null | grep "^\w" | sort -u) | |
| pushd "${downloadDir}" >/dev/null || exit | |
| for pkg in ${pkg_list}; do | |
| apt-get download "${pkg}" 2>/dev/null || true | |
| done | |
| # Also download doca-ofed meta-package itself | |
| apt-get download doca-ofed 2>/dev/null || exit $ERR_APT_INSTALL_TIMEOUT | |
| popd >/dev/null || exit | |
| mkdir -p "${downloadDir}/partial" | |
| # Use apt to resolve and download doca-ofed and its full dependency closure | |
| # as .deb files for air-gapped installation at CSE time. | |
| apt-get install --download-only -y --no-install-recommends \ | |
| -o Dir::Cache::archives="${downloadDir}" \ | |
| doca-ofed 2>/dev/null || exit $ERR_APT_INSTALL_TIMEOUT |
| if [ "$cpu_arch" = "amd64" ]; then | ||
| repo_arch="x86_64" | ||
| elif [ "$cpu_arch" = "arm64" ]; then | ||
| repo_arch="aarch64" | ||
| else | ||
| echo "Unknown CPU architecture: ${cpu_arch}" | ||
| return | ||
| fi | ||
|
|
||
| local mellanox_ubuntu_version="" | ||
| if [ "${UBUNTU_RELEASE}" = "22.04" ]; then | ||
| mellanox_ubuntu_version="ubuntu22.04" | ||
| elif [ "${UBUNTU_RELEASE}" = "24.04" ]; then | ||
| mellanox_ubuntu_version="ubuntu24.04" | ||
| else | ||
| echo "Mellanox DOCA repo setup is not supported on Ubuntu ${UBUNTU_RELEASE}" | ||
| return | ||
| fi |
There was a problem hiding this comment.
Both error paths use return without a non-zero status, which will typically return the status of the preceding echo (0). That makes failures silently look successful to callers and can lead to later steps running with no Mellanox repo configured. Return a non-zero status (e.g., return 1) or consistently exit with an appropriate error code, matching how other repo-setup functions enforce failure.`
| DEBIAN_FRONTEND=noninteractive dpkg -i "${downloadDir}"/*.deb 2>&1 || { | ||
| # Fix any broken dependencies using only local packages | ||
| apt-get install -f -y --no-install-recommends 2>&1 || { | ||
| echo "Failed to install DOCA OFED packages" | ||
| if [ -n "${original_arch}" ]; then | ||
| export ARCH="${original_arch}" | ||
| fi | ||
| return 1 | ||
| } | ||
| } |
There was a problem hiding this comment.
apt-get install -fmay attempt to download missing packages from configured apt repositories, which undermines the stated goal of “air-gapped installation” and can cause provisioning failures in restricted networks. Also,dpkg -i "${downloadDir}"/*.debcan run into command-line argument length limits if the dependency set is large. Consider installing in a way that (1) only uses local.deb` artifacts (fail fast if any are missing, without attempting network), and (2) avoids glob-arg limits (e.g., stream filenames to dpkg/apt via xargs/find or use an appropriate local-repo approach).
- Replace `doca-ofed` with `doca-basic` to avoid caching unnecessary packages (`openmpi`, `ibutils2`, `sharp`, `ucx`, `opensm`) - Cache DKMS build dependencies (`libelf-dev`, `libssl-dev`, `flex`, `bison`) and their transitive deps alongside `doca-basic`, since `mlnx-ofed-kernel-dkms` postinstall needs them for kernel module compilation but they are not in `doca-basic`'s dependency tree Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
The generic URL `https://linux.mellanox.com/public/keys/GPG-KEY-Mellanox.pub` returns a 404. Use the per-repo URL that includes the Ubuntu version and architecture instead. Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
- Fix GPG key path from `/etc/apt/keyrings/GPG-KEY-Mellanox.gpg` to `/usr/share/keyrings/mellanox-doca.pub` to match the ASCII-armored key format and avoid apt signature verification failures - Fix GPG key URL from the defunct `/public/keys/` path to the per-repo URL that includes Ubuntu version and architecture - Skip caching packages already installed at the exact same version, reducing cached `.deb` count from ~177 to ~49 while still caching packages that need upgrading (e.g., `libibverbs1` 39.0 → 2601.0) - Replace raw `dpkg -i` with a local apt repo (`dpkg-scanpackages` + `apt-get install`) to handle dependency resolution, upgrade ordering, and `Breaks:` constraints properly - Restrict `apt-get install` to only the local repo so nothing is fetched from the network during provisioning - Update `removeMellanoxRepos` to match the new GPG key path Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Add `/etc/init.d/openibd restart` to `configureInfiniBand()` to unload the old in-tree kernel modules and load the new DKMS-built ones during CSE provisioning, avoiding a node reboot. Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
| pkg_list=$(apt-cache depends --recurse --no-recommends --no-suggests \ | ||
| --no-conflicts --no-breaks --no-replaces --no-enhances \ | ||
| doca-basic 2>/dev/null | grep "^\w" | sort -u) |
There was a problem hiding this comment.
grep "^\w" is not a valid “word char” pattern in standard grep (it matches a literal 'w' unless using PCRE). This will likely produce an incomplete/empty dependency list from apt-cache depends, causing the VHD cache to miss required .debs and breaking the offline install path. Use a POSIX character class (e.g., lines starting with [[:alnum:]]) or another reliable filter for package header lines.
| pushd "${downloadDir}" >/dev/null || exit | ||
| dpkg-scanpackages . /dev/null 2>/dev/null | gzip -9c > Packages.gz | ||
| popd >/dev/null || exit |
There was a problem hiding this comment.
dpkg-scanpackages is provided by dpkg-dev, which is not installed by default in this script’s dependency set. On an InfiniBand node this will fail at provisioning time and prevent DOCA OFED installation. Consider either ensuring dpkg-scanpackages is available (install/cached dpkg-dev) or generating and caching Packages.gz during VHD build (when the cache is created) so CSE doesn’t need this tool.
| # from the network. The -o flags restrict apt to doca-local.list only. | ||
| local apt_local_opts=(-o Dir::Etc::sourcelist="/etc/apt/sources.list.d/doca-local.list" | ||
| -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0") | ||
| apt-get update "${apt_local_opts[@]}" 2>&1 |
There was a problem hiding this comment.
apt-get update for the local file:// repo isn’t checked for failure here. If Packages.gz is missing/corrupt, the subsequent apt-get install error will be harder to diagnose. Capture and handle the return code (and fail fast with a clear error) before proceeding to install.
| apt-get update "${apt_local_opts[@]}" 2>&1 | |
| if ! apt-get update "${apt_local_opts[@]}" 2>&1; then | |
| echo "Failed to update local DOCA OFED apt repository" | |
| rm -f /etc/apt/sources.list.d/doca-local.list | |
| return 1 | |
| fi |
| # Download GPG key | ||
| retrycmd_curl_file 120 5 25 "${mellanox_gpg_keyring_path}" \ | ||
| "https://linux.mellanox.com/public/repo/doca/latest/${mellanox_ubuntu_version}/${repo_arch}/GPG-KEY-Mellanox.pub" \ | ||
| || exit $ERR_APT_UPDATE_TIMEOUT | ||
|
|
||
| # Add repo - use latest DOCA repo | ||
| echo "deb [signed-by=${mellanox_gpg_keyring_path}] https://linux.mellanox.com/public/repo/doca/latest/${mellanox_ubuntu_version}/${repo_arch}/ /" \ | ||
| >"${mellanox_sources_list_path}" | ||
|
|
||
| apt_get_update || exit $ERR_APT_UPDATE_TIMEOUT |
There was a problem hiding this comment.
DOCA packages are pulled from https://linux.mellanox.com/.../doca/latest/... and not tracked in parts/common/components.json (no Mellanox entries). This introduces an unpinned external dependency ("latest") without Renovate/version governance, which can change VHD contents between builds unexpectedly. Consider pinning to a specific DOCA repo/version and/or modeling this dependency in components.json so it’s auditable and updateable via the existing workflow.
| if [ "${skip_doca_ofed_install}" != "true" ]; then | ||
| echo $(date),$(hostname), "Start configuring DOCA OFED for InfiniBand" | ||
| logs_to_events "AKS.CSE.installDocaOfedFromCache" installDocaOfedFromCache || exit $ERR_APT_INSTALL_TIMEOUT | ||
| logs_to_events "AKS.CSE.configureInfiniBand" configureInfiniBand |
There was a problem hiding this comment.
configureInfiniBand failure is not handled here. If blacklisting or openibd restart fails, provisioning will continue but the node may not have a functional RDMA stack. Consider making this step fatal (or at least propagating the error) similar to installDocaOfedFromCache so InfiniBand nodes don’t come up partially configured.
| logs_to_events "AKS.CSE.configureInfiniBand" configureInfiniBand | |
| logs_to_events "AKS.CSE.configureInfiniBand" configureInfiniBand || exit $ERR_APT_INSTALL_TIMEOUT |
InfiniBandSizesSKU map ingpu_components.gofor RDMA-capable VM sizes (ND-series A100/H100/H200 withr, HPC HB v3/v4, HC)NeedsInfiniBandtemplate function inbaker.goandNEEDS_INFINIBANDenv var incse_cmd.shupdateAptWithMellanoxPkg), cleanup (removeMellanoxRepos), and full dependency tree download (downloadDocaOfedPackages) incse_install_ubuntu.shdoca-ofed.debpackages during VHD build for air-gapped installation at CSE provisioning timeinstallDocaOfedFromCacheto install cached packages at CSE time, unsettingARCHto prevent DKMS postinstall failuresconfigureInfiniBandto blacklistib_ipoibkernel moduleshould_skip_doca_ofedincse_helpers.shusing IMDS VM tagSkipDocaOfedInstallto allow users to opt out of installationcse_main.shnodePrep()with skip-tag supportWhat this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #