NVIDIA-554: DPU-host mode: use ConfigMap for OVN feature enablement instead of per-node script gating#2944
NVIDIA-554: DPU-host mode: use ConfigMap for OVN feature enablement instead of per-node script gating#2944tsorya wants to merge 2 commits intoopenshift:masterfrom
Conversation
tsorya
commented
Mar 20, 2026
|
@tsorya: This pull request references NVIDIA-554 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughRemoved per-node feature gating via Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Masterminds/semver@v1.5.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Masterminds/sprig/v3@v3.2.3: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/containernetworking/cni@v0.8.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ghodss/yaml@v1.0.1-0.20190212211648-25d852aebe32: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-bindata/go-bindata@v3.1.2+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/onsi/gomega@v1.39.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/ope ... [truncated 17356 characters] ... ired in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/gengo/v2@v2.0.0-20251215205346-5ee0d033ba5b: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kms@v0.35.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tk8s.io/kube-aggregator@v0.35.1: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v6@v6.3.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n" Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tsorya The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bindata/network/ovn-kubernetes/common/008-script-lib.yaml (1)
701-724:⚠️ Potential issue | 🟠 MajorMulticast should be moved to
004-config.yamlper documented design.The documentation states that "Feature enablement (egress IP, multicast, multi-network, network segmentation, admin network policy, etc.) is managed through the cluster-wide ConfigMap (
004-config.yaml)", but--enable-multicastis still hardcoded as a CLI flag at line 723. The004-config.yamltemplates contain dozens ofenable-*settings (e.g.,enable-egress-ip,enable-multi-network,enable-admin-network-policy) butenable-multicastis absent. Either move multicast configuration into the config file to align with the documented design and other features, or update the documentation to clarify why multicast remains CLI-driven.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindata/network/ovn-kubernetes/common/008-script-lib.yaml` around lines 701 - 724, The script currently hardcodes the CLI flag --enable-multicast in the ovnkube exec line; move multicast feature toggling into the cluster-wide ConfigMap like other features in 004-config.yaml. Add a ConfigMap/template key (e.g., enable-multicast) to 004-config.yaml, expose it as a rendered variable (e.g., multicast_enable_flag) alongside the existing flags (similar to network_observability_enabled_flag), then replace the literal --enable-multicast in the exec invocation with the variable ${multicast_enable_flag} (or remove it when false). Alternatively, if you intend multicast to stay CLI-driven, update the docs to explicitly state why --enable-multicast remains a hardcoded flag instead of being managed via 004-config.yaml.
🧹 Nitpick comments (1)
pkg/network/ovn_kubernetes_test.go (1)
296-306: Add one DPU-host regression case.These updated goldens prove the new
[ovnkubernetesfeature]contents, but they still don't lock in the behavior this PR is changing:dpu-hostshould stop rewriting feature flags while still changing gateway/interface selection,--ovnkube-node-mode, andinit_ovnkube_controller. A small render test aroundovnkube-lib.shwould make that contract much harder to regress.Also applies to: 342-352, 401-412, 463-474, 525-535, 586-596, 636-646, 689-699, 735-745, 782-793, 831-841, 877-887, 925-935, 972-982
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/network/ovn_kubernetes_test.go` around lines 296 - 306, Add a single regression test case in pkg/network/ovn_kubernetes_test.go that renders ovnkube-lib.sh for the dpu-host node mode: feed the test input containing the full [ovnkubernetesfeature] block shown in the diff, set node mode to "dpu-host", and assert that the output preserves the entire [ovnkubernetesfeature] block exactly (no feature flags rewritten) while still updating only the gateway/interface selection, the --ovnkube-node-mode value to "dpu-host", and the init_ovnkube_controller behavior; place the case alongside the existing render tests in the file that exercise ovnkube-lib.sh so it will fail if dpu-host rewrites feature flags.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@bindata/network/ovn-kubernetes/common/008-script-lib.yaml`:
- Around line 701-724: The script currently hardcodes the CLI flag
--enable-multicast in the ovnkube exec line; move multicast feature toggling
into the cluster-wide ConfigMap like other features in 004-config.yaml. Add a
ConfigMap/template key (e.g., enable-multicast) to 004-config.yaml, expose it as
a rendered variable (e.g., multicast_enable_flag) alongside the existing flags
(similar to network_observability_enabled_flag), then replace the literal
--enable-multicast in the exec invocation with the variable
${multicast_enable_flag} (or remove it when false). Alternatively, if you intend
multicast to stay CLI-driven, update the docs to explicitly state why
--enable-multicast remains a hardcoded flag instead of being managed via
004-config.yaml.
---
Nitpick comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 296-306: Add a single regression test case in
pkg/network/ovn_kubernetes_test.go that renders ovnkube-lib.sh for the dpu-host
node mode: feed the test input containing the full [ovnkubernetesfeature] block
shown in the diff, set node mode to "dpu-host", and assert that the output
preserves the entire [ovnkubernetesfeature] block exactly (no feature flags
rewritten) while still updating only the gateway/interface selection, the
--ovnkube-node-mode value to "dpu-host", and the init_ovnkube_controller
behavior; place the case alongside the existing render tests in the file that
exercise ovnkube-lib.sh so it will fail if dpu-host rewrites feature flags.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5c1994eb-63ed-421c-ab98-8900f240f6ff
📒 Files selected for processing (10)
README.mdbindata/network/ovn-kubernetes/common/008-script-lib.yamlbindata/network/ovn-kubernetes/managed/004-config.yamlbindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yamlbindata/network/ovn-kubernetes/self-hosted/004-config.yamlbindata/network/ovn-kubernetes/self-hosted/ovnkube-control-plane.yamldocs/architecture.mddocs/operands.mddocs/ovn_node_mode.mdpkg/network/ovn_kubernetes_test.go
💤 Files with no reviewable changes (3)
- README.md
- docs/architecture.md
- docs/operands.md
|
/retest-required |
|
/test ? |
|
oc exec -it -n openshift-ovn-kubernetes ovnkube-node-bpkjk -- bash [root@dhcp-8-231-235 ~]# ovn-nbctl find ACL | grep -i mcast match : "(ip4.mcast || mldv1 || mldv2 || (ip6.dst[120..127] == 0xff && ip6.dst[116] == 1))" match : "(ip4.mcast || mldv1 || mldv2 || (ip6.dst[120..127] == 0xff && ip6.dst[116] == 1))" match : "outport == @a15696136357465712812 && (igmp || (ip4.src == $a14513626322132345176 && ip4.mcast))" match : "outport == @a4743249366342378346 && (ip4.mcast || mldv1 || mldv2 || (ip6.dst[120..127] == 0xff && ip6.dst[116] == 1))" match : "inport == @a4743249366342378346 && (ip4.mcast || mldv1 || mldv2 || (ip6.dst[120..127] == 0xff && ip6.dst[116] == 1))" match : "inport == @a15696136357465712812 && ip4.mcast" 10.130.0.3 : unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 0.290/0.404/1.026/0.222 10.130.0.3 : multicast, xmt/rcv/%loss = 10/9/9% (seq>=2 0%), min/avg/max/std-dev = 0.307/0.522/1.115/0.319 10.128.0.30 : unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 0.314/0.449/0.717/0.140 10.128.0.30 : multicast, xmt/rcv/%loss = 10/9/9% (seq>=2 0%), min/avg/max/std-dev = 0.285/0.445/0.823/0.155 10.128.0.30 : unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 0.307/0.488/0.991/0.218 10.128.0.30 : multicast, xmt/rcv/%loss = 10/9/9% (seq>=2 0%), min/avg/max/std-dev = 0.313/0.499/1.212/0.273 10.129.0.7 : unicast, xmt/rcv/%loss = 10/10/0%, min/avg/max/std-dev = 0.307/0.462/1.057/0.252 10.129.0.7 : multicast, xmt/rcv/%loss = 10/9/9% (seq>=2 0%), min/avg/max/std-dev = 0.320/0.501/1.513/0.382 |
|
/retest |
…nstead of per-node script gating
Feature flags (egress IP, multicast, multi-network, network
segmentation, admin network policy, multi-external-gateway, etc.)
are managed via the cluster-wide ConfigMap (004-config.yaml) passed
to ovnkube through --config-file. They do not need per-node gating
in the startup script.
OVN_NODE_MODE remains used only for DPU-host structural differences:
gateway interface, ovnkube-node-mode flag, and init-controller.
Also re-applies the feature gate cleanup from f5b8490 (removal of
OVN_ADMIN_NETWORK_POLICY_ENABLE template references) and removes
redundant CLI flags from 008-script-lib.yaml that duplicate what
the ConfigMap already provides.
Co-authored-by: Cursor <cursoragent@cursor.com>
Made-with: Cursor
This reverts commit 6f5697c.
Remove enable-multicast=true from ovnkube config maps and pass it directly as --enable-multicast on the ovnkube CLI for node and control plane processes (both self-hosted and managed). Made-with: Cursor
f58272e to
7f928af
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/network/ovn_kubernetes_test.go`:
- Around line 298-308: Update the two failing tests to match the ConfigMap-based
migration: in TestOVNKubernetesControlPlaneFlags and
TestOVNKubernetesScriptLibCombined remove assertions that expect the removed CLI
flags and script-lib variables (the old enable-* CLI flags and corresponding
script-lib vars) and instead assert the presence/contents of the new
ovnkube.conf configmap golden (the block shown at lines 298–308). Also either
implement or remove references to the missing helpers
renderControlPlaneWithOverrides and renderScriptLibWithOverrides used by those
tests—if keeping them, implement helpers that render using the new ConfigMap
migration semantics and return the rendered ovnkube.conf/script-lib output;
otherwise delete calls to them and inline the appropriate rendering/assertion
using existing render helpers. Ensure all references are to the test names
TestOVNKubernetesControlPlaneFlags, TestOVNKubernetesScriptLibCombined and the
helper names renderControlPlaneWithOverrides/renderScriptLibWithOverrides so
reviewers can locate the edits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d16edd6a-c511-46cc-ada2-5d68285bf3f3
📒 Files selected for processing (10)
README.mdbindata/network/ovn-kubernetes/common/008-script-lib.yamlbindata/network/ovn-kubernetes/managed/004-config.yamlbindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yamlbindata/network/ovn-kubernetes/self-hosted/004-config.yamlbindata/network/ovn-kubernetes/self-hosted/ovnkube-control-plane.yamldocs/architecture.mddocs/operands.mddocs/ovn_node_mode.mdpkg/network/ovn_kubernetes_test.go
💤 Files with no reviewable changes (3)
- docs/architecture.md
- docs/operands.md
- README.md
✅ Files skipped from review due to trivial changes (3)
- bindata/network/ovn-kubernetes/managed/004-config.yaml
- docs/ovn_node_mode.md
- bindata/network/ovn-kubernetes/common/008-script-lib.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- bindata/network/ovn-kubernetes/self-hosted/004-config.yaml
| [ovnkubernetesfeature] | ||
| enable-egress-ip=true | ||
| enable-egress-firewall=true | ||
| enable-egress-qos=true | ||
| enable-egress-service=true | ||
| egressip-node-healthcheck-port=9107 | ||
| enable-multi-network=true | ||
| enable-network-segmentation=true | ||
| enable-preconfigured-udn-addresses=true | ||
| enable-admin-network-policy=true | ||
| enable-multi-external-gateway=true |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf 'Helper definitions:\n'
rg -n --type=go '\bfunc\s+(renderControlPlaneWithOverrides|renderScriptLibWithOverrides)\s*\(' || true
printf '\nOld CLI/script-lib assertions in pkg/network/ovn_kubernetes_test.go:\n'
rg -n -C2 --fixed-strings \
-e '--enable-egress-ip=true' \
-e 'egress_features_enable_flag=' \
pkg/network/ovn_kubernetes_test.goRepository: openshift/cluster-network-operator
Length of output: 1827
Complete the test migration: remove stale CLI and script-lib assertions.
The new ovnkube.conf goldens at lines 298–308 reflect the ConfigMap migration, but TestOVNKubernetesControlPlaneFlags (lines 4435–4506) and TestOVNKubernetesScriptLibCombined (lines 4548–4575) still assert removed CLI flags and script-lib variables that no longer apply. Additionally, helper functions renderControlPlaneWithOverrides and renderScriptLibWithOverrides referenced at lines 4514 and 4529 are not defined in the codebase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/network/ovn_kubernetes_test.go` around lines 298 - 308, Update the two
failing tests to match the ConfigMap-based migration: in
TestOVNKubernetesControlPlaneFlags and TestOVNKubernetesScriptLibCombined remove
assertions that expect the removed CLI flags and script-lib variables (the old
enable-* CLI flags and corresponding script-lib vars) and instead assert the
presence/contents of the new ovnkube.conf configmap golden (the block shown at
lines 298–308). Also either implement or remove references to the missing
helpers renderControlPlaneWithOverrides and renderScriptLibWithOverrides used by
those tests—if keeping them, implement helpers that render using the new
ConfigMap migration semantics and return the rendered ovnkube.conf/script-lib
output; otherwise delete calls to them and inline the appropriate
rendering/assertion using existing render helpers. Ensure all references are to
the test names TestOVNKubernetesControlPlaneFlags,
TestOVNKubernetesScriptLibCombined and the helper names
renderControlPlaneWithOverrides/renderScriptLibWithOverrides so reviewers can
locate the edits.
|
@tsorya: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |