GCP-447: inject token-minter as native sidecar init container#7965
GCP-447: inject token-minter as native sidecar init container#7965openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis change introduces support for Kubernetes native sidecar containers (available in Kubernetes 1.29+) across the control plane operator. It detects native sidecar capability through management cluster version discovery, propagates this flag through the reconciler and control plane context, and uses it in token minter container injection logic to conditionally inject containers as native sidecars with startup probes or as regular sidecar containers. The feature also includes test coverage for various injection scenarios and removes a toleration entry for the GCP cloud controller manager. Sequence Diagram(s)sequenceDiagram
participant main as main.go
participant cap as Capability Detection
participant reconciler as HostedControlPlaneReconciler
participant context as ControlPlaneContext
participant injector as TokenMinterContainerInjector
participant pod as PodSpec
main->>cap: DetectManagementClusterCapabilities(discoveryClient)
cap->>cap: supportsNativeSidecarContainers(k8s >= 1.29)
cap-->>main: CapabilityNativeSidecarContainers flag
main->>reconciler: Initialize with NativeSidecarContainersEnabled flag
reconciler->>context: Create ControlPlaneContext with NativeSidecarContainersEnabled
context->>injector: injectContainer(nativeSidecarsEnabled, podSpec, container, ...)
alt nativeSidecarsEnabled
injector->>pod: Add container to InitContainers with RestartPolicy: Always
injector->>pod: Add StartupProbe (check token file existence)
else not nativeSidecarsEnabled
injector->>pod: Add container to Containers as regular sidecar
end
injector->>pod: Mount volume on first container in PodSpec
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@cristianoveiga: An error was encountered searching for bug GCP-447 on the Jira server at https://issues.redhat.com. No known errors were detected, please see the full error message for details. Full error message.
No response returned: Get "https://issues.redhat.com/rest/api/2/issue/GCP-447": GET https://issues.redhat.com/rest/api/2/issue/GCP-447 giving up after 5 attempt(s)
Please contact an administrator to resolve this issue, then request a bug refresh with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@cristianoveiga: An error was encountered searching for bug GCP-447 on the Jira server at https://issues.redhat.com. No known errors were detected, please see the full error message for details. Full error message.
No response returned: Get "https://issues.redhat.com/rest/api/2/issue/GCP-447": GET https://issues.redhat.com/rest/api/2/issue/GCP-447 giving up after 5 attempt(s)
Please contact an administrator to resolve this issue, then request a bug refresh with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
efc1c40 to
b9226a3
Compare
|
/test ? |
|
/test e2e-aks |
Test Resultse2e-aks
e2e-aws
|
|
/retest |
|
/uncc @bryan-cox |
|
/test ? |
|
/test e2e-gke |
| MetricsSet: r.MetricsSet, | ||
| EnableCIDebugOutput: r.EnableCIDebugOutput, | ||
| ImageMetadataProvider: r.ImageMetadataProvider, | ||
| NativeSidecarContainersEnabled: r.ManagementClusterCapabilities.Has(capabilities.CapabilityNativeSidecarContainers), |
There was a problem hiding this comment.
can we pass this to karpenter-operator/controllers/karpenter/karpenter_controller.go as well
There was a problem hiding this comment.
karpenter_controller.go creates the karpenter component (v2/karpenter/component.go) which doesn't call InjectTokenMinterContainer, so setting NativeSidecarContainersEnabled in its ControlPlaneContext would have no effect.
The karpenter-operator component (v2/karpenteroperator/component.go) does use InjectTokenMinterContainer, but it's reconciled by the HCP controller which already sets the flag.
Happy to add it proactively if you prefer. I just want to confirm since it appears to be a no-op currently.
| revisionHistoryLimit: 2 | ||
| selector: | ||
| matchLabels: | ||
| app: gcp-cloud-controller-manager |
There was a problem hiding this comment.
this will break on any already existing cluster
There was a problem hiding this comment.
We don't have any production GCP clusters at the moment - and we can re-create any clusters in integration, if needed.
There was a problem hiding this comment.
Please note that this reverts the label change in #7926.
This was only needed for the crash toleration exception, which is no longer necessary with the native sidecar fix.
We prefer reverting to keep GCP CCM consistent with how other CCMs are labeled (app: cloud-controller-manager).
|
dropped some feedback. This looks great to me overall. |
Change InjectTokenMinterContainer in the cpov2 framework to inject the token-minter as a native sidecar init container (RestartPolicy=Always) with a StartupProbe that blocks main containers until the token file exists. This eliminates the race condition where the main container starts before the token is written, causing fatal crashes (observed in GCP CCM CI). For backwards compatibility, the management cluster K8s version is detected at startup via DetectManagementClusterCapabilities. On K8s >= 1.29 (where the SidecarContainers feature gate is beta and enabled by default), the native sidecar pattern is used. On older clusters, the current regular sidecar container injection is preserved. OneShot token-minters are injected as regular init containers, which run to completion before main containers start. Ref: https://issues.redhat.com/browse/GCP-447 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/test verify |
|
/retest-required |
|
/test e2e-gke |
1 similar comment
|
/test e2e-gke |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cristianoveiga, enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verfified by @cristianoveiga The e2e-gke validated these changes - the failure is unrelated to this PR. PR #7887 recently added the controlPlaneVersion status check to the e2e validation, which requires all ControlPlaneComponents to report RolloutComplete: True. The cluster-network-operator never completes because cloud-network-config-controller is stuck — the secret cloud-network-config-controller-creds doesn't exist for GCP yet (tracked in PR #7824 / GCP-431). This was previously invisible and is now surfaced by the new check. cc: @apahim |
|
/lgtm |
|
Scheduling tests matching the |
|
/verified by @cristianoveiga |
|
@cristianoveiga: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@cristianoveiga: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
37e3785
into
openshift:main
What this PR does / why we need it:
Changes InjectTokenMinterContainer in the cpov2 framework to inject the token-minter as a https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/ (RestartPolicy: Always) with a StartupProbe that blocks main containers until the token file exists. This eliminates a race condition where the main container starts before the token is written, causing fatal crashes (observed in GCP CCM CI).
For backwards compatibility, the management cluster K8s version is detected at startup via ServerVersion(). On K8s >= 1.29 (where the SidecarContainers feature gate is beta and enabled by default), the native sidecar pattern is used. On older clusters, the current regular sidecar injection is preserved.
Which issue(s) this PR fixes:
Fixes GCP-447
Special notes for your reviewer:
The race condition has only been observed in GCP CI so far, but the underlying issue exists for all providers using InjectTokenMinterContainer and could surface for other cloud providers in the future.
Checklist:
Summary by CodeRabbit
Release Notes
New Features
Changes
Tests