Skip to content

GCP-447: inject token-minter as native sidecar init container#7965

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
cristianoveiga:GCP-447
Mar 27, 2026
Merged

GCP-447: inject token-minter as native sidecar init container#7965
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
cristianoveiga:GCP-447

Conversation

@cristianoveiga
Copy link
Copy Markdown
Contributor

@cristianoveiga cristianoveiga commented Mar 13, 2026

What this PR does / why we need it:

Changes InjectTokenMinterContainer in the cpov2 framework to inject the token-minter as a https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/ (RestartPolicy: Always) with a StartupProbe that blocks main containers until the token file exists. This eliminates a race condition where the main container starts before the token is written, causing fatal crashes (observed in GCP CCM CI).

For backwards compatibility, the management cluster K8s version is detected at startup via ServerVersion(). On K8s >= 1.29 (where the SidecarContainers feature gate is beta and enabled by default), the native sidecar pattern is used. On older clusters, the current regular sidecar injection is preserved.

Which issue(s) this PR fixes:

Fixes GCP-447

Special notes for your reviewer:

The race condition has only been observed in GCP CI so far, but the underlying issue exists for all providers using InjectTokenMinterContainer and could surface for other cloud providers in the future.

  • The second commit removes the GCP CCM crash toleration from podCrashTolerations in the e2e test suite, as it is no longer needed.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for native Kubernetes sidecar containers on management clusters running Kubernetes 1.29+, enabling more efficient container initialization.
    • Enhanced token minter container injection to intelligently adapt between native sidecars, init containers, and regular sidecars based on platform capabilities.
  • Changes

    • Updated GCP cloud controller manager deployment labels for consistency.
  • Tests

    • Added comprehensive test coverage for native sidecar container injection across multiple scenarios and Kubernetes versions.

@openshift-ci-robot
Copy link
Copy Markdown

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 13, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 13, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 13, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: e8564532-a4e8-4649-adcd-4390a41d438d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This change introduces support for Kubernetes native sidecar containers (available in Kubernetes 1.29+) across the control plane operator. It detects native sidecar capability through management cluster version discovery, propagates this flag through the reconciler and control plane context, and uses it in token minter container injection logic to conditionally inject containers as native sidecars with startup probes or as regular sidecar containers. The feature also includes test coverage for various injection scenarios and removes a toleration entry for the GCP cloud controller manager.

Sequence Diagram(s)

sequenceDiagram
    participant main as main.go
    participant cap as Capability Detection
    participant reconciler as HostedControlPlaneReconciler
    participant context as ControlPlaneContext
    participant injector as TokenMinterContainerInjector
    participant pod as PodSpec

    main->>cap: DetectManagementClusterCapabilities(discoveryClient)
    cap->>cap: supportsNativeSidecarContainers(k8s >= 1.29)
    cap-->>main: CapabilityNativeSidecarContainers flag
    
    main->>reconciler: Initialize with NativeSidecarContainersEnabled flag
    
    reconciler->>context: Create ControlPlaneContext with NativeSidecarContainersEnabled
    
    context->>injector: injectContainer(nativeSidecarsEnabled, podSpec, container, ...)
    
    alt nativeSidecarsEnabled
        injector->>pod: Add container to InitContainers with RestartPolicy: Always
        injector->>pod: Add StartupProbe (check token file existence)
    else not nativeSidecarsEnabled
        injector->>pod: Add container to Containers as regular sidecar
    end
    
    injector->>pod: Mount volume on first container in PodSpec
Loading
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot
Copy link
Copy Markdown

@cristianoveiga: An error was encountered searching for bug GCP-447 on the Jira server at https://issues.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. No response returned: Get "https://issues.redhat.com/rest/api/2/issue/GCP-447": GET https://issues.redhat.com/rest/api/2/issue/GCP-447 giving up after 5 attempt(s)

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

What this PR does / why we need it:

Changes InjectTokenMinterContainer in the cpov2 framework to inject the token-minter as a https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/ (RestartPolicy: Always) with a StartupProbe that blocks main containers until the token file exists. This eliminates a race condition where the main container starts before the token is written, causing fatal crashes (observed in GCP CCM CI).

For backwards compatibility, the management cluster K8s version is detected at startup via ServerVersion(). On K8s >= 1.29 (where the SidecarContainers feature gate is beta and enabled by default), the native sidecar pattern is used. On older clusters, the current regular sidecar injection is preserved.

Which issue(s) this PR fixes:

Fixes GCP-447

Special notes for your reviewer:

The race condition has only been observed in GCP CI so far, but the underlying issue exists for all providers using InjectTokenMinterContainer and could surface for other cloud providers in the future.

  • The second commit removes the GCP CCM crash toleration from podCrashTolerations in the e2e test suite, as it is no longer needed.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Mar 13, 2026
@cristianoveiga
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@openshift-ci-robot
Copy link
Copy Markdown

@cristianoveiga: An error was encountered searching for bug GCP-447 on the Jira server at https://issues.redhat.com. No known errors were detected, please see the full error message for details.

Full error message. No response returned: Get "https://issues.redhat.com/rest/api/2/issue/GCP-447": GET https://issues.redhat.com/rest/api/2/issue/GCP-447 giving up after 5 attempt(s)

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

What this PR does / why we need it:

Changes InjectTokenMinterContainer in the cpov2 framework to inject the token-minter as a https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/ (RestartPolicy: Always) with a StartupProbe that blocks main containers until the token file exists. This eliminates a race condition where the main container starts before the token is written, causing fatal crashes (observed in GCP CCM CI).

For backwards compatibility, the management cluster K8s version is detected at startup via ServerVersion(). On K8s >= 1.29 (where the SidecarContainers feature gate is beta and enabled by default), the native sidecar pattern is used. On older clusters, the current regular sidecar injection is preserved.

Which issue(s) this PR fixes:

Fixes GCP-447

Special notes for your reviewer:

The race condition has only been observed in GCP CI so far, but the underlying issue exists for all providers using InjectTokenMinterContainer and could surface for other cloud providers in the future.

  • The second commit removes the GCP CCM crash toleration from podCrashTolerations in the e2e test suite, as it is no longer needed.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • New Features

  • Added automatic detection of native sidecar container support based on management cluster version (Kubernetes 1.29.0 and later).

  • Enabled conditional injection of sidecar containers as native init containers when supported, with appropriate startup probes and volume mounting.

  • Tests

  • Added comprehensive test coverage for native sidecar detection and injection logic across multiple Kubernetes versions and deployment scenarios.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the area/testing Indicates the PR includes changes for e2e testing label Mar 13, 2026
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2026
@cristianoveiga cristianoveiga force-pushed the GCP-447 branch 2 times, most recently from efc1c40 to b9226a3 Compare March 15, 2026 14:53
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 15, 2026
@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/test ?

@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/test e2e-aks
/test e2e-aws

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented Mar 15, 2026

Test Results

e2e-aks

e2e-aws

@cristianoveiga cristianoveiga marked this pull request as ready for review March 16, 2026 20:37
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 16, 2026
@openshift-ci openshift-ci Bot requested review from bryan-cox and csrwng March 16, 2026 20:42
@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/retest

@bryan-cox
Copy link
Copy Markdown
Member

/uncc @bryan-cox

@openshift-ci openshift-ci Bot removed the request for review from bryan-cox March 17, 2026 11:30
@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/test ?

@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/test e2e-gke

MetricsSet: r.MetricsSet,
EnableCIDebugOutput: r.EnableCIDebugOutput,
ImageMetadataProvider: r.ImageMetadataProvider,
NativeSidecarContainersEnabled: r.ManagementClusterCapabilities.Has(capabilities.CapabilityNativeSidecarContainers),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass this to karpenter-operator/controllers/karpenter/karpenter_controller.go as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

karpenter_controller.go creates the karpenter component (v2/karpenter/component.go) which doesn't call InjectTokenMinterContainer, so setting NativeSidecarContainersEnabled in its ControlPlaneContext would have no effect.

The karpenter-operator component (v2/karpenteroperator/component.go) does use InjectTokenMinterContainer, but it's reconciled by the HCP controller which already sets the flag.

Happy to add it proactively if you prefer. I just want to confirm since it appears to be a no-op currently.

revisionHistoryLimit: 2
selector:
matchLabels:
app: gcp-cloud-controller-manager
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will break on any already existing cluster

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any production GCP clusters at the moment - and we can re-create any clusters in integration, if needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that this reverts the label change in #7926.

This was only needed for the crash toleration exception, which is no longer necessary with the native sidecar fix.

We prefer reverting to keep GCP CCM consistent with how other CCMs are labeled (app: cloud-controller-manager).

@enxebre
Copy link
Copy Markdown
Member

enxebre commented Mar 25, 2026

dropped some feedback. This looks great to me overall.

@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Mar 25, 2026
@openshift-ci openshift-ci Bot added area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/azure PR/issue for Azure (AzurePlatform) platform and removed lgtm Indicates that a PR is ready to be merged. labels Mar 25, 2026
@cristianoveiga cristianoveiga requested a review from enxebre March 25, 2026 19:34
Change InjectTokenMinterContainer in the cpov2 framework to inject the
token-minter as a native sidecar init container (RestartPolicy=Always)
with a StartupProbe that blocks main containers until the token file
exists. This eliminates the race condition where the main container
starts before the token is written, causing fatal crashes (observed in
GCP CCM CI).

For backwards compatibility, the management cluster K8s version is
detected at startup via DetectManagementClusterCapabilities. On K8s
>= 1.29 (where the SidecarContainers feature gate is beta and enabled
by default), the native sidecar pattern is used. On older clusters, the
current regular sidecar container injection is preserved.

OneShot token-minters are injected as regular init containers, which
run to completion before main containers start.

Ref: https://issues.redhat.com/browse/GCP-447

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/test verify

@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/retest-required

@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/test e2e-gke

1 similar comment
@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/test e2e-gke

@enxebre
Copy link
Copy Markdown
Member

enxebre commented Mar 26, 2026

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 26, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cristianoveiga, enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 26, 2026
@cristianoveiga
Copy link
Copy Markdown
Contributor Author

cristianoveiga commented Mar 26, 2026

/verfified by @cristianoveiga

The e2e-gke validated these changes - the failure is unrelated to this PR.

PR #7887 recently added the controlPlaneVersion status check to the e2e validation, which requires all ControlPlaneComponents to report RolloutComplete: True.

The cluster-network-operator never completes because cloud-network-config-controller is stuck — the secret cloud-network-config-controller-creds doesn't exist for GCP yet (tracked in PR #7824 / GCP-431).

This was previously invisible and is now surfaced by the new check.

cc: @apahim

@muraee
Copy link
Copy Markdown
Contributor

muraee commented Mar 26, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 26, 2026
@openshift-ci-robot
Copy link
Copy Markdown

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-21
/test e2e-aws-4-21
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/verified by @cristianoveiga

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 26, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@cristianoveiga: This PR has been marked as verified by @cristianoveiga.

Details

In response to this:

/verified by @cristianoveiga

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD f0b245f and 2 for PR HEAD 9dc48ce in total

@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD a12b316 and 1 for PR HEAD 9dc48ce in total

@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD ee66457 and 0 for PR HEAD 9dc48ce in total

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 27, 2026

@cristianoveiga: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-techpreview 3c7f914 link false /test e2e-aws-techpreview
ci/prow/verify 77737f8 link true /test verify
ci/prow/e2e-gke 9dc48ce link false /test e2e-gke

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 37e3785 into openshift:main Mar 27, 2026
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/azure PR/issue for Azure (AzurePlatform) platform area/platform/gcp PR/issue for GCP (GCPPlatform) platform area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants