Skip to content

fix(hypershift/gcp): handle missing hosted-cluster-name in deprovision#76930

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
cristianoveiga:fix/deprovision-missing-hosted-cluster-name
Mar 27, 2026
Merged

fix(hypershift/gcp): handle missing hosted-cluster-name in deprovision#76930
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
cristianoveiga:fix/deprovision-missing-hosted-cluster-name

Conversation

@cristianoveiga
Copy link
Copy Markdown
Contributor

@cristianoveiga cristianoveiga commented Mar 26, 2026

Summary

  • Follow-up to GCP-484. When a job is aborted after gke-provision but before hosted-cluster-setup, the deprovision script crashes reading the missing hosted-cluster-name file (set -euo pipefail), preventing all cleanup — GKE cluster, HC project, CP project, DNS records, and WIF bindings are all orphaned.
  • Adds a file-existence guard for hosted-cluster-name so deprovision can still clean up GCP projects and GKE clusters when the job is aborted before hosted-cluster-setup runs. DNS cleanup is skipped in this case since no hosted clusters were created and no DNS records exist.

Test plan

  • Verify deprovision completes successfully when hosted-cluster-name is missing (abort scenario)
  • Verify deprovision still cleans up DNS records when hosted-cluster-name exists (normal scenario)

🤖 Generated with Claude Code

Follow-up to GCP-484. When a job is aborted after gke-provision but
before hosted-cluster-setup, the deprovision script crashes reading
the missing hosted-cluster-name file, leaving GCP projects orphaned.

Make hosted-cluster-name optional and skip DNS cleanup when it is
absent — no hosted clusters means no DNS records to clean up.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 26, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 26, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 26, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@cristianoveiga: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-hypershift-main-e2e-gke openshift/hypershift presubmit Registry content changed
pull-ci-openshift-hypershift-release-5.0-e2e-gke openshift/hypershift presubmit Registry content changed
pull-ci-openshift-hypershift-release-4.23-e2e-gke openshift/hypershift presubmit Registry content changed
pull-ci-openshift-hypershift-release-4.22-e2e-gke openshift/hypershift presubmit Registry content changed

Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@cristianoveiga cristianoveiga marked this pull request as ready for review March 26, 2026 19:27
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 26, 2026
@openshift-ci openshift-ci Bot requested review from cblecker and jimdaga March 26, 2026 19:27
@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/pj-rehearse pull-ci-openshift-hypershift-main-e2e-gke

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@cristianoveiga: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@cristianoveiga
Copy link
Copy Markdown
Contributor Author

cristianoveiga commented Mar 27, 2026

Note: The e2e-gke rehearsal is expected to fail — this is a known, pre-existing issue unrelated to this PR.

Root cause: controlPlaneVersion stays Partial because cluster-network-operator can't complete its rollout. The cloud-network-config-controller pod is stuck in PodInitializing due to a missing secret (cloud-network-config-controller-creds). This secret is created for AWS/Azure/OpenStack but not yet for GCP.

Why it fails now: hypershift#7887 added a controlPlaneVersion status check that requires all components to reach RolloutComplete: True. Before that PR, the missing secret was invisible to the test.

Fix: hypershift#7824 (GCP-431: Add CNCC support for GCP WIF) adds the missing secret for GCP but is not yet merged. The e2e-gke job will fail until that lands.

@patjlm
Copy link
Copy Markdown
Contributor

patjlm commented Mar 27, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 27, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 27, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cristianoveiga, patjlm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cristianoveiga
Copy link
Copy Markdown
Contributor Author

/pj-rehearse ack

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@cristianoveiga: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Mar 27, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 27, 2026

@cristianoveiga: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/openshift/hypershift/main/e2e-gke 0debb3c link unknown /pj-rehearse pull-ci-openshift-hypershift-main-e2e-gke

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 3979293 into openshift:main Mar 27, 2026
9 of 10 checks passed
memodi pushed a commit to memodi/release that referenced this pull request Mar 27, 2026
openshift#76930)

Follow-up to GCP-484. When a job is aborted after gke-provision but
before hosted-cluster-setup, the deprovision script crashes reading
the missing hosted-cluster-name file, leaving GCP projects orphaned.

Make hosted-cluster-name optional and skip DNS cleanup when it is
absent — no hosted clusters means no DNS records to clean up.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
mgencur pushed a commit to mgencur/release that referenced this pull request Mar 30, 2026
openshift#76930)

Follow-up to GCP-484. When a job is aborted after gke-provision but
before hosted-cluster-setup, the deprovision script crashes reading
the missing hosted-cluster-name file, leaving GCP projects orphaned.

Make hosted-cluster-name optional and skip DNS cleanup when it is
absent — no hosted clusters means no DNS records to clean up.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants