OCPBUGS-90545: Fix leaked s3 buckets by ipi-aws-post-disconnected#80814
Conversation
openshift-e2e-aws-disconnected calls ipi-aws-pre-disconnected and ipi-aws-post-disconnected. ipi-aws-pre-disconnected provisions an s3 bucket for the bastionhost, but ipi-aws-post-disconnected does not call aws-deprovision-s3buckets to clean it up, so it leaks. Subsequent runs of the job against the same PR re-use a namespace. As the buckets use the namespace in the name, this also means the second run in a PR will fail due to the name collision. This change fixes the leak by adding the missing cleanup step. It also tolerates a pre-existing bucket owned by us and re-uses it, cleaning it up after the run.
|
@mdbooth: This pull request references Jira Issue OCPBUGS-90545, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
WalkthroughThe bastion-host provisioning script gains a conditional S3 bucket check ( ChangesS3 Bucket Lifecycle
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@mdbooth: This pull request references Jira Issue OCPBUGS-90545, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[REHEARSALNOTIFIER]
A total of 1303 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse pull-ci-openshift-cluster-capi-operator-main-e2e-aws-capi-disconnected-techpreview |
|
@mdbooth: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse |
|
@mdbooth: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
Confirmed by checking the logs that s3 bucket creation and cleanup worked correctly in all the above jobs. The DR tests failed for unrelated reasons I'm not going in to. The capi disconnected job still seems to have a problem, but it started before the s3 problem. I'm continuing to work on it, but I think we should merge this now anyway. /pj-rehearse ack |
|
@mdbooth: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse pull-ci-openshift-cluster-capi-operator-main-e2e-aws-capi-disconnected-techpreview |
|
@mdbooth: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/approve |
|
/assign patrickdillon |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mdbooth, miyadav, patrickdillon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@mdbooth: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
9a8ee7f
into
openshift:main
|
@mdbooth: Jira Issue OCPBUGS-90545: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-90545 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
openshift-e2e-aws-disconnected calls ipi-aws-pre-disconnected and ipi-aws-post-disconnected. ipi-aws-pre-disconnected provisions an s3 bucket for the bastionhost, but ipi-aws-post-disconnected does not call aws-deprovision-s3buckets to clean it up, so it leaks. Subsequent runs of the job against the same PR re-use a namespace. As the buckets use the namespace in the name, this also means the second run in a PR will fail due to the name collision. This change fixes the leak by adding the missing cleanup step. It also tolerates a pre-existing bucket owned by us and re-uses it, cleaning it up after the run.
openshift-e2e-aws-disconnected calls ipi-aws-pre-disconnected and ipi-aws-post-disconnected. ipi-aws-pre-disconnected provisions an s3 bucket for the bastionhost, but ipi-aws-post-disconnected does not call aws-deprovision-s3buckets to clean it up, so it leaks.
Subsequent runs of the job against the same PR re-use a namespace. As the buckets use the namespace in the name, this also means the second run in a PR will fail due to the name collision.
This change fixes the leak by adding the missing cleanup step. It also tolerates a pre-existing bucket owned by us and re-uses it, cleaning it up after the run.
Summary by CodeRabbit
This pull request fixes a resource leak in the AWS disconnected CI/CD workflow for OpenShift cluster testing. The issue occurs when the
openshift-e2e-aws-disconnectedjob runs multiple times in the same namespace—S3 buckets created during the pre-disconnected setup phase were never being cleaned up, causing name collisions on subsequent test runs.Changes made:
S3 bucket reuse and tolerance (
aws-provision-bastionhost-commands.sh):aws s3api head-bucketbefore attempting to create a new oneAdded S3 bucket deprovisioning step (
ipi-aws-post-disconnected-chain.yaml):aws-deprovision-s3bucketsstep to the post-disconnected cleanup chainThe fix addresses both immediate symptom relief (allowing bucket reuse) and root cause elimination (ensuring proper cleanup), preventing the accumulation of orphaned S3 resources in AWS accounts across multiple test runs.