[wip]OCPKUEUE-668: Add operator-upgrade-test Claude Code skill by anahas-redhat · Pull Request #2043 · openshift/kueue-operator

anahas-redhat · 2026-06-19T19:06:52Z

Summary

Adds a Claude Code skill (.claude/skills/operator-upgrade-test/SKILL.md) that automates end-to-end Kueue operator upgrade testing (Part 2 of OCPKUEUE-668)
Covers both uninstall scenarios: operator-only (Scenario A) and full CR deletion (Scenario B)
Includes config migration checks, API deprecation detection, and automated results document generation
Implemented as a natural language runbook that adapts to version-specific differences (schema changes, new CRD fields, API deprecations) without requiring scripted logic for each case

What the skill does

Gathers all parameters and validates credentials interactively (Phase 0)
Optionally provisions a GCP cluster (Phase 1)
Builds and applies an FBC catalog with source and target bundles (Phase 2)
Installs the source version and creates 11 test workloads across 6 workload types (Phase 3)
Runs Scenario A: operator-only uninstall → upgrade → verify (Phase 4)
Runs Scenario B: full uninstall → upgrade → verify (Phases 5-6)
Checks config migration compatibility (Phase 7)
Generates a results document with PASS/FAIL tables (Phase 8)

Test plan

Ran full upgrade test 1.3.1 → 1.4.0 on OCP 4.22.1
Ran full upgrade test 1.3.0 → 1.4.0 on OCP 4.22.2
Review skill instructions for clarity and completeness

Jira: https://redhat.atlassian.net/browse/OCPKUEUE-668

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Added an automated, end-to-end operator upgrade test runbook with multi-phase provisioning, uninstall/upgrade scenarios, and extensive post-upgrade verification to confirm resource persistence and continued admission.
Documentation
- Added detailed step-by-step runbook guidance covering required inputs, dynamic schema handling, unattended execution flow, optional environment cleanup, and results reporting for PASS/FAIL outcomes.

openshift-ci-robot · 2026-06-19T19:06:55Z

@anahas-redhat: This pull request references OCPKUEUE-668 which is a valid jira issue.

Details

In response to this:

Summary

Adds a Claude Code skill (.claude/skills/operator-upgrade-test/SKILL.md) that automates end-to-end Kueue operator upgrade testing (Part 2 of OCPKUEUE-668)

Covers both uninstall scenarios: operator-only (Scenario A) and full CR deletion (Scenario B)

Includes config migration checks, API deprecation detection, and automated results document generation

Implemented as a natural language runbook that adapts to version-specific differences (schema changes, new CRD fields, API deprecations) without requiring scripted logic for each case

What the skill does

Gathers all parameters and validates credentials interactively (Phase 0)

Optionally provisions a GCP cluster (Phase 1)

Builds and applies an FBC catalog with source and target bundles (Phase 2)

Installs the source version and creates 11 test workloads across 6 workload types (Phase 3)

Runs Scenario A: operator-only uninstall → upgrade → verify (Phase 4)

Runs Scenario B: full uninstall → upgrade → verify (Phases 5-6)

Checks config migration compatibility (Phase 7)

Generates a results document with PASS/FAIL tables (Phase 8)

Test plan

Ran full upgrade test 1.3.1 → 1.4.0 on OCP 4.22.1

Ran full upgrade test 1.3.0 → 1.4.0 on OCP 4.22.2

Review skill instructions for clarity and completeness

Jira: https://redhat.atlassian.net/browse/OCPKUEUE-668

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-06-19T19:07:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a3c8dabf-bceb-409f-b195-0d3178571ce7

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0d234 and 708d292.

📒 Files selected for processing (1)

.claude/skills/operator-upgrade-test/SKILL.md

Walkthrough

Adds .claude/skills/operator-upgrade-test/SKILL.md, a new AI-agent skill defining a nine-phase scripted runbook for end-to-end Kueue operator upgrade testing on OpenShift. It covers two upgrade scenarios (operator-only vs. full uninstall with CR deletion), dynamic schema introspection, GCP cluster provisioning, FBC catalog build, workload survival verification, schema migration checks, and results document generation.

Changes

Operator Upgrade Test Skill

Layer / File(s)	Summary
Phase 0: interactive input collection and schema introspection `.claude/skills/operator-upgrade-test/SKILL.md`	Defines skill metadata, two uninstall scenario types, unattended execution model, and the full interactive Phase 0 flow: collects cluster/OCP version, GCP parameters, SSH/pull secret, source/target Kueue versions, bundle images, FBC catalog destination, credential checks, and performs dynamic Kueue CR schema introspection via `oc explain`.
Phase 1–2: cluster provisioning and FBC catalog build `.claude/skills/operator-upgrade-test/SKILL.md`	Phase 1 downloads and verifies `openshift-install`, generates `install-config.yaml`, creates the GCP-backed OCP cluster, and exports KUBECONFIG. Phase 2 clones kueue-fbc, updates catalog templates with source/target bundle images, generates/builds/pushes the catalog image, applies CatalogSource and optional ImageDigestMirrorSet, and waits for catalog readiness.
Phase 3–6: operator installation, baseline, and upgrade scenarios `.claude/skills/operator-upgrade-test/SKILL.md`	Phase 3 installs the source operator, provisions cert-manager/JobSet/LWS prerequisites, and dynamically builds/applies the Kueue CR with retry; creates test namespaces, ClusterQueues, LocalQueues, ResourceFlavors, and workloads and records the admission baseline. Phase 4 (Scenario A) performs operator-only uninstall then target reinstall and verifies workload and queue survival. Phase 5 cleans up and replays Phase 3. Phase 6 (Scenario B) deletes the Kueue CR before uninstalling, reinstalls using target-version schema, and mirrors Scenario A verification.
Phase 7–8: schema migration validation and results document `.claude/skills/operator-upgrade-test/SKILL.md`	Phase 7 compares source vs. target API schemas via dry-run server-side apply and performs API deprecation detection. Phase 8 generates a structured results markdown file with per-scenario PASS/FAIL tables and configuration migration findings under the work directory.
Phase 9 cleanup, error handling, and prerequisites `.claude/skills/operator-upgrade-test/SKILL.md`	Defines optional interactive Phase 9 cluster teardown via `openshift-install destroy cluster`, global error-handling rules (log to results doc, mark FAIL, continue when possible), and the full list of tool/dependency prerequisites validated during Phase 0.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 15

✅ Passed checks (15 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly references the main change: adding a Claude Code skill for operator-upgrade-test automation (OCPKUEUE-668), which matches the file addition and PR objectives.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All 156 Ginkgo test names in the PR are static and deterministic with no dynamic content like pod names, timestamps, UUIDs, or generated identifiers.
Test Structure And Quality	✅ Passed	The PR adds a markdown skill document, not Ginkgo test code. The custom check explicitly requires reviewing "Ginkgo test code" with patterns like BeforeEach/AfterEach/It blocks, which do not apply...
Microshift Test Compatibility	✅ Passed	This PR adds a Claude Code skill file (.claude/skills/operator-upgrade-test/SKILL.md), which is a 848-line Markdown runbook for Kueue operator upgrade testing. No Ginkgo e2e tests (It(), Describe()...
Single Node Openshift (Sno) Test Compatibility	✅ Passed	This PR adds only a markdown skill file (.claude/skills/operator-upgrade-test/SKILL.md), not Ginkgo e2e tests. The check for SNO compatibility applies only to new Go test files with Ginkgo patterns...
Topology-Aware Scheduling Compatibility	✅ Passed	The PR adds a Claude Code skill file (.claude/skills/operator-upgrade-test/SKILL.md) which is a natural language runbook for testing Kueue operator upgrades, not operator code or deployment manifes...
Ote Binary Stdout Contract	✅ Passed	PR adds only a Claude Code skill (SKILL.md) markdown runbook with embedded bash examples, not OTE binary source code or test binaries. Check only applies to compiled OTE binaries communicating with...
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	PR adds a Claude Code skill (SKILL.md markdown file), not Ginkgo e2e tests. Check is for Ginkgo e2e tests only; therefore not applicable to this PR.
No-Weak-Crypto	✅ Passed	No weak cryptography detected. File uses SHA256 checksums for integrity verification, HTTPS for all downloads, and explicitly filters credentials from logs.
Container-Privileges	✅ Passed	The PR adds a single markdown skill file (.claude/skills/operator-upgrade-test/SKILL.md) containing example Kubernetes manifests. None of the manifests contain privileged: true, hostPID, hostNetwor...
No-Sensitive-Data-In-Logs	✅ Passed	The SKILL.md file demonstrates strong awareness of credential protection. Line 350 explicitly instructs to "Filter out lines containing `pullSecret`, `sshKey`, or `password` to avoid exposing crede...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-06-19T19:07:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: anahas-redhat
Once this PR has been reviewed and has the lgtm label, please assign sohankunkerkar for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

anahas-redhat · 2026-06-19T19:10:27Z

Hey @sohankunkerkar @kannon92 — PR for a Claude Code skill that automates operator upgrade testing. Would love your feedback on the approach.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

.claude/skills/operator-upgrade-test/SKILL.md (2)
206-210: 💤 Low value

Add markdown language specifier to fenced code block (MD040).

Line 207 starts a fenced code block without a language specifier. While this renders correctly, it prevents syntax highlighting and linter validation.
📝 Proposed fix
+```bash
=== Kueue Upgrade Test Configuration ===
This enables syntax highlighting and bash-specific linting.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/operator-upgrade-test/SKILL.md around lines 206 - 210, The
fenced code block starting with the "=== Kueue Upgrade Test Configuration ==="
content is missing a language specifier after the opening triple backticks. Add
"bash" as the language specifier immediately after the opening to enable syntax highlighting and bash-specific linting validation. This means changing the opening fence from to ```bash.
</details>



_Source: Linters/SAST tools_

---

`1-50`: _⚡ Quick win_

**Verify pull secret and SSH key handling is secure in all phases.**

The skill collects pull secrets and SSH keys from files and embeds them in install-config.yaml and API objects. Ensure that:

1. install-config.yaml file is not committed or persisted longer than the cluster creation
2. No credentials appear in logs or error messages from Phases 1-8
3. Pull secret and SSH key values are not printed in results document (Phase 8)

Add a note in Phase 9 (cluster cleanup) to remove the install-config.yaml file and any temporary credential files.





<details>
<summary>🔒 Suggested addition to Phase 9 (cluster cleanup)</summary>

```bash
# Clean up sensitive files
rm -f <work-dir>/<cluster-name>/install-config.yaml
rm -f <work-dir>/<cluster-name>/install-config.yaml.bak
rm -f /tmp/openshift-install-*.tar.gz*
```

Also ensure the results document (Phase 8) does not include:
- Pull secret contents
- SSH key contents
- GCP service account key paths
- Any other credential material
```

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/operator-upgrade-test/SKILL.md around lines 1 - 50, The skill
must ensure that sensitive credentials (pull secrets, SSH keys, and GCP service
account keys) are not exposed in logs, error messages, or the results document.
Add explicit cleanup instructions to Phase 9 to remove the install-config.yaml
file and any temporary credential files created during cluster creation. In
Phase 8 (results document generation), explicitly filter out and document which
credential values must not be included. Additionally, review all log statements
and error handling in Phases 1-8 to ensure that when install-config.yaml or
credential variables are referenced, their actual contents are never printed or
logged—only their file paths or variable names should appear in output.
```

</details>



_Source: Coding guidelines_

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/skills/operator-upgrade-test/SKILL.md:

Around line 477-498: Replace the three hardcoded /releases/latest/download/
URLs for cert-manager (line 479), JobSet (line 488), and LeaderWorkerSet (line

with pinned version numbers instead of "latest". Define version variables
at the top of the document for cert-manager, jobset, and lws, then use those
variables in the oc apply commands. Additionally, add checksum verification
after downloading each manifest by comparing against published checksums, or
consider including the pinned manifests directly in the skill repository to
eliminate the need for remote downloads entirely. Document the version pinning
strategy at the beginning of Phase 0 or Phase 3 to explain how these specific
versions are selected and maintained.

Around line 1-100: The bundle image discovery in Phase 0e clones the kueue-fbc
repository and queries the quay.io API without pinning git revisions or
validating responses, creating a supply chain risk. Pin the git clone operation
to a specific commit hash or version tag instead of using --depth 1 without a
ref, and add response validation for the quay.io API queries by comparing
against a known-good manifest or checksum before using the discovered bundle
image references. This ensures that even if the remote repository or API is
compromised, the test will detect the tampering and fail safely rather than
deploying a malicious bundle.

Around line 261-270: The openshift-install binary download section lacks
checksum verification, creating a security vulnerability where a compromised or
intercepted binary could be executed without validation. Add a step to download
the sha256sum.txt file from the same mirror location as the
openshift-install-mac-arm64-.tar.gz file, then verify the
downloaded tar.gz file against the published checksum using sha256sum before
extracting and executing it. The verification should be performed immediately
after the curl download and before the tar extraction step, ensuring the binary
integrity is confirmed before any execution occurs.

Around line 140-155: The IAM role assignment in the service account
auto-creation block grants 9 roles including iam.securityAdmin and
iam.roleAdmin, which exceed the least-privilege requirement for OpenShift
installer. Remove the unnecessary IAM roles from the for loop and retain only
compute.admin, dns.admin, compute.loadBalancerAdmin, and storage.admin.
Additionally, add a flag variable (such as CREATED_SA) to track whether the
service account was auto-created, then reference this flag in Phase 9 to add
cleanup logic that deletes the service account and its keys using gcloud iam
service-accounts delete when cluster deletion occurs.

Nitpick comments:
In @.claude/skills/operator-upgrade-test/SKILL.md:

Around line 206-210: The fenced code block starting with the "=== Kueue
Upgrade Test Configuration ===" content is missing a language specifier after
the opening triple backticks. Add "bash" as the language specifier immediately
after the opening to enable syntax highlighting and bash-specific linting validation. This means changing the opening fence from to ```bash.

Around line 1-50: The skill must ensure that sensitive credentials (pull
secrets, SSH keys, and GCP service account keys) are not exposed in logs, error
messages, or the results document. Add explicit cleanup instructions to Phase 9
to remove the install-config.yaml file and any temporary credential files
created during cluster creation. In Phase 8 (results document generation),
explicitly filter out and document which credential values must not be included.
Additionally, review all log statements and error handling in Phases 1-8 to
ensure that when install-config.yaml or credential variables are referenced,
their actual contents are never printed or logged—only their file paths or
variable names should appear in output.
</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ]  Push a commit to this branch (recommended)
- [ ]  Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Repository YAML (base), Central YAML (inherited)

**Review profile**: CHILL

**Plan**: Enterprise

**Run ID**: `6afaee1d-3b71-4517-8d6a-f43cda5119ee`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between fbfdf061d0b6fa1356191dd1d1062629ce315847 and 1b0d23420efb4b8a662658b0575c49e1481f712c.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `.claude/skills/operator-upgrade-test/SKILL.md`

</details>

</details>

Automates end-to-end Kueue operator upgrade testing (Part 2 of OCPKUEUE-668). Covers both uninstall scenarios (operator-only and full CR deletion), config migration checks, and results document generation. Implemented as a Claude Code skill — a natural language runbook that adapts to version-specific differences (schema changes, new fields, API deprecations) without requiring scripted logic for each case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci · 2026-06-19T22:26:24Z

@anahas-redhat: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 19, 2026

anahas-redhat changed the title ~~OCPKUEUE-668: Add operator-upgrade-test Claude Code skill~~ [wip]OCPKUEUE-668: Add operator-upgrade-test Claude Code skill Jun 19, 2026

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 19, 2026

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread .claude/skills/operator-upgrade-test/SKILL.md

Comment thread .claude/skills/operator-upgrade-test/SKILL.md

Comment thread .claude/skills/operator-upgrade-test/SKILL.md

Comment thread .claude/skills/operator-upgrade-test/SKILL.md

anahas-redhat force-pushed the add-kueue-upgrade-test-skill branch from a7e8e3f to 708d292 Compare June 19, 2026 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip]OCPKUEUE-668: Add operator-upgrade-test Claude Code skill#2043

[wip]OCPKUEUE-668: Add operator-upgrade-test Claude Code skill#2043
anahas-redhat wants to merge 1 commit into
openshift:mainfrom
anahas-redhat:add-kueue-upgrade-test-skill

anahas-redhat commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

openshift-ci-robot commented Jun 19, 2026 •

edited by openshift-ci Bot

Loading

Summary

What the skill does

Test plan

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited by openshift-ci Bot

Loading

Uh oh!

openshift-ci Bot commented Jun 19, 2026

Uh oh!

anahas-redhat commented Jun 19, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openshift-ci Bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

anahas-redhat commented Jun 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the skill does

Test plan

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Jun 19, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What the skill does

Test plan

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-ci Bot commented Jun 19, 2026

Uh oh!

anahas-redhat commented Jun 19, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openshift-ci Bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anahas-redhat commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Jun 19, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited by openshift-ci Bot

Loading