Skip to content

[wip]OCPKUEUE-668: Add operator-upgrade-test Claude Code skill#2043

Open
anahas-redhat wants to merge 1 commit into
openshift:mainfrom
anahas-redhat:add-kueue-upgrade-test-skill
Open

[wip]OCPKUEUE-668: Add operator-upgrade-test Claude Code skill#2043
anahas-redhat wants to merge 1 commit into
openshift:mainfrom
anahas-redhat:add-kueue-upgrade-test-skill

Conversation

@anahas-redhat

@anahas-redhat anahas-redhat commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds a Claude Code skill (.claude/skills/operator-upgrade-test/SKILL.md) that automates end-to-end Kueue operator upgrade testing (Part 2 of OCPKUEUE-668)
  • Covers both uninstall scenarios: operator-only (Scenario A) and full CR deletion (Scenario B)
  • Includes config migration checks, API deprecation detection, and automated results document generation
  • Implemented as a natural language runbook that adapts to version-specific differences (schema changes, new CRD fields, API deprecations) without requiring scripted logic for each case

What the skill does

  1. Gathers all parameters and validates credentials interactively (Phase 0)
  2. Optionally provisions a GCP cluster (Phase 1)
  3. Builds and applies an FBC catalog with source and target bundles (Phase 2)
  4. Installs the source version and creates 11 test workloads across 6 workload types (Phase 3)
  5. Runs Scenario A: operator-only uninstall → upgrade → verify (Phase 4)
  6. Runs Scenario B: full uninstall → upgrade → verify (Phases 5-6)
  7. Checks config migration compatibility (Phase 7)
  8. Generates a results document with PASS/FAIL tables (Phase 8)

Test plan

  • Ran full upgrade test 1.3.1 → 1.4.0 on OCP 4.22.1
  • Ran full upgrade test 1.3.0 → 1.4.0 on OCP 4.22.2
  • Review skill instructions for clarity and completeness

Jira: https://redhat.atlassian.net/browse/OCPKUEUE-668

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests

    • Added an automated, end-to-end operator upgrade test runbook with multi-phase provisioning, uninstall/upgrade scenarios, and extensive post-upgrade verification to confirm resource persistence and continued admission.
  • Documentation

    • Added detailed step-by-step runbook guidance covering required inputs, dynamic schema handling, unattended execution flow, optional environment cleanup, and results reporting for PASS/FAIL outcomes.

@openshift-ci-robot

openshift-ci-robot commented Jun 19, 2026

Copy link
Copy Markdown

@anahas-redhat: This pull request references OCPKUEUE-668 which is a valid jira issue.

Details

In response to this:

Summary

  • Adds a Claude Code skill (.claude/skills/operator-upgrade-test/SKILL.md) that automates end-to-end Kueue operator upgrade testing (Part 2 of OCPKUEUE-668)
  • Covers both uninstall scenarios: operator-only (Scenario A) and full CR deletion (Scenario B)
  • Includes config migration checks, API deprecation detection, and automated results document generation
  • Implemented as a natural language runbook that adapts to version-specific differences (schema changes, new CRD fields, API deprecations) without requiring scripted logic for each case

What the skill does

  1. Gathers all parameters and validates credentials interactively (Phase 0)
  2. Optionally provisions a GCP cluster (Phase 1)
  3. Builds and applies an FBC catalog with source and target bundles (Phase 2)
  4. Installs the source version and creates 11 test workloads across 6 workload types (Phase 3)
  5. Runs Scenario A: operator-only uninstall → upgrade → verify (Phase 4)
  6. Runs Scenario B: full uninstall → upgrade → verify (Phases 5-6)
  7. Checks config migration compatibility (Phase 7)
  8. Generates a results document with PASS/FAIL tables (Phase 8)

Test plan

  • Ran full upgrade test 1.3.1 → 1.4.0 on OCP 4.22.1
  • Ran full upgrade test 1.3.0 → 1.4.0 on OCP 4.22.2
  • Review skill instructions for clarity and completeness

Jira: https://redhat.atlassian.net/browse/OCPKUEUE-668

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 19, 2026
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a3c8dabf-bceb-409f-b195-0d3178571ce7

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0d234 and 708d292.

📒 Files selected for processing (1)
  • .claude/skills/operator-upgrade-test/SKILL.md

Walkthrough

Adds .claude/skills/operator-upgrade-test/SKILL.md, a new AI-agent skill defining a nine-phase scripted runbook for end-to-end Kueue operator upgrade testing on OpenShift. It covers two upgrade scenarios (operator-only vs. full uninstall with CR deletion), dynamic schema introspection, GCP cluster provisioning, FBC catalog build, workload survival verification, schema migration checks, and results document generation.

Changes

Operator Upgrade Test Skill

Layer / File(s) Summary
Phase 0: interactive input collection and schema introspection
.claude/skills/operator-upgrade-test/SKILL.md
Defines skill metadata, two uninstall scenario types, unattended execution model, and the full interactive Phase 0 flow: collects cluster/OCP version, GCP parameters, SSH/pull secret, source/target Kueue versions, bundle images, FBC catalog destination, credential checks, and performs dynamic Kueue CR schema introspection via oc explain.
Phase 1–2: cluster provisioning and FBC catalog build
.claude/skills/operator-upgrade-test/SKILL.md
Phase 1 downloads and verifies openshift-install, generates install-config.yaml, creates the GCP-backed OCP cluster, and exports KUBECONFIG. Phase 2 clones kueue-fbc, updates catalog templates with source/target bundle images, generates/builds/pushes the catalog image, applies CatalogSource and optional ImageDigestMirrorSet, and waits for catalog readiness.
Phase 3–6: operator installation, baseline, and upgrade scenarios
.claude/skills/operator-upgrade-test/SKILL.md
Phase 3 installs the source operator, provisions cert-manager/JobSet/LWS prerequisites, and dynamically builds/applies the Kueue CR with retry; creates test namespaces, ClusterQueues, LocalQueues, ResourceFlavors, and workloads and records the admission baseline. Phase 4 (Scenario A) performs operator-only uninstall then target reinstall and verifies workload and queue survival. Phase 5 cleans up and replays Phase 3. Phase 6 (Scenario B) deletes the Kueue CR before uninstalling, reinstalls using target-version schema, and mirrors Scenario A verification.
Phase 7–8: schema migration validation and results document
.claude/skills/operator-upgrade-test/SKILL.md
Phase 7 compares source vs. target API schemas via dry-run server-side apply and performs API deprecation detection. Phase 8 generates a structured results markdown file with per-scenario PASS/FAIL tables and configuration migration findings under the work directory.
Phase 9 cleanup, error handling, and prerequisites
.claude/skills/operator-upgrade-test/SKILL.md
Defines optional interactive Phase 9 cluster teardown via openshift-install destroy cluster, global error-handling rules (log to results doc, mark FAIL, continue when possible), and the full list of tool/dependency prerequisites validated during Phase 0.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly references the main change: adding a Claude Code skill for operator-upgrade-test automation (OCPKUEUE-668), which matches the file addition and PR objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 156 Ginkgo test names in the PR are static and deterministic with no dynamic content like pod names, timestamps, UUIDs, or generated identifiers.
Test Structure And Quality ✅ Passed The PR adds a markdown skill document, not Ginkgo test code. The custom check explicitly requires reviewing "Ginkgo test code" with patterns like BeforeEach/AfterEach/It blocks, which do not apply...
Microshift Test Compatibility ✅ Passed This PR adds a Claude Code skill file (.claude/skills/operator-upgrade-test/SKILL.md), which is a 848-line Markdown runbook for Kueue operator upgrade testing. No Ginkgo e2e tests (It(), Describe()...
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR adds only a markdown skill file (.claude/skills/operator-upgrade-test/SKILL.md), not Ginkgo e2e tests. The check for SNO compatibility applies only to new Go test files with Ginkgo patterns...
Topology-Aware Scheduling Compatibility ✅ Passed The PR adds a Claude Code skill file (.claude/skills/operator-upgrade-test/SKILL.md) which is a natural language runbook for testing Kueue operator upgrades, not operator code or deployment manifes...
Ote Binary Stdout Contract ✅ Passed PR adds only a Claude Code skill (SKILL.md) markdown runbook with embedded bash examples, not OTE binary source code or test binaries. Check only applies to compiled OTE binaries communicating with...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds a Claude Code skill (SKILL.md markdown file), not Ginkgo e2e tests. Check is for Ginkgo e2e tests only; therefore not applicable to this PR.
No-Weak-Crypto ✅ Passed No weak cryptography detected. File uses SHA256 checksums for integrity verification, HTTPS for all downloads, and explicitly filters credentials from logs.
Container-Privileges ✅ Passed The PR adds a single markdown skill file (.claude/skills/operator-upgrade-test/SKILL.md) containing example Kubernetes manifests. None of the manifests contain privileged: true, hostPID, hostNetwor...
No-Sensitive-Data-In-Logs ✅ Passed The SKILL.md file demonstrates strong awareness of credential protection. Line 350 explicitly instructs to "Filter out lines containing pullSecret, sshKey, or password to avoid exposing crede...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: anahas-redhat
Once this PR has been reviewed and has the lgtm label, please assign sohankunkerkar for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@anahas-redhat anahas-redhat changed the title OCPKUEUE-668: Add operator-upgrade-test Claude Code skill [wip]OCPKUEUE-668: Add operator-upgrade-test Claude Code skill Jun 19, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 19, 2026
@anahas-redhat

Copy link
Copy Markdown
Contributor Author

Hey @sohankunkerkar @kannon92 — PR for a Claude Code skill that automates operator upgrade testing. Would love your feedback on the approach.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
.claude/skills/operator-upgrade-test/SKILL.md (2)

206-210: 💤 Low value

Add markdown language specifier to fenced code block (MD040).

Line 207 starts a fenced code block without a language specifier. While this renders correctly, it prevents syntax highlighting and linter validation.

📝 Proposed fix

+```bash
=== Kueue Upgrade Test Configuration ===


This enables syntax highlighting and bash-specific linting.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/operator-upgrade-test/SKILL.md around lines 206 - 210, The
fenced code block starting with the "=== Kueue Upgrade Test Configuration ==="
content is missing a language specifier after the opening triple backticks. Add
"bash" as the language specifier immediately after the opening to enable syntax highlighting and bash-specific linting validation. This means changing the opening fence from to ```bash.


</details>

<!-- cr-comment:v1:624bb03863cea42f25620591 -->

_Source: Linters/SAST tools_

---

`1-50`: _⚡ Quick win_

**Verify pull secret and SSH key handling is secure in all phases.**

The skill collects pull secrets and SSH keys from files and embeds them in install-config.yaml and API objects. Ensure that:

1. install-config.yaml file is not committed or persisted longer than the cluster creation
2. No credentials appear in logs or error messages from Phases 1-8
3. Pull secret and SSH key values are not printed in results document (Phase 8)

Add a note in Phase 9 (cluster cleanup) to remove the install-config.yaml file and any temporary credential files.





<details>
<summary>🔒 Suggested addition to Phase 9 (cluster cleanup)</summary>

```bash
# Clean up sensitive files
rm -f <work-dir>/<cluster-name>/install-config.yaml
rm -f <work-dir>/<cluster-name>/install-config.yaml.bak
rm -f /tmp/openshift-install-*.tar.gz*
```

Also ensure the results document (Phase 8) does not include:
- Pull secret contents
- SSH key contents
- GCP service account key paths
- Any other credential material
```

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/operator-upgrade-test/SKILL.md around lines 1 - 50, The skill
must ensure that sensitive credentials (pull secrets, SSH keys, and GCP service
account keys) are not exposed in logs, error messages, or the results document.
Add explicit cleanup instructions to Phase 9 to remove the install-config.yaml
file and any temporary credential files created during cluster creation. In
Phase 8 (results document generation), explicitly filter out and document which
credential values must not be included. Additionally, review all log statements
and error handling in Phases 1-8 to ensure that when install-config.yaml or
credential variables are referenced, their actual contents are never printed or
logged—only their file paths or variable names should appear in output.
```

</details>

<!-- cr-comment:v1:f9613ab14fca752abee56726 -->

_Source: Coding guidelines_

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/skills/operator-upgrade-test/SKILL.md:

  • Around line 477-498: Replace the three hardcoded /releases/latest/download/
    URLs for cert-manager (line 479), JobSet (line 488), and LeaderWorkerSet (line
  1. with pinned version numbers instead of "latest". Define version variables
    at the top of the document for cert-manager, jobset, and lws, then use those
    variables in the oc apply commands. Additionally, add checksum verification
    after downloading each manifest by comparing against published checksums, or
    consider including the pinned manifests directly in the skill repository to
    eliminate the need for remote downloads entirely. Document the version pinning
    strategy at the beginning of Phase 0 or Phase 3 to explain how these specific
    versions are selected and maintained.
  • Around line 1-100: The bundle image discovery in Phase 0e clones the kueue-fbc
    repository and queries the quay.io API without pinning git revisions or
    validating responses, creating a supply chain risk. Pin the git clone operation
    to a specific commit hash or version tag instead of using --depth 1 without a
    ref, and add response validation for the quay.io API queries by comparing
    against a known-good manifest or checksum before using the discovered bundle
    image references. This ensures that even if the remote repository or API is
    compromised, the test will detect the tampering and fail safely rather than
    deploying a malicious bundle.
  • Around line 261-270: The openshift-install binary download section lacks
    checksum verification, creating a security vulnerability where a compromised or
    intercepted binary could be executed without validation. Add a step to download
    the sha256sum.txt file from the same mirror location as the
    openshift-install-mac-arm64-.tar.gz file, then verify the
    downloaded tar.gz file against the published checksum using sha256sum before
    extracting and executing it. The verification should be performed immediately
    after the curl download and before the tar extraction step, ensuring the binary
    integrity is confirmed before any execution occurs.
  • Around line 140-155: The IAM role assignment in the service account
    auto-creation block grants 9 roles including iam.securityAdmin and
    iam.roleAdmin, which exceed the least-privilege requirement for OpenShift
    installer. Remove the unnecessary IAM roles from the for loop and retain only
    compute.admin, dns.admin, compute.loadBalancerAdmin, and storage.admin.
    Additionally, add a flag variable (such as CREATED_SA) to track whether the
    service account was auto-created, then reference this flag in Phase 9 to add
    cleanup logic that deletes the service account and its keys using gcloud iam
    service-accounts delete when cluster deletion occurs.

Nitpick comments:
In @.claude/skills/operator-upgrade-test/SKILL.md:

  • Around line 206-210: The fenced code block starting with the "=== Kueue
    Upgrade Test Configuration ===" content is missing a language specifier after
    the opening triple backticks. Add "bash" as the language specifier immediately
    after the opening to enable syntax highlighting and bash-specific linting validation. This means changing the opening fence from to ```bash.
  • Around line 1-50: The skill must ensure that sensitive credentials (pull
    secrets, SSH keys, and GCP service account keys) are not exposed in logs, error
    messages, or the results document. Add explicit cleanup instructions to Phase 9
    to remove the install-config.yaml file and any temporary credential files
    created during cluster creation. In Phase 8 (results document generation),
    explicitly filter out and document which credential values must not be included.
    Additionally, review all log statements and error handling in Phases 1-8 to
    ensure that when install-config.yaml or credential variables are referenced,
    their actual contents are never printed or logged—only their file paths or
    variable names should appear in output.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Repository YAML (base), Central YAML (inherited)

**Review profile**: CHILL

**Plan**: Enterprise

**Run ID**: `6afaee1d-3b71-4517-8d6a-f43cda5119ee`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between fbfdf061d0b6fa1356191dd1d1062629ce315847 and 1b0d23420efb4b8a662658b0575c49e1481f712c.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `.claude/skills/operator-upgrade-test/SKILL.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread .claude/skills/operator-upgrade-test/SKILL.md
Comment thread .claude/skills/operator-upgrade-test/SKILL.md
Comment thread .claude/skills/operator-upgrade-test/SKILL.md
Comment thread .claude/skills/operator-upgrade-test/SKILL.md
Automates end-to-end Kueue operator upgrade testing (Part 2 of
OCPKUEUE-668). Covers both uninstall scenarios (operator-only and
full CR deletion), config migration checks, and results document
generation.

Implemented as a Claude Code skill — a natural language runbook that
adapts to version-specific differences (schema changes, new fields,
API deprecations) without requiring scripted logic for each case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@anahas-redhat anahas-redhat force-pushed the add-kueue-upgrade-test-skill branch from a7e8e3f to 708d292 Compare June 19, 2026 20:00
@openshift-ci

openshift-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

@anahas-redhat: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants