Skip to content

SBR | SBRC lifecycle test (OCP-88734)#28

Open
maximunited wants to merge 2 commits into
medik8s:mainfrom
maximunited:feat/sbr-ocp88734-sbrc-lifecycle
Open

SBR | SBRC lifecycle test (OCP-88734)#28
maximunited wants to merge 2 commits into
medik8s:mainfrom
maximunited:feat/sbr-ocp88734-sbrc-lifecycle

Conversation

@maximunited

@maximunited maximunited commented Jun 21, 2026

Copy link
Copy Markdown

Summary

  • Adds Go/Ginkgo test for StorageBasedRemediationConfig lifecycle (OCP-88734)
  • Verifies SBRC create -> DaemonSet appears and reaches Ready
  • Verifies patch (sbrTimeoutSeconds change) -> DaemonSet rolls out and re-reaches Ready
  • Verifies multi-SBRC isolation: two independent DaemonSets coexist without interfering
  • Verifies delete -> DaemonSet is garbage-collected; sibling SBRC's DaemonSet unaffected

Note: depends on #24 (OCP-88737 remediation CR tests) which adds the waitForSBRCReady helper and SBRCReadyTimeout/SBRAgentDaemonSetPrefix constants used here.

Test plan

  • make vet passes
  • make lint passes
  • /test 4.22-konflux-e2e-sbr-aws-odf

Summary by CodeRabbit

  • Tests
    • Added functional test suite covering StorageBasedRemediation (SBR) lifecycle, including CR finalizer handling and safe cleanup.
    • Added StorageBasedRemediationConfig (SBRC) lifecycle system test with creation, patching, and deletion workflows (including DaemonSet readiness/garbage collection).
    • Improved test infrastructure with shared unstructured CR builders, node schedulability checks, and RWX-capable StorageClass discovery.
    • Added readiness and reconciliation waiting helpers to reduce flaky timing issues.
    • Removed duplicate node schedulability helper in the watchdog test in favor of shared implementation.

@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@openshift-ci

openshift-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maximunited

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested review from slintes and ugreener June 21, 2026 00:07
@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@maximunited, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 37 minutes and 23 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 048777ad-937e-4713-b0d6-feff4b604111

📥 Commits

Reviewing files that changed from the base of the PR and between f30b638 and 583268f.

📒 Files selected for processing (6)
  • tests/sbr-operator/internal/sbrparams/const.go
  • tests/sbr-operator/tests/remediation.go
  • tests/sbr-operator/tests/sbr.go
  • tests/sbr-operator/tests/sbr_helpers.go
  • tests/sbr-operator/tests/sbrc_lifecycle.go
  • tests/sbr-operator/tests/watchdog.go
📝 Walkthrough

Walkthrough

Adds two new Ginkgo functional test suites for the SBR operator: one covering StorageBasedRemediationConfig lifecycle (create, patch, multi-instance coexistence, DaemonSet GC after deletion) and one covering StorageBasedRemediation CR admission, finalizer management, and deletion. Supporting changes add shared constants, an RWX StorageClass discovery helper, a unified unstructured CR builder, a SBRC readiness poller, and move isNodeSchedulable from watchdog.go into the shared sbr.go.

Changes

SBR Operator Functional Tests

Layer / File(s) Summary
Shared constants, helpers, and builder refactor
tests/sbr-operator/internal/sbrparams/const.go, tests/sbr-operator/tests/sbr_helpers.go, tests/sbr-operator/tests/sbr.go, tests/sbr-operator/tests/watchdog.go
Adds exported constants for the SBR finalizer, test resource names, DaemonSet prefix, and timeout durations. Introduces discoverRWXStorageClass(). Moves isNodeSchedulable from watchdog.go to sbr.go (removing the corev1 import from watchdog.go). Refactors buildSBRC to delegate to a new shared buildSBRUnstructured helper. Adds waitForSBRCReady that polls for DaemonSet readiness.
SBRC lifecycle test suite
tests/sbr-operator/tests/sbrc_lifecycle.go
Ordered Ginkgo suite that pre-cleans stale SBRCs and DaemonSets in BeforeAll, then creates two SBRC instances with an RWX StorageClass, patches SBRC A's timeout, validates multi-instance DaemonSet coexistence, and verifies DaemonSet GC after sequential deletion of each SBRC.
StorageBasedRemediation CR functional test suite
tests/sbr-operator/tests/remediation.go
Adds helpers to build, fetch, and force-delete a StorageBasedRemediation CR; detects nodes hosting SBR controller pods. Implements a Ginkgo suite that creates an SBRC, selects a schedulable non-controller worker node, creates a node-targeted remediation CR, asserts the controller adds the expected finalizer, deletes the CR, and waits for full removal after the finalizer is released.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • medik8s/system-tests#9: Extends the same tests/sbr-operator/internal/sbrparams/const.go file with SBR-related exported constants, directly related to the constant additions in this PR.
  • medik8s/system-tests#10: Modifies the same shared SBR test infrastructure files (const.go and sbr.go) that this PR also changes.
  • medik8s/system-tests#19: Modifies the shared SBR operator test code around watchdog.go and sbr.go, particularly related to the isNodeSchedulable helper that this PR relocates.

Suggested labels

lgtm, ok-to-test

Suggested reviewers

  • mshitrit
  • ugreener
  • swgoswam
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'SBR | SBRC lifecycle test (OCP-88734)' directly and clearly reflects the main change: adding a comprehensive SBRC lifecycle test suite.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@qodo-2-for-medik8s

Copy link
Copy Markdown

PR Summary by Qodo

Add SBRC lifecycle functional test for DaemonSet rollout (OCP-88734)
🧪 Tests ✨ Enhancement 🕐 20-40 Minutes

Grey Divider

Description

• Add Ginkgo test covering SBRC create/patch/multi-instance/delete DaemonSet behavior.
• Add remediation CR lifecycle test validating finalizer add/remove and node uncordon.
• Introduce shared helpers/constants for building SBR objects and waiting for agent readiness.
Diagram

graph TD
  T(["Ginkgo functional tests"]) --> API["K8s API"] --> SBRC["SBRC CR"] --> DS["Agent DaemonSet"] --> P["Agent pods"]
  T --> API --> SBR["SBR CR"] --> P
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Stronger DaemonSet rollout assertions
  • ➕ More robust verification of patch-triggered rollout (e.g., check observedGeneration/updated pods match desired)
  • ➕ Reduces false positives where only one pod becomes Ready
  • ➖ More complexity and more cluster-variance sensitivity (especially on small/tainted worker pools)
  • ➖ May increase test runtime/flakiness if cluster is slow
2. Use a generic DaemonSet readiness helper (reusable across suites)
  • ➕ Centralizes rollout/readiness logic; consistent timeouts and error messages
  • ➕ Makes future tests shorter and easier to maintain
  • ➖ Adds abstraction that can hide useful per-test context unless carefully designed

Recommendation: The PR’s approach (asserting DaemonSet existence plus at-least-one-ready-pod, and GC on delete) is a good balance for functional coverage without over-constraining cluster behavior. If this test becomes flaky or misses rollout regressions, consider upgrading the patch step to validate DaemonSet rollout more explicitly (generation/updated scheduling) via a shared helper.

Files changed (5) +576 / -20

Enhancement (1) +48 / -2
sbr.goRefactor unstructured CR builders and add SBRC readiness helper +48/-2

Refactor unstructured CR builders and add SBRC readiness helper

• Moves isNodeSchedulable into shared test helpers, generalizes unstructured CR construction via buildSBRUnstructured, restores buildSBRC as a thin wrapper, and adds waitForSBRCReady to block until the agent DaemonSet has a ready pod.

tests/sbr-operator/tests/sbr.go

Refactor (1) +0 / -18
watchdog.goRemove duplicated isNodeSchedulable helper (now shared) +0/-18

Remove duplicated isNodeSchedulable helper (now shared)

• Deletes the local node schedulability helper after moving it to sbr.go for reuse across multiple functional suites.

tests/sbr-operator/tests/watchdog.go

Tests (3) +528 / -0
const.goAdd SBRC/SBR functional test constants and timeouts +30/-0

Add SBRC/SBR functional test constants and timeouts

• Introduces constants for remediation finalizer validation, functional SBRC names, agent DaemonSet naming prefix, and several test timeouts used by the new lifecycle suites.

tests/sbr-operator/internal/sbrparams/const.go

remediation.goAdd StorageBasedRemediation CR lifecycle functional test (OCP-88737) +289/-0

Add StorageBasedRemediation CR lifecycle functional test (OCP-88737)

• Adds a Ginkgo suite that creates a minimal SBRC to ensure agents run, then validates StorageBasedRemediation CR admission, finalizer addition, deletion/finalizer release, and node uncordon behavior. Includes defensive cleanup that clears finalizers if needed.

tests/sbr-operator/tests/remediation.go

sbrc_lifecycle.goAdd SBRC lifecycle test covering create/patch/multi-instance/delete (OCP-88734) +209/-0

Add SBRC lifecycle test covering create/patch/multi-instance/delete (OCP-88734)

• Adds a presubmit functional test that validates SBRC creation produces a ready agent DaemonSet, patching triggers a rollout that re-reaches readiness, two SBRCs produce independent DaemonSets, and deletion GC removes only the corresponding DaemonSet.

tests/sbr-operator/tests/sbrc_lifecycle.go

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

Grey Divider


Action required

1. No pre-cleanup for SBRC ✓ Resolved 🐞 Bug ☼ Reliability
Description
The remediation CR suite creates a fixed-name StorageBasedRemediationConfig in BeforeAll without
deleting any pre-existing instance, so a leaked SBRC from a prior run will cause AlreadyExists and
fail the suite before any assertions run.
Code

tests/sbr-operator/tests/remediation.go[R114-125]

+		BeforeAll(func() {
+			By("Creating a minimal StorageBasedRemediationConfig so agent pods run and can reconcile SBR CRs")
+
+			setupSBRC = buildSBRC(sbrparams.SBRCFunctionalTestName, map[string]interface{}{})
+
+			createErr := APIClient.Create(context.TODO(), setupSBRC)
+			Expect(createErr).ToNot(HaveOccurred(),
+				"StorageBasedRemediationConfig %q must be created before the remediation CR test",
+				sbrparams.SBRCFunctionalTestName)
+
+			waitForSBRCReady(sbrparams.SBRCFunctionalTestName)
+
Relevance

⭐⭐⭐ High

Team previously accepted pre-test cleanup to avoid leaked SBRC/DaemonSet artifacts and flakes (PR
#11, #10).

PR-#11
PR-#10

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The remediation test unconditionally creates a fixed-name SBRC, but unlike other SBR suites it does
not delete stale CRs first; therefore reruns can fail on AlreadyExists.

tests/sbr-operator/tests/remediation.go[114-125]
tests/sbr-operator/tests/sbr.go[682-703]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`tests/sbr-operator/tests/remediation.go` creates `sbrparams.SBRCFunctionalTestName` in `BeforeAll` but does not remove any existing instance of that CR first. If a previous run leaked the SBRC, `APIClient.Create` will return `AlreadyExists` and the suite fails during setup.

## Issue Context
Other SBR suites already follow a “delete stale CRs in BeforeAll” pattern to make reruns stable.

## Fix Focus Areas
- tests/sbr-operator/tests/remediation.go[114-125]

## Suggested fix
In `BeforeAll`, attempt to delete a stale SBRC with the same name (ignore NotFound), then (optionally) `Eventually` wait for its DaemonSet (`sbr-agent-<name>`) to be gone before calling `Create(...)`. Alternatively, treat `AlreadyExists` as non-fatal and proceed to `waitForSBRCReady(...)`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Rollout not actually asserted 🐞 Bug ≡ Correctness
Description
The SBRC lifecycle test’s “patch -> DaemonSet rolls out” step only asserts that the DaemonSet still
has at least one ready pod, which can remain true even if the patch did not trigger any DaemonSet
update, so the test can miss regressions in patch reconciliation.
Code

tests/sbr-operator/tests/sbrc_lifecycle.go[R108-142]

+				By("Step 2: Patching StorageBasedRemediationConfig A (updating sbrTimeoutSeconds) " +
+					"and verifying the DaemonSet rolls out")
+
+				patchPayload, marshalErr := json.Marshal(map[string]interface{}{
+					"spec": map[string]interface{}{
+						"sbrTimeoutSeconds": sbrparams.SBRCTimeoutSecondsMin + 10,
+					},
+				})
+				Expect(marshalErr).ToNot(HaveOccurred(), "Failed to marshal patch payload for SBRC A")
+
+				patchErr := APIClient.Patch(
+					context.TODO(),
+					sbrcA.DeepCopy(),
+					runtimeclient.RawPatch(types.MergePatchType, patchPayload),
+				)
+				Expect(patchErr).ToNot(HaveOccurred(),
+					"Failed to patch StorageBasedRemediationConfig %q", sbrparams.SBRCLifecycleTestNameA)
+
+				dsNameA := sbrparams.SBRAgentDaemonSetPrefix + sbrparams.SBRCLifecycleTestNameA
+
+				Eventually(func() error {
+					agentDS, getErr := APIClient.DaemonSets(medik8sparams.OperatorNs).Get(
+						context.TODO(), dsNameA, metav1.GetOptions{})
+					if getErr != nil {
+						return fmt.Errorf("DaemonSet %s not found: %w", dsNameA, getErr)
+					}
+
+					if agentDS.Status.NumberReady == 0 {
+						return fmt.Errorf("DaemonSet %s: %d/%d pods ready after patch",
+							dsNameA, agentDS.Status.NumberReady, agentDS.Status.DesiredNumberScheduled)
+					}
+
+					return nil
+				}, sbrparams.SBRCLifecyclePatchedTimeout, sbrparams.DefaultPollInterval).Should(Succeed(),
+					"DaemonSet %s must have at least one ready pod after SBRC A is patched", dsNameA)
Relevance

⭐⭐ Medium

No direct precedent; but team often accepts strengthening test assertions for
reliability/correctness (e.g., PR #11, #13).

PR-#11
PR-#13

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The post-patch Eventually block only checks NumberReady and does not validate any
generation/revision change or updated scheduling, so it cannot prove a rollout occurred.

tests/sbr-operator/tests/sbrc_lifecycle.go[108-142]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
After patching SBRC A, the test claims to verify a DaemonSet rollout, but it only checks `agentDS.Status.NumberReady > 0`. That validates post-patch readiness, not that an update/rollout happened.

## Issue Context
A DaemonSet can stay Ready while being completely unchanged; this would make the test pass even if the controller stopped applying SBRC patches to the DaemonSet.

## Fix Focus Areas
- tests/sbr-operator/tests/sbrc_lifecycle.go[108-142]

## Suggested fix
Before patching, `Get` the DaemonSet and record a rollout indicator (e.g., `ds.Generation`, `ds.Status.ObservedGeneration`, and/or `ds.Status.UpdateRevision` via `apps.ControllerRevisionHashLabelKey` if present). After patching, `Eventually` assert:
- `ds.Generation` increased (or revision/hash changed), AND
- `ds.Status.ObservedGeneration == ds.Generation`, AND
- `ds.Status.UpdatedNumberScheduled == ds.Status.DesiredNumberScheduled` (or at least >0), AND
- `ds.Status.NumberReady == ds.Status.DesiredNumberScheduled` (or whatever threshold you intend).
This turns the step into a real rollout verification rather than a readiness re-check.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Informational

3. DaemonSet baseline parallel flake 🐞 Bug ☼ Reliability
Description
Existing SBR negative tests assert that no new DaemonSet at all appears during a Consistently
window, but this PR introduces additional suites that create agent DaemonSets (via SBRCs), which can
cause nondeterministic failures when specs run concurrently (e.g., Ginkgo parallel execution).
Code

tests/sbr-operator/tests/remediation.go[R114-125]

+		BeforeAll(func() {
+			By("Creating a minimal StorageBasedRemediationConfig so agent pods run and can reconcile SBR CRs")
+
+			setupSBRC = buildSBRC(sbrparams.SBRCFunctionalTestName, map[string]interface{}{})
+
+			createErr := APIClient.Create(context.TODO(), setupSBRC)
+			Expect(createErr).ToNot(HaveOccurred(),
+				"StorageBasedRemediationConfig %q must be created before the remediation CR test",
+				sbrparams.SBRCFunctionalTestName)
+
+			waitForSBRCReady(sbrparams.SBRCFunctionalTestName)
+
Relevance

⭐ Low

Similar “DaemonSet baseline too broad / flake, filter listings” suggestion was explicitly rejected
in PR #10.

PR-#10

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The new suites create SBRCs and wait for the corresponding agent DaemonSet (sbr-agent-<name>) to
become ready, while the existing negative test flags any newly appearing DaemonSet name as a failure
during Consistently, which is vulnerable to concurrent DaemonSet creation.

tests/sbr-operator/tests/remediation.go[114-125]
tests/sbr-operator/tests/sbr.go[654-675]
tests/sbr-operator/tests/sbr.go[747-851]
PR-#10

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The negative test in `tests/sbr-operator/tests/sbr.go` treats *any* DaemonSet not in a baseline snapshot as a failure during `Consistently(...)`. New suites in this PR create legitimate agent DaemonSets (`sbr-agent-<sbrcName>`), which can invalidate that assumption under concurrent spec execution.

## Issue Context
The intent of the negative test is to ensure the invalid SBRC does not create *its own* DaemonSet, not to ensure the entire namespace has zero DaemonSet churn.

## Fix Focus Areas
- tests/sbr-operator/tests/sbr.go[747-851]
- tests/sbr-operator/tests/sbr.go[654-675]

## Suggested fix
Change the Layer-2 Consistently assertion to check specifically for the absence of the DaemonSet that would correspond to the invalid SBRC (e.g. `dsName := sbrparams.SBRAgentDaemonSetPrefix + sbrparams.SBRCControllerTestName` and assert `Get(...)` returns NotFound for the full observation window). This preserves the test’s intent while removing cross-suite interference.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread tests/sbr-operator/tests/remediation.go Outdated
maximunited added a commit to maximunited/medik8s-system-tests that referenced this pull request Jun 21, 2026
…review)

Wait for stale lifecycle SBRCs to be fully deleted (finalizer drain)
before creating new ones, preventing AlreadyExists failures on reruns.
@maximunited

Copy link
Copy Markdown
Author

Fixed: added an Eventually wait after the pre-cleanup delete loop that polls until each SBRC object is gone (i.e. finalizer has fully drained). This prevents AlreadyExists failures when the test is rerun and a prior SBRC is still in Terminating state at BeforeAll time.

@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

maximunited added a commit to maximunited/medik8s-system-tests that referenced this pull request Jun 21, 2026
…nSet (OCP-88737)

Root cause of e2e failure: the SBRC controller creates the agent DaemonSet only
AFTER a storage initialization job completes. Without sharedStorageClass set, no
job is created and no DaemonSet ever appears, so waitForSBRCReady times out.

- Add discoverRWXStorageClass() to sbr.go: reads SBR_STORAGE_CLASS env var or
  auto-discovers a CephFS StorageClass (provisioner contains "cephfs"); Skips
  if none found
- Add SBRStorageClass var to sbrparams for env-var override
- remediation.go BeforeAll: call discoverRWXStorageClass() and pass the result
  as spec.sharedStorageClass when creating the test SBRC
- Also add pre-cleanup (delete stale SBRC + wait IsNotFound) before Create to
  prevent AlreadyExists failures on reruns (Qodo finding from PR medik8s#28)
@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/sbr-operator/tests/remediation.go`:
- Around line 135-142: The APIClient.Get call error handling in the polling loop
is incomplete. Currently, only NotFound errors are explicitly handled to return
nil (indicating successful deletion), but any other error (including actual API
failures) falls through to the "still present" error message, masking real
problems. Fix this by first checking if getErr is not nil, then within that
block check if it's a NotFound error (return nil if so), and otherwise return
the actual getErr itself so real errors are properly propagated instead of being
hidden behind the "stale SBRC still present" message.

In `@tests/sbr-operator/tests/sbr_helpers.go`:
- Around line 29-34: The function currently returns an empty string when the
StorageClasses().List() API call fails, which masks real environment or
permission issues downstream. Instead of returning "" on the error from
APIClient.StorageV1Interface.StorageClasses().List(), fail the test immediately
using Ginkgo's Fail function with the error message to ensure that API/client
errors are properly detected and not silently skipped.

In `@tests/sbr-operator/tests/sbrc_lifecycle.go`:
- Around line 155-169: The test in the Eventually block checks only that
NumberReady is greater than zero, which does not prove the patch caused a
rollout since pods may have already been ready. To properly verify a rollout
occurred, capture the DaemonSet's generation or template hash before applying
the patch, then in the Eventually check, assert that this marker has changed
after the patch and verify that the updated pods with the new configuration are
converging and becoming ready. This ensures the patch actually triggered a
rollout rather than just confirming existing pod readiness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ffc741ce-f70a-4ec9-bb7b-5d18629b54f6

📥 Commits

Reviewing files that changed from the base of the PR and between c4219c1 and 02a3679.

📒 Files selected for processing (6)
  • tests/sbr-operator/internal/sbrparams/const.go
  • tests/sbr-operator/tests/remediation.go
  • tests/sbr-operator/tests/sbr.go
  • tests/sbr-operator/tests/sbr_helpers.go
  • tests/sbr-operator/tests/sbrc_lifecycle.go
  • tests/sbr-operator/tests/watchdog.go
💤 Files with no reviewable changes (1)
  • tests/sbr-operator/tests/watchdog.go

Comment thread tests/sbr-operator/tests/remediation.go Outdated
Comment thread tests/sbr-operator/tests/sbr_helpers.go Outdated
Comment thread tests/sbr-operator/tests/sbrc_lifecycle.go Outdated
maximunited added a commit to maximunited/medik8s-system-tests that referenced this pull request Jun 21, 2026
…tion check (CodeRabbit PR medik8s#28)

- remediation.go: propagate non-NotFound errors in stale SBRC Eventually
  poll instead of treating them as "still present"
- sbr_helpers.go: replace GinkgoWriter.Printf+return with
  Expect(err).ToNot(HaveOccurred()) so API errors fail the test immediately
- sbrc_lifecycle.go: capture DaemonSet ObservedGeneration before the SBRC
  patch; assert it increases in the rollout check to prove a real rollout
  occurred, not just that pods happen to be ready
@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@maximunited maximunited force-pushed the feat/sbr-ocp88734-sbrc-lifecycle branch from f30b638 to 5e7e255 Compare June 21, 2026 09:29
@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@maximunited maximunited force-pushed the feat/sbr-ocp88734-sbrc-lifecycle branch from 5e7e255 to 6f91549 Compare June 21, 2026 11:24
@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@maximunited maximunited force-pushed the feat/sbr-ocp88734-sbrc-lifecycle branch from 6f91549 to 5e81701 Compare June 21, 2026 11:27
@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@maximunited maximunited force-pushed the feat/sbr-ocp88734-sbrc-lifecycle branch from 5e81701 to 1a38169 Compare June 21, 2026 14:00
@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@maximunited maximunited force-pushed the feat/sbr-ocp88734-sbrc-lifecycle branch from 1a38169 to 22901e5 Compare June 21, 2026 15:54
@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@maximunited maximunited force-pushed the feat/sbr-ocp88734-sbrc-lifecycle branch from 22901e5 to 35cc9bf Compare June 21, 2026 21:00
@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

Comment thread tests/sbr-operator/tests/remediation.go Outdated
Comment thread tests/sbr-operator/tests/remediation.go Outdated
Comment thread tests/sbr-operator/tests/sbr_helpers.go Outdated
Comment thread tests/sbr-operator/tests/remediation.go Outdated
Comment thread tests/sbr-operator/tests/watchdog.go Outdated
Comment thread tests/sbr-operator/tests/remediation.go Outdated
Comment thread tests/sbr-operator/tests/sbrc_lifecycle.go Outdated
Comment thread tests/sbr-operator/tests/remediation.go Outdated
Comment thread tests/sbr-operator/tests/sbrc_lifecycle.go Outdated
@maximunited maximunited force-pushed the feat/sbr-ocp88734-sbrc-lifecycle branch from 35cc9bf to a04ee27 Compare June 22, 2026 16:25
@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 22, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@maximunited maximunited requested a review from ugreener June 23, 2026 05:44
return fmt.Errorf("DaemonSet %s not found: %w", dsNameA, getErr)
}

if agentDS.Status.NumberReady == 0 {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check only verifies NumberReady > 0, which Step 1 already established via waitForSBRCReady. The Eventually succeeds on the first poll using pre-patch readiness without proving the patch triggered a rollout.

Fix: Capture the DaemonSet Generation before the patch, then assert rollout completion:

prePatchDS, err := APIClient.DaemonSets(medik8sparams.OperatorNs).Get(
    context.TODO(), dsNameA, metav1.GetOptions{})
Expect(err).ToNot(HaveOccurred())
prePatchGen := prePatchDS.Generation

// ... apply patch ...

Eventually(func() error {
    agentDS, getErr := APIClient.DaemonSets(medik8sparams.OperatorNs).Get(
        context.TODO(), dsNameA, metav1.GetOptions{})
    if getErr != nil {
        return fmt.Errorf("DaemonSet %s not found: %w", dsNameA, getErr)
    }
    if agentDS.Generation <= prePatchGen {
        return fmt.Errorf("DaemonSet %s generation not advanced (current: %d, pre-patch: %d)",
            dsNameA, agentDS.Generation, prePatchGen)
    }
    if agentDS.Status.ObservedGeneration < agentDS.Generation {
        return fmt.Errorf("DaemonSet %s not reconciled (observed: %d, current: %d)",
            dsNameA, agentDS.Status.ObservedGeneration, agentDS.Generation)
    }
    if agentDS.Status.NumberReady == 0 {
        return fmt.Errorf("DaemonSet %s: 0/%d ready after rollout",
            dsNameA, agentDS.Status.DesiredNumberScheduled)
    }
    return nil
}, sbrparams.SBRCLifecyclePatchedTimeout, sbrparams.DefaultPollInterval).Should(Succeed())

Tests SBRC create→DaemonSet ready, sbrTimeoutSeconds patch→rollout,
multi-SBRC isolation, delete→DaemonSet GC. Fixes ugreener review:
AfterAll DaemonSet GC wait, operator readiness check, cephfs-only
discovery, node discovery before SBRC creation, dead empty-string check,
MergePatch uncordon, discoverRWXStorageClass in BeforeAll, naming
consistency, consistent timeout constants.

Polarion: OCP-88734
@maximunited maximunited force-pushed the feat/sbr-ocp88734-sbrc-lifecycle branch from a04ee27 to 7805e4f Compare June 23, 2026 10:42
@maximunited maximunited force-pushed the feat/sbr-ocp88734-sbrc-lifecycle branch from 64fcf90 to 583268f Compare June 23, 2026 10:59
@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown

@maximunited: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/lint 583268f link true /test lint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants