SBR | SBRC lifecycle test (OCP-88734)#28
Conversation
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: maximunited The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Warning Review limit reached
More reviews will be available in 37 minutes and 23 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughAdds two new Ginkgo functional test suites for the SBR operator: one covering ChangesSBR Operator Functional Tests
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
PR-Agent: could not find a component named |
PR Summary by QodoAdd SBRC lifecycle functional test for DaemonSet rollout (OCP-88734) Description
Diagram
High-Level Assessment
Files changed (5)
|
Code Review by Qodo
1.
|
…review) Wait for stale lifecycle SBRCs to be fully deleted (finalizer drain) before creating new ones, preventing AlreadyExists failures on reruns.
|
Fixed: added an |
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
…nSet (OCP-88737) Root cause of e2e failure: the SBRC controller creates the agent DaemonSet only AFTER a storage initialization job completes. Without sharedStorageClass set, no job is created and no DaemonSet ever appears, so waitForSBRCReady times out. - Add discoverRWXStorageClass() to sbr.go: reads SBR_STORAGE_CLASS env var or auto-discovers a CephFS StorageClass (provisioner contains "cephfs"); Skips if none found - Add SBRStorageClass var to sbrparams for env-var override - remediation.go BeforeAll: call discoverRWXStorageClass() and pass the result as spec.sharedStorageClass when creating the test SBRC - Also add pre-cleanup (delete stale SBRC + wait IsNotFound) before Create to prevent AlreadyExists failures on reruns (Qodo finding from PR medik8s#28)
|
/test 4.22-konflux-e2e-sbr-aws-odf |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/sbr-operator/tests/remediation.go`:
- Around line 135-142: The APIClient.Get call error handling in the polling loop
is incomplete. Currently, only NotFound errors are explicitly handled to return
nil (indicating successful deletion), but any other error (including actual API
failures) falls through to the "still present" error message, masking real
problems. Fix this by first checking if getErr is not nil, then within that
block check if it's a NotFound error (return nil if so), and otherwise return
the actual getErr itself so real errors are properly propagated instead of being
hidden behind the "stale SBRC still present" message.
In `@tests/sbr-operator/tests/sbr_helpers.go`:
- Around line 29-34: The function currently returns an empty string when the
StorageClasses().List() API call fails, which masks real environment or
permission issues downstream. Instead of returning "" on the error from
APIClient.StorageV1Interface.StorageClasses().List(), fail the test immediately
using Ginkgo's Fail function with the error message to ensure that API/client
errors are properly detected and not silently skipped.
In `@tests/sbr-operator/tests/sbrc_lifecycle.go`:
- Around line 155-169: The test in the Eventually block checks only that
NumberReady is greater than zero, which does not prove the patch caused a
rollout since pods may have already been ready. To properly verify a rollout
occurred, capture the DaemonSet's generation or template hash before applying
the patch, then in the Eventually check, assert that this marker has changed
after the patch and verify that the updated pods with the new configuration are
converging and becoming ready. This ensures the patch actually triggered a
rollout rather than just confirming existing pod readiness.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ffc741ce-f70a-4ec9-bb7b-5d18629b54f6
📒 Files selected for processing (6)
tests/sbr-operator/internal/sbrparams/const.gotests/sbr-operator/tests/remediation.gotests/sbr-operator/tests/sbr.gotests/sbr-operator/tests/sbr_helpers.gotests/sbr-operator/tests/sbrc_lifecycle.gotests/sbr-operator/tests/watchdog.go
💤 Files with no reviewable changes (1)
- tests/sbr-operator/tests/watchdog.go
…tion check (CodeRabbit PR medik8s#28) - remediation.go: propagate non-NotFound errors in stale SBRC Eventually poll instead of treating them as "still present" - sbr_helpers.go: replace GinkgoWriter.Printf+return with Expect(err).ToNot(HaveOccurred()) so API errors fail the test immediately - sbrc_lifecycle.go: capture DaemonSet ObservedGeneration before the SBRC patch; assert it increases in the rollout check to prove a real rollout occurred, not just that pods happen to be ready
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
f30b638 to
5e7e255
Compare
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
5e7e255 to
6f91549
Compare
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
6f91549 to
5e81701
Compare
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
5e81701 to
1a38169
Compare
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
1a38169 to
22901e5
Compare
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
22901e5 to
35cc9bf
Compare
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
35cc9bf to
a04ee27
Compare
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
| return fmt.Errorf("DaemonSet %s not found: %w", dsNameA, getErr) | ||
| } | ||
|
|
||
| if agentDS.Status.NumberReady == 0 { |
There was a problem hiding this comment.
This check only verifies NumberReady > 0, which Step 1 already established via waitForSBRCReady. The Eventually succeeds on the first poll using pre-patch readiness without proving the patch triggered a rollout.
Fix: Capture the DaemonSet Generation before the patch, then assert rollout completion:
prePatchDS, err := APIClient.DaemonSets(medik8sparams.OperatorNs).Get(
context.TODO(), dsNameA, metav1.GetOptions{})
Expect(err).ToNot(HaveOccurred())
prePatchGen := prePatchDS.Generation
// ... apply patch ...
Eventually(func() error {
agentDS, getErr := APIClient.DaemonSets(medik8sparams.OperatorNs).Get(
context.TODO(), dsNameA, metav1.GetOptions{})
if getErr != nil {
return fmt.Errorf("DaemonSet %s not found: %w", dsNameA, getErr)
}
if agentDS.Generation <= prePatchGen {
return fmt.Errorf("DaemonSet %s generation not advanced (current: %d, pre-patch: %d)",
dsNameA, agentDS.Generation, prePatchGen)
}
if agentDS.Status.ObservedGeneration < agentDS.Generation {
return fmt.Errorf("DaemonSet %s not reconciled (observed: %d, current: %d)",
dsNameA, agentDS.Status.ObservedGeneration, agentDS.Generation)
}
if agentDS.Status.NumberReady == 0 {
return fmt.Errorf("DaemonSet %s: 0/%d ready after rollout",
dsNameA, agentDS.Status.DesiredNumberScheduled)
}
return nil
}, sbrparams.SBRCLifecyclePatchedTimeout, sbrparams.DefaultPollInterval).Should(Succeed())Tests SBRC create→DaemonSet ready, sbrTimeoutSeconds patch→rollout, multi-SBRC isolation, delete→DaemonSet GC. Fixes ugreener review: AfterAll DaemonSet GC wait, operator readiness check, cephfs-only discovery, node discovery before SBRC creation, dead empty-string check, MergePatch uncordon, discoverRWXStorageClass in BeforeAll, naming consistency, consistent timeout constants. Polarion: OCP-88734
a04ee27 to
7805e4f
Compare
64fcf90 to
583268f
Compare
|
@maximunited: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Test plan
Summary by CodeRabbit