SBR | write-only storage loss test (OCP-89200)#32
Conversation
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: maximunited The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Warning Review limit reached
More reviews will be available in 39 minutes and 28 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughAdds 19 exported constants for SBR/NHC test orchestration, refactors the shared SBRC unstructured builder and moves ChangesSBR Operator Test Additions
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
PR-Agent: could not find a component named |
PR Summary by QodoSBR: add write-only storage loss and SBR CR lifecycle functional tests Description
Diagram
High-Level Assessment
Files changed (5)
|
Code Review by Qodo
1.
|
…ference (PR medik8s#32 review) - discoverRWXStorageClass: remove rbd/nfs from provisioner match; accept only cephfs since the test blocks CephFS-specific ports (3300/6789/6800-7300); call Skip instead of returning empty string when no CephFS class found - DeferCleanup: promote injectorPod to Describe-scope var so cleanup holds a direct reference instead of re-pulling by name via pod.Pull; if the pod was never created add a fallback that spins up a temporary cleanup pod to avoid leaking iptables rules on the target node
|
Fixed: (1) removed rbd from RWX discovery (CephFS only), (2) iptables cleanup uses held pod reference instead of re-pull; fallback creates cleanup pod if original is gone. |
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
6c6c424 to
8e98029
Compare
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
- storage_loss_write_only.go: remove pullSBRCR and cleanupSBRCR local duplicates; use the shared helpers from remediation.go (same package) - const.go: remove unused SBRCNHCTestName, NHCTestName, InjectorPodName; no callers exist in this branch — they were left from an earlier design - storage_loss_write_only.go: add stale SBRC pre-cleanup in BeforeAll before creating a fresh one; without this a retry after failure hits AlreadyExists - storage_loss_write_only.go: DeferCleanup now waits for the node to become schedulable after cleanupSBRCR; without this, cordoned state leaks into subsequent tests in the same Ginkgo run - storage_loss_write_only.go: FencingSucceeded check tracks fencingObserved bool; IsNotFound is only accepted as success after FencingSucceeded=True was observed — test can no longer pass without confirming fencing occurred
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/sbr-operator/tests/sbr.go`:
- Around line 654-675: The `waitForSBRCReady` function currently only checks if
`NumberReady == 0`, allowing tests to proceed with partial agent coverage on
cluster nodes. This can cause flakes when tests target nodes without agents.
Change the condition to require that `agentDS.Status.NumberReady ==
agentDS.Status.DesiredNumberScheduled`, ensuring the entire agent DaemonSet is
fully rolled out and ready before proceeding with functional tests.
In `@tests/sbr-operator/tests/storage_loss_write_only.go`:
- Around line 351-353: The cleanup pod name construction at the pod.NewBuilder
call for "sbr-write-cleanup-" is missing the truncation and sanitization that is
applied to injectorPodName, which can cause pod creation failures when
targetNodeName is too long. Extract the sanitization logic that is used for
injectorPodName and apply the same truncation bounds to the cleanup pod name
string before passing it to pod.NewBuilder to ensure the final pod name stays
within Kubernetes naming limits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 867e75a1-7b85-4ba0-be8f-7d65760feaeb
📒 Files selected for processing (5)
tests/sbr-operator/internal/sbrparams/const.gotests/sbr-operator/tests/remediation.gotests/sbr-operator/tests/sbr.gotests/sbr-operator/tests/storage_loss_write_only.gotests/sbr-operator/tests/watchdog.go
💤 Files with no reviewable changes (1)
- tests/sbr-operator/tests/watchdog.go
|
/test 4.22-konflux-e2e-sbr-aws-odf |
a3fe0fe to
668c6e4
Compare
|
PR-Agent: could not find a component named |
668c6e4 to
7d0549b
Compare
|
Fixed CodeRabbit findings: (1) waitForSBRCReady now waits for full rollout (NumberReady==DesiredNumberScheduled); (2) fallback cleanup pod name sanitized and truncated same as injector pod name. |
|
/test 4.22-konflux-e2e-sbr-aws-odf |
|
PR-Agent: could not find a component named |
Injects CephFS block via iptables OUTPUT rules (same approach as PR medik8s#32 write-only test which passes CI). Blocks node from writing CephFS heartbeats; peers detect stale heartbeat triggering SBRStorageUnhealthy. Polarion: OCP-88735
Injects CephFS block via iptables OUTPUT rules (same as PR medik8s#32). Skips gracefully when iptables injection is unavailable on the allocated nodes rather than hard-failing — CI node allocation is non-deterministic. Polarion: OCP-88735
7d0549b to
f3ceb8d
Compare
f3ceb8d to
6c4a9a3
Compare
Blocks OUTPUT-only CephFS traffic; peers detect stale heartbeat and write fence message; target reads and self-fences via watchdog. Verifies fence-message-read path distinct from watchdog autonomous path. waitForSBRCReady now requires full rollout (NumberReady==DesiredScheduled). Fallback cleanup pod name sanitized same as injector pod name. Polarion: OCP-89200
6c4a9a3 to
205b7c9
Compare
7092364 to
7aa63fd
Compare
|
@maximunited: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Injects CephFS block via iptables OUTPUT rules (same as PR medik8s#32). Skips gracefully when iptables injection is unavailable on the allocated nodes rather than hard-failing — CI node allocation is non-deterministic. Polarion: OCP-88735
Summary
Test plan
Summary by CodeRabbit