Skip to content

SBR | write-only storage loss test (OCP-89200)#32

Open
maximunited wants to merge 2 commits into
medik8s:mainfrom
maximunited:feat/sbr-ocp89200-write-only-storage-loss
Open

SBR | write-only storage loss test (OCP-89200)#32
maximunited wants to merge 2 commits into
medik8s:mainfrom
maximunited:feat/sbr-ocp89200-write-only-storage-loss

Conversation

@maximunited

@maximunited maximunited commented Jun 21, 2026

Copy link
Copy Markdown

Summary

  • Adds VERY HIGH PRIORITY test for write-only storage loss (OCP-89200)
  • Blocks only OUTPUT CephFS traffic (ports 3300, 6789, 6800–7300); INPUT kept open
  • Target node reads fence message written by peers → self-fences via fence-message-read path
  • Distinct from OCP-88880 which blocks both directions and relies on watchdog-autonomous path
  • Adds NHC integration constants to sbrparams (NHCAPIGroup, SBRStorageUnhealthyCondition, NodeRebootTimeout, etc.)

Test plan

  • make vet passes
  • golangci-lint passes
  • /test 4.22-konflux-e2e-sbr-aws-odf

Summary by CodeRabbit

  • Tests
    • Added comprehensive test coverage for StorageBasedRemediation CR lifecycle, including admission, finalizer handling, and cleanup verification.
    • Added functional test for write-only CephFS storage loss scenarios with self-fencing and node recovery verification.
    • Enhanced test infrastructure for node scheduling detection and storage configuration readiness validation.

@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@openshift-ci openshift-ci Bot requested review from beekhof and lyfofvipin June 21, 2026 00:33
@openshift-ci

openshift-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maximunited

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@maximunited, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 39 minutes and 28 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 48e89ba9-875c-4955-a398-1355664920e2

📥 Commits

Reviewing files that changed from the base of the PR and between a3fe0fe and 7aa63fd.

📒 Files selected for processing (5)
  • tests/sbr-operator/internal/sbrparams/const.go
  • tests/sbr-operator/tests/remediation.go
  • tests/sbr-operator/tests/sbr.go
  • tests/sbr-operator/tests/storage_loss_write_only.go
  • tests/sbr-operator/tests/watchdog.go
📝 Walkthrough

Walkthrough

Adds 19 exported constants for SBR/NHC test orchestration, refactors the shared SBRC unstructured builder and moves isNodeSchedulable to a shared location, and introduces two new Ginkgo test suites: a StorageBasedRemediation CR lifecycle test (admission, finalizer, deletion) and a write-only CephFS storage loss fencing test using iptables injection and NodeHealthCheck.

Changes

SBR Operator Test Additions

Layer / File(s) Summary
Test constants and shared node/SBRC helpers
tests/sbr-operator/internal/sbrparams/const.go, tests/sbr-operator/tests/sbr.go, tests/sbr-operator/tests/watchdog.go
Adds 19 exported constants (finalizer name, NHC CRD identifiers, timeouts, poll intervals, test resource names). Refactors buildSBRC into a shared buildSBRUnstructured builder, promotes isNodeSchedulable from watchdog.go to sbr.go, and adds waitForSBRCReady polling against the agent DaemonSet.
StorageBasedRemediation CR lifecycle test
tests/sbr-operator/tests/remediation.go
Adds helpers to build, fetch, and force-cleanup StorageBasedRemediation CRs and to identify controller pod nodes. Implements the Ginkgo suite that creates an SBRC, waits for readiness, selects a suitable target node, then validates CR admission, finalizer addition (SBRRemediationFinalizer), finalizer release, and full CR deletion.
Write-only CephFS storage loss fencing test
tests/sbr-operator/tests/storage_loss_write_only.go
Adds helpers for CephFS RWX StorageClass discovery, boot ID reads, NHC CR construction, and condition extraction. Implements the Ginkgo suite that sets up SBRC and NHC, injects OUTPUT-only iptables REJECT rules via a privileged pod, then waits for SBRStorageUnhealthy=True, NHC-triggered SBR CR creation, node NotReady→Ready transition with a changed boot ID, FencingSucceeded=True, and controller-side SBR CR deletion.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • medik8s/system-tests#9: Modifies the same sbrparams/const.go and tests/sbr-operator/tests/sbr.go files that this PR extends with new constants and helper refactoring.
  • medik8s/system-tests#10: Works on the same SBR test suite files (sbrparams/const.go, sbr.go) including buildSBRC and schedulable-node handling that this PR directly refactors.
  • medik8s/system-tests#19: Intersects with watchdog.go and shared SBR helpers in sbr.go, the same files modified by removal of the local isNodeSchedulable in this PR.

Suggested labels

lgtm, ok-to-test

Suggested reviewers

  • beekhof
  • lyfofvipin
  • ugreener
  • mshitrit
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'SBR | write-only storage loss test (OCP-89200)' directly and clearly describes the primary change: adding a new write-only storage loss test for the SBR operator, with the specific OCP ticket reference.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@qodo-2-for-medik8s

Copy link
Copy Markdown

PR Summary by Qodo

SBR: add write-only storage loss and SBR CR lifecycle functional tests
🧪 Tests ✨ Enhancement 🕐 40+ Minutes

Grey Divider

Description

• Add a high-priority functional test for write-only CephFS storage loss triggering
 fence-message-read self-fencing.
• Add a Tier-2 functional test validating StorageBasedRemediation CR finalizer lifecycle and
 cleanup.
• Introduce shared helpers/constants for SBRC readiness, node selection, and NHC integration
 parameters.
Diagram

graph TD
T["Ginkgo functional tests"] --> A["K8s API"] --> N["NHC"] --> O["SBR agents"] --> W["Target worker"]
T --> I["Injector pod"] --> W --> S[("Shared storage")]
W -. "OUTPUT blocked" .-> S
S --> W
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Use tc/netem for fault injection instead of iptables REJECT
  • ➕ Can model latency/packet loss in addition to hard rejects
  • ➕ May better emulate partial storage-path degradation
  • ➖ More complex to implement/clean up reliably across OS variants
  • ➖ Harder to target specific Ceph ports cleanly without iptables anyway
2. Centralize SBR CR pull/cleanup helpers across tests
  • ➕ Avoids duplication (multiple pull/cleanup implementations)
  • ➕ Reduces risk of behavioral drift in cleanup/finalizer handling
  • ➖ Requires additional refactor/coordination across test files
  • ➖ May slightly reduce per-test readability if over-abstracted

Recommendation: The PR’s approach (privileged pod + nsenter + targeted OUTPUT REJECT rules) is appropriate for reliably simulating write-only CephFS loss on the node network namespace while preserving inbound reads for the fence-message-read path. Consider a follow-up to consolidate duplicated SBR CR pull/cleanup helpers to keep destructive-test cleanup semantics consistent.

Files changed (5) +1004 / -20

Enhancement (1) +71 / -0
const.goAdd SBRC/NHC and fencing test constants +71/-0

Add SBRC/NHC and fencing test constants

• Adds constants for SBRC names, agent DaemonSet naming, readiness timeouts, NHC API identifiers, node condition names, reboot timeouts, and injector pod prefixes. Introduces the SBR remediation finalizer constant used by lifecycle assertions.

tests/sbr-operator/internal/sbrparams/const.go

Refactor (2) +48 / -20
sbr.goRefactor shared builders and add SBRC readiness wait +48/-2

Refactor shared builders and add SBRC readiness wait

• Moves isNodeSchedulable into a shared location and introduces a generic unstructured builder to support both SBRC and SBR CR creation. Adds waitForSBRCReady to block until the agent DaemonSet has a ready pod before running tests that depend on active agents.

tests/sbr-operator/tests/sbr.go

watchdog.goRemove local isNodeSchedulable helper (now shared) +0/-18

Remove local isNodeSchedulable helper (now shared)

• Deletes the watchdog-local implementation of isNodeSchedulable after moving it to sbr.go for reuse across tests.

tests/sbr-operator/tests/watchdog.go

Tests (2) +885 / -0
remediation.goAdd SBR CR lifecycle functional test (finalizer + cleanup) +289/-0

Add SBR CR lifecycle functional test (finalizer + cleanup)

• Introduces a functional test that creates an SBRC to ensure agents run, creates a StorageBasedRemediation CR targeting a safe worker node, asserts the controller-added finalizer, and verifies deletion fully completes. Includes robust DeferCleanup to force-remove finalizers and ensure the node is schedulable after cleanup.

tests/sbr-operator/tests/remediation.go

storage_loss_write_only.goAdd write-only storage loss test via fence-message-read path (OCP-89200) +596/-0

Add write-only storage loss test via fence-message-read path (OCP-89200)

• Adds a destructive functional test that creates an SBRC with a discovered RWX storage class, ensures NHC is installed, configures an NHC CR, and injects OUTPUT-only CephFS port blocks on the target node. Verifies SBRStorageUnhealthy=True, NHC-created StorageBasedRemediation CR, node reboot (BootID change), FencingSucceeded=True, and post-fencing controller cleanup.

tests/sbr-operator/tests/storage_loss_write_only.go

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

Grey Divider


Action required

1. iptables cleanup can leak ✓ Resolved 🐞 Bug ☼ Reliability
Description
The test injects host-level iptables OUTPUT REJECT rules but DeferCleanup only removes them when it
can pull the injector pod; if the pod cannot be pulled while the node remains up, the rules can
persist and impact later tests/cluster behavior.
Code

tests/sbr-operator/tests/storage_loss_write_only.go[R366-394]

+				DeferCleanup(func() {
+					By("DeferCleanup: removing OUTPUT REJECT rules on target node")
+
+					cleanupPod, pullErr := pod.Pull(APIClient, injectorPodName, medik8sparams.OperatorNs)
+					if pullErr == nil {
+						// Delete rules individually to avoid flushing the whole OUTPUT chain
+						// which could affect other concurrent tests.
+						outputRulesCleanup := [][]string{
+							{"nsenter", "--target", "1", "--net",
+								"iptables", "-D", "OUTPUT", "-p", "tcp", "--dport", "3300", "-j", "REJECT"},
+							{"nsenter", "--target", "1", "--net",
+								"iptables", "-D", "OUTPUT", "-p", "tcp", "--dport", "6789", "-j", "REJECT"},
+							{"nsenter", "--target", "1", "--net",
+								"iptables", "-D", "OUTPUT", "-p", "tcp", "--match", "multiport",
+								"--dports", "6800:7300", "-j", "REJECT"},
+						}
+
+						for _, cmd := range outputRulesCleanup {
+							if _, flushErr := cleanupPod.ExecCommand(cmd); flushErr != nil {
+								GinkgoWriter.Printf("Warning: iptables cleanup on node %s (cmd %v): %v\n",
+									targetNodeName, cmd, flushErr)
+							}
+						}
+
+						if _, delErr := cleanupPod.Delete(); delErr != nil {
+							GinkgoWriter.Printf("Warning: delete injector pod: %v\n", delErr)
+						}
+					}
+
Relevance

⭐⭐⭐ High

Team previously accepted hardening cleanups to prevent leaked test side-effects (conditional
DeferCleanup to avoid resource leaks).

PR-#10

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
Cleanup is guarded by if pullErr == nil, so failures to retrieve the pod skip iptables removal,
while the test always injects OUTPUT rules via nsenter/iptables earlier in the same test.

tests/sbr-operator/tests/storage_loss_write_only.go[366-393]
tests/sbr-operator/tests/storage_loss_write_only.go[429-437]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
DeferCleanup removes the injected iptables rules only if `pod.Pull(...)` succeeds. If the injector pod is missing/unreachable but the node is still running with the OUTPUT REJECT rules applied, cleanup is skipped and the node can remain network-partitioned for subsequent tests.

### Issue Context
Rules are inserted into the host network namespace using `nsenter ... iptables -I OUTPUT ... -j REJECT`.

### Fix Focus Areas
- tests/sbr-operator/tests/storage_loss_write_only.go[366-398]
- tests/sbr-operator/tests/storage_loss_write_only.go[429-443]

### What to change
- Prefer using the already-created `injectorPod` handle for cleanup (when available) instead of re-pulling.
- If the injector pod cannot be pulled (or is nil), create a new privileged cleanup pod on the target node and remove the rules.
- Optionally, delete rules in a loop until `iptables -D` fails (to handle duplicate insertions across retries/reruns).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Non-RWX class selected ✓ Resolved 🐞 Bug ≡ Correctness
Description
discoverRWXStorageClass() is documented/used as RWX discovery but it also matches provisioners
containing "rbd", which can select a non-RWX storage class and break the sharedStorageClass-based
storage-loss test setup/behavior.
Code

tests/sbr-operator/tests/storage_loss_write_only.go[R28-50]

+// discoverRWXStorageClass returns the RWX storage class to use for storage loss tests.
+// SBR_STORAGE_CLASS env var overrides auto-discovery. Falls back to a CephFS or NFS provisioner.
+func discoverRWXStorageClass() string {
+	if sc := os.Getenv("SBR_STORAGE_CLASS"); sc != "" {
+		GinkgoWriter.Printf("Using SBR_STORAGE_CLASS=%q from environment\n", sc)
+
+		return sc
+	}
+
+	scList, err := APIClient.StorageClasses().List(context.TODO(), metav1.ListOptions{})
+	if err != nil || len(scList.Items) == 0 {
+		return ""
+	}
+
+	for _, storageClass := range scList.Items {
+		prov := strings.ToLower(storageClass.Provisioner)
+		if strings.Contains(prov, "cephfs") || strings.Contains(prov, "nfs") ||
+			strings.Contains(prov, "rbd") {
+			GinkgoWriter.Printf("Auto-discovered storage class %q (provisioner: %s)\n",
+				storageClass.Name, storageClass.Provisioner)
+
+			return storageClass.Name
+		}
Relevance

⭐⭐ Medium

No prior review evidence on RWX StorageClass auto-discovery; similar issues not found in history
searches.

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The helper is explicitly described and asserted as returning an RWX storage class, but the selection
logic also accepts "rbd" provisioners; the resulting value is then used as sharedStorageClass for
the test SBRC.

tests/sbr-operator/tests/storage_loss_write_only.go[28-50]
tests/sbr-operator/tests/storage_loss_write_only.go[224-235]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`discoverRWXStorageClass()` claims to return an RWX storage class (CephFS/NFS) but currently also matches provisioners containing `rbd`. This can select a storage class that does not meet the test's shared-storage expectations and cause the write-only storage loss test to fail or exercise the wrong path.

### Issue Context
The returned storage class is used as `StorageBasedRemediationConfig.spec.sharedStorageClass`, which is expected to point to shared/RWX storage.

### Fix Focus Areas
- tests/sbr-operator/tests/storage_loss_write_only.go[28-54]

### What to change
- Remove the `rbd` provisioner match from the RWX discovery logic, or tighten discovery to only known RWX backends (CephFS/NFS).
- Keep the log output consistent with the new selection criteria.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread tests/sbr-operator/tests/storage_loss_write_only.go Outdated
Comment thread tests/sbr-operator/tests/storage_loss_write_only.go
maximunited added a commit to maximunited/medik8s-system-tests that referenced this pull request Jun 21, 2026
…ference (PR medik8s#32 review)

- discoverRWXStorageClass: remove rbd/nfs from provisioner match; accept
  only cephfs since the test blocks CephFS-specific ports (3300/6789/6800-7300);
  call Skip instead of returning empty string when no CephFS class found
- DeferCleanup: promote injectorPod to Describe-scope var so cleanup holds
  a direct reference instead of re-pulling by name via pod.Pull; if the pod
  was never created add a fallback that spins up a temporary cleanup pod to
  avoid leaking iptables rules on the target node
@maximunited

Copy link
Copy Markdown
Author

Fixed: (1) removed rbd from RWX discovery (CephFS only), (2) iptables cleanup uses held pod reference instead of re-pull; fallback creates cleanup pod if original is gone.

@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@maximunited maximunited force-pushed the feat/sbr-ocp89200-write-only-storage-loss branch from 6c6c424 to 8e98029 Compare June 21, 2026 09:30
@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

Comment thread tests/sbr-operator/tests/storage_loss_write_only.go Outdated
Comment thread tests/sbr-operator/internal/sbrparams/const.go Outdated
Comment thread tests/sbr-operator/tests/storage_loss_write_only.go
Comment thread tests/sbr-operator/tests/storage_loss_write_only.go Outdated
Comment thread tests/sbr-operator/tests/storage_loss_write_only.go
Comment thread tests/sbr-operator/tests/storage_loss_write_only.go
maximunited added a commit to maximunited/medik8s-system-tests that referenced this pull request Jun 21, 2026
- storage_loss_write_only.go: remove pullSBRCR and cleanupSBRCR local
  duplicates; use the shared helpers from remediation.go (same package)
- const.go: remove unused SBRCNHCTestName, NHCTestName, InjectorPodName;
  no callers exist in this branch — they were left from an earlier design
- storage_loss_write_only.go: add stale SBRC pre-cleanup in BeforeAll before
  creating a fresh one; without this a retry after failure hits AlreadyExists
- storage_loss_write_only.go: DeferCleanup now waits for the node to become
  schedulable after cleanupSBRCR; without this, cordoned state leaks into
  subsequent tests in the same Ginkgo run
- storage_loss_write_only.go: FencingSucceeded check tracks fencingObserved
  bool; IsNotFound is only accepted as success after FencingSucceeded=True
  was observed — test can no longer pass without confirming fencing occurred

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/sbr-operator/tests/sbr.go`:
- Around line 654-675: The `waitForSBRCReady` function currently only checks if
`NumberReady == 0`, allowing tests to proceed with partial agent coverage on
cluster nodes. This can cause flakes when tests target nodes without agents.
Change the condition to require that `agentDS.Status.NumberReady ==
agentDS.Status.DesiredNumberScheduled`, ensuring the entire agent DaemonSet is
fully rolled out and ready before proceeding with functional tests.

In `@tests/sbr-operator/tests/storage_loss_write_only.go`:
- Around line 351-353: The cleanup pod name construction at the pod.NewBuilder
call for "sbr-write-cleanup-" is missing the truncation and sanitization that is
applied to injectorPodName, which can cause pod creation failures when
targetNodeName is too long. Extract the sanitization logic that is used for
injectorPodName and apply the same truncation bounds to the cleanup pod name
string before passing it to pod.NewBuilder to ensure the final pod name stays
within Kubernetes naming limits.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 867e75a1-7b85-4ba0-be8f-7d65760feaeb

📥 Commits

Reviewing files that changed from the base of the PR and between c4219c1 and a3fe0fe.

📒 Files selected for processing (5)
  • tests/sbr-operator/internal/sbrparams/const.go
  • tests/sbr-operator/tests/remediation.go
  • tests/sbr-operator/tests/sbr.go
  • tests/sbr-operator/tests/storage_loss_write_only.go
  • tests/sbr-operator/tests/watchdog.go
💤 Files with no reviewable changes (1)
  • tests/sbr-operator/tests/watchdog.go

Comment thread tests/sbr-operator/tests/sbr.go Outdated
Comment thread tests/sbr-operator/tests/storage_loss_write_only.go
@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@maximunited maximunited force-pushed the feat/sbr-ocp89200-write-only-storage-loss branch from a3fe0fe to 668c6e4 Compare June 21, 2026 15:35
@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

@maximunited maximunited force-pushed the feat/sbr-ocp89200-write-only-storage-loss branch from 668c6e4 to 7d0549b Compare June 21, 2026 15:40
@maximunited

Copy link
Copy Markdown
Author

Fixed CodeRabbit findings: (1) waitForSBRCReady now waits for full rollout (NumberReady==DesiredNumberScheduled); (2) fallback cleanup pod name sanitized and truncated same as injector pod name.

@maximunited

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-sbr-aws-odf

@qodo-2-for-medik8s

qodo-2-for-medik8s Bot commented Jun 21, 2026

Copy link
Copy Markdown

PR-Agent: could not find a component named 4.22-konflux-e2e-sbr-aws-odf in a supported language in this PR.

maximunited added a commit to maximunited/medik8s-system-tests that referenced this pull request Jun 22, 2026
Injects CephFS block via iptables OUTPUT rules (same approach as PR medik8s#32
write-only test which passes CI). Blocks node from writing CephFS
heartbeats; peers detect stale heartbeat triggering SBRStorageUnhealthy.

Polarion: OCP-88735
maximunited added a commit to maximunited/medik8s-system-tests that referenced this pull request Jun 22, 2026
Injects CephFS block via iptables OUTPUT rules (same as PR medik8s#32). Skips
gracefully when iptables injection is unavailable on the allocated nodes
rather than hard-failing — CI node allocation is non-deterministic.

Polarion: OCP-88735
@maximunited maximunited requested a review from ugreener June 22, 2026 16:38
@maximunited maximunited force-pushed the feat/sbr-ocp89200-write-only-storage-loss branch from 7d0549b to f3ceb8d Compare June 23, 2026 10:41
@maximunited maximunited force-pushed the feat/sbr-ocp89200-write-only-storage-loss branch from f3ceb8d to 6c4a9a3 Compare June 23, 2026 10:45
Blocks OUTPUT-only CephFS traffic; peers detect stale heartbeat and write
fence message; target reads and self-fences via watchdog. Verifies
fence-message-read path distinct from watchdog autonomous path.
waitForSBRCReady now requires full rollout (NumberReady==DesiredScheduled).
Fallback cleanup pod name sanitized same as injector pod name.

Polarion: OCP-89200
@maximunited maximunited force-pushed the feat/sbr-ocp89200-write-only-storage-loss branch from 6c4a9a3 to 205b7c9 Compare June 23, 2026 10:48
@maximunited maximunited force-pushed the feat/sbr-ocp89200-write-only-storage-loss branch from 7092364 to 7aa63fd Compare June 23, 2026 10:59
@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown

@maximunited: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/lint 7aa63fd link true /test lint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

maximunited added a commit to maximunited/medik8s-system-tests that referenced this pull request Jun 23, 2026
Injects CephFS block via iptables OUTPUT rules (same as PR medik8s#32). Skips
gracefully when iptables injection is unavailable on the allocated nodes
rather than hard-failing — CI node allocation is non-deterministic.

Polarion: OCP-88735
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants