Skip to content

Add FAR destructive test constants and cluster helpers#20

Open
ugreener wants to merge 1 commit into
medik8s:mainfrom
ugreener:feat/far-destructive-harness
Open

Add FAR destructive test constants and cluster helpers#20
ugreener wants to merge 1 commit into
medik8s:mainfrom
ugreener:feat/far-destructive-harness

Conversation

@ugreener

@ugreener ugreener commented Jun 16, 2026

Copy link
Copy Markdown

Summary

Add timeout constants, fence agent identifiers, AWS credential field names, and cluster-level helper functions to support FAR destructive E2E tests. This is the foundation layer that subsequent PRs (#23 and #22) build upon.

Changes

  • Extended tests/far-operator/internal/farparams/const.go with: timeout constants (NodeReadyTimeout, NodeNotReadyTimeout, NodeRebootTimeout, OcDebugTimeout, FARConditionTimeout), fence agent names (fence_aws, fence_ipmilan), node identifier parameters (--plug, --ipport), AWS credential Secret/field names, and the controller lease prefix
  • Added OperatorControllerPodLabels two-label map to tests/far-operator/internal/farparams/farvars.go for controller pod selection matching the control-plane: controller-manager + app.kubernetes.io/name labeling convention used across medik8s operators
  • New tests/far-operator/internal/farutils/harness.go with cluster helpers: platform detection via Infrastructure CR, AWS instance ID extraction from providerID, AWS credentials from CCO Secret, worker node selection with leader exclusion and deterministic ordering, leader controller node discovery via lease inspection, AWS node parameter map building, ready worker count, and controller pod readiness check

Dependencies

Part 1 of 3 for RHWA-962. No dependencies. Node disruption helpers in #23, test scaffold in #22.

Merge order: this PR first, then #23, then #22.

Jira: RHWA-962

@openshift-ci openshift-ci Bot requested review from lyfofvipin and slintes June 16, 2026 12:17
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ugreener

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a new farparams/const.go file defining 21 exported constants (polling intervals, CRD names, PSA labels, fence agent identifiers, AWS credential keys, node/FAR timeouts, and a lease prefix), extends farparams/farvars.go with an OperatorControllerPodLabels selector map, and introduces farutils/harness.go with nine exported functions for platform detection, AWS credential retrieval, worker-node selection/counting, controller Lease-based leader discovery, AWS fence parameter construction, and FAR controller Pod listing.

Changes

FAR Test Infrastructure

Layer / File(s) Summary
farparams constants and label selectors
tests/far-operator/internal/farparams/const.go, tests/far-operator/internal/farparams/farvars.go
Defines 21 exported constants (polling interval, CRD names, PSA labels, priority class, controller label/lease keys, fence agent binary/parameter names, AWS credential fields, node/FAR timeout durations) and the OperatorControllerPodLabels selector map consumed by GetFARControllerPods.
Platform detection and AWS helpers
tests/far-operator/internal/farutils/harness.go
Adds DetectPlatform (fetches Infrastructure/cluster, returns platform type and AWS region), FenceAgentForPlatform (maps platform to fence agent binary and node-identifier parameter), ExtractAWSInstanceID (parses aws:// providerID), and GetAWSCredentials (reads CCO-provisioned Secret fields with missing-key errors).
Worker node selection, counting, and readiness predicate
tests/far-operator/internal/farutils/harness.go
Adds SelectWorkerNode (sorts and filters worker nodes by name exclusion, scheduling state, and NodeReady condition), CountReadyWorkerNodes (counts Ready schedulable workers), and the internal isNodeReady predicate scanning NodeReady conditions.
Controller discovery, AWS node parameters, and Pod listing
tests/far-operator/internal/farutils/harness.go
Adds GetActiveFARControllerNode (Lease-based leader discovery returning the leader Pod's spec.nodeName), BuildAWSNodeParameters (builds the fence_aws --plug map from Ready/schedulable workers and extracted instance IDs), and GetFARControllerPods (lists controller Pods filtered by label selectors and all-containers-Ready).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • lyfofvipin
  • maximunited
  • clobrano
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 94.44% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add FAR destructive test constants and cluster helpers' clearly and concisely summarizes the main changes: introducing new constants and helper functions for FAR destructive tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
tests/far-operator/internal/farutils/harness.go (1)

244-265: ⚡ Quick win

Hardcoded label value duplicates GetFARControllerPods.

Line 248 hardcodes "fence-agents-remediation-operator", which also appears in GetFARControllerPods (Line 213). Consider extracting this as a constant in farparams (e.g., ControllerPodLabelValue) to maintain consistency and simplify future updates.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/far-operator/internal/farutils/harness.go` around lines 244 - 265, The
hardcoded string "fence-agents-remediation-operator" in the
ListFARControllerPodsByLabel function duplicates the same value used in
GetFARControllerPods. Extract this value as a constant in the farparams package
(e.g., ControllerPodLabelValue) and replace the hardcoded string on line 248
with a reference to this new constant. Additionally, update the
GetFARControllerPods function to use the same constant to maintain consistency
across the codebase.
tests/far-operator/internal/farutils/node.go (1)

142-159: 💤 Low value

Hardcoded poll interval instead of farparams.DefaultPollInterval.

Line 147 uses a hardcoded 10*time.Second interval, while WaitForNodeNotReady and WaitForNodeReady use farparams.DefaultPollInterval (5 seconds). If the longer interval is intentional for reboot detection, consider extracting it as a named constant (e.g., RebootPollInterval) for clarity and maintainability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/far-operator/internal/farutils/node.go` around lines 142 - 159, The
WaitForNodeReboot function uses a hardcoded 10*time.Second poll interval in the
wait.PollUntilContextTimeout call, which differs from the standard interval used
in similar functions like WaitForNodeNotReady and WaitForNodeReady that use
farparams.DefaultPollInterval. Either replace the hardcoded interval with
farparams.DefaultPollInterval for consistency, or if the longer interval is
intentional for reboot detection, extract it as a named constant (e.g.,
RebootPollInterval) in the farparams package for clarity and maintainability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/far-operator/internal/farutils/harness.go`:
- Around line 132-162: The GetActiveFARControllerNode function has three
robustness issues that need to be addressed. First, replace the strings.Contains
check when looking for the controller lease to use strings.HasPrefix or exact
matching instead, as Contains is too loose and could match unintended leases
with similar names. Second, add validation after splitting the holderIdentity by
underscore to ensure the split result has at least one element and that it's
non-empty before using parts[0] as the pod name, otherwise return an appropriate
error. Third, before returning pod.Spec.NodeName, validate that it is not an
empty string and return an error if the pod has not been scheduled yet
(indicating an empty NodeName).

In `@tests/far-operator/internal/farutils/node.go`:
- Around line 42-50: The StopKubelet function uses fragile string matching with
strings.Contains() to detect connection refused errors instead of the proper
IsConnectionRefused() function available in
vendor/k8s.io/apimachinery/pkg/util/net/util.go. Replace the string matching
check in StopKubelet with a call to IsConnectionRefused() to properly detect
connection refused errors. Additionally, review the StartKubelet function (lines
52-57) to ensure consistent error handling between both functions - consider
whether it should also handle the connection refused case using the same
IsConnectionRefused() approach to maintain consistency.
- Around line 89-138: The WaitForNodeNotReady function has inconsistent error
handling compared to WaitForNodeReady. When the client.Get call fails in
WaitForNodeNotReady, it returns the error immediately (return false, err),
terminating polling, while WaitForNodeReady suppresses the error and continues
polling (return false, nil). Change the error handling in WaitForNodeNotReady to
suppress Get errors by returning false, nil instead of false, err, making it
consistent with WaitForNodeReady and the broader codebase pattern of suppressing
transient errors during polling operations.

---

Nitpick comments:
In `@tests/far-operator/internal/farutils/harness.go`:
- Around line 244-265: The hardcoded string "fence-agents-remediation-operator"
in the ListFARControllerPodsByLabel function duplicates the same value used in
GetFARControllerPods. Extract this value as a constant in the farparams package
(e.g., ControllerPodLabelValue) and replace the hardcoded string on line 248
with a reference to this new constant. Additionally, update the
GetFARControllerPods function to use the same constant to maintain consistency
across the codebase.

In `@tests/far-operator/internal/farutils/node.go`:
- Around line 142-159: The WaitForNodeReboot function uses a hardcoded
10*time.Second poll interval in the wait.PollUntilContextTimeout call, which
differs from the standard interval used in similar functions like
WaitForNodeNotReady and WaitForNodeReady that use farparams.DefaultPollInterval.
Either replace the hardcoded interval with farparams.DefaultPollInterval for
consistency, or if the longer interval is intentional for reboot detection,
extract it as a named constant (e.g., RebootPollInterval) in the farparams
package for clarity and maintainability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6f0e31a7-be0f-4d41-af83-35cbd197b3f3

📥 Commits

Reviewing files that changed from the base of the PR and between 79c5b13 and d82d3b1.

📒 Files selected for processing (3)
  • tests/far-operator/internal/farparams/const.go
  • tests/far-operator/internal/farutils/harness.go
  • tests/far-operator/internal/farutils/node.go

Comment thread tests/far-operator/internal/farutils/harness.go Outdated
Comment thread tests/far-operator/internal/farutils/node.go Outdated
Comment thread tests/far-operator/internal/farutils/node.go Outdated
@ugreener ugreener changed the title Add FAR destructive test harness helpers Draft: Add FAR destructive test harness helpers Jun 16, 2026
@ugreener ugreener force-pushed the feat/far-destructive-harness branch 3 times, most recently from 64248f2 to f347cea Compare June 16, 2026 12:58

@ugreener ugreener left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This AI review was posted in error and has been retracted. Please disregard.)

ugreener added a commit to ugreener/system-tests that referenced this pull request Jun 16, 2026
Add the Ginkgo scaffold for FAR destructive E2E tests with:
- Serial + Ordered execution (destructive tests must not run in parallel)
- BeforeAll: platform detection, AWS credential retrieval, fence_aws
  parameter building, FAR controller leader detection, worker node count
  validation (requires 3+ workers)
- JustAfterEach safety net: restarts kubelet, waits for node Ready,
  cleans up orphaned FAR/FART CRs on test failure
- Inline FAR/FART CR builders using unstructured objects
- Inline delete helpers with wait-for-removal polling
- Empty Context blocks for RHWA-963 (7 standalone tests) and
  RHWA-1035 (4 NHC+FAR interop tests)
- One harness validation spec that exercises the full setup: node
  selection, boot ID retrieval, FART create/delete lifecycle

Depends on: PR medik8s#20 (farutils harness helpers)

Jira: RHWA-962

Co-Authored-By: Claude <noreply@anthropic.com>

@ugreener ugreener left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This AI review was posted in error and has been retracted. Please disregard.)

ugreener added a commit to ugreener/system-tests that referenced this pull request Jun 16, 2026
Add the Ginkgo scaffold for FAR destructive E2E tests with:
- Serial + Ordered execution (destructive tests must not run in parallel)
- BeforeAll: platform detection, AWS credential retrieval, fence_aws
  parameter building, FAR controller leader detection, worker node count
  validation (requires 3+ workers)
- JustAfterEach safety net: restarts kubelet, waits for node Ready,
  cleans up orphaned FAR/FART CRs on test failure
- Inline FAR/FART CR builders using unstructured objects
- Inline delete helpers with wait-for-removal polling
- Empty Context blocks for RHWA-963 (7 standalone tests) and
  RHWA-1035 (4 NHC+FAR interop tests)
- One harness validation spec that exercises the full setup: node
  selection, boot ID retrieval, FART create/delete lifecycle

Depends on: PR medik8s#20 (farutils harness helpers)

Jira: RHWA-962

Co-Authored-By: Claude <noreply@anthropic.com>
ugreener added a commit to ugreener/system-tests that referenced this pull request Jun 16, 2026
Add the Ginkgo scaffold for FAR destructive E2E tests with:
- Serial + Ordered execution (destructive tests must not run in parallel)
- BeforeAll: platform detection, AWS credential retrieval, fence_aws
  parameter building, FAR controller leader detection, worker node count
  validation (requires 3+ workers)
- JustAfterEach safety net: restarts kubelet, waits for node Ready,
  cleans up orphaned FAR/FART CRs on test failure
- Inline FAR/FART CR builders using unstructured objects
- Inline delete helpers with wait-for-removal polling
- Empty Context blocks for RHWA-963 (7 standalone tests) and
  RHWA-1035 (4 NHC+FAR interop tests)
- One harness validation spec that exercises the full setup: node
  selection, boot ID retrieval, FART create/delete lifecycle

Depends on: PR medik8s#20 (farutils harness helpers)

Jira: RHWA-962

Co-Authored-By: Claude <noreply@anthropic.com>
ugreener added a commit to ugreener/system-tests that referenced this pull request Jun 16, 2026
Add the Ginkgo scaffold for FAR destructive E2E tests with:
- Serial + Ordered execution (destructive tests must not run in parallel)
- BeforeAll: platform detection, AWS credential retrieval, fence_aws
  parameter building, FAR controller leader detection, worker node count
  validation (requires 3+ workers)
- JustAfterEach safety net: restarts kubelet, waits for node Ready,
  cleans up orphaned FAR/FART CRs on test failure
- Inline FAR/FART CR builders using unstructured objects
- Inline delete helpers with wait-for-removal polling
- Empty Context blocks for RHWA-963 (7 standalone tests) and
  RHWA-1035 (4 NHC+FAR interop tests)
- One harness validation spec that exercises the full setup: node
  selection, boot ID retrieval, FART create/delete lifecycle

Depends on: PR medik8s#20 (farutils harness helpers)

Jira: RHWA-962

Co-Authored-By: Claude <noreply@anthropic.com>
@ugreener ugreener force-pushed the feat/far-destructive-harness branch from f347cea to 4f94147 Compare June 16, 2026 14:38
@ugreener ugreener force-pushed the feat/far-destructive-harness branch from 4f94147 to 5a9beca Compare June 16, 2026 14:49
ugreener added a commit to ugreener/system-tests that referenced this pull request Jun 16, 2026
Add Prow-compatible node operation helpers for FAR destructive E2E
tests:

- farutils/node.go: RunOnNode (oc debug wrapper with context
  propagation and timeout), StopKubelet/StartKubelet, boot ID
  tracking via API and oc debug, node readiness polling
  (WaitForNodeNotReady, WaitForNodeReady, WaitForNodeReboot)

Part 2 of 3 for RHWA-962. Depends on PR medik8s#20 (constants).
Test scaffold in PR medik8s#22.

Jira: RHWA-962

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ugreener added a commit to ugreener/system-tests that referenced this pull request Jun 16, 2026
Add the Ginkgo test scaffold for FAR destructive E2E tests:

- far_destructive.go: Serial/Ordered Describe block with BeforeAll
  setup (platform detection, AWS credential retrieval, node parameter
  building, FAR leader identification), JustAfterEach safety-net
  cleanup, and initial AWS fencing remediation test case

Part 3 of 3 for RHWA-962. Depends on PR medik8s#20 (constants + cluster
helpers) and PR medik8s#21 (node disruption helpers).

Jira: RHWA-962

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ugreener ugreener changed the title Draft: Add FAR destructive test harness helpers Draft: Add FAR destructive test constants and cluster helpers Jun 16, 2026
@ugreener ugreener force-pushed the feat/far-destructive-harness branch 2 times, most recently from 5133e6f to 6389fda Compare June 16, 2026 20:19
Add constants and cluster-level helper functions for FAR destructive
E2E tests:

- farparams/const.go: timeout constants, fence agent names, AWS
  credential field names, CRD names, controller lease prefix
- farutils/harness.go: platform detection via Infrastructure CR, AWS
  instance ID extraction, AWS credentials from CCO Secret, worker node
  selection with FAR leader exclusion, controller pod readiness check

Part 1 of 3 for RHWA-962. Node disruption helpers in PR medik8s#21, test
scaffold in PR medik8s#22.

Jira: RHWA-962

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ugreener ugreener force-pushed the feat/far-destructive-harness branch from 6389fda to 87d365c Compare June 21, 2026 14:22
@ugreener ugreener changed the title Draft: Add FAR destructive test constants and cluster helpers Add FAR destructive test constants and cluster helpers Jun 21, 2026
@ugreener

Copy link
Copy Markdown
Author

/test 4.22-konflux-e2e-far-aws

@razo7 razo7 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I think we have an issue with lease name

}

pod := &corev1.Pod{}
if err := k8sClient.Get(ctx, client.ObjectKey{Name: podName, Namespace: medik8sparams.OperatorNs}, pod); err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose "fence-agents-remediation" for the lease name?

The const is set to "fence-agents-remediation", but FAR's actual leader election lease name is "cb305759.medik8s.io" (see in cmd/main.go:120):

LeaderElectionID: "cb305759.medik8s.io",

So strings.HasPrefix("cb305759.medik8s.io", "fence-agents-remediation")false. This function will always return "FAR leader lease not found or has no holder".

IIUC this is invisible in PR #20 CI because the function isn't called yet, but it'll surface when PR #22 uses it.

@@ -0,0 +1,280 @@
package farutils

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the FAR repo, these helpers are split into:

  • test/e2e/utils/cluster.go - platform detection, credentials, node info
  • test/e2e/utils/command.go - controller pod management

Why harness.go? Would cluster.go (or splitting into cluster.go + controller.go) communicate the contents more clearly?

Also: some of these helpers are not FAR-specific. DetectPlatform, SelectWorkerNode, CountReadyWorkerNodes, isNodeReady would work unchanged for NHC, SNR, or any medik8s operator. Should they live in the shared tests/internal/ package instead of under farutils/?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants