Add FAR destructive test constants and cluster helpers#20
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ugreener The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughAdds a new ChangesFAR Test Infrastructure
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
tests/far-operator/internal/farutils/harness.go (1)
244-265: ⚡ Quick winHardcoded label value duplicates
GetFARControllerPods.Line 248 hardcodes
"fence-agents-remediation-operator", which also appears inGetFARControllerPods(Line 213). Consider extracting this as a constant infarparams(e.g.,ControllerPodLabelValue) to maintain consistency and simplify future updates.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/far-operator/internal/farutils/harness.go` around lines 244 - 265, The hardcoded string "fence-agents-remediation-operator" in the ListFARControllerPodsByLabel function duplicates the same value used in GetFARControllerPods. Extract this value as a constant in the farparams package (e.g., ControllerPodLabelValue) and replace the hardcoded string on line 248 with a reference to this new constant. Additionally, update the GetFARControllerPods function to use the same constant to maintain consistency across the codebase.tests/far-operator/internal/farutils/node.go (1)
142-159: 💤 Low valueHardcoded poll interval instead of
farparams.DefaultPollInterval.Line 147 uses a hardcoded
10*time.Secondinterval, whileWaitForNodeNotReadyandWaitForNodeReadyusefarparams.DefaultPollInterval(5 seconds). If the longer interval is intentional for reboot detection, consider extracting it as a named constant (e.g.,RebootPollInterval) for clarity and maintainability.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/far-operator/internal/farutils/node.go` around lines 142 - 159, The WaitForNodeReboot function uses a hardcoded 10*time.Second poll interval in the wait.PollUntilContextTimeout call, which differs from the standard interval used in similar functions like WaitForNodeNotReady and WaitForNodeReady that use farparams.DefaultPollInterval. Either replace the hardcoded interval with farparams.DefaultPollInterval for consistency, or if the longer interval is intentional for reboot detection, extract it as a named constant (e.g., RebootPollInterval) in the farparams package for clarity and maintainability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/far-operator/internal/farutils/harness.go`:
- Around line 132-162: The GetActiveFARControllerNode function has three
robustness issues that need to be addressed. First, replace the strings.Contains
check when looking for the controller lease to use strings.HasPrefix or exact
matching instead, as Contains is too loose and could match unintended leases
with similar names. Second, add validation after splitting the holderIdentity by
underscore to ensure the split result has at least one element and that it's
non-empty before using parts[0] as the pod name, otherwise return an appropriate
error. Third, before returning pod.Spec.NodeName, validate that it is not an
empty string and return an error if the pod has not been scheduled yet
(indicating an empty NodeName).
In `@tests/far-operator/internal/farutils/node.go`:
- Around line 42-50: The StopKubelet function uses fragile string matching with
strings.Contains() to detect connection refused errors instead of the proper
IsConnectionRefused() function available in
vendor/k8s.io/apimachinery/pkg/util/net/util.go. Replace the string matching
check in StopKubelet with a call to IsConnectionRefused() to properly detect
connection refused errors. Additionally, review the StartKubelet function (lines
52-57) to ensure consistent error handling between both functions - consider
whether it should also handle the connection refused case using the same
IsConnectionRefused() approach to maintain consistency.
- Around line 89-138: The WaitForNodeNotReady function has inconsistent error
handling compared to WaitForNodeReady. When the client.Get call fails in
WaitForNodeNotReady, it returns the error immediately (return false, err),
terminating polling, while WaitForNodeReady suppresses the error and continues
polling (return false, nil). Change the error handling in WaitForNodeNotReady to
suppress Get errors by returning false, nil instead of false, err, making it
consistent with WaitForNodeReady and the broader codebase pattern of suppressing
transient errors during polling operations.
---
Nitpick comments:
In `@tests/far-operator/internal/farutils/harness.go`:
- Around line 244-265: The hardcoded string "fence-agents-remediation-operator"
in the ListFARControllerPodsByLabel function duplicates the same value used in
GetFARControllerPods. Extract this value as a constant in the farparams package
(e.g., ControllerPodLabelValue) and replace the hardcoded string on line 248
with a reference to this new constant. Additionally, update the
GetFARControllerPods function to use the same constant to maintain consistency
across the codebase.
In `@tests/far-operator/internal/farutils/node.go`:
- Around line 142-159: The WaitForNodeReboot function uses a hardcoded
10*time.Second poll interval in the wait.PollUntilContextTimeout call, which
differs from the standard interval used in similar functions like
WaitForNodeNotReady and WaitForNodeReady that use farparams.DefaultPollInterval.
Either replace the hardcoded interval with farparams.DefaultPollInterval for
consistency, or if the longer interval is intentional for reboot detection,
extract it as a named constant (e.g., RebootPollInterval) in the farparams
package for clarity and maintainability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6f0e31a7-be0f-4d41-af83-35cbd197b3f3
📒 Files selected for processing (3)
tests/far-operator/internal/farparams/const.gotests/far-operator/internal/farutils/harness.gotests/far-operator/internal/farutils/node.go
64248f2 to
f347cea
Compare
Add the Ginkgo scaffold for FAR destructive E2E tests with: - Serial + Ordered execution (destructive tests must not run in parallel) - BeforeAll: platform detection, AWS credential retrieval, fence_aws parameter building, FAR controller leader detection, worker node count validation (requires 3+ workers) - JustAfterEach safety net: restarts kubelet, waits for node Ready, cleans up orphaned FAR/FART CRs on test failure - Inline FAR/FART CR builders using unstructured objects - Inline delete helpers with wait-for-removal polling - Empty Context blocks for RHWA-963 (7 standalone tests) and RHWA-1035 (4 NHC+FAR interop tests) - One harness validation spec that exercises the full setup: node selection, boot ID retrieval, FART create/delete lifecycle Depends on: PR medik8s#20 (farutils harness helpers) Jira: RHWA-962 Co-Authored-By: Claude <noreply@anthropic.com>
Add the Ginkgo scaffold for FAR destructive E2E tests with: - Serial + Ordered execution (destructive tests must not run in parallel) - BeforeAll: platform detection, AWS credential retrieval, fence_aws parameter building, FAR controller leader detection, worker node count validation (requires 3+ workers) - JustAfterEach safety net: restarts kubelet, waits for node Ready, cleans up orphaned FAR/FART CRs on test failure - Inline FAR/FART CR builders using unstructured objects - Inline delete helpers with wait-for-removal polling - Empty Context blocks for RHWA-963 (7 standalone tests) and RHWA-1035 (4 NHC+FAR interop tests) - One harness validation spec that exercises the full setup: node selection, boot ID retrieval, FART create/delete lifecycle Depends on: PR medik8s#20 (farutils harness helpers) Jira: RHWA-962 Co-Authored-By: Claude <noreply@anthropic.com>
Add the Ginkgo scaffold for FAR destructive E2E tests with: - Serial + Ordered execution (destructive tests must not run in parallel) - BeforeAll: platform detection, AWS credential retrieval, fence_aws parameter building, FAR controller leader detection, worker node count validation (requires 3+ workers) - JustAfterEach safety net: restarts kubelet, waits for node Ready, cleans up orphaned FAR/FART CRs on test failure - Inline FAR/FART CR builders using unstructured objects - Inline delete helpers with wait-for-removal polling - Empty Context blocks for RHWA-963 (7 standalone tests) and RHWA-1035 (4 NHC+FAR interop tests) - One harness validation spec that exercises the full setup: node selection, boot ID retrieval, FART create/delete lifecycle Depends on: PR medik8s#20 (farutils harness helpers) Jira: RHWA-962 Co-Authored-By: Claude <noreply@anthropic.com>
Add the Ginkgo scaffold for FAR destructive E2E tests with: - Serial + Ordered execution (destructive tests must not run in parallel) - BeforeAll: platform detection, AWS credential retrieval, fence_aws parameter building, FAR controller leader detection, worker node count validation (requires 3+ workers) - JustAfterEach safety net: restarts kubelet, waits for node Ready, cleans up orphaned FAR/FART CRs on test failure - Inline FAR/FART CR builders using unstructured objects - Inline delete helpers with wait-for-removal polling - Empty Context blocks for RHWA-963 (7 standalone tests) and RHWA-1035 (4 NHC+FAR interop tests) - One harness validation spec that exercises the full setup: node selection, boot ID retrieval, FART create/delete lifecycle Depends on: PR medik8s#20 (farutils harness helpers) Jira: RHWA-962 Co-Authored-By: Claude <noreply@anthropic.com>
f347cea to
4f94147
Compare
4f94147 to
5a9beca
Compare
Add Prow-compatible node operation helpers for FAR destructive E2E tests: - farutils/node.go: RunOnNode (oc debug wrapper with context propagation and timeout), StopKubelet/StartKubelet, boot ID tracking via API and oc debug, node readiness polling (WaitForNodeNotReady, WaitForNodeReady, WaitForNodeReboot) Part 2 of 3 for RHWA-962. Depends on PR medik8s#20 (constants). Test scaffold in PR medik8s#22. Jira: RHWA-962 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add the Ginkgo test scaffold for FAR destructive E2E tests: - far_destructive.go: Serial/Ordered Describe block with BeforeAll setup (platform detection, AWS credential retrieval, node parameter building, FAR leader identification), JustAfterEach safety-net cleanup, and initial AWS fencing remediation test case Part 3 of 3 for RHWA-962. Depends on PR medik8s#20 (constants + cluster helpers) and PR medik8s#21 (node disruption helpers). Jira: RHWA-962 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5133e6f to
6389fda
Compare
Add constants and cluster-level helper functions for FAR destructive E2E tests: - farparams/const.go: timeout constants, fence agent names, AWS credential field names, CRD names, controller lease prefix - farutils/harness.go: platform detection via Infrastructure CR, AWS instance ID extraction, AWS credentials from CCO Secret, worker node selection with FAR leader exclusion, controller pod readiness check Part 1 of 3 for RHWA-962. Node disruption helpers in PR medik8s#21, test scaffold in PR medik8s#22. Jira: RHWA-962 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6389fda to
87d365c
Compare
|
/test 4.22-konflux-e2e-far-aws |
razo7
left a comment
There was a problem hiding this comment.
Looks good, but I think we have an issue with lease name
| } | ||
|
|
||
| pod := &corev1.Pod{} | ||
| if err := k8sClient.Get(ctx, client.ObjectKey{Name: podName, Namespace: medik8sparams.OperatorNs}, pod); err != nil { |
There was a problem hiding this comment.
Why did you choose "fence-agents-remediation" for the lease name?
The const is set to "fence-agents-remediation", but FAR's actual leader election lease name is "cb305759.medik8s.io" (see in cmd/main.go:120):
LeaderElectionID: "cb305759.medik8s.io",So strings.HasPrefix("cb305759.medik8s.io", "fence-agents-remediation") → false. This function will always return "FAR leader lease not found or has no holder".
IIUC this is invisible in PR #20 CI because the function isn't called yet, but it'll surface when PR #22 uses it.
| @@ -0,0 +1,280 @@ | |||
| package farutils | |||
There was a problem hiding this comment.
In the FAR repo, these helpers are split into:
test/e2e/utils/cluster.go- platform detection, credentials, node infotest/e2e/utils/command.go- controller pod management
Why harness.go? Would cluster.go (or splitting into cluster.go + controller.go) communicate the contents more clearly?
Also: some of these helpers are not FAR-specific. DetectPlatform, SelectWorkerNode, CountReadyWorkerNodes, isNodeReady would work unchanged for NHC, SNR, or any medik8s operator. Should they live in the shared tests/internal/ package instead of under farutils/?
Summary
Add timeout constants, fence agent identifiers, AWS credential field names, and cluster-level helper functions to support FAR destructive E2E tests. This is the foundation layer that subsequent PRs (#23 and #22) build upon.
Changes
tests/far-operator/internal/farparams/const.gowith: timeout constants (NodeReadyTimeout,NodeNotReadyTimeout,NodeRebootTimeout,OcDebugTimeout,FARConditionTimeout), fence agent names (fence_aws,fence_ipmilan), node identifier parameters (--plug,--ipport), AWS credential Secret/field names, and the controller lease prefixOperatorControllerPodLabelstwo-label map totests/far-operator/internal/farparams/farvars.gofor controller pod selection matching thecontrol-plane: controller-manager+app.kubernetes.io/namelabeling convention used across medik8s operatorstests/far-operator/internal/farutils/harness.gowith cluster helpers: platform detection via Infrastructure CR, AWS instance ID extraction from providerID, AWS credentials from CCO Secret, worker node selection with leader exclusion and deterministic ordering, leader controller node discovery via lease inspection, AWS node parameter map building, ready worker count, and controller pod readiness checkDependencies
Part 1 of 3 for RHWA-962. No dependencies. Node disruption helpers in #23, test scaffold in #22.
Merge order: this PR first, then #23, then #22.
Jira: RHWA-962