Skip to content

OCPEDGE-2705: Add TNF Pacemaker metrics controller#1634

Draft
fonta-rh wants to merge 2 commits into
openshift:mainfrom
fonta-rh:ocpedge-2705-metrics-controller
Draft

OCPEDGE-2705: Add TNF Pacemaker metrics controller#1634
fonta-rh wants to merge 2 commits into
openshift:mainfrom
fonta-rh:ocpedge-2705-metrics-controller

Conversation

@fonta-rh

@fonta-rh fonta-rh commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds a metrics controller that projects PacemakerCluster CR conditions into Prometheus gauges (20 metric families, ~53 series on a 2-node cluster)
  • Shares the existing pacemakerInformer with HealthCheck — zero additional API server load
  • Uses table-driven sync logic with GaugeVec.Reset() to prevent stale data from removed nodes

Details

New package pkg/tnf/pkg/metriccontroller/ with three files:

File Purpose
metrics.go 20 metric definitions (3 cluster, 9 node, 8 resource), mapping tables, conditionToFloat64 helper
controller.go Controller struct, constructor with metrics.KubeRegistry injection, sync logic
controller_test.go 11 test cases covering healthy cluster, node offline, resource stopped, missing CR, nil nodes, missing condition, node removal, multi-resource, label correctness, idempotency, and edge cases

Wiring in pkg/tnf/operator/starter.go — controller created after HealthCheck, sharing pacemakerInformer.

Key design decisions

  • Registry injection for test isolation (matches EtcdCertSignerController pattern)
  • Missing CR → gauges reset to 0 — makes "unhealthy" the safe default for alerting rules (== 0 vs absent())
  • No operator condition — failure modes overlap with HealthCheck; self-health metric deferred to OCPEDGE-2707
  • StabilityLevel: metrics.ALPHA — explicit on all metrics

Metrics exposed

Level Count Type Labels
Cluster 3 Gauge none
Node 9 GaugeVec node
Resource 8 GaugeVec node, resource

Test plan

  • go build ./... — all binaries compile
  • go test ./pkg/tnf/pkg/metriccontroller/ -v -count=5 — 55/55 pass, no registration pollution
  • go test ./pkg/tnf/... -count=1 — 454 existing TNF tests pass (no regressions)
  • make build — clean
  • make test-unit — all tests pass with -race
  • E2E: Deploy on TNF cluster, verify metrics appear on :8443/metrics

Related

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added Prometheus metrics for PacemakerCluster, including cluster conditions, per-node online status, and per-resource state.
    • Metrics are maintained by a dedicated controller and reset appropriately when the cluster object or node details are unavailable.
  • Tests

    • Added unit tests validating metric outputs across healthy, unhealthy/offline, stopped/started, missing data, label correctness, node removal, and Sync idempotency.
    • Updated health check tests to use shared fake informer utilities.

Implement a metrics controller that projects PacemakerCluster CR
conditions into Prometheus gauges for the TNF monitoring pipeline.

The controller shares the existing pacemakerInformer with HealthCheck,
registers 20 metric families (~53 series on a 2-node cluster), and
uses table-driven sync logic with GaugeVec.Reset() to prevent stale
data from removed nodes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 18, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 18, 2026

Copy link
Copy Markdown

@fonta-rh: This pull request references OCPEDGE-2705 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Adds a metrics controller that projects PacemakerCluster CR conditions into Prometheus gauges (20 metric families, ~53 series on a 2-node cluster)
  • Shares the existing pacemakerInformer with HealthCheck — zero additional API server load
  • Uses table-driven sync logic with GaugeVec.Reset() to prevent stale data from removed nodes

Details

New package pkg/tnf/pkg/metriccontroller/ with three files:

File Purpose
metrics.go 20 metric definitions (3 cluster, 9 node, 8 resource), mapping tables, conditionToFloat64 helper
controller.go Controller struct, constructor with metrics.KubeRegistry injection, sync logic
controller_test.go 11 test cases covering healthy cluster, node offline, resource stopped, missing CR, nil nodes, missing condition, node removal, multi-resource, label correctness, idempotency, and edge cases

Wiring in pkg/tnf/operator/starter.go — controller created after HealthCheck, sharing pacemakerInformer.

Key design decisions

  • Registry injection for test isolation (matches EtcdCertSignerController pattern)
  • Missing CR → gauges reset to 0 — makes "unhealthy" the safe default for alerting rules (== 0 vs absent())
  • No operator condition — failure modes overlap with HealthCheck; self-health metric deferred to OCPEDGE-2707
  • StabilityLevel: metrics.ALPHA — explicit on all metrics

Metrics exposed

Level Count Type Labels
Cluster 3 Gauge none
Node 9 GaugeVec node
Resource 8 GaugeVec node, resource

Test plan

  • go build ./... — all binaries compile
  • go test ./pkg/tnf/pkg/metriccontroller/ -v -count=5 — 55/55 pass, no registration pollution
  • go test ./pkg/tnf/... -count=1 — 454 existing TNF tests pass (no regressions)
  • make build — clean
  • make test-unit — all tests pass with -race
  • E2E: Deploy on TNF cluster, verify metrics appear on :8443/metrics

Related

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Walkthrough

A new metriccontroller package is added that exposes Prometheus gauges and gauge vectors tracking PacemakerCluster conditions at cluster, node, and resource levels. The controller syncs from a shared informer, resets metrics when the CR is absent, and updates labeled series per node and resource. It is wired into the operator startup alongside the existing Pacemaker healthcheck controller. A shared test utility for fake informers is introduced and reused across test files.

Changes

Pacemaker Metrics Controller

Layer / File(s) Summary
Prometheus metric definitions and helpers
pkg/tnf/pkg/metriccontroller/metrics.go
Declares cluster-level Gauge instances, node-level and resource-level GaugeVec instances, builds ordered mapping tables from Pacemaker condition type constants to those gauges, and adds conditionToFloat64 to convert *metav1.Condition to 1.0/0.0.
Metrics controller implementation
pkg/tnf/pkg/metriccontroller/controller.go
Defines pacemakerMetricsController struct and NewPacemakerMetricsController constructor that registers gauges against the provided KubeRegistry and wires the factory controller. The sync method fetches PacemakerCluster from the informer store, calls resetAllMetrics when absent, or iterates cluster/node/resource conditions to update all gauge families.
Operator startup integration
pkg/tnf/operator/starter.go
Adds imports for component-base/metrics, the legacy registry, and the new metriccontroller package; extends the Pacemaker startup block to construct and run the metrics controller in a background goroutine reusing the existing pacemakerInformer and event recorder.
Test utilities: fake informer
pkg/tnf/internal/testutil/fakeinformer.go
Introduces CreateFakeInformer factory and fakeInformer struct implementing cache.SharedIndexInformer. Provides a test-friendly informer backed by an in-memory cache store, enabling controller tests to populate CR data without a real Kubernetes client.
Comprehensive controller test suite
pkg/tnf/pkg/metriccontroller/controller_test.go
Provides PacemakerCluster CR builders with condition-mutator helpers and setupAndSync helper. Includes eleven test functions covering healthy clusters, per-node/resource condition toggles, missing CR reset, nil Nodes pointer, missing conditions, node removal between syncs, multiple resources per node, label correctness, sync idempotency, and conditionToFloat64 conversion.
Existing test refactoring
pkg/tnf/pkg/pacemaker/healthcheck_test.go
Replaces local fake informer implementation with testutil.CreateFakeInformer. Removes unused fmt import and updates createTestHealthCheck and createTestHealthCheckWithMockStatus to use the shared utility.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test assertions lack meaningful failure messages in 6 locations: 5 require.True() without messages (lines 206-210 in TestHealthyCluster) and 1 require.Equal() without message (line 394 in TestSyncI... Add failure messages to all assertions: require.True(t, condition, "description") and require.Equal(t, expected, actual, "description") to help diagnose test failures.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'OCPEDGE-2705: Add TNF Pacemaker metrics controller' clearly and specifically describes the main change: adding a metrics controller for TNF Pacemaker, as confirmed by the comprehensive PR objectives and file changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The custom check is not applicable; the PR uses standard Go testing (func TestXxx), not Ginkgo. All 11 test names are static and deterministic with no dynamic information.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. All new tests in controller_test.go are standard Go unit tests using testing.T, not Ginkgo-based e2e tests.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR contains only standard Go unit tests (not Ginkgo e2e tests), so SNO compatibility check does not apply. New tests are in controller_test.go with Test* signatures and testing.T framework.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces only a metrics controller for Prometheus gauge collection from PacemakerCluster CRs. No deployment manifests, pod specs, affinity rules, nodeSelectors, tolerations, or topology-depend...
Ote Binary Stdout Contract ✅ Passed No stdout writes in process-level code. NewPacemakerMetricsController called inside background goroutine, not process-level code. No init() functions, no fmt.Print calls, only safe metric initializ...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests are added. The PR only adds standard Go unit tests in controller_test.go using func Test...(*testing.T) pattern. Check is not applicable.
No-Weak-Crypto ✅ Passed No weak cryptographic patterns detected. PR adds metrics controller and test utilities with no crypto usage, imports, or secret comparisons.
Container-Privileges ✅ Passed PR introduces metrics controller in Go source/tests only; no Kubernetes manifests or container security configurations with privilege escalation settings are added or modified.
No-Sensitive-Data-In-Logs ✅ Passed All logging in the PR uses safe, non-sensitive data. The metrics controller logs only operational states (syncing metrics, node count) at debug verbosity (klog.V(4)), never exposing node names, res...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from eggfoobar and mshitrit June 18, 2026 14:31
@openshift-ci

openshift-ci Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign eggfoobar for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fonta-rh fonta-rh marked this pull request as draft June 18, 2026 14:31
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/tnf/pkg/metriccontroller/controller_test.go (1)

21-68: 💤 Low value

Consider extracting fakeInformer to a shared test utility.

The comment on line 22 notes this is copied from pacemaker/healthcheck_test.go. To reduce duplication, consider extracting this to a shared test helper package (e.g., pkg/tnf/pkg/testutil) so both test files can reuse the same implementation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/tnf/pkg/metriccontroller/controller_test.go` around lines 21 - 68, The
fakeInformer struct and createFakeInformer function are duplicated between
controller_test.go and pacemaker/healthcheck_test.go as noted in the comment.
Extract both the createFakeInformer function and the fakeInformer type
definition (along with all its method implementations) to a shared test utility
package such as pkg/tnf/pkg/testutil. Update both test files to import and use
the shared implementation instead of maintaining duplicate code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/tnf/operator/starter.go`:
- Around line 320-328: The type assertion on legacyregistry.DefaultGatherer to
metrics.KubeRegistry is unsafe and will cause a panic if the assertion fails at
runtime. Replace the bare type assertion with the comma-ok idiom by capturing
both the converted value and a boolean success indicator, then check the boolean
before proceeding. If the assertion fails, handle the error appropriately (e.g.,
log an error and return or exit) before creating the metricsController with
NewPacemakerMetricsController.

---

Nitpick comments:
In `@pkg/tnf/pkg/metriccontroller/controller_test.go`:
- Around line 21-68: The fakeInformer struct and createFakeInformer function are
duplicated between controller_test.go and pacemaker/healthcheck_test.go as noted
in the comment. Extract both the createFakeInformer function and the
fakeInformer type definition (along with all its method implementations) to a
shared test utility package such as pkg/tnf/pkg/testutil. Update both test files
to import and use the shared implementation instead of maintaining duplicate
code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6cfb780f-0e07-4d69-909e-d7f8d653442a

📥 Commits

Reviewing files that changed from the base of the PR and between f38807a and 383e514.

📒 Files selected for processing (4)
  • pkg/tnf/operator/starter.go
  • pkg/tnf/pkg/metriccontroller/controller.go
  • pkg/tnf/pkg/metriccontroller/controller_test.go
  • pkg/tnf/pkg/metriccontroller/metrics.go

Comment on lines +320 to +328
// Create and start the metrics controller, sharing the same informer
klog.Infof("creating Pacemaker metrics controller")
metricsController := metriccontroller.NewPacemakerMetricsController(
pacemakerInformer,
controllerContext.EventRecorder,
legacyregistry.DefaultGatherer.(metrics.KubeRegistry),
)
klog.Infof("starting Pacemaker metrics controller")
go metricsController.Run(ctx, 1)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Bare type assertion can panic the operator.

Line 325 uses an unsafe type assertion without checking success. If legacyregistry.DefaultGatherer does not implement metrics.KubeRegistry at runtime, the operator will panic and crash.

🔒 Proposed fix using comma-ok idiom
 	// Create and start the metrics controller, sharing the same informer
 	klog.Infof("creating Pacemaker metrics controller")
+	registry, ok := legacyregistry.DefaultGatherer.(metrics.KubeRegistry)
+	if !ok {
+		klog.Errorf("failed to cast DefaultGatherer to KubeRegistry, metrics controller will not start")
+		return
+	}
 	metricsController := metriccontroller.NewPacemakerMetricsController(
 		pacemakerInformer,
 		controllerContext.EventRecorder,
-		legacyregistry.DefaultGatherer.(metrics.KubeRegistry),
+		registry,
 	)
 	klog.Infof("starting Pacemaker metrics controller")
 	go metricsController.Run(ctx, 1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Create and start the metrics controller, sharing the same informer
klog.Infof("creating Pacemaker metrics controller")
metricsController := metriccontroller.NewPacemakerMetricsController(
pacemakerInformer,
controllerContext.EventRecorder,
legacyregistry.DefaultGatherer.(metrics.KubeRegistry),
)
klog.Infof("starting Pacemaker metrics controller")
go metricsController.Run(ctx, 1)
// Create and start the metrics controller, sharing the same informer
klog.Infof("creating Pacemaker metrics controller")
registry, ok := legacyregistry.DefaultGatherer.(metrics.KubeRegistry)
if !ok {
klog.Errorf("failed to cast DefaultGatherer to KubeRegistry, metrics controller will not start")
return
}
metricsController := metriccontroller.NewPacemakerMetricsController(
pacemakerInformer,
controllerContext.EventRecorder,
registry,
)
klog.Infof("starting Pacemaker metrics controller")
go metricsController.Run(ctx, 1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/tnf/operator/starter.go` around lines 320 - 328, The type assertion on
legacyregistry.DefaultGatherer to metrics.KubeRegistry is unsafe and will cause
a panic if the assertion fails at runtime. Replace the bare type assertion with
the comma-ok idiom by capturing both the converted value and a boolean success
indicator, then check the boolean before proceeding. If the assertion fails,
handle the error appropriately (e.g., log an error and return or exit) before
creating the metricsController with NewPacemakerMetricsController.

Both metriccontroller and pacemaker tests had identical 47-line
fakeInformer implementations. Move to pkg/tnf/internal/testutil/
so future interface changes only need one update.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/tnf/internal/testutil/fakeinformer.go`:
- Around line 19-21: The `store.Add(obj)` call in the function containing the if
statement is suppressing the error with the blank identifier (_), which can hide
test setup failures. Modify the function signature to return an error type, and
instead of discarding the error, return it if `store.Add(obj)` fails. Then
update all five call sites of this function across controller_test.go and
healthcheck_test.go to handle the returned error appropriately by checking for
nil and failing the test if an error is returned.

In `@pkg/tnf/pkg/pacemaker/healthcheck_test.go`:
- Around line 53-59: The test setup in createTestHealthCheckWithMockStatus has
an implicit coupling between how the fake informer indexes objects (by
PacemakerCluster.Name) and how getPacemakerStatus fetches them (by
PacemakerClusterResourceName constant). Make this contract explicit by setting
mockStatus.Name to match the expected lookup key before passing it to
testutil.CreateFakeInformer(mockStatus), ensuring the mock object can be found
when getPacemakerStatus attempts to retrieve it by its fixed resource name.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9c1e7295-27cc-4c14-9264-76fe10a05aa9

📥 Commits

Reviewing files that changed from the base of the PR and between 383e514 and 17f75ce.

📒 Files selected for processing (3)
  • pkg/tnf/internal/testutil/fakeinformer.go
  • pkg/tnf/pkg/metriccontroller/controller_test.go
  • pkg/tnf/pkg/pacemaker/healthcheck_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/tnf/pkg/metriccontroller/controller_test.go

Comment on lines +19 to +21
if obj != nil {
_ = store.Add(obj)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify all call sites are updated if CreateFakeInformer signature changes.
rg -n -C2 '\bCreateFakeInformer\s*\(' --type go

Repository: openshift/cluster-etcd-operator

Length of output: 3025


🏁 Script executed:

cat -n pkg/tnf/internal/testutil/fakeinformer.go

Repository: openshift/cluster-etcd-operator

Length of output: 2666


🏁 Script executed:

# Check if the function signature has already been changed to return error
rg -A 15 'func CreateFakeInformer' pkg/tnf/internal/testutil/fakeinformer.go

Repository: openshift/cluster-etcd-operator

Length of output: 489


Handle store.Add errors instead of discarding them

Line 20 suppresses the error from store.Add(obj) with _, which can silently hide test setup failures. Per coding guidelines, "Never ignore error returns."

The fix requires changing the function signature to return an error and updating all 5 call sites in controller_test.go and healthcheck_test.go to handle the new error return.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/tnf/internal/testutil/fakeinformer.go` around lines 19 - 21, The
`store.Add(obj)` call in the function containing the if statement is suppressing
the error with the blank identifier (_), which can hide test setup failures.
Modify the function signature to return an error type, and instead of discarding
the error, return it if `store.Add(obj)` fails. Then update all five call sites
of this function across controller_test.go and healthcheck_test.go to handle the
returned error appropriately by checking for nil and failing the test if an
error is returned.

Source: Coding guidelines

Comment on lines 53 to 59
func createTestHealthCheckWithMockStatus(t *testing.T, mockStatus *pacmkrv1.PacemakerCluster) *HealthCheck {
return &HealthCheck{
operatorClient: createFakeOperatorClient(),
kubeClient: fake.NewSimpleClientset(),
eventRecorder: events.NewInMemoryRecorder("test", clock.RealClock{}),
pacemakerInformer: createFakeInformer(mockStatus),
pacemakerInformer: testutil.CreateFakeInformer(mockStatus),
recordedEvents: make(map[string]time.Time),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the informer key contract explicit in test setup

CreateFakeInformer indexes by PacemakerCluster.Name, while getPacemakerStatus fetches with PacemakerClusterResourceName (cluster). Set the fixture name here to avoid implicit coupling and brittle tests.

Suggested hardening
 func createTestHealthCheckWithMockStatus(t *testing.T, mockStatus *pacmkrv1.PacemakerCluster) *HealthCheck {
+	status := mockStatus.DeepCopy()
+	if status.Name == "" {
+		status.Name = PacemakerClusterResourceName
+	}
 	return &HealthCheck{
 		operatorClient:    createFakeOperatorClient(),
 		kubeClient:        fake.NewSimpleClientset(),
 		eventRecorder:     events.NewInMemoryRecorder("test", clock.RealClock{}),
-		pacemakerInformer: testutil.CreateFakeInformer(mockStatus),
+		pacemakerInformer: testutil.CreateFakeInformer(status),
 		recordedEvents:    make(map[string]time.Time),
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func createTestHealthCheckWithMockStatus(t *testing.T, mockStatus *pacmkrv1.PacemakerCluster) *HealthCheck {
return &HealthCheck{
operatorClient: createFakeOperatorClient(),
kubeClient: fake.NewSimpleClientset(),
eventRecorder: events.NewInMemoryRecorder("test", clock.RealClock{}),
pacemakerInformer: createFakeInformer(mockStatus),
pacemakerInformer: testutil.CreateFakeInformer(mockStatus),
recordedEvents: make(map[string]time.Time),
func createTestHealthCheckWithMockStatus(t *testing.T, mockStatus *pacmkrv1.PacemakerCluster) *HealthCheck {
status := mockStatus.DeepCopy()
if status.Name == "" {
status.Name = PacemakerClusterResourceName
}
return &HealthCheck{
operatorClient: createFakeOperatorClient(),
kubeClient: fake.NewSimpleClientset(),
eventRecorder: events.NewInMemoryRecorder("test", clock.RealClock{}),
pacemakerInformer: testutil.CreateFakeInformer(status),
recordedEvents: make(map[string]time.Time),
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/tnf/pkg/pacemaker/healthcheck_test.go` around lines 53 - 59, The test
setup in createTestHealthCheckWithMockStatus has an implicit coupling between
how the fake informer indexes objects (by PacemakerCluster.Name) and how
getPacemakerStatus fetches them (by PacemakerClusterResourceName constant). Make
this contract explicit by setting mockStatus.Name to match the expected lookup
key before passing it to testutil.CreateFakeInformer(mockStatus), ensuring the
mock object can be found when getPacemakerStatus attempts to retrieve it by its
fixed resource name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants