Skip to content

Conversation

@elgnay
Copy link
Contributor

@elgnay elgnay commented Oct 31, 2025

Summary

Related issue(s)

Fixes #

Summary by CodeRabbit

  • Bug Fixes

    • Improved lease renewal timing: cluster status now syncs based on actual lease expiration state, ensuring timely synchronization for fresh leases and proper status updates for expired ones.
  • Tests

    • Added comprehensive test coverage for lease renewal timing verification.

@openshift-ci openshift-ci bot requested review from ldpliu and xuezhaojun October 31, 2025 03:29
@elgnay
Copy link
Contributor Author

elgnay commented Oct 31, 2025

/hold

@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.15%. Comparing base (5b1beb6) to head (9a3d4eb).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1236      +/-   ##
==========================================
+ Coverage   61.30%   62.15%   +0.84%     
==========================================
  Files         209      209              
  Lines       20943    16968    -3975     
==========================================
- Hits        12840    10547    -2293     
+ Misses       6986     5304    -1682     
  Partials     1117     1117              
Flag Coverage Δ
unit 62.15% <100.00%> (+0.84%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@elgnay elgnay force-pushed the fix-offline-delay branch from 9de9989 to 9a3d4eb Compare October 31, 2025 03:40
@coderabbitai
Copy link

coderabbitai bot commented Oct 31, 2025

Walkthrough

The lease controller's requeue logic is modified from unconditional requeue-after-grace-period to conditional requeue: expired leases trigger status update and requeue after grace period, while fresh leases requeue after calculated time-until-expiry. Tests validate requeue timing across lease expiry, grace period, and renewal scenarios.

Changes

Cohort / File(s) Summary
Lease Controller Requeue Logic
pkg/registration/hub/lease/controller.go, pkg/registration/hub/lease/controller_test.go
Modified requeue behavior from unconditional grace-period-based to conditional logic: expired leases update cluster status to unknown and requeue after grace period; fresh leases requeue after exact time-until-expiry. Added test scaffolding with spyQueue wrapper, testSyncContext, and TestRequeueTime cases to validate requeue delays against expected min/max ranges across lease lifecycle scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

  • Controller logic changes introduce conditional branching with time calculation (RenewTime + gracePeriod - now); requires verification of edge cases and correctness of expiry detection
  • Test additions include custom queue wrapping, helper functions, and multiple per-case test scenarios that need validation against controller behavior
  • Changes are focused within a single component with clear scope, reducing heterogeneity

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is incomplete and consists only of the template structure without substantive content. The Summary section is entirely empty, providing no explanation of what the pull request does or why these changes are needed. The Related issue(s) section contains only the placeholder text "Fixes #" with no issue number filled in. Both of these are required sections according to the repository template, and their absence leaves critical information about the PR's purpose and context missing. The author should fill in the Summary section with a clear explanation of the changes and their rationale, such as describing the optimization of conditional requeue logic based on lease expiry state. Additionally, the Related issue(s) section should be completed by either providing a specific issue number that this PR addresses (e.g., "Fixes #1234") or removing the line if no related issue exists. The description should match the level of detail and structure defined in the repository's PR template.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "optimize the requeue timing for lease controller" accurately reflects the main change in the changeset. The modifications in the controller replace an unconditional requeue mechanism with a conditional one that requeues based on lease expiry state, which is precisely an optimization of requeue timing. The title is concise, clear, and specific enough that it meaningfully conveys the primary change to someone reviewing the repository history. The emoji usage (🌱) correctly follows the template guidelines for miscellaneous/other changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@elgnay
Copy link
Contributor Author

elgnay commented Oct 31, 2025

/hold cancel

Copy link
Member

@qiujian16 qiujian16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 31, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elgnay, qiujian16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/registration/hub/lease/controller.go (1)

129-140: Excellent optimization of requeue timing!

The conditional requeue logic correctly handles both scenarios:

  • Expired leases requeue after grace period, allowing immediate recovery detection via watch events
  • Fresh leases requeue exactly at expiry time for immediate detection

This optimization reduces unnecessary reconciliations while maintaining responsiveness.

Consider adding a defensive check to ensure timeUntilExpiry is positive, though the conditional makes this highly unlikely:

 	} else {
 		// Lease is fresh, requeue exactly when it will expire to detect expiration immediately
 		timeUntilExpiry := observedLease.Spec.RenewTime.Add(gracePeriod).Sub(now)
+		if timeUntilExpiry <= 0 {
+			timeUntilExpiry = time.Second // Minimum requeue delay
+		}
 		syncCtx.Queue().AddAfter(clusterName, timeUntilExpiry)
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3722a00 and 9a3d4eb.

📒 Files selected for processing (2)
  • pkg/registration/hub/lease/controller.go (1 hunks)
  • pkg/registration/hub/lease/controller_test.go (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-23T02:18:08.604Z
Learnt from: xuezhaojun
Repo: open-cluster-management-io/ocm PR: 1191
File: pkg/registration/spoke/registration/hub_timeout_controller.go:60-61
Timestamp: 2025-09-23T02:18:08.604Z
Learning: In Kubernetes coordination/v1 API, lease.Spec.RenewTime is a *metav1.MicroTime (pointer), not a direct time.Time. While .Time is indeed a time.Time field, accessing .Time on a nil pointer will panic, so nil checks are necessary when dereferencing lease.Spec.RenewTime.Time.

Applied to files:

  • pkg/registration/hub/lease/controller.go
  • pkg/registration/hub/lease/controller_test.go
📚 Learning: 2025-07-01T05:55:56.502Z
Learnt from: skeeey
Repo: open-cluster-management-io/ocm PR: 1058
File: pkg/server/services/lease/lease.go:98-121
Timestamp: 2025-07-01T05:55:56.502Z
Learning: In the Open Cluster Management lease service, deletion handling is not required. The LeaseService intentionally omits DeleteFunc in EventHandlerFuncs as lease deletion events are not part of the service's expected functionality.

Applied to files:

  • pkg/registration/hub/lease/controller.go
🧬 Code graph analysis (1)
pkg/registration/hub/lease/controller_test.go (6)
vendor/open-cluster-management.io/api/cluster/v1/types.go (2)
  • ManagedCluster (34-44)
  • ManagedClusterSpec (48-77)
pkg/registration/helpers/testing/testinghelpers.go (3)
  • NewAvailableManagedCluster (70-81)
  • NewManagedClusterLease (180-190)
  • TestManagedClusterName (38-38)
pkg/registration/hub/lease/controller.go (1)
  • LeaseDurationSeconds (34-34)
pkg/common/helpers/event_recorder.go (1)
  • NewEventRecorder (12-22)
vendor/open-cluster-management.io/addon-framework/pkg/addonmanager/addontesting/helpers.go (1)
  • NewTestingEventRecorder (318-320)
vendor/open-cluster-management.io/sdk-go/pkg/patcher/patcher.go (1)
  • NewPatcher (51-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: integration
  • GitHub Check: e2e-singleton
  • GitHub Check: e2e-hosted
  • GitHub Check: e2e
🔇 Additional comments (2)
pkg/registration/hub/lease/controller_test.go (2)

209-237: Well-designed test infrastructure for timing validation!

The test scaffolding effectively captures requeue timing:

  • spyQueue wraps the real queue to record AddAfter calls without disrupting queue behavior
  • testSyncContext provides a minimal, testable implementation of factory.SyncContext
  • newManagedClusterWithLeaseDuration helper simplifies test case setup

This infrastructure enables precise validation of the optimized requeue logic.


239-332: Comprehensive test coverage with correct timing calculations!

The test cases thoroughly validate the requeue optimization:

  1. Expired lease case: Verifies gracePeriod (300s) requeue for recovery detection
  2. Near-expiry lease case: Verifies timeUntilExpiry (60s) requeue for immediate detection
  3. Fresh lease case: Verifies precise timeUntilExpiry (270s) requeue

The math is correct for all scenarios, and the ±5 second tolerance ranges appropriately handle timing variability. Using the package-level now variable (line 30) ensures consistent timestamps across tests.


now := time.Now()
if !now.Before(observedLease.Spec.RenewTime.Add(gracePeriod)) {
leaseExpired := !now.Before(observedLease.Spec.RenewTime.Add(gracePeriod))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Add nil check for RenewTime to prevent potential panic.

Based on learnings, observedLease.Spec.RenewTime is a *metav1.MicroTime pointer that can be nil. Dereferencing it without a nil check will cause a panic if the lease comes from an external source without RenewTime set.

Apply this diff to add a defensive nil check:

 	now := time.Now()
+	if observedLease.Spec.RenewTime == nil {
+		// Lease has no RenewTime, treat as expired and update status
+		if err := c.updateClusterStatus(ctx, cluster); err != nil {
+			return err
+		}
+		syncCtx.Queue().AddAfter(clusterName, gracePeriod)
+		return nil
+	}
+
 	leaseExpired := !now.Before(observedLease.Spec.RenewTime.Add(gracePeriod))

Run the following script to verify if there are other locations in the codebase that safely handle nil RenewTime:

Based on learnings


🏁 Script executed:

#!/bin/bash
# Description: Search for RenewTime usage patterns to verify nil handling practices

# Search for RenewTime access patterns in lease-related code
rg -n -C3 'RenewTime' --type=go -g '*lease*.go' -g '*registration*.go'

Length of output: 20258


Now let me verify the actual code in the reviewed file:


🏁 Script executed:

#!/bin/bash
# Inspect the reviewed file to confirm the exact code at line 127
fd 'controller.go' -g '*hub/lease*' --exec cat {} \;

Length of output: 98


🏁 Script executed:

#!/bin/bash
# Find the controller file in hub/lease directory
find . -path '*hub/lease/controller.go' -type f | head -1 | xargs -I {} cat {}

Length of output: 6293


Add nil check for RenewTime to prevent potential panic.

The code at line 127 (and line 134) dereferences observedLease.Spec.RenewTime without checking for nil. Since RenewTime is an optional field in the Kubernetes Lease API (*metav1.MicroTime pointer), it can be nil when leases are retrieved from external sources. The Kubernetes client library itself explicitly checks for nil before dereferencing (see vendor/k8s.io/client-go/tools/leaderelection/resourcelock/leaselock.go:122-124). This will panic at runtime if RenewTime is nil.

Apply this defensive nil check:

 	now := time.Now()
+	if observedLease.Spec.RenewTime == nil {
+		// Lease has no RenewTime, treat as expired and update status
+		if err := c.updateClusterStatus(ctx, cluster); err != nil {
+			return err
+		}
+		syncCtx.Queue().AddAfter(clusterName, gracePeriod)
+		return nil
+	}
+
 	leaseExpired := !now.Before(observedLease.Spec.RenewTime.Add(gracePeriod))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In pkg/registration/hub/lease/controller.go around lines 127 and 134, the code
dereferences observedLease.Spec.RenewTime without a nil check which can panic
because RenewTime is a *metav1.MicroTime; add a defensive nil check before any
dereference. Concretely, if observedLease.Spec.RenewTime == nil treat the lease
as expired (or otherwise handle as "no renew time") and set leaseExpired
accordingly, and for the later use at line 134 avoid dereferencing RenewTime
when nil (use a safe branch or a zero time fallback) so no panic occurs.

@openshift-merge-bot openshift-merge-bot bot merged commit 1384645 into open-cluster-management-io:main Oct 31, 2025
18 checks passed
elgnay added a commit to elgnay/ocm that referenced this pull request Nov 5, 2025
openshift-merge-bot bot pushed a commit to stolostron/ocm that referenced this pull request Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants