-
Notifications
You must be signed in to change notification settings - Fork 115
🌱 optimize the requeue timing for lease controller #1236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 optimize the requeue timing for lease controller #1236
Conversation
|
/hold |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1236 +/- ##
==========================================
+ Coverage 61.30% 62.15% +0.84%
==========================================
Files 209 209
Lines 20943 16968 -3975
==========================================
- Hits 12840 10547 -2293
+ Misses 6986 5304 -1682
Partials 1117 1117
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Yang Le <[email protected]>
9de9989 to
9a3d4eb
Compare
WalkthroughThe lease controller's requeue logic is modified from unconditional requeue-after-grace-period to conditional requeue: expired leases trigger status update and requeue after grace period, while fresh leases requeue after calculated time-until-expiry. Tests validate requeue timing across lease expiry, grace period, and renewal scenarios. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–25 minutes
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/hold cancel |
qiujian16
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elgnay, qiujian16 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/registration/hub/lease/controller.go (1)
129-140: Excellent optimization of requeue timing!The conditional requeue logic correctly handles both scenarios:
- Expired leases requeue after grace period, allowing immediate recovery detection via watch events
- Fresh leases requeue exactly at expiry time for immediate detection
This optimization reduces unnecessary reconciliations while maintaining responsiveness.
Consider adding a defensive check to ensure
timeUntilExpiryis positive, though the conditional makes this highly unlikely:} else { // Lease is fresh, requeue exactly when it will expire to detect expiration immediately timeUntilExpiry := observedLease.Spec.RenewTime.Add(gracePeriod).Sub(now) + if timeUntilExpiry <= 0 { + timeUntilExpiry = time.Second // Minimum requeue delay + } syncCtx.Queue().AddAfter(clusterName, timeUntilExpiry) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pkg/registration/hub/lease/controller.go(1 hunks)pkg/registration/hub/lease/controller_test.go(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-23T02:18:08.604Z
Learnt from: xuezhaojun
Repo: open-cluster-management-io/ocm PR: 1191
File: pkg/registration/spoke/registration/hub_timeout_controller.go:60-61
Timestamp: 2025-09-23T02:18:08.604Z
Learning: In Kubernetes coordination/v1 API, lease.Spec.RenewTime is a *metav1.MicroTime (pointer), not a direct time.Time. While .Time is indeed a time.Time field, accessing .Time on a nil pointer will panic, so nil checks are necessary when dereferencing lease.Spec.RenewTime.Time.
Applied to files:
pkg/registration/hub/lease/controller.gopkg/registration/hub/lease/controller_test.go
📚 Learning: 2025-07-01T05:55:56.502Z
Learnt from: skeeey
Repo: open-cluster-management-io/ocm PR: 1058
File: pkg/server/services/lease/lease.go:98-121
Timestamp: 2025-07-01T05:55:56.502Z
Learning: In the Open Cluster Management lease service, deletion handling is not required. The LeaseService intentionally omits DeleteFunc in EventHandlerFuncs as lease deletion events are not part of the service's expected functionality.
Applied to files:
pkg/registration/hub/lease/controller.go
🧬 Code graph analysis (1)
pkg/registration/hub/lease/controller_test.go (6)
vendor/open-cluster-management.io/api/cluster/v1/types.go (2)
ManagedCluster(34-44)ManagedClusterSpec(48-77)pkg/registration/helpers/testing/testinghelpers.go (3)
NewAvailableManagedCluster(70-81)NewManagedClusterLease(180-190)TestManagedClusterName(38-38)pkg/registration/hub/lease/controller.go (1)
LeaseDurationSeconds(34-34)pkg/common/helpers/event_recorder.go (1)
NewEventRecorder(12-22)vendor/open-cluster-management.io/addon-framework/pkg/addonmanager/addontesting/helpers.go (1)
NewTestingEventRecorder(318-320)vendor/open-cluster-management.io/sdk-go/pkg/patcher/patcher.go (1)
NewPatcher(51-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: integration
- GitHub Check: e2e-singleton
- GitHub Check: e2e-hosted
- GitHub Check: e2e
🔇 Additional comments (2)
pkg/registration/hub/lease/controller_test.go (2)
209-237: Well-designed test infrastructure for timing validation!The test scaffolding effectively captures requeue timing:
spyQueuewraps the real queue to recordAddAftercalls without disrupting queue behaviortestSyncContextprovides a minimal, testable implementation offactory.SyncContextnewManagedClusterWithLeaseDurationhelper simplifies test case setupThis infrastructure enables precise validation of the optimized requeue logic.
239-332: Comprehensive test coverage with correct timing calculations!The test cases thoroughly validate the requeue optimization:
- Expired lease case: Verifies gracePeriod (300s) requeue for recovery detection
- Near-expiry lease case: Verifies timeUntilExpiry (60s) requeue for immediate detection
- Fresh lease case: Verifies precise timeUntilExpiry (270s) requeue
The math is correct for all scenarios, and the ±5 second tolerance ranges appropriately handle timing variability. Using the package-level
nowvariable (line 30) ensures consistent timestamps across tests.
|
|
||
| now := time.Now() | ||
| if !now.Before(observedLease.Spec.RenewTime.Add(gracePeriod)) { | ||
| leaseExpired := !now.Before(observedLease.Spec.RenewTime.Add(gracePeriod)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Add nil check for RenewTime to prevent potential panic.
Based on learnings, observedLease.Spec.RenewTime is a *metav1.MicroTime pointer that can be nil. Dereferencing it without a nil check will cause a panic if the lease comes from an external source without RenewTime set.
Apply this diff to add a defensive nil check:
now := time.Now()
+ if observedLease.Spec.RenewTime == nil {
+ // Lease has no RenewTime, treat as expired and update status
+ if err := c.updateClusterStatus(ctx, cluster); err != nil {
+ return err
+ }
+ syncCtx.Queue().AddAfter(clusterName, gracePeriod)
+ return nil
+ }
+
leaseExpired := !now.Before(observedLease.Spec.RenewTime.Add(gracePeriod))Run the following script to verify if there are other locations in the codebase that safely handle nil RenewTime:
Based on learnings
🏁 Script executed:
#!/bin/bash
# Description: Search for RenewTime usage patterns to verify nil handling practices
# Search for RenewTime access patterns in lease-related code
rg -n -C3 'RenewTime' --type=go -g '*lease*.go' -g '*registration*.go'Length of output: 20258
Now let me verify the actual code in the reviewed file:
🏁 Script executed:
#!/bin/bash
# Inspect the reviewed file to confirm the exact code at line 127
fd 'controller.go' -g '*hub/lease*' --exec cat {} \;Length of output: 98
🏁 Script executed:
#!/bin/bash
# Find the controller file in hub/lease directory
find . -path '*hub/lease/controller.go' -type f | head -1 | xargs -I {} cat {}Length of output: 6293
Add nil check for RenewTime to prevent potential panic.
The code at line 127 (and line 134) dereferences observedLease.Spec.RenewTime without checking for nil. Since RenewTime is an optional field in the Kubernetes Lease API (*metav1.MicroTime pointer), it can be nil when leases are retrieved from external sources. The Kubernetes client library itself explicitly checks for nil before dereferencing (see vendor/k8s.io/client-go/tools/leaderelection/resourcelock/leaselock.go:122-124). This will panic at runtime if RenewTime is nil.
Apply this defensive nil check:
now := time.Now()
+ if observedLease.Spec.RenewTime == nil {
+ // Lease has no RenewTime, treat as expired and update status
+ if err := c.updateClusterStatus(ctx, cluster); err != nil {
+ return err
+ }
+ syncCtx.Queue().AddAfter(clusterName, gracePeriod)
+ return nil
+ }
+
leaseExpired := !now.Before(observedLease.Spec.RenewTime.Add(gracePeriod))Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In pkg/registration/hub/lease/controller.go around lines 127 and 134, the code
dereferences observedLease.Spec.RenewTime without a nil check which can panic
because RenewTime is a *metav1.MicroTime; add a defensive nil check before any
dereference. Concretely, if observedLease.Spec.RenewTime == nil treat the lease
as expired (or otherwise handle as "no renew time") and set leaseExpired
accordingly, and for the later use at line 134 avoid dereferencing RenewTime
when nil (use a safe branch or a zero time fallback) so no panic occurs.
1384645
into
open-cluster-management-io:main
…gement-io#1236) Signed-off-by: Yang Le <[email protected]>
…gement-io#1236) (#515) Signed-off-by: Yang Le <[email protected]>
Summary
Related issue(s)
Fixes #
Summary by CodeRabbit
Bug Fixes
Tests