Watchdog: don't re-assert stale value after reset#31039
Merged
Conversation
A watchdog re-write tick can be in flight (blocked on the mutex) while the value is switched to the reset value. Without re-checking the context after acquiring the lock, the tick re-asserts the old value after the reset was written, leaving the device in the previous state. Skip the write when the watchdog was cancelled while waiting for the lock.
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The loop
for iter := range 200inTestWatchdogResetStopsInflightTickwill not compile in Go; it should be written as a standard indexed loop (e.g.for iter := 0; iter < 200; iter++ { ... }). - The test relies on very small fixed
time.Sleepdurations around the watchdog timeout, which can be flaky on slower or more contended CI runners; consider increasing the timeouts or using the injectedclockto control time deterministically.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The loop `for iter := range 200` in `TestWatchdogResetStopsInflightTick` will not compile in Go; it should be written as a standard indexed loop (e.g. `for iter := 0; iter < 200; iter++ { ... }`).
- The test relies on very small fixed `time.Sleep` durations around the watchdog timeout, which can be flaky on slower or more contended CI runners; consider increasing the timeouts or using the injected `clock` to control time deterministically.
## Individual Comments
### Comment 1
<location path="plugin/watchdog_test.go" line_range="192-198" />
<code_context>
+ return nil
+ }, []int{1})
+
+ require.NoError(t, set(2)) // non-reset: starts the watchdog
+ time.Sleep(3 * time.Millisecond)
+
+ require.NoError(t, set(1)) // reset: must stop the watchdog
+ time.Sleep(5 * time.Millisecond)
+
+ require.Falsef(t, stale.Load(), "iter %d: watchdog re-asserted old value after reset", iter)
+ }
+}
</code_context>
<issue_to_address>
**suggestion (testing):** Reduce reliance on real-time `Sleep` with tight millisecond windows to avoid flakiness
This test is probing a subtle race but relies on several very short `time.Sleep` calls and runs 200 times, which is likely to be flaky under load/CI or on slower machines. Since the plugin already takes a clock, please consider switching to a controllable/fake clock and explicit synchronization (e.g., advance the clock until the tick fires and coordinate via channels/WaitGroups) so the race reproduction is deterministic and the test remains robust across environments.
Suggested implementation:
```golang
func TestWatchdogResetStopsInflightTick(t *testing.T) {
// Switching to the reset value must stop the watchdog: an in-flight tick
// must not re-assert the old value after the reset value was written.
for iter := range 200 {
// Use a fake/controllable clock so we can deterministically advance time
// instead of relying on real-time sleeps, which can be flaky under load.
fakeClock := clock.NewFake()
p := &watchdogPlugin{
log: util.NewLogger("test"),
timeout: 2 * time.Millisecond,
clock: fakeClock,
}
```
```golang
require.NoError(t, set(2)) // non-reset: starts the watchdog
// Advance the fake clock past the timeout to trigger the watchdog tick.
// This replaces a real-time sleep and makes the test deterministic.
fakeClock.Advance(3 * time.Millisecond)
require.NoError(t, set(1)) // reset: must stop the watchdog
// Advance the fake clock again to ensure any pending tick would fire
// if it weren't properly cancelled by the reset.
fakeClock.Advance(5 * time.Millisecond)
```
I assumed the clock package exposes a controllable clock via `clock.NewFake()` with an `Advance(d time.Duration)` method, which is a common pattern. If your clock API differs, you will need to:
1. Replace `clock.NewFake()` with the appropriate constructor for your fake/controllable clock (e.g. `clock.NewMock()`, `clock.NewTestClock(time.Now())`, etc.).
2. Replace `fakeClock.Advance(...)` with the corresponding method on your fake clock to move time forward and trigger timers (e.g. `fakeClock.Step(...)`, `fakeClock.Add(...)`, etc.).
3. Ensure the fake clock type implements whatever interface `watchdogPlugin.clock` expects.
With those adjustments, the test will no longer rely on tight real-time `time.Sleep` windows and will instead deterministically drive the watchdog via the fake clock.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Member
Author
|
@CiNcH83 does this fix what you've observed before? |
Contributor
|
You mean this race-condition? I scheduled re-testing battery modes (especially |
Member
Author
|
I'll merge this anyway- the test errors without the fix. Seems this is a real bug. |
Contributor
|
Looks good. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The watchdog re-write tick acquires the mutex but never re-checks the context after obtaining the lock. When the value is switched to the reset value, an in-flight tick can wake up after the reset was written and re-assert the old value, leaving the device in the previous state (e.g. battery stuck in charge/hold after switching to normal). Skip the write when the watchdog was cancelled while waiting for the lock.
🤖 Generated with Claude Code