Skip to content

Watchdog: don't re-assert stale value after reset#31039

Merged
andig merged 1 commit into
masterfrom
fix/watchdog-reset-race
Jun 19, 2026
Merged

Watchdog: don't re-assert stale value after reset#31039
andig merged 1 commit into
masterfrom
fix/watchdog-reset-race

Conversation

@andig

@andig andig commented Jun 19, 2026

Copy link
Copy Markdown
Member

The watchdog re-write tick acquires the mutex but never re-checks the context after obtaining the lock. When the value is switched to the reset value, an in-flight tick can wake up after the reset was written and re-assert the old value, leaving the device in the previous state (e.g. battery stuck in charge/hold after switching to normal). Skip the write when the watchdog was cancelled while waiting for the lock.

🤖 Generated with Claude Code

A watchdog re-write tick can be in flight (blocked on the mutex) while the
value is switched to the reset value. Without re-checking the context after
acquiring the lock, the tick re-asserts the old value after the reset was
written, leaving the device in the previous state. Skip the write when the
watchdog was cancelled while waiting for the lock.
@andig andig added the bug Something isn't working label Jun 19, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The loop for iter := range 200 in TestWatchdogResetStopsInflightTick will not compile in Go; it should be written as a standard indexed loop (e.g. for iter := 0; iter < 200; iter++ { ... }).
  • The test relies on very small fixed time.Sleep durations around the watchdog timeout, which can be flaky on slower or more contended CI runners; consider increasing the timeouts or using the injected clock to control time deterministically.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The loop `for iter := range 200` in `TestWatchdogResetStopsInflightTick` will not compile in Go; it should be written as a standard indexed loop (e.g. `for iter := 0; iter < 200; iter++ { ... }`).
- The test relies on very small fixed `time.Sleep` durations around the watchdog timeout, which can be flaky on slower or more contended CI runners; consider increasing the timeouts or using the injected `clock` to control time deterministically.

## Individual Comments

### Comment 1
<location path="plugin/watchdog_test.go" line_range="192-198" />
<code_context>
+			return nil
+		}, []int{1})
+
+		require.NoError(t, set(2)) // non-reset: starts the watchdog
+		time.Sleep(3 * time.Millisecond)
+
+		require.NoError(t, set(1)) // reset: must stop the watchdog
+		time.Sleep(5 * time.Millisecond)
+
+		require.Falsef(t, stale.Load(), "iter %d: watchdog re-asserted old value after reset", iter)
+	}
+}
</code_context>
<issue_to_address>
**suggestion (testing):** Reduce reliance on real-time `Sleep` with tight millisecond windows to avoid flakiness

This test is probing a subtle race but relies on several very short `time.Sleep` calls and runs 200 times, which is likely to be flaky under load/CI or on slower machines. Since the plugin already takes a clock, please consider switching to a controllable/fake clock and explicit synchronization (e.g., advance the clock until the tick fires and coordinate via channels/WaitGroups) so the race reproduction is deterministic and the test remains robust across environments.

Suggested implementation:

```golang
func TestWatchdogResetStopsInflightTick(t *testing.T) {
	// Switching to the reset value must stop the watchdog: an in-flight tick
	// must not re-assert the old value after the reset value was written.
	for iter := range 200 {
		// Use a fake/controllable clock so we can deterministically advance time
		// instead of relying on real-time sleeps, which can be flaky under load.
		fakeClock := clock.NewFake()

		p := &watchdogPlugin{
			log:     util.NewLogger("test"),
			timeout: 2 * time.Millisecond,
			clock:   fakeClock,
		}

```

```golang
		require.NoError(t, set(2)) // non-reset: starts the watchdog

		// Advance the fake clock past the timeout to trigger the watchdog tick.
		// This replaces a real-time sleep and makes the test deterministic.
		fakeClock.Advance(3 * time.Millisecond)

		require.NoError(t, set(1)) // reset: must stop the watchdog

		// Advance the fake clock again to ensure any pending tick would fire
		// if it weren't properly cancelled by the reset.
		fakeClock.Advance(5 * time.Millisecond)

```

I assumed the clock package exposes a controllable clock via `clock.NewFake()` with an `Advance(d time.Duration)` method, which is a common pattern. If your clock API differs, you will need to:

1. Replace `clock.NewFake()` with the appropriate constructor for your fake/controllable clock (e.g. `clock.NewMock()`, `clock.NewTestClock(time.Now())`, etc.).
2. Replace `fakeClock.Advance(...)` with the corresponding method on your fake clock to move time forward and trigger timers (e.g. `fakeClock.Step(...)`, `fakeClock.Add(...)`, etc.).
3. Ensure the fake clock type implements whatever interface `watchdogPlugin.clock` expects.

With those adjustments, the test will no longer rely on tight real-time `time.Sleep` windows and will instead deterministically drive the watchdog via the fake clock.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread plugin/watchdog_test.go
@andig

andig commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

@CiNcH83 does this fix what you've observed before?

@CiNcH83

CiNcH83 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

You mean this race-condition? I scheduled re-testing battery modes (especially holdcharge) for Sunday or Monday. So please bear with me.

@andig andig merged commit 63b7d96 into master Jun 19, 2026
27 checks passed
@andig andig deleted the fix/watchdog-reset-race branch June 19, 2026 18:49
@andig

andig commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

I'll merge this anyway- the test errors without the fix. Seems this is a real bug.

@CiNcH83 CiNcH83 mentioned this pull request Jun 20, 2026
17 tasks
@CiNcH83

CiNcH83 commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Looks good. charge for Huawei is now properly being stopped without the HW watchdog having to take care of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants