Skip to content

fix(hub): surface silently swallowed errors in hubsync hook and replication loop#114

Open
omergk28 wants to merge 1 commit into
ActiveMemory:mainfrom
omergk28:fix/100-hub-silent-error-suppression
Open

fix(hub): surface silently swallowed errors in hubsync hook and replication loop#114
omergk28 wants to merge 1 commit into
ActiveMemory:mainfrom
omergk28:fix/100-hub-silent-error-suppression

Conversation

@omergk28

Copy link
Copy Markdown
Contributor

Summary

Two hub code paths swallowed errors with no logging surface — the session-start hubsync hook (internal/cli/system/core/hubsync) and the cluster replication loop (internal/hub/replicate.go). Operators couldn't tell a healthy quiet sync from one that never reached the network.

This PR wires every silent return through the established warn sink with format constants in config/warn, and un-conflates the sync-error check from the empty-result check (only the error warns now). Both functions keep their signatures and never-block contracts — logging is the only behavior change.

Closes #100.

Already landed

Two of #100's sub-items were already in main before this PR (the conn.Close defer warn and the store.Append warn + keep-consuming, via CloseHubClient/HubReplicateAppend). This PR covers the rest and adds the regression tests the issue asked for — including pinning the already-landed keep-consuming behavior, which previously had no coverage.

One deliberate deviation from the issue's proposed code

The issue's sketch warns on every RecvMsg error — but io.EOF is the normal end of every sync stream, so that would emit a warning once per ReplicateInterval on perfectly healthy replication. The receive path here stays silent for eof() (the hub package's own end-of-stream helper, same as the other two RecvMsg sites) and for caller shutdown (ctx.Err() != nil), and warns on everything else. TestReplicateOnce_CleanReplicationDoesNotWarn pins this.

Tests (mapped to the issue's Tests Required)

Issue ask Test
Warns on load error TestSync_WarnsOnLoadError
Warns on dial error TestSync_WarnsOnDialError — empirically, the only eager grpc.NewClient failure is a control character failing URL parse; every other malformed target (://invalid…) defers to first use. TestSync_WarnsOnPullError covers the lazy path (closed port → Sync RPC fails)
No warn on empty result (the un-conflation) TestSync_NoWarnOnEmptyResult — real in-process hub, zero entries, asserts an empty warn buffer
Replication warns on dial error TestReplicateOnce_WarnsOnDialError + TestReplicateOnce_WarnsOnTransportError
Keeps consuming after append error TestReplicateOnce_KeepsConsumingAfterAppendError — read-only follower dir, asserts exactly 2 append warnings (root/Windows skip guards per the rc/require_test.go precedent)

All use the existing warn.SetSink seam; the hubsync package previously had zero tests.

Verification

  • go test -race -count=2 on both packages — clean.
  • Mutation checks: three deliberate regressions each fail their pinning test — (1) re-conflating syncErr with the empty check fails TestSync_NoWarnOnEmptyResult; (2) removing the EOF guard fails TestReplicateOnce_CleanReplicationDoesNotWarn; (3) dropping the dial warn fails TestSync_WarnsOnDialError.
  • make audit — all checks passed.

🤖 Generated with Claude Code

…cation loop

The session-start hubsync hook returned "" on config-load, dial,
sync-RPC, and write failures with no trace, and conflated a real
sync error with the genuine "nothing new" result. The follower
replication loop returned silently on dial, stream-open, send, and
close-send failures and on every receive error. Operators could not
distinguish a healthy quiet sync from one that never reached the
network.

Wire every silent return through the warn sink with format
constants in config/warn. Behavior is otherwise unchanged: both
functions keep their signatures and never-block contracts.

- Sync now warns per failure site and stays silent only for a
  genuine zero-entry result (the un-conflation from ActiveMemory#100).
- replicateOnce warns per failure site; the receive path stays
  silent for the eof() end-of-stream (the one deliberate deviation
  from the issue's proposed code, which would have warned once per
  replication cycle) and for caller shutdown.
- Two ActiveMemory#100 sub-items (close-defer warn, append warn + keep
  consuming) had already landed; this covers the rest and adds the
  regression tests the issue asked for, including pinning
  keep-consuming-after-append-error and clean-EOF-no-warn.

Closes ActiveMemory#100.

Spec: specs/fix-hub-silent-error-suppression.md
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Omer Kocaoglu <omergk28@gmail.com>
@omergk28 omergk28 requested a review from josealekhine as a code owner June 12, 2026 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hub: silent error suppression in hubsync hook + replication loop

1 participant