fix(hub): surface silently swallowed errors in hubsync hook and replication loop#114
Open
omergk28 wants to merge 1 commit into
Open
fix(hub): surface silently swallowed errors in hubsync hook and replication loop#114omergk28 wants to merge 1 commit into
omergk28 wants to merge 1 commit into
Conversation
…cation loop The session-start hubsync hook returned "" on config-load, dial, sync-RPC, and write failures with no trace, and conflated a real sync error with the genuine "nothing new" result. The follower replication loop returned silently on dial, stream-open, send, and close-send failures and on every receive error. Operators could not distinguish a healthy quiet sync from one that never reached the network. Wire every silent return through the warn sink with format constants in config/warn. Behavior is otherwise unchanged: both functions keep their signatures and never-block contracts. - Sync now warns per failure site and stays silent only for a genuine zero-entry result (the un-conflation from ActiveMemory#100). - replicateOnce warns per failure site; the receive path stays silent for the eof() end-of-stream (the one deliberate deviation from the issue's proposed code, which would have warned once per replication cycle) and for caller shutdown. - Two ActiveMemory#100 sub-items (close-defer warn, append warn + keep consuming) had already landed; this covers the rest and adds the regression tests the issue asked for, including pinning keep-consuming-after-append-error and clean-EOF-no-warn. Closes ActiveMemory#100. Spec: specs/fix-hub-silent-error-suppression.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Omer Kocaoglu <omergk28@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two hub code paths swallowed errors with no logging surface — the session-start hubsync hook (
internal/cli/system/core/hubsync) and the cluster replication loop (internal/hub/replicate.go). Operators couldn't tell a healthy quiet sync from one that never reached the network.This PR wires every silent return through the established
warnsink with format constants inconfig/warn, and un-conflates the sync-error check from the empty-result check (only the error warns now). Both functions keep their signatures and never-block contracts — logging is the only behavior change.Closes #100.
Already landed
Two of #100's sub-items were already in main before this PR (the
conn.Closedefer warn and thestore.Appendwarn + keep-consuming, viaCloseHubClient/HubReplicateAppend). This PR covers the rest and adds the regression tests the issue asked for — including pinning the already-landed keep-consuming behavior, which previously had no coverage.One deliberate deviation from the issue's proposed code
The issue's sketch warns on every
RecvMsgerror — butio.EOFis the normal end of every sync stream, so that would emit a warning once perReplicateIntervalon perfectly healthy replication. The receive path here stays silent foreof()(the hub package's own end-of-stream helper, same as the other two RecvMsg sites) and for caller shutdown (ctx.Err() != nil), and warns on everything else.TestReplicateOnce_CleanReplicationDoesNotWarnpins this.Tests (mapped to the issue's Tests Required)
TestSync_WarnsOnLoadErrorTestSync_WarnsOnDialError— empirically, the only eagergrpc.NewClientfailure is a control character failing URL parse; every other malformed target (://invalid…) defers to first use.TestSync_WarnsOnPullErrorcovers the lazy path (closed port → Sync RPC fails)TestSync_NoWarnOnEmptyResult— real in-process hub, zero entries, asserts an empty warn bufferTestReplicateOnce_WarnsOnDialError+TestReplicateOnce_WarnsOnTransportErrorTestReplicateOnce_KeepsConsumingAfterAppendError— read-only follower dir, asserts exactly 2 append warnings (root/Windows skip guards per therc/require_test.goprecedent)All use the existing
warn.SetSinkseam; the hubsync package previously had zero tests.Verification
go test -race -count=2on both packages — clean.syncErrwith the empty check failsTestSync_NoWarnOnEmptyResult; (2) removing the EOF guard failsTestReplicateOnce_CleanReplicationDoesNotWarn; (3) dropping the dial warn failsTestSync_WarnsOnDialError.make audit— all checks passed.🤖 Generated with Claude Code