feat(sdks/go): stream reconnect and lifecycle for listeners#4257
feat(sdks/go): stream reconnect and lifecycle for listeners#4257igor-kupczynski wants to merge 8 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Stop unbounded background reconnect loops on NoProgress errors, pass caller context into initial subscribe/listen setup, and make transient-failure reconnect tests deterministic via a stream sleep test hook.
27d4227 to
2d96f31
Compare
Share listener reconnect state across stream consumers and split worker action listening into smaller lifecycle helpers.
Move duplicated workflow and durable event listen loops into the shared reconnect helper and cover the behavior once at the policy layer.
Benchmark resultsCompared against |
Description
Implements PR #2 for issue #4228: stream reconnect and lifecycle for the legacy Go SDK (
pkg/client). This PR stacks on #4240 (rubylabs-audit-go-sdk), which added REST read retries and shared backoff primitives; here we extend that foundation to gRPC stream listeners.Long-lived gRPC subscriptions (workflow runs, durable events, worker actions, metadata streams) now use explicit app-level reconnect with full-jitter backoff instead of relying on the unary gRPC interceptor or ad-hoc hot loops. Sync paths (initial subscribe /
AddWorkflowRun) use bounded reconnect; background listen loops reconnect unboundedly while the listener remains open. Permanent errors and consecutive no-progress failures still surface to callers.Dependencies: #4240 (
rubylabs-audit-go-sdkbase branch)Fixes #4228
Type of change
What's Changed
pkg/client/retry/stream.go: stream backoff (1s base, 30s cap, full jitter), cancellable sleep helpers, andClassifyStreamErrorretrySubscribeSync/retrySubscribeBackground, mirrored for durable events) with singleflight coalescingClose()Recvfailures; permanent errors still surface onerrCh; V2→V1 fallback preservedListen/ListenV2; add app-level reconnect forStreamByAdditionalMetadata(including EOF)retry.Sleepduring reconnect backoffWorkflow.Result(): fail fast whenAddWorkflowRunfails instead of blocking indefinitelyNoProgresserrors, pass caller context into initial subscribe/listen setup, deterministic transient-failure reconnect tests via stream sleep test hookChecklist
Changes have been:
Testing
go test ./pkg/client/retry ./pkg/client -count=1pkg/client/retry/stream_test.go)grpc_retry.Disable()verificationAddSignal, permanent error stop, reconnect during retry backoffRecvfailures, V2→V1 fallback, V1Unimplementedterminal, ctx cancelStreamByAdditionalMetadatarecv reconnect (including EOF)Workflow.Result()fail-fast when subscribe failsRelated
rubylabs-audit-go-sdk) — retry foundation for REST readsRemaining risks
Workflow.Result()has no caller ctx; relies on bounded sync reconnect only🤖 AI Disclosure
I acknowledge that an LLM was used in the creation of this Pull Request, in accordance with Hatchet's AI_POLICY.md.
Details: Cursor (Claude) used for implementation, test coverage, review follow-ups, and PR description drafting across the Go SDK Retries #4228 PR chore: add root package.json with workspaces #2 scope.