Context
peat-node#91 landed a reconnect watchdog on SidecarNode (PR #99) to recover peer sessions after iroh's QUIC idle timeout fires during a blackout. The justification at the time was that MeshSyncTransport — the peat-mesh transport variant peat-node consumes — does not own a ReconnectionManager, whereas peat-mesh's other transport, IrohMeshTransport, does.
QA review on #99 flagged this as architectural debt to track:
The PR description notes that peat-mesh already owns a ReconnectionManager on IrohMeshTransport, but MeshSyncTransport (the variant peat-node consumes) does not — and uses that as the rationale to build a parallel watchdog here in peat-node. The justification ("the operator-intent registry is a peat-node concept") is defensible, but the resulting shape is two transport-tier reconnect mechanisms living in two repos for two transports that share a substrate.
The actual fork
Three plausible long-term shapes, only one of which is the current state:
-
Status quo (current): peat-node owns the watchdog on top of MeshSyncTransport. The operator-intent registry (which peers should be auto-reconnected vs. left dropped) is a peat-node concept that lives where the gRPC ConnectPeer / DisconnectPeer calls land.
-
Push the reconnect into the transport: MeshSyncTransport grows a ReconnectionManager parallel to IrohMeshTransport's. peat-node passes the operator-intent registry into the transport via register_for_reconnect(peer_id, addresses, relay) / unregister(peer_id). The transport handles the watchdog timing and dial retry; peat-node just declares intent.
-
Unify both transports under a common reconnect abstraction in peat-mesh: factor out the registry + watchdog into a ReconnectionManager trait or struct that both IrohMeshTransport and MeshSyncTransport use. Removes the per-transport divergence entirely; consumers that want auto-reconnect register peers; consumers that don't (e.g. one-shot CLI tools) skip it.
Trade-offs
-
(1) Current keeps consumer-policy and transport-mechanism cleanly separated, but means every consumer that wants auto-reconnect re-implements the watchdog. peat-node has it; future consumers (mobile-app plugins, etc.) would either crib it or do without.
-
(2) would put the reconnect logic next to the connection lifecycle it manages — that's a clean fit, but it gives the transport API two distinct modes (registered vs. ad-hoc connect), which couples MeshSyncTransport's surface to a policy peat-mesh-node may not need.
-
(3) is the most "correct" shape architecturally but requires a peat-mesh-side ADR + refactor that touches both transport implementations. Highest design cost, highest long-term coherence payoff.
Acceptance for closing this issue
This issue is not for an immediate refactor. It's for filing the design tension so it surfaces as an ADR or design discussion before the next consumer (or the next reconnect-adjacent bug) re-litigates it from scratch.
When this is addressed, expect either:
- An ADR (in
peat/docs/adr/) capturing the chosen shape with rationale, or
- A documented decision to keep the current per-consumer pattern, with a comment in
peat-node's watchdog citing this issue as the reason it lives here.
References
Context
peat-node#91 landed a reconnect watchdog on
SidecarNode(PR #99) to recover peer sessions after iroh's QUIC idle timeout fires during a blackout. The justification at the time was thatMeshSyncTransport— the peat-mesh transport variant peat-node consumes — does not own aReconnectionManager, whereas peat-mesh's other transport,IrohMeshTransport, does.QA review on #99 flagged this as architectural debt to track:
The actual fork
Three plausible long-term shapes, only one of which is the current state:
Status quo (current): peat-node owns the watchdog on top of
MeshSyncTransport. The operator-intent registry (which peers should be auto-reconnected vs. left dropped) is a peat-node concept that lives where the gRPCConnectPeer/DisconnectPeercalls land.Push the reconnect into the transport:
MeshSyncTransportgrows aReconnectionManagerparallel toIrohMeshTransport's. peat-node passes the operator-intent registry into the transport viaregister_for_reconnect(peer_id, addresses, relay)/unregister(peer_id). The transport handles the watchdog timing and dial retry; peat-node just declares intent.Unify both transports under a common reconnect abstraction in peat-mesh: factor out the registry + watchdog into a
ReconnectionManagertrait or struct that bothIrohMeshTransportandMeshSyncTransportuse. Removes the per-transport divergence entirely; consumers that want auto-reconnect register peers; consumers that don't (e.g. one-shot CLI tools) skip it.Trade-offs
(1) Current keeps consumer-policy and transport-mechanism cleanly separated, but means every consumer that wants auto-reconnect re-implements the watchdog. peat-node has it; future consumers (mobile-app plugins, etc.) would either crib it or do without.
(2) would put the reconnect logic next to the connection lifecycle it manages — that's a clean fit, but it gives the transport API two distinct modes (registered vs. ad-hoc connect), which couples MeshSyncTransport's surface to a policy peat-mesh-node may not need.
(3) is the most "correct" shape architecturally but requires a peat-mesh-side ADR + refactor that touches both transport implementations. Highest design cost, highest long-term coherence payoff.
Acceptance for closing this issue
This issue is not for an immediate refactor. It's for filing the design tension so it surfaces as an ADR or design discussion before the next consumer (or the next reconnect-adjacent bug) re-litigates it from scratch.
When this is addressed, expect either:
peat/docs/adr/) capturing the chosen shape with rationale, orpeat-node's watchdog citing this issue as the reason it lives here.References
IrohMeshTransport::ReconnectionManager(for comparison shape)