Skip to content

[ARCH] consolidate transport-tier reconnect: peat-node watchdog + peat-mesh ReconnectionManager #100

@kitplummer

Description

@kitplummer

Context

peat-node#91 landed a reconnect watchdog on SidecarNode (PR #99) to recover peer sessions after iroh's QUIC idle timeout fires during a blackout. The justification at the time was that MeshSyncTransport — the peat-mesh transport variant peat-node consumes — does not own a ReconnectionManager, whereas peat-mesh's other transport, IrohMeshTransport, does.

QA review on #99 flagged this as architectural debt to track:

The PR description notes that peat-mesh already owns a ReconnectionManager on IrohMeshTransport, but MeshSyncTransport (the variant peat-node consumes) does not — and uses that as the rationale to build a parallel watchdog here in peat-node. The justification ("the operator-intent registry is a peat-node concept") is defensible, but the resulting shape is two transport-tier reconnect mechanisms living in two repos for two transports that share a substrate.

The actual fork

Three plausible long-term shapes, only one of which is the current state:

  1. Status quo (current): peat-node owns the watchdog on top of MeshSyncTransport. The operator-intent registry (which peers should be auto-reconnected vs. left dropped) is a peat-node concept that lives where the gRPC ConnectPeer / DisconnectPeer calls land.

  2. Push the reconnect into the transport: MeshSyncTransport grows a ReconnectionManager parallel to IrohMeshTransport's. peat-node passes the operator-intent registry into the transport via register_for_reconnect(peer_id, addresses, relay) / unregister(peer_id). The transport handles the watchdog timing and dial retry; peat-node just declares intent.

  3. Unify both transports under a common reconnect abstraction in peat-mesh: factor out the registry + watchdog into a ReconnectionManager trait or struct that both IrohMeshTransport and MeshSyncTransport use. Removes the per-transport divergence entirely; consumers that want auto-reconnect register peers; consumers that don't (e.g. one-shot CLI tools) skip it.

Trade-offs

  • (1) Current keeps consumer-policy and transport-mechanism cleanly separated, but means every consumer that wants auto-reconnect re-implements the watchdog. peat-node has it; future consumers (mobile-app plugins, etc.) would either crib it or do without.

  • (2) would put the reconnect logic next to the connection lifecycle it manages — that's a clean fit, but it gives the transport API two distinct modes (registered vs. ad-hoc connect), which couples MeshSyncTransport's surface to a policy peat-mesh-node may not need.

  • (3) is the most "correct" shape architecturally but requires a peat-mesh-side ADR + refactor that touches both transport implementations. Highest design cost, highest long-term coherence payoff.

Acceptance for closing this issue

This issue is not for an immediate refactor. It's for filing the design tension so it surfaces as an ADR or design discussion before the next consumer (or the next reconnect-adjacent bug) re-litigates it from scratch.

When this is addressed, expect either:

  • An ADR (in peat/docs/adr/) capturing the chosen shape with rationale, or
  • A documented decision to keep the current per-consumer pattern, with a comment in peat-node's watchdog citing this issue as the reason it lives here.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions