Skip to content

Dynamic peer discovery — mDNS/DNS-SD (LAN, Option A from #5) #62

@kitplummer

Description

@kitplummer

Scoped follow-up to #5. Covers Option A only (mDNS/DNS-SD LAN discovery). Option B (K8s headless) is tracked separately in #63; Option C (gossip) remains parked under #5.

Context

Peers today must be configured statically (--peer endpoint_id@host:port at src/main.rs:60) or registered at runtime via the ConnectPeer gRPC RPC. Both require some orchestrator to already know the Iroh endpoint ID, which doesn't fit tactical edge / BLE-bridge scenarios where sidecars appear and disappear on the same L2 segment and no external orchestrator exists.

Customer driver: the Docker Desktop ↔ Jetson scenario referenced in #55 — sidecars on a shared LAN that should find each other without manual endpoint-ID exchange.

Prior art in peat-mesh =0.9.0-rc.9 (most of the work is already done)

The mDNS implementation lives in peat-mesh and is wired into the peat-mesh-node reference binary. peat-node simply doesn't use it. Reusable today:

  • peat_mesh::discovery::MdnsDiscovery — full implementation
    • MdnsDiscovery::new() / MdnsDiscovery::with_service_type(&str)
    • advertise(node_id, port) / unadvertise(node_id)
    • Event-stream pattern for consuming discovery events
  • peat_mesh::discovery::DiscoveryStrategy trait + PeerInfo struct
  • Reference wiring: peat-mesh/src/bin/peat-mesh-node.rs (construct → take event stream → start → advertise → consume events)
  • mdns-sd and swarm-discovery already in Cargo.lock (transitive via peat-mesh / iroh)

AutomergeBackend::with_iroh deliberately does not fold discovery into AutomergeBackendConfig — peat-mesh keeps the two concerns parallel, leaving the consumer to construct discovery alongside the backend. peat-node follows the same pattern.

Concrete gap list (what needs to be built in peat-node)

  1. Construct + spawn discovery in src/node.rs. Instantiate MdnsDiscovery::new() (or with_service_type if scoping by formation requires it — see gap 4), take its event stream, call start, call advertise(node_id, iroh_bind_port), and spawn an event-consumer task that maps PeerInfo events → node.connect_peer(endpoint_id, &addresses, \"\").
  2. CLI flags in src/main.rs:
    --discovery-mode <none|mdns>   / PEAT_NODE_DISCOVERY_MODE (default: none)
    --discovery-interval <seconds> / PEAT_NODE_DISCOVERY_INTERVAL (default: 30)
    --discovery-service-type <s>   / PEAT_NODE_DISCOVERY_SERVICE_TYPE (optional override)
    
  3. Iroh endpoint ID propagation through mDNS. MdnsDiscovery::advertise(node_id, port) carries node_id + port; connect_peer requires an Iroh endpoint ID. Confirm during implementation whether PeerInfo already carries the endpoint ID (likely — the reference binary works) or whether a TXT-record extension is needed. This is the one technical unknown. If a TXT extension is needed, it goes in a small additive peat-mesh PR linked from this issue.
  4. Formation scoping. Two formations on the same LAN must not cross-pollute. Decide between:
    • Per-formation mDNS service type (_peat-sidecar-<formation_id>._udp.local via MdnsDiscovery::with_service_type), or
    • Shared service type + app_id TXT record filtered on the consume side
      First option is simpler and avoids a TXT-record dependency.
  5. Self-filter + dedup in the event consumer. Skip own announcement; skip peers already in the connected set.
  6. Graceful degradation. UDP/5353 blocked or multicast disabled → log + continue (do not fail startup). Backoff on advertise / discover errors.
  7. Helm chart wiring (chart/peat-node/): new discovery.mode, discovery.interval, discovery.serviceType values, threaded through templates/deployment.yaml env. Default none to stay backward-compatible.
  8. Docs: README.md config table + docs/CONFIGURATION.md deployment example for LAN auto-discovery.
  9. Tests:
    • Unit: event-consumer dedups and self-filters correctly.
    • Integration: two-node mDNS convergence test. LAN-dependent — likely feature-gated or skipped under k3d; test/cross-cluster-sync.sh should explicitly document that mDNS is out of its scope (k3d pod networking makes multicast brittle).

Acceptance

  • cargo build and cargo test green
  • Two-node test on a shared LAN with --discovery-mode mdns (no --peer flag) converges within --discovery-interval seconds
  • Existing --peer flow continues to work when --discovery-mode=none (default)
  • helm template chart/peat-node renders cleanly with discovery.mode=mdns
  • README API/config table and docs/CONFIGURATION.md updated

Constraints

  • Proto-first per SKILL.md: if any discovery state is exposed via gRPC (e.g., "list discovered peers"), it goes in proto/sidecar.proto first. Optional for the first cut — CLI + auto-connect is sufficient.
  • Discovery off by default. Failure to announce / discover must log + degrade gracefully, never panic.
  • Don't break the endpoint_id@host:port parsing in main.rs — mDNS is additive.

Dependencies

  • Likely no peat-mesh change required. Only needed if gap 3 reveals PeerInfo doesn't carry the iroh endpoint ID, in which case a small additive PR in peat-mesh adds a TXT-record / PeerInfo.endpoint_id extension before peat-node can wire it.
  • Helm chart values addition; chart version bump per repo convention.

Effort estimate

Small. Most of the work is plumbing in node.rs / main.rs plus chart and docs. Plausibly a ~200-line PR if gap 3 lands favorably; +1 small peat-mesh PR if it doesn't.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions