You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scoped follow-up to #5. Covers Option B only (Kubernetes peer discovery for in-cluster deployments). Option A (mDNS LAN) is tracked separately in #62; Option C (gossip) remains parked under #5.
Pivot from #5's original spec:#5 proposed headless Service + DNS poll + BootstrapInfo handshake. peat-mesh's existing KubernetesDiscovery takes a different (better) approach — EndpointSlice watching + annotation-carried metadata. This issue follows the peat-mesh design rather than reinventing.
Context
In-cluster deployments today must pre-share Iroh endpoint IDs at sidecar startup (--peer endpoint_id@host:port, src/main.rs:60). For autoscaling deployments — replicaset scales 3 → 5, new pods need to join the mesh — there's no mechanism for them to find existing peers or for existing peers to learn about them short of an external orchestrator calling ConnectPeer per new pod.
Prior art in peat-mesh =0.9.0-rc.9 (most of the work is already done — but feature-gated)
The Kubernetes implementation lives in peat-mesh behind the kubernetes feature flag (not currently enabled in peat-node's dep). Reusable today:
peat_mesh::discovery::KubernetesDiscovery — watches EndpointSlice resources via the kube API
peat_mesh::discovery::KubernetesDiscoveryConfig:
namespace: Option<String> — defaults to service-account mount, falls back to default
label_selector: String — defaults app=peat-mesh
annotation_prefix: String — defaults peat.
poll_interval: Duration — defaults 30s
extract_peers_from_endpoint_slice(...):
node_id ← endpoint.target_ref.name (the pod name)
addresses ← endpoint.addresses
port ← EndpointSlice port (defaults 8080)
custom metadata (e.g. relay_url, and presumably iroh endpoint ID) ← EndpointSlice annotations with the configured prefix
Reference wiring: peat-mesh/src/bin/peat-mesh-node.rs:152-156 (the kubernetes / k8s mode branch)
AutomergeBackend::with_iroh does not fold discovery into its config — peat-mesh keeps the two concerns parallel, leaving the consumer to construct discovery alongside the backend. peat-node follows the same pattern.
Concrete gap list (what needs to be built in peat-node)
Enable the kubernetes feature on the peat-mesh dep.Cargo.toml:16 currently reads features = [\"automerge-backend\"]; add \"kubernetes\".
Construct + spawn discovery in src/node.rs. Instantiate KubernetesDiscovery::new(...), take its event stream, call start, call advertise(node_id, sidecar_port), and spawn an event-consumer task that maps PeerInfo events → node.connect_peer(endpoint_id, &addresses, \"\"). Mirrors the mDNS wiring in Dynamic peer discovery — mDNS/DNS-SD (LAN, Option A from #5) #62.
Iroh endpoint ID propagation into the EndpointSlice annotation — the key technical unknown. peat-mesh's extractor reads annotations off the EndpointSlice resource, not off pods. EndpointSlices are auto-managed by the kube-controller-manager and don't inherit per-pod annotations for free. Options to investigate during design:
a) Helm chart sets a static endpoint_id annotation on the Service → EndpointSlice mirroring carries it through. Only works if all replicas share an endpoint_id (they don't — endpoint_id is per-instance).
b) Each pod self-patches its own EndpointSlice annotation on startup via the kube API. Requires patch RBAC on endpointslices — viable but adds a sidecar-startup dependency on the API server.
c) Each pod self-patches its own pod annotation (cheaper RBAC), and we extend peat-mesh's extractor to look at the pod via target_ref.name → pod GET → annotations. Adds one API call per peer discovery cycle.
d) Use a deterministic keypair seeded from (formation_id, pod_name) so endpoint_id is computable from target_ref.name without any annotation lookup. Cleanest but requires a deterministic-keypair option in AutomergeBackendConfig (peat-mesh additive change).
First spelunk should be: does peat-mesh's reference binary or operator solve this today, and how? That answer dictates everything else.
RBAC manifests in chart/peat-node/templates/: new ServiceAccount, Role (or ClusterRole if cross-namespace) with get/list/watch on endpointslices.discovery.k8s.io, and RoleBinding. Optional additional patch on endpointslices (option 4b) or pods (option 4c) depending on gap 4's resolution.
Service / Deployment labels.app=peat-node label on the Deployment so the default label_selector works without forcing operators to think about it.
Self-filter + dedup in the event consumer. Skip own pod, skip already-connected peers.
Graceful degradation. kube API unreachable (e.g., running outside a cluster) → log + continue, do not fail startup. The --discovery-mode=none path must remain the safe default.
Docs:README.md config table + docs/CONFIGURATION.md deployment example for in-cluster discovery, including a complete Deployment + Service + RBAC manifest example.
Tests:
Unit: extract_peers_from_endpoint_slice is already covered in peat-mesh; peat-node tests cover the event-consumer dedup/self-filter.
Integration (extend test/cross-cluster-sync.sh or new k3d test): 3 sidecar replicas in a Deployment with no static --peer flags converge to a fully-connected mesh; scale to 5 and verify new pods join within --discovery-interval.
Acceptance
cargo build and cargo test green
helm template chart/peat-node renders cleanly with discovery.mode=kubernetes and produces the RBAC + Service + Deployment manifests
Existing static --peer flow continues to work when --discovery-mode=none (default)
k3d integration test: 3-replica Deployment converges to a fully-connected mesh under the new discovery mode; scaling up adds the new pod within discovery.interval
README API/config table and docs/CONFIGURATION.md updated; sample Deployment + RBAC manifest included
Constraints
Proto-first per SKILL.md: if any discovery state is exposed via gRPC (e.g., "list discovered peers"), it goes in proto/sidecar.proto first. Optional for the first cut.
Discovery off by default. Failure to reach the kube API must log + degrade gracefully, never panic.
Don't break the endpoint_id@host:port parsing in main.rs — kubernetes discovery is additive.
Cross-namespace discovery is out of scope for the first cut; namespace defaults to the pod's own SA-mount namespace.
Dependencies
Probably requires a small peat-mesh PR, depending on gap 4's resolution. If KubernetesDiscovery doesn't already have a story for per-pod endpoint_id propagation, the cleanest fix is option 4d (deterministic per-pod keypair seeded from (formation_id, pod_name)), which needs additive surface in AutomergeBackendConfig.
Helm chart values addition; chart version bump per repo convention.
Effort estimate
Medium. Larger than #62 because of the RBAC manifests and the gap-4 question. If gap 4 resolves to options (a–c) it's plumbing + chart work (~300 lines + manifests + docs). If it needs option (d), add a small peat-mesh PR first. The peat-mesh KubernetesDiscovery itself is done — peat-node is the wiring + RBAC + the endpoint_id propagation answer.
Scoped follow-up to #5. Covers Option B only (Kubernetes peer discovery for in-cluster deployments). Option A (mDNS LAN) is tracked separately in #62; Option C (gossip) remains parked under #5.
Context
In-cluster deployments today must pre-share Iroh endpoint IDs at sidecar startup (
--peer endpoint_id@host:port,src/main.rs:60). For autoscaling deployments — replicaset scales 3 → 5, new pods need to join the mesh — there's no mechanism for them to find existing peers or for existing peers to learn about them short of an external orchestrator callingConnectPeerper new pod.Prior art in
peat-mesh =0.9.0-rc.9(most of the work is already done — but feature-gated)The Kubernetes implementation lives in peat-mesh behind the
kubernetesfeature flag (not currently enabled in peat-node's dep). Reusable today:peat_mesh::discovery::KubernetesDiscovery— watchesEndpointSliceresources via the kube APIpeat_mesh::discovery::KubernetesDiscoveryConfig:namespace: Option<String>— defaults to service-account mount, falls back todefaultlabel_selector: String— defaultsapp=peat-meshannotation_prefix: String— defaultspeat.poll_interval: Duration— defaults 30sextract_peers_from_endpoint_slice(...):node_id←endpoint.target_ref.name(the pod name)endpoint.addressesrelay_url, and presumably iroh endpoint ID) ← EndpointSlice annotations with the configured prefixpeat-mesh/src/bin/peat-mesh-node.rs:152-156(thekubernetes/k8smode branch)AutomergeBackend::with_irohdoes not fold discovery into its config — peat-mesh keeps the two concerns parallel, leaving the consumer to construct discovery alongside the backend. peat-node follows the same pattern.Concrete gap list (what needs to be built in peat-node)
kubernetesfeature on the peat-mesh dep.Cargo.toml:16currently readsfeatures = [\"automerge-backend\"]; add\"kubernetes\".src/node.rs. InstantiateKubernetesDiscovery::new(...), take its event stream, callstart, calladvertise(node_id, sidecar_port), and spawn an event-consumer task that mapsPeerInfoevents →node.connect_peer(endpoint_id, &addresses, \"\"). Mirrors the mDNS wiring in Dynamic peer discovery — mDNS/DNS-SD (LAN, Option A from #5) #62.src/main.rs:--discovery-modeand--discovery-intervalwith Dynamic peer discovery — mDNS/DNS-SD (LAN, Option A from #5) #62.endpoint_idannotation on the Service → EndpointSlice mirroring carries it through. Only works if all replicas share an endpoint_id (they don't — endpoint_id is per-instance).patchRBAC onendpointslices— viable but adds a sidecar-startup dependency on the API server.target_ref.name→ pod GET → annotations. Adds one API call per peer discovery cycle.(formation_id, pod_name)so endpoint_id is computable fromtarget_ref.namewithout any annotation lookup. Cleanest but requires a deterministic-keypair option inAutomergeBackendConfig(peat-mesh additive change).First spelunk should be: does peat-mesh's reference binary or operator solve this today, and how? That answer dictates everything else.
chart/peat-node/templates/: newServiceAccount,Role(orClusterRoleif cross-namespace) withget/list/watchonendpointslices.discovery.k8s.io, andRoleBinding. Optional additionalpatchonendpointslices(option 4b) orpods(option 4c) depending on gap 4's resolution.app=peat-nodelabel on the Deployment so the defaultlabel_selectorworks without forcing operators to think about it.--discovery-mode=nonepath must remain the safe default.chart/peat-node/values.yaml):discovery.mode,discovery.namespace,discovery.labelSelector,discovery.annotationPrefix,discovery.interval. Defaultmode: noneto stay backward-compatible.README.mdconfig table +docs/CONFIGURATION.mddeployment example for in-cluster discovery, including a complete Deployment + Service + RBAC manifest example.extract_peers_from_endpoint_sliceis already covered in peat-mesh; peat-node tests cover the event-consumer dedup/self-filter.test/cross-cluster-sync.shor new k3d test): 3 sidecar replicas in a Deployment with no static--peerflags converge to a fully-connected mesh; scale to 5 and verify new pods join within--discovery-interval.Acceptance
cargo buildandcargo testgreenhelm template chart/peat-noderenders cleanly withdiscovery.mode=kubernetesand produces the RBAC + Service + Deployment manifests--peerflow continues to work when--discovery-mode=none(default)discovery.intervaldocs/CONFIGURATION.mdupdated; sample Deployment + RBAC manifest includedConstraints
proto/sidecar.protofirst. Optional for the first cut.endpoint_id@host:portparsing inmain.rs— kubernetes discovery is additive.Dependencies
KubernetesDiscoverydoesn't already have a story for per-pod endpoint_id propagation, the cleanest fix is option 4d (deterministic per-pod keypair seeded from(formation_id, pod_name)), which needs additive surface inAutomergeBackendConfig.Effort estimate
Medium. Larger than #62 because of the RBAC manifests and the gap-4 question. If gap 4 resolves to options (a–c) it's plumbing + chart work (~300 lines + manifests + docs). If it needs option (d), add a small peat-mesh PR first. The peat-mesh
KubernetesDiscoveryitself is done — peat-node is the wiring + RBAC + the endpoint_id propagation answer.References
src/main.rs:60,src/node.rs:260chart/peat-node/templates/(no RBAC, no headless service today)peat_mesh::discovery::{KubernetesDiscovery, KubernetesDiscoveryConfig}(feature-gated onkubernetes), reference binary atpeat-mesh/src/bin/peat-mesh-node.rs:152-156