e2e and attestation#433
Conversation
Each node generates an X25519 inference keypair at startup, advertised in gossip. When routing to a remote host that has an inference key, the proxy encrypts the full HTTP request body with NaCl box (ephemeral keypair per request for forward secrecy). The receiving node detects the 0xE1 magic byte, decrypts, forwards to the local backend, then encrypts the response back. Relay/routing nodes see only ciphertext. Hardware attestation: on macOS Apple Silicon, a P-256 key is created in the Secure Enclave (private key never leaves hardware). It signs an attestation blob binding: node endpoint ID, inference public key, binary SHA-256 hash, and security posture (SIP, RDMA, secure boot). Any peer can verify the P-256 signature. Challenge-response nonce signing for continuous liveness proof. Runtime hardening (macOS): PT_DENY_ATTACH blocks debuggers, core dumps disabled, dangerous env vars scrubbed, SIP and RDMA status checked, binary self-hash computed. SecurityPosture gossiped to all peers. New CLI flag: --require-attested-hosts refuses to route inference to peers without a verified hardware attestation. Proto: fields 36-38 on PeerAnnouncement (inference_public_key, SecurityPosture message, hardware_attestation_blob). Additive — old nodes ignore new fields. New deps: p256 (ecdsa verification), security-framework (macOS SE).
Two things that weren't actually wired: 1. SE attestation: try_create_attestation() now called at startup. On Apple Silicon with SE access, creates a P-256 key in hardware, signs an attestation blob (chip, model, memory, posture, binary hash, node ID, inference key), stores in local_hardware_attestation which gets gossiped to all peers. 2. Encrypted response decryption: route_remote_attempt now handles the encrypted response path. After sending an encrypted request, it reads back 0xE1 + encrypted JSON, decrypts with the ephemeral session, and writes plaintext HTTP to the client. Previously this was broken — the proxy would try to parse encrypted bytes as HTTP and fail. Also: finish() the QUIC send stream after writing encrypted payload so the receiver's read_to_end() completes.
- fix response sender auth: verify against gossiped host key, not self-declared - chunked streaming encryption: SSE events stream through encrypted tunnel - attestation refresh every 5 min so --require-attested-hosts doesn't expire - fail closed: refuse plaintext fallback when attestation required - parse HTTP status from first decrypted chunk - move harden_runtime() before worker threads spawn - document attestation trust model limitation - add 10 tests for chunked crypto, model extraction, status parsing
There was a problem hiding this comment.
Pull request overview
Adds end-to-end inference encryption and (macOS) Secure Enclave-based attestation, plus OS-level runtime hardening signals that are gossiped to peers and can be enforced via --require-attested-hosts.
Changes:
- Introduces X25519 + NaCl box request encryption with per-request ephemeral keys and a chunked encrypted streaming response format.
- Adds Secure Enclave (P-256) attestation generation + cross-platform verification and gossips attestation + security posture.
- Adds runtime hardening checks (debugger attach denial, core dump disable, env scrubbing) and routing gates for attestation enforcement.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/mesh-llm/src/system/mod.rs | Exposes new hardening module from system. |
| crates/mesh-llm/src/system/hardening.rs | Implements best-effort runtime hardening and a gossiped SecurityPosture. |
| crates/mesh-llm/src/runtime/mod.rs | Runs hardening early, creates/stores attestation + posture, and periodically refreshes attestation. |
| crates/mesh-llm/src/protocol/mod.rs | Updates protocol tests/fixtures for new gossip fields. |
| crates/mesh-llm/src/protocol/convert.rs | Converts new posture + attestation fields between local and protobuf. |
| crates/mesh-llm/src/network/tunnel.rs | Adds encrypted tunnel detection + decrypt/forward + encrypted streaming response. |
| crates/mesh-llm/src/network/openai/transport.rs | Adds attestation gating, request encryption, and encrypted response handling in proxy routing. |
| crates/mesh-llm/src/mesh/tests.rs | Updates mesh test node/peer construction for new Node/PeerInfo fields. |
| crates/mesh-llm/src/mesh/mod.rs | Extends peer announcement/info and Node with inference keys, posture, and attestation helpers. |
| crates/mesh-llm/src/mesh/gossip.rs | Gossips inference public key, posture, and attestation; merges these fields transitively. |
| crates/mesh-llm/src/crypto/mod.rs | Exposes new attestation and inference_encryption modules. |
| crates/mesh-llm/src/crypto/inference_encryption.rs | Implements request encryption/decryption + chunked streaming encryption primitives. |
| crates/mesh-llm/src/crypto/error.rs | Adds EncryptionFailed error variant. |
| crates/mesh-llm/src/crypto/attestation.rs | Implements SE key creation/signing, software signing for tests, and attestation verification. |
| crates/mesh-llm/src/cli/mod.rs | Adds --require-attested-hosts flag to enforce routing constraints. |
| crates/mesh-llm/src/api/tests.rs | Updates API tests for expanded peer info fields. |
| crates/mesh-llm/proto/node.proto | Adds inference key, security posture, and attestation blob fields to PeerAnnouncement. |
| crates/mesh-llm/Cargo.toml | Adds p256 plus macOS Security Framework dependencies. |
| Cargo.lock | Locks new transitive dependencies for crypto + macOS SE support. |
- system/mod.rs: pub mod hardening → pub(crate) for consistency - mesh/mod.rs: pub inference_keypair → pub(crate) (secret key visibility) - tunnel.rs: move inline use statement to top-of-file imports - hardening.rs + main.rs + lib.rs: move env scrubbing before tokio runtime to eliminate UB from std::env::remove_var racing with worker threads. main.rs now builds the tokio runtime manually after scrubbing. - attestation.rs + mesh/mod.rs + runtime/mod.rs: reuse SE identity across attestation refreshes via SeIdentityHandle, preserving SE public-key continuity for TOFU pinning instead of creating a new ephemeral key every 5-minute refresh cycle.
|
don't merge without hand testing (at least 2 people) |
* origin/main: fix: wire --discover into serve path, add --name to discover subcommand (#453)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
crates/mesh-llm/src/mesh/mod.rs:2328
peer_is_attestedholds thestatemutex while runningverify_attestation(...)(base64 decode, signature verification, timestamp parsing). This can block unrelated peer/state operations. Consider copying outhardware_attestation+inference_public_keyunder the lock, dropping the lock, then performing verification outside the critical section.
local_security_posture: Arc::new(Mutex::new(None)),
local_hardware_attestation: Arc::new(Mutex::new(None)),
se_identity_handle: Arc::new(Mutex::new(
crate::crypto::attestation::SeIdentityHandle::empty(),
)),
require_attested_hosts: Arc::new(std::sync::atomic::AtomicBool::new(false)),
enumerate_host: true,
gpu_name: None,
hostname: None,
is_soc: Some(false),
gpu_vram: None,
gpu_reserved_bytes: None,
gpu_mem_bandwidth_gbps: Arc::new(tokio::sync::Mutex::new(None)),
gpu_compute_tflops_fp32: Arc::new(tokio::sync::Mutex::new(None)),
gpu_compute_tflops_fp16: Arc::new(tokio::sync::Mutex::new(None)),
config_state: Arc::new(tokio::sync::Mutex::new(
crate::runtime::config_state::ConfigState::default(),
)),
| } | ||
|
|
||
| /// Handle an encrypted response from a remote host tunnel. | ||
| /// | ||
| /// The remote tunnel sends an encrypted stream header followed by | ||
| /// length-prefixed encrypted chunks. This function verifies the advertised | ||
| /// sender key against gossip, decrypts chunks into a bounded async pipe as they | ||
| /// arrive, then reuses the normal HTTP response relay path. Reusing | ||
| /// `relay_probed_response` keeps response translation (`/v1/responses` | ||
| /// adapters), context-overflow retry detection, error remapping, and token | ||
| /// accounting consistent with the plaintext tunnel path while preserving live | ||
| /// token/SSE streaming to the downstream client. | ||
| /// |
| tracing::debug!( | ||
| "API proxy: encrypted response decrypt task cancelled for {}", | ||
| host_id.fmt_short() | ||
| ); |
| let attestation = HardwareAttestation { | ||
| node_endpoint_id: node_endpoint_id.to_string(), | ||
| inference_public_key: inference_public_key.to_string(), | ||
| se_public_key: se.public_key_base64().to_string(), | ||
| binary_hash: posture.binary_hash.clone().unwrap_or_default(), | ||
| chip_name, | ||
| hardware_model, | ||
| unified_memory_bytes, | ||
| sip_enabled: posture.sip_enabled, | ||
| secure_boot_enabled: true, // assume if SE works | ||
| rdma_disabled: posture.rdma_disabled, | ||
| timestamp: chrono::Utc::now().to_rfc3339(), |
| let stdout = String::from_utf8_lossy(&output.stdout); | ||
| stdout.trim() == "disabled" |
| } | ||
|
|
||
| /// Handle an encrypted response from a remote host tunnel. | ||
| /// | ||
| /// The remote tunnel sends an encrypted stream header followed by | ||
| /// length-prefixed encrypted chunks. This function verifies the advertised | ||
| /// sender key against gossip, decrypts chunks into a bounded async pipe as they | ||
| /// arrive, then reuses the normal HTTP response relay path. Reusing | ||
| /// `relay_probed_response` keeps response translation (`/v1/responses` | ||
| /// adapters), context-overflow retry detection, error remapping, and token | ||
| /// accounting consistent with the plaintext tunnel path while preserving live | ||
| /// token/SSE streaming to the downstream client. | ||
| /// |
|
This pull request has not been updated in at least 5 days. It will be closed after 7 days of inactivity to keep the active review queue current. Please update it within 2 days if the changes are still moving forward. |
|
Closing this pull request because it has not been updated in at least 7 days. Please reopen or create a fresh pull request when the changes are ready to continue. |
|
Closing — branch too stale after crate restructure. Porting to fresh branch micn/attestation-privacy. |
E2E inference encryption (NaCl box, ephemeral keys, forward secrecy) + Secure Enclave hardware attestation + runtime hardening. Automatic when peer advertises an inference key — no opt-in needed.
Validation so far
Local validation on this branch:
cargo fmt --all -- --check cargo check -p mesh-llm just build LLAMA_STAGE_BUILD_DIR=.deps/llama.cpp/build-stage-abi-metal cargo test -p mesh-llm --libResult:
1306 passed; 0 failed; 3 ignored.Implementation note: encrypted streaming
The encrypted remote path is expected to preserve live SSE/token streaming. The host encrypts response chunks as they arrive, and the client-side tunnel decrypts those chunks into a bounded async pipe that feeds the normal plaintext HTTP response relay. This keeps
/v1/responsesadapters, context-overflow retry detection, error remapping, and token accounting aligned with the plaintext path without buffering the full response before the downstream client sees events.Next manual validation: 2-Mac node test
Use two macOS machines, ideally Apple Silicon for Secure Enclave attestation coverage.
1. Build and bundle locally
From this branch:
2. Clean both Macs before starting
On both Mac A and Mac B:
Expected: no remaining
mesh-llmprocesses.3. Deploy the same bundle to both Macs
Copy the generated tarball from
dist/to Mac B, extract it, and codesign if required by local macOS policy.On both Macs verify the same binary is running:
Expected: both report the version/build from this PR.
4. Start Mac B as the model host
On Mac B:
Watch logs for startup, model hosting, and if on Apple Silicon, hardware attestation logs such as:
Verify status:
5. Start Mac A and join/discover Mac B
On Mac A, start this PR binary using the normal test mesh flow for your environment, for example auto/discovery or an explicit join token:
or, if using an invite/join flow:
RUST_LOG=mesh_llm=debug mesh-llm --join '<invite-token-from-mac-b>'Verify both nodes see each other:
Expected:
6. Prove remote encrypted inference works
From Mac A, call a model hosted by Mac B. Use the exact model id shown in
/v1/modelsif needed.Streaming chat completions:
Non-streaming chat completions:
Expected:
If those log lines do not appear, the test may only be exercising plaintext routing.
7. Test
/v1/responsesthrough the same remote pathThis is important because the encrypted response handler should reuse the normal response relay/adapter path.
From Mac A:
If streaming responses are enabled in the environment:
Expected:
/v1/responsesshape.8. Test attestation-required behavior
On Apple Silicon Macs, run Mac A with attestation required:
Then repeat a remote inference request from Mac A to Mac B.
Expected when Mac B has valid Secure Enclave attestation:
Expected when Mac B is not attested, or when testing against a non-Apple-Silicon/older host:
--require-attested-hostsis enabled.9. Mixed-version compatibility check
Run one Mac on this PR and the other Mac on the latest released binary.
Test both useful directions:
Expected:
/api/statusshows peers./v1/modelsworks.10. Failure-mode smoke
While a streaming request from Mac A to Mac B is active, stop Mac B:
Expected:
Also send a malformed request from Mac A:
Expected:
Pass criteria
Call the manual 2-Mac validation successful when:
/v1/chat/completionswork./v1/responsesworks through the same path.--require-attested-hostsaccepts an attested Apple Silicon host and rejects unattested hosts.