Skip to content

e2e and attestation#433

Closed
michaelneale wants to merge 12 commits into
mainfrom
micn/e2e-take-2
Closed

e2e and attestation#433
michaelneale wants to merge 12 commits into
mainfrom
micn/e2e-take-2

Conversation

@michaelneale
Copy link
Copy Markdown
Collaborator

@michaelneale michaelneale commented May 5, 2026

E2E inference encryption (NaCl box, ephemeral keys, forward secrecy) + Secure Enclave hardware attestation + runtime hardening. Automatic when peer advertises an inference key — no opt-in needed.

Validation so far

Local validation on this branch:

cargo fmt --all -- --check
cargo check -p mesh-llm
just build
LLAMA_STAGE_BUILD_DIR=.deps/llama.cpp/build-stage-abi-metal cargo test -p mesh-llm --lib

Result: 1306 passed; 0 failed; 3 ignored.

Implementation note: encrypted streaming

The encrypted remote path is expected to preserve live SSE/token streaming. The host encrypts response chunks as they arrive, and the client-side tunnel decrypts those chunks into a bounded async pipe that feeds the normal plaintext HTTP response relay. This keeps /v1/responses adapters, context-overflow retry detection, error remapping, and token accounting aligned with the plaintext path without buffering the full response before the downstream client sees events.

Next manual validation: 2-Mac node test

Use two macOS machines, ideally Apple Silicon for Secure Enclave attestation coverage.

1. Build and bundle locally

From this branch:

just build
just bundle
./target/release/mesh-llm --version

2. Clean both Macs before starting

On both Mac A and Mac B:

pkill -9 -f mesh-llm || true
ps -eo pid,args | grep -E 'mesh-llm' | grep -v grep || true

Expected: no remaining mesh-llm processes.

3. Deploy the same bundle to both Macs

Copy the generated tarball from dist/ to Mac B, extract it, and codesign if required by local macOS policy.

On both Macs verify the same binary is running:

mesh-llm --version

Expected: both report the version/build from this PR.

4. Start Mac B as the model host

On Mac B:

RUST_LOG=mesh_llm=debug mesh-llm --auto

Watch logs for startup, model hosting, and if on Apple Silicon, hardware attestation logs such as:

hardware attestation: SE key bound to node ...

Verify status:

curl -s http://localhost:3131/api/status | jq
curl -s http://localhost:3131/v1/models | jq

5. Start Mac A and join/discover Mac B

On Mac A, start this PR binary using the normal test mesh flow for your environment, for example auto/discovery or an explicit join token:

RUST_LOG=mesh_llm=debug mesh-llm --auto

or, if using an invite/join flow:

RUST_LOG=mesh_llm=debug mesh-llm --join '<invite-token-from-mac-b>'

Verify both nodes see each other:

curl -s http://localhost:3131/api/status | jq '.peers // .mesh.peers // .'
curl -s http://localhost:3131/v1/models | jq

Expected:

  • Mac A sees Mac B as a peer.
  • Mac A sees models hosted/routable on Mac B.
  • Peer version in status matches this PR build.

6. Prove remote encrypted inference works

From Mac A, call a model hosted by Mac B. Use the exact model id shown in /v1/models if needed.

Streaming chat completions:

curl -N http://localhost:3131/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Return exactly: encrypted remote path ok"}],
    "stream": true
  }'

Non-streaming chat completions:

curl -s http://localhost:3131/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Say remote non-stream ok in one sentence."}],
    "stream": false
  }' | jq

Expected:

  • Responses succeed through Mac A while inference is hosted on Mac B.
  • Mac A debug logs show encrypted request/response handling, e.g. messages like:
Encrypted inference request ...
Decrypted response from ...
  • Mac B debug logs show encrypted tunnel handling, e.g. messages like:
Decrypted inbound HTTP tunnel ...
Encrypted streaming response ...

If those log lines do not appear, the test may only be exercising plaintext routing.

7. Test /v1/responses through the same remote path

This is important because the encrypted response handler should reuse the normal response relay/adapter path.

From Mac A:

curl -s http://localhost:3131/v1/responses \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "auto",
    "input": "Return exactly: responses endpoint ok"
  }' | jq

If streaming responses are enabled in the environment:

curl -N http://localhost:3131/v1/responses \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "auto",
    "input": "Stream three short words.",
    "stream": true
  }'

Expected:

  • Valid /v1/responses shape.
  • No broken chunking or hanging.
  • Debug logs still show encrypted tunnel path.

8. Test attestation-required behavior

On Apple Silicon Macs, run Mac A with attestation required:

RUST_LOG=mesh_llm=debug mesh-llm --auto --require-attested-hosts

Then repeat a remote inference request from Mac A to Mac B.

Expected when Mac B has valid Secure Enclave attestation:

  • Routing succeeds.
  • Logs show attestation/security posture and encrypted tunnel path.

Expected when Mac B is not attested, or when testing against a non-Apple-Silicon/older host:

  • Routing fails closed.
  • Mac A logs a refusal similar to:
refusing to route to unattested host ... (--require-attested-hosts)
  • No silent plaintext fallback occurs while --require-attested-hosts is enabled.

9. Mixed-version compatibility check

Run one Mac on this PR and the other Mac on the latest released binary.

Test both useful directions:

  1. PR Mac as client/API, released Mac as host.
  2. Released Mac as client/API, PR Mac as host.

Expected:

  • Gossip/discovery still works.
  • /api/status shows peers.
  • /v1/models works.
  • Inference still works.
  • New-to-old routing should fall back to plaintext because the old host will not advertise an inference key.
  • Old-to-new routing should still work because the new host still accepts plaintext tunnels.

10. Failure-mode smoke

While a streaming request from Mac A to Mac B is active, stop Mac B:

pkill -f mesh-llm

Expected:

  • Mac A does not hang indefinitely.
  • Request fails cleanly or retries another host if available.
  • Logs classify the tunnel failure as timeout/unavailable rather than panicking.

Also send a malformed request from Mac A:

curl -s -i http://localhost:3131/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"bad": true}'

Expected:

  • Error response is relayed/remapped cleanly through the encrypted path.
  • No decrypt errors on successful requests.

Pass criteria

Call the manual 2-Mac validation successful when:

  • Both Macs run this PR binary and see each other.
  • Remote inference from Mac A to Mac B succeeds.
  • Logs prove the encrypted tunnel path is used.
  • Streaming and non-streaming /v1/chat/completions work.
  • /v1/responses works through the same path.
  • --require-attested-hosts accepts an attested Apple Silicon host and rejects unattested hosts.
  • Mixed-version PR/release nodes interoperate.
  • Killing the remote host during a stream fails cleanly without hangs or panics.

Each node generates an X25519 inference keypair at startup, advertised
in gossip. When routing to a remote host that has an inference key, the
proxy encrypts the full HTTP request body with NaCl box (ephemeral
keypair per request for forward secrecy). The receiving node detects the
0xE1 magic byte, decrypts, forwards to the local backend, then encrypts
the response back. Relay/routing nodes see only ciphertext.

Hardware attestation: on macOS Apple Silicon, a P-256 key is created in
the Secure Enclave (private key never leaves hardware). It signs an
attestation blob binding: node endpoint ID, inference public key, binary
SHA-256 hash, and security posture (SIP, RDMA, secure boot). Any peer
can verify the P-256 signature. Challenge-response nonce signing for
continuous liveness proof.

Runtime hardening (macOS): PT_DENY_ATTACH blocks debuggers, core dumps
disabled, dangerous env vars scrubbed, SIP and RDMA status checked,
binary self-hash computed. SecurityPosture gossiped to all peers.

New CLI flag: --require-attested-hosts refuses to route inference to
peers without a verified hardware attestation.

Proto: fields 36-38 on PeerAnnouncement (inference_public_key,
SecurityPosture message, hardware_attestation_blob). Additive — old
nodes ignore new fields.

New deps: p256 (ecdsa verification), security-framework (macOS SE).
Two things that weren't actually wired:

1. SE attestation: try_create_attestation() now called at startup.
   On Apple Silicon with SE access, creates a P-256 key in hardware,
   signs an attestation blob (chip, model, memory, posture, binary hash,
   node ID, inference key), stores in local_hardware_attestation which
   gets gossiped to all peers.

2. Encrypted response decryption: route_remote_attempt now handles the
   encrypted response path. After sending an encrypted request, it reads
   back 0xE1 + encrypted JSON, decrypts with the ephemeral session, and
   writes plaintext HTTP to the client. Previously this was broken — the
   proxy would try to parse encrypted bytes as HTTP and fail.

Also: finish() the QUIC send stream after writing encrypted payload so
the receiver's read_to_end() completes.
- fix response sender auth: verify against gossiped host key, not self-declared
- chunked streaming encryption: SSE events stream through encrypted tunnel
- attestation refresh every 5 min so --require-attested-hosts doesn't expire
- fail closed: refuse plaintext fallback when attestation required
- parse HTTP status from first decrypted chunk
- move harden_runtime() before worker threads spawn
- document attestation trust model limitation
- add 10 tests for chunked crypto, model extraction, status parsing
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end inference encryption and (macOS) Secure Enclave-based attestation, plus OS-level runtime hardening signals that are gossiped to peers and can be enforced via --require-attested-hosts.

Changes:

  • Introduces X25519 + NaCl box request encryption with per-request ephemeral keys and a chunked encrypted streaming response format.
  • Adds Secure Enclave (P-256) attestation generation + cross-platform verification and gossips attestation + security posture.
  • Adds runtime hardening checks (debugger attach denial, core dump disable, env scrubbing) and routing gates for attestation enforcement.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
crates/mesh-llm/src/system/mod.rs Exposes new hardening module from system.
crates/mesh-llm/src/system/hardening.rs Implements best-effort runtime hardening and a gossiped SecurityPosture.
crates/mesh-llm/src/runtime/mod.rs Runs hardening early, creates/stores attestation + posture, and periodically refreshes attestation.
crates/mesh-llm/src/protocol/mod.rs Updates protocol tests/fixtures for new gossip fields.
crates/mesh-llm/src/protocol/convert.rs Converts new posture + attestation fields between local and protobuf.
crates/mesh-llm/src/network/tunnel.rs Adds encrypted tunnel detection + decrypt/forward + encrypted streaming response.
crates/mesh-llm/src/network/openai/transport.rs Adds attestation gating, request encryption, and encrypted response handling in proxy routing.
crates/mesh-llm/src/mesh/tests.rs Updates mesh test node/peer construction for new Node/PeerInfo fields.
crates/mesh-llm/src/mesh/mod.rs Extends peer announcement/info and Node with inference keys, posture, and attestation helpers.
crates/mesh-llm/src/mesh/gossip.rs Gossips inference public key, posture, and attestation; merges these fields transitively.
crates/mesh-llm/src/crypto/mod.rs Exposes new attestation and inference_encryption modules.
crates/mesh-llm/src/crypto/inference_encryption.rs Implements request encryption/decryption + chunked streaming encryption primitives.
crates/mesh-llm/src/crypto/error.rs Adds EncryptionFailed error variant.
crates/mesh-llm/src/crypto/attestation.rs Implements SE key creation/signing, software signing for tests, and attestation verification.
crates/mesh-llm/src/cli/mod.rs Adds --require-attested-hosts flag to enforce routing constraints.
crates/mesh-llm/src/api/tests.rs Updates API tests for expanded peer info fields.
crates/mesh-llm/proto/node.proto Adds inference key, security posture, and attestation blob fields to PeerAnnouncement.
crates/mesh-llm/Cargo.toml Adds p256 plus macOS Security Framework dependencies.
Cargo.lock Locks new transitive dependencies for crypto + macOS SE support.

Comment thread crates/mesh-llm/src/system/mod.rs Outdated
Comment thread crates/mesh-llm/src/mesh/mod.rs Outdated
Comment thread crates/mesh-llm/src/network/tunnel.rs
Comment thread crates/mesh-llm/src/network/tunnel.rs
Comment thread crates/mesh-llm/src/network/openai/transport.rs Outdated
Comment thread crates/mesh-llm/src/network/openai/transport.rs
Comment thread crates/mesh-llm/src/runtime/mod.rs Outdated
Comment thread crates/mesh-llm/src/crypto/attestation.rs
Comment thread crates/mesh-llm/src/crypto/attestation.rs
michaelneale and others added 8 commits May 5, 2026 19:14
- system/mod.rs: pub mod hardening → pub(crate) for consistency
- mesh/mod.rs: pub inference_keypair → pub(crate) (secret key visibility)
- tunnel.rs: move inline use statement to top-of-file imports
- hardening.rs + main.rs + lib.rs: move env scrubbing before tokio runtime
  to eliminate UB from std::env::remove_var racing with worker threads.
  main.rs now builds the tokio runtime manually after scrubbing.
- attestation.rs + mesh/mod.rs + runtime/mod.rs: reuse SE identity across
  attestation refreshes via SeIdentityHandle, preserving SE public-key
  continuity for TOFU pinning instead of creating a new ephemeral key
  every 5-minute refresh cycle.
* origin/main:
  future(telemetry-plugin): add opt-in survey OTLP metrics exporter
  Support duplicate local runtime instances for the same model (#450)
  Use GGUF metadata for context and parallel defaults (#449)

# Conflicts:
#	Cargo.lock
@michaelneale michaelneale marked this pull request as ready for review May 7, 2026 06:05
Copilot AI review requested due to automatic review settings May 7, 2026 06:05
@michaelneale
Copy link
Copy Markdown
Collaborator Author

don't merge without hand testing (at least 2 people)

* origin/main:
  fix: wire --discover into serve path, add --name to discover subcommand (#453)
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

crates/mesh-llm/src/mesh/mod.rs:2328

  • peer_is_attested holds the state mutex while running verify_attestation(...) (base64 decode, signature verification, timestamp parsing). This can block unrelated peer/state operations. Consider copying out hardware_attestation + inference_public_key under the lock, dropping the lock, then performing verification outside the critical section.
            local_security_posture: Arc::new(Mutex::new(None)),
            local_hardware_attestation: Arc::new(Mutex::new(None)),
            se_identity_handle: Arc::new(Mutex::new(
                crate::crypto::attestation::SeIdentityHandle::empty(),
            )),
            require_attested_hosts: Arc::new(std::sync::atomic::AtomicBool::new(false)),
            enumerate_host: true,
            gpu_name: None,
            hostname: None,
            is_soc: Some(false),
            gpu_vram: None,
            gpu_reserved_bytes: None,
            gpu_mem_bandwidth_gbps: Arc::new(tokio::sync::Mutex::new(None)),
            gpu_compute_tflops_fp32: Arc::new(tokio::sync::Mutex::new(None)),
            gpu_compute_tflops_fp16: Arc::new(tokio::sync::Mutex::new(None)),
            config_state: Arc::new(tokio::sync::Mutex::new(
                crate::runtime::config_state::ConfigState::default(),
            )),

Comment on lines 3252 to +3264
}

/// Handle an encrypted response from a remote host tunnel.
///
/// The remote tunnel sends an encrypted stream header followed by
/// length-prefixed encrypted chunks. This function verifies the advertised
/// sender key against gossip, decrypts chunks into a bounded async pipe as they
/// arrive, then reuses the normal HTTP response relay path. Reusing
/// `relay_probed_response` keeps response translation (`/v1/responses`
/// adapters), context-overflow retry detection, error remapping, and token
/// accounting consistent with the plaintext tunnel path while preserving live
/// token/SSE streaming to the downstream client.
///
Comment on lines +3410 to +3413
tracing::debug!(
"API proxy: encrypted response decrypt task cancelled for {}",
host_id.fmt_short()
);
Comment on lines +480 to +491
let attestation = HardwareAttestation {
node_endpoint_id: node_endpoint_id.to_string(),
inference_public_key: inference_public_key.to_string(),
se_public_key: se.public_key_base64().to_string(),
binary_hash: posture.binary_hash.clone().unwrap_or_default(),
chip_name,
hardware_model,
unified_memory_bytes,
sip_enabled: posture.sip_enabled,
secure_boot_enabled: true, // assume if SE works
rdma_disabled: posture.rdma_disabled,
timestamp: chrono::Utc::now().to_rfc3339(),
Comment on lines +155 to +156
let stdout = String::from_utf8_lossy(&output.stdout);
stdout.trim() == "disabled"
Comment on lines 3252 to +3264
}

/// Handle an encrypted response from a remote host tunnel.
///
/// The remote tunnel sends an encrypted stream header followed by
/// length-prefixed encrypted chunks. This function verifies the advertised
/// sender key against gossip, decrypts chunks into a bounded async pipe as they
/// arrive, then reuses the normal HTTP response relay path. Reusing
/// `relay_probed_response` keeps response translation (`/v1/responses`
/// adapters), context-overflow retry detection, error remapping, and token
/// accounting consistent with the plaintext tunnel path while preserving live
/// token/SSE streaming to the downstream client.
///
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has not been updated in at least 5 days. It will be closed after 7 days of inactivity to keep the active review queue current. Please update it within 2 days if the changes are still moving forward.

@github-actions github-actions Bot added the stale label May 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Closing this pull request because it has not been updated in at least 7 days. Please reopen or create a fresh pull request when the changes are ready to continue.

@github-actions github-actions Bot closed this May 15, 2026
@michaelneale michaelneale reopened this May 15, 2026
@michaelneale
Copy link
Copy Markdown
Collaborator Author

Closing — branch too stale after crate restructure. Porting to fresh branch micn/attestation-privacy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants