feat(picod): implement two-stage secure initialization to isolate sandbox sessions#352
feat(picod): implement two-stage secure initialization to isolate sandbox sessions#352Abhinav-kodes wants to merge 35 commits into
Conversation
|
@Abhinav-kodes: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
21ecf85 to
4e28b8b
Compare
There was a problem hiding this comment.
Code Review
This pull request implements a two-stage authentication mechanism for PicoD sandboxes by introducing an /init endpoint and bootstrap key generation. The reviewer identified a critical security flaw where the router's shared key is incorrectly reused for session initialization instead of a unique session key. Additionally, the review points out that the session key can be overwritten, suggests refactoring duplicated PEM parsing logic, highlights the need for comprehensive unit tests for the new initialization flow, and recommends making the HTTP client timeout configurable.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #352 +/- ##
==========================================
+ Coverage 47.57% 50.53% +2.96%
==========================================
Files 30 32 +2
Lines 2819 3364 +545
==========================================
+ Hits 1341 1700 +359
- Misses 1338 1477 +139
- Partials 140 187 +47
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1c65fc7 to
2ef218c
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a two-stage authentication and initialization workflow for PicoD. It introduces bootstrap keys to verify an initial setup phase that establishes unique session-specific keys for each sandbox. The workload manager has been updated to generate these ephemeral keys and perform remote initialization of PicoD instances, while the router now supports signing JWTs with per-session private keys. Reviewer feedback identifies performance bottlenecks related to frequent RSA key parsing and redundant HTTP client instantiation, and provides corrections for the golang-jwt/jwt/v5 library usage to ensure proper claim enforcement.
d610e39 to
1672739
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a two-stage authentication and initialization flow for PicoD, introducing a bootstrap public key and an /init endpoint for session-specific key registration. The changes include updates to the Workload Manager to generate and distribute these keys, and modifications to the Router to support session-specific JWT signing. Feedback includes concerns regarding a potential memory leak in the JWTManager due to an unbounded cache, the use of global variables in bootstrap_auth.go which hinders testability and thread safety, and a recommendation to improve security by verifying the issuer claim in the bootstrap JWT.
|
/gemini review |
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
…, and dead code Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a two-stage secure initialization trust chain between the Router, Workload Manager, and PicoD sandboxes. It replaces the static RSA public key environment variable mechanism with a dynamic, per-session ECDSA keypair established via a secure /init handshake. Critical compilation issues were identified in pkg/picod/auth.go due to the use of a non-existent jwt.WithIssuedAt() function from the golang-jwt/jwt/v5 library. Additionally, a locking optimization is recommended in pkg/router/jwt.go to avoid double-locking and reduce lock contention on the hot path of JWT generation.
…ity logging, testing binding, token generation fallback logic, and key caching locking Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a two-stage secure initialization trust chain between the Router, Workload Manager, and PicoD to mitigate cross-sandbox token replay vulnerabilities. It replaces the static RSA key-pair authentication with a dynamic ECDSA (P-256) session key generated via a new /init handshake. The bootstrap public key (PICOD_BOOTSTRAP_PUBLIC_KEY) is now used solely to verify the initial handshake, which registers the session key for subsequent ES256 JWT verification. The review feedback highlights opportunities to improve context cancellation handling during bootstrap retries and to optimize the token generation hot path in the Router by utilizing a sentinel error to avoid redundant cache lookups.
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a two-stage secure initialization trust chain between the Router, Workload Manager, and PicoD. It replaces the static RSA key pair model with a bootstrap RSA key pair used to verify an /init handshake, which dynamically establishes a unique, per-session ECDSA (P-256) key pair. The session private key is encrypted via AES-GCM and stored in Redis/Valkey. Feedback on the changes points out a compilation error in pkg/picod/auth.go due to the use of an invalid JWT parser option (jwt.WithIssuedAt), and suggests improving responsiveness to context cancellation in pkg/router/jwt.go by replacing time.Sleep with a select block.
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a two-stage secure initialization trust chain for PicoD, transitioning from a static RSA key pair to a dynamic ECDSA session key pair established via an /init handshake. It also introduces AES-GCM encryption for storing session private keys in Redis/Valkey. The review feedback highlights several critical issues: a strict session ID check that breaks the warm pool feature, fragile PEM-string hashing for encryption keys that should use raw DER bytes instead, Redis/Valkey cluster compatibility issues due to transactional multi-key operations (CROSSSLOT errors), and a potential goroutine leak in PicoD's background cleanup task due to uncanceled contexts.
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a two-stage secure initialization trust chain between the Router, Workload Manager, and PicoD to mitigate cross-sandbox token replay vulnerabilities. Instead of a single static RSA key pair, it introduces a bootstrap RSA key pair used only for an /init handshake, which dynamically establishes a unique ECDSA (P-256) session key pair for each sandbox instance. Subsequent requests are signed using ES256 JWTs. The changes span PicoD, Workload Manager, Router, and store packages, adding symmetric encryption (AES-GCM) for storing session private keys in Redis/Valkey. The reviewer identified two key issues: first, treating 409 Conflict as a successful initialization during /init retries is dangerous because a new key pair is generated on each retry, leading to a mismatch between the stored private key and PicoD's registered public key; second, the initialization context timeout is hardcoded to 5 seconds, ignoring the configured PicoInitTimeout value.
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
…ession auth errors Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
|
cc @hzxuzhonghu |
feat(picod): implement two-stage secure initialization to isolate sandbox sessions
What type of PR is this?
/kind feature
/kind security
What this PR does / why we need it
This PR implements Two-Stage Secure Initialization for PicoD to resolve a critical security vulnerability regarding cross-sandbox token replays, while incorporating defense-in-depth, encrypted key storage, and performance optimizations.
Previously, under the Plain Authentication design, all PicoD sandboxes verified JWTs using the same shared public key injected via the
PICOD_AUTH_PUBLIC_KEYenvironment variable. This meant a valid JWT issued for one sandbox could theoretically be replayed against another sandbox, breaking tenant isolation.This PR introduces a cryptographically isolated two-stage handshake:
Bootstrap Stage: The Workload Manager generates a static RSA-2048 Bootstrap Key Pair on startup, stores it in the
agentcube-bootstrap-identityKubernetes Secret/ConfigMap (renamed frompicod-router-identity/picod-router-public-key), and injectsPICOD_BOOTSTRAP_PUBLIC_KEYalong with a uniquePICOD_SESSION_IDinto each PicoD container environment. PicoD loads the bootstrap key at startup and waits in an uninitialized state.Session Stage (
POST /init): The Workload Manager generates a per-sandbox ECDSA (P-256) Session Key Pair. It calls the newPOST /initendpoint on PicoD with a short-lived bootstrap JWT (RS256, signed by the Bootstrap Private Key, scoped bysub=PICOD_SESSION_ID) containing thesession_public_keyclaim. On success, PicoD stores the ECDSA public key and transitions to initialized. All subsequent user requests must be signed with the ECDSA Session Private Key (ES256).Files Changed
cmd/picod/main.gocontext.Background()toNewServerto enable lifecycle-aware goroutinescmd/workload-manager/main.goserver.GetBootstrapPublicKeyPEMintoCodeInterpreterReconcilerafter server constructionpkg/picod/auth.goAuthManager: split intobootstrapPublicKey(RSA) +sessionPublicKey(ECDSA); addedVerifyBootstrapJWT,SetSessionPublicKey, JTI replay-protection viasync.Map, background JTI eviction goroutinepkg/picod/server.goPOST /initroute with per-endpoint rate limiter (2 req/s, burst 5);InitHandlerverifies bootstrap JWT and sets session key; API routes return503 DAEMON_NOT_INITIALIZEDuntil session is establishedpkg/router/jwt.goagentcube-bootstrap-identity; added LRU key cache (keyCacheMaxSize=1000) for per-session ECDSA private keys; addedGenerateTokenWithKeyandGetCachedKeyhelperspkg/router/handlers.gogenerateSandboxJWTfetches per-session ECDSA private key from store (withErrKeyNotCachedfallback); returns503 SESSION_PRIVATE_KEY_NOT_FOUNDif key is missingpkg/router/server.goIdleConnTimeout: 85sandMaxIdleConnsPerHost: 100on HTTP transport; passstoreClienttoTryStoreOrLoadJWTKeySecretpkg/store/interface.goStoreSessionPrivateKey,GetSessionPrivateKey,SetEncryptionKeytoStoreinterfacepkg/store/crypto.go(new)Cryptohelper for at-rest encryption of session private keyspkg/store/keys.go(new)sessionPrivKeyPrefixconstant shared by Redis and Valkey backendspkg/store/store_redis.goStoreSessionPrivateKey/GetSessionPrivateKey(AES-GCM + base64); updatedDeleteSandboxBySessionIDto use non-transactionalPipeline()for cluster-safe deletionpkg/store/store_valkey.goDeleteSandboxBySessionIDuses bareDoMultifor cluster-safe deletionpkg/common/types/sandbox.goSessionPrivateKey(excluded from JSON, persisted separately) andAuthModefields toSandboxInfodocs/design/PicoD-Plain-Authentication-Design.mddocs/design/auth-proposal.mddocs/agentcube/blog/release-v0.1.0/index.mdpkg/picod/auth_test.go,picod_test.go,execute_test.go,server_test.goTestInitHandlercovering success/conflict/invalid-token cases,TestAuthMiddleware_ValidTokenpkg/router/session_manager_test.gopkg/workloadmanager/server.goPicoInitTimeout, bootstrap auth manager wiring,GetBootstrapPublicKeyPEM, and store encryption key setuppkg/workloadmanager/workload_builder.goAuthModedefaults from the new modelKey Security Hardening & Optimizations Included
PICOD_SESSION_ID): PicoD validates thesubclaim in the bootstrap JWT againstPICOD_SESSION_IDwhen set (on-demand pods). Warm pool pods that are pre-created before a session is assigned skip this check; replay protection is still enforced via the JTI guard.AuthManagertracks seen JTIs viasync.Mapwith a background 1-minute cleanup goroutine; bootstrap JTIs older than 3 minutes are evicted.SetSessionPublicKeyreturnsErrAlreadyInitializedon second call;InitHandlerresponds409 Conflictto prevent session key hijacking via re-initialization.pkg/store/crypto.go) before persistence.DeleteSandboxBySessionIDuses a non-transactionalPipeline()(Redis) / bareDoMulti(Valkey) to delete sandbox metadata and the private key safely in clustered deployments./init: A token-bucket rate limiter (2 req/s, burst 5) guards the/initendpoint against spam during the startup window.IdleConnTimeoutset to 85s (aligned with AWS ALB 60s timeout) andMaxIdleConnsPerHost=100to prevent stale-connection race conditions./inithandshake so sandbox initialization does not block indefinitely.SandboxInfoandCreateSandboxResponsenow carryAuthMode, allowing the Router to distinguishpicodsandboxes fromnone-auth sandboxes.PICOD_AUTH_PUBLIC_KEYis still accepted as a fallback; deployments should migrate toPICOD_BOOTSTRAP_PUBLIC_KEY.Which issue(s) this PR fixes
Fixes #
Special notes for your reviewer
This implementation realizes the architecture outlined in
docs/design/agentcube-proposal.mdunder section 5.2 (Picod Workflow). The relevant design documents (docs/design/PicoD-Plain-Authentication-Design.mdandauth-proposal.md) have been fully updated within this PR to accurately document the newPICOD_BOOTSTRAP_PUBLIC_KEY,PICOD_SESSION_ID, and the two-stage ECDSA handshake.The initialization order in
cmd/workload-manager/main.gowas intentionally changed:CodeInterpreterReconcileris now constructed afterNewServerso it can receiveserver.GetBootstrapPublicKeyPEMas a live function reference rather than a nil pointer.Does this PR introduce a user-facing change?