fix(training-agent): create fresh MCP Server per request to eliminate transport-binding race#4089
Draft
fix(training-agent): create fresh MCP Server per request to eliminate transport-binding race#4089
Conversation
… transport-binding race Fixes #4084. The tenant router reused a singleton MCP Server per tenant and called server.connect(transport) on every HTTP POST. Under concurrent load the SDK rejected re-connection with "Already connected to a transport", returning -32603 to ~1/3 of cold-start requests. Fix: add createServer(tenantId) factory to RegistryHolder that calls createAdcpServerFromPlatform(opts) fresh per invocation. Each HTTP POST now gets its own Protocol instance for connect→handle→close, matching the SDK's stateless-mode contract (sessionIdGenerator: undefined). https://claude.ai/code/session_011KzNaC6rJnpxvUe4QCMUkd
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #4084
The tenant router reused a singleton MCP
Serverper tenant (held byTenantRegistry) and calledserver.connect(transport)on every HTTP POST. Under concurrent load the MCP SDK rejected re-connection with"Already connected to a transport", producing-32603 Internal server erroron ~1/3 of cold-start requests. The compliance heartbeat and storyboard runners targeting the training agent both hit this under back-to-back load.Fix: Add
createServer(tenantId)factory toRegistryHolderinregistry.ts. The factory stores a closure per tenant (keyed bycfg.tenantId— same keyTenantRegistryuses) that callscreateAdcpServerFromPlatform(opts)fresh per invocation. Each HTTP POST intenantMcpHandlernow callsholder.createServer(resolved.tenantId)to get its own Protocol instance, then runs theconnect → handle → closelifecycle on that instance. The registry singleton is retained only forresolveByRequest(tenant existence and health checks).This matches the pattern already used in
standalone.tsand the legacy/strict MCP handlers inindex.ts, and matches the MCP SDK's stated requirement for stateless mode (sessionIdGenerator: undefined): one Server per connection.Non-breaking justification: purely internal server lifecycle change — no change to the HTTP interface, JSON-RPC wire format, tenant registration, or any public API surface. Existing clients are unaffected.
Pre-PR review:
cfg.tenantIdnotid), ordering guarantee tightresolved.serverno longer used for transport ops, comment adequateSession: https://claude.ai/code/session_011KzNaC6rJnpxvUe4QCMUkd
Generated by Claude Code