fix(training-agent): create fresh MCP Server per request to eliminate transport-binding race by bokelley · Pull Request #4089 · adcontextprotocol/adcp

bokelley · 2026-05-04T15:27:46Z

The tenant router reused a singleton MCP Server per tenant (held by TenantRegistry) and called server.connect(transport) on every HTTP POST. Under concurrent load the MCP SDK rejected re-connection with "Already connected to a transport", producing -32603 Internal server error on ~1/3 of cold-start requests. The compliance heartbeat and storyboard runners targeting the training agent both hit this under back-to-back load.

Fix: Add createServer(tenantId) factory to RegistryHolder in registry.ts. The factory stores a closure per tenant (keyed by cfg.tenantId — same key TenantRegistry uses) that calls createAdcpServerFromPlatform(opts) fresh per invocation. Each HTTP POST in tenantMcpHandler now calls holder.createServer(resolved.tenantId) to get its own Protocol instance, then runs the connect → handle → close lifecycle on that instance. The registry singleton is retained only for resolveByRequest (tenant existence and health checks).

This matches the pattern already used in standalone.ts and the legacy/strict MCP handlers in index.ts, and matches the MCP SDK's stated requirement for stateless mode (sessionIdGenerator: undefined): one Server per connection.

Non-breaking justification: purely internal server lifecycle change — no change to the HTTP interface, JSON-RPC wire format, tenant registration, or any public API surface. Existing clients are unaffected.

Pre-PR review:

code-reviewer: approved — closure correctness verified, factory-key fix applied (cfg.tenantId not id), ordering guarantee tight
dx-expert: approved — race correctly fixed, resolved.server no longer used for transport ops, comment adequate

Triage-managed PR. This bot does not currently iterate on
review comments or PR conversation threads (only on the source
issue). To unblock:

Push fixup commits directly: gh pr checkout <num> →
fix → push.

Or re-trigger: comment /triage execute on the source
issue.

See #3121
for context.

Session: https://claude.ai/code/session_011KzNaC6rJnpxvUe4QCMUkd

Generated by Claude Code

… transport-binding race Fixes #4084. The tenant router reused a singleton MCP Server per tenant and called server.connect(transport) on every HTTP POST. Under concurrent load the SDK rejected re-connection with "Already connected to a transport", returning -32603 to ~1/3 of cold-start requests. Fix: add createServer(tenantId) factory to RegistryHolder that calls createAdcpServerFromPlatform(opts) fresh per invocation. Each HTTP POST now gets its own Protocol instance for connect→handle→close, matching the SDK's stateless-mode contract (sessionIdGenerator: undefined). https://claude.ai/code/session_011KzNaC6rJnpxvUe4QCMUkd

bokelley added the claude-triaged Issue has been triaged by the Claude Code triage routine. Remove to re-triage. label May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(training-agent): create fresh MCP Server per request to eliminate transport-binding race#4089

fix(training-agent): create fresh MCP Server per request to eliminate transport-binding race#4089
bokelley wants to merge 1 commit intomainfrom
claude/issue-4084-tenant-server-per-request-factory

bokelley commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bokelley commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants