Add DoS protection and idle timeout to vMCP session manager#4047
Add DoS protection and idle timeout to vMCP session manager#4047
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 88deb977d6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Implements resource-exhaustion protections for SessionManagementV2 by adding configurable session limits and idle-session cleanup to reduce risk of backend connection/memory/FD exhaustion.
Changes:
- Add per-client (identity subject) session limiting and idle-session tracking/reaping to
sessionmanager.Manager. - Add server-level global session limit middleware returning HTTP 503 +
Retry-Afterfor new sessions when capped. - Extend/adjust unit tests to cover per-client limits and idle reaping behaviors.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
pkg/vmcp/server/sessionmanager/session_manager.go |
Adds limits struct, per-client counters, idle tracking + reaper, and integrates cleanup into session termination/tool calls. |
pkg/vmcp/server/server.go |
Adds config knobs/defaults, global session-limit middleware (503 + Retry-After), and wires idle reaper lifecycle into server start/shutdown. |
pkg/vmcp/server/sessionmanager/session_manager_test.go |
Updates constructors for new Limits parameter and adds tests for per-client limit + idle reaper behavior. |
Comments suppressed due to low confidence (1)
pkg/vmcp/server/server.go:607
- The new session-limit middleware is applied before AuthMiddleware in code, but due to wrapper ordering the auth middleware becomes the outermost handler and runs first. This contradicts the comment and means over-limit requests will still pay auth cost. Apply AuthMiddleware first and then wrap with sessionLimitMiddleware (so the limit check runs before auth).
// Apply session limit middleware when V2 session management is active.
// Runs before auth so over-limit requests are rejected early without auth overhead.
if s.vmcpSessionMgr != nil && s.config.MaxSessions > 0 {
mcpHandler = s.sessionLimitMiddleware(mcpHandler)
slog.Info("session limit middleware enabled", "max_sessions", s.config.MaxSessions)
}
// Apply authentication middleware if configured (runs first in chain)
if s.config.AuthMiddleware != nil {
mcpHandler = s.config.AuthMiddleware(mcpHandler)
slog.Info("authentication middleware enabled for MCP endpoints")
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4047 +/- ##
==========================================
+ Coverage 68.61% 68.63% +0.01%
==========================================
Files 445 445
Lines 45374 45516 +142
==========================================
+ Hits 31135 31240 +105
- Misses 11841 11867 +26
- Partials 2398 2409 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Implements resource-exhaustion protections for the session-scoped backend lifecycle (SessionManagementV2), resolving issue #3874. Global session limit: new session requests (no Mcp-Session-Id header) receive HTTP 503 with a Retry-After header when the server-wide cap is reached. Default: 100 sessions. Per-client session limit: CreateSession enforces a maximum number of concurrent sessions per auth.Identity.Subject. Anonymous clients are exempt. The counter is rolled back on all failure paths. Default: 10 sessions per identity. Idle session timeout: a background reaper goroutine terminates sessions that have had no CallTool activity for longer than the configured threshold. The idle clock resets on every tool call and is initialised when a session is fully established. The reaper is wired into shutdownFuncs so it stops cleanly on server shutdown. Default: 5 min. All three limits are configurable via Config fields; zero values select the defaults. The Limits struct is passed to sessionmanager.New() and all existing call sites are updated. Closes: #3874
jerm-dro
left a comment
There was a problem hiding this comment.
I don't think these changes are required for the session v2 work. Would you be okay setting this work aside for now?
I think we'll tackle a related problem as part of the upcoming scale work, where we'll need to implement an LRU for the session storage so not all sessions are held in memory.
| // sessionLimitMiddleware is a best-effort fast-fail for new session requests | ||
| // (no Mcp-Session-Id header): it returns HTTP 503 + Retry-After before the | ||
| // request reaches the SDK when the global session cap appears to be reached. | ||
| // Existing sessions (with a valid Mcp-Session-Id) are never affected. | ||
| // | ||
| // This check is intentionally optimistic (non-atomic): it avoids the overhead | ||
| // of routing and SDK processing for clearly-over-limit requests, but it does | ||
| // not guarantee strict enforcement under concurrent load. Strict enforcement | ||
| // is provided atomically by sessionmanager.Manager.Generate(), which uses an | ||
| // increment-first reservation to prevent races between concurrent initialize | ||
| // requests. | ||
| func (s *Server) sessionLimitMiddleware(next http.Handler) http.Handler { | ||
| // Resolve the concrete manager once so we can call ActiveSessionCount(). | ||
| mgr, _ := s.vmcpSessionMgr.(*sessionmanager.Manager) | ||
| return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
| if r.Header.Get("Mcp-Session-Id") == "" && mgr != nil { | ||
| if mgr.ActiveSessionCount() >= s.config.MaxSessions { | ||
| w.Header().Set("Retry-After", strconv.Itoa(defaultRetryAfterSeconds)) | ||
| w.Header().Set("Content-Type", "application/json") | ||
| w.WriteHeader(http.StatusServiceUnavailable) | ||
| _, _ = w.Write([]byte( | ||
| `{"error":{"code":-32000,"message":"Maximum concurrent sessions exceeded. ` + | ||
| `Please try again later or contact administrator."}}`, | ||
| )) | ||
| return | ||
| } | ||
| } | ||
| next.ServeHTTP(w, r) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Blocker: can we remove the rate limiting behavior from this PR?
It's not really needed today, so I'd prefer to avoid adding it.
| // perClientMu guards perClientCounts and sessionSubject. | ||
| perClientMu sync.Mutex | ||
| perClientCounts map[string]int // subject → active session count | ||
| sessionSubject map[string]string // sessionID → subject (for decrement on Terminate) | ||
|
|
||
| // idleActivityMu guards idleActivity. | ||
| idleActivityMu sync.RWMutex | ||
| idleActivity map[string]time.Time // sessionID → last active time | ||
|
|
||
| // activeSessionCount tracks sessions that have been generated but not yet | ||
| // terminated, excluding terminated placeholders left for TTL cleanup. | ||
| // This gives an accurate count for global limit enforcement, unlike | ||
| // storage.Count() which includes those terminated placeholders. | ||
| activeSessionCount atomic.Int64 |
There was a problem hiding this comment.
Blocker: this is adding a lot of complexity to Manager. We should refactor the Manager struct first so we don't have to mix all these concerns together in one class. That would also make unit testing much simpler. I would expect a change like this doesn't require the integration_tests at all.
|
ok setting as draft for now. I know that is not essential but i was speeding up with pending tasks while the main code was reviewed... |
Summary
Implements resource-exhaustion protections for the session-scoped backend lifecycle (SessionManagementV2), resolving issue #3874.
Global session limit: new session requests (no Mcp-Session-Id header) receive HTTP 503 with a Retry-After header when the server-wide cap is reached. Default: 100 sessions.
Per-client session limit: CreateSession enforces a maximum number of concurrent sessions per auth.Identity.Subject. Anonymous clients are exempt. The counter is rolled back on all failure paths. Default: 10 sessions per identity.
Idle session timeout: a background reaper goroutine terminates sessions that have had no CallTool activity for longer than the configured threshold. The idle clock resets on every tool call and is initialised when a session is fully established. The reaper is wired into shutdownFuncs so it stops cleanly on server shutdown. Default: 5 min.
All three limits are configurable via Config fields; zero values select the defaults. The Limits struct is passed to sessionmanager.New() and all existing call sites are updated.
Fixes #3874
Type of change
Test plan
task test)task test-e2e)task lint-fix)Does this introduce a user-facing change?
No
Special notes for reviewers