fix(sentinel): bound login trackers and report audit drops#314
fix(sentinel): bound login trackers and report audit drops#314Agent-Hellboy wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces idle entry pruning and maximum entry limits to the login attempt trackers in both the API auth and UI services to prevent unbounded memory growth. It also refactors locking in the MCP gateway's analytics emitter to prevent deadlocks and tracks dropped events. However, the cache eviction policy in both trackers is vulnerable to a lockout bypass attack, as an attacker can flood the system with dummy requests to evict entries with accumulated failures before they trigger a lockout. It is recommended to prioritize preserving entries with active failures during eviction.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| type candidate struct { | ||
| key string | ||
| lastSeen time.Time | ||
| locked bool | ||
| } | ||
| candidates := make([]candidate, 0, len(t.entries)) | ||
| for key, state := range t.entries { | ||
| candidates = append(candidates, candidate{ | ||
| key: key, | ||
| lastSeen: state.LastSeen, | ||
| locked: state.LockedUntil.After(now), | ||
| }) | ||
| } | ||
| sort.Slice(candidates, func(i, j int) bool { | ||
| if candidates[i].locked != candidates[j].locked { | ||
| return !candidates[i].locked | ||
| } | ||
| return candidates[i].lastSeen.Before(candidates[j].lastSeen) | ||
| }) |
There was a problem hiding this comment.
The current eviction policy in enforceMaxLocked sorts candidates solely by whether they are locked and their lastSeen timestamp. This creates a vulnerability where an attacker can bypass the login lockout mechanism via a cache eviction attack (or key spray).
Specifically, if a client has accumulated login failures (e.g., just below the lockout threshold) but is not yet actively locked out, their entry is considered 'unlocked'. If an attacker then floods the tracker with requests from dummy keys/IPs, the tracker will exceed APILoginAttemptMaxEntries and evict the oldest unlocked entries. Since the target client's entry has an older lastSeen timestamp than the newly active dummy entries, it will be evicted, resetting its failure counter to 0. This allows the attacker to brute-force passwords indefinitely without ever triggering a lockout.
To mitigate this, the eviction logic should prioritize preserving entries with active failures (i.e., failures > 0) even if they are not currently locked out, or at least sort them such that entries with 0 failures are evicted first.
type candidate struct {
key string
lastSeen time.Time
failures int
locked bool
}
candidates := make([]candidate, 0, len(t.entries))
for key, state := range t.entries {
candidates = append(candidates, candidate{
key: key,
lastSeen: state.LastSeen,
failures: state.Failures,
locked: state.LockedUntil.After(now),
})
}
sort.Slice(candidates, func(i, j int) bool {
if candidates[i].locked != candidates[j].locked {
return !candidates[i].locked
}
if (candidates[i].failures > 0) != (candidates[j].failures > 0) {
return candidates[i].failures == 0
}
return candidates[i].lastSeen.Before(candidates[j].lastSeen)
})| type candidate struct { | ||
| clientID string | ||
| lastSeen time.Time | ||
| locked bool | ||
| } | ||
| candidates := make([]candidate, 0, len(t.clients)) | ||
| for clientID, state := range t.clients { | ||
| candidates = append(candidates, candidate{ | ||
| clientID: clientID, | ||
| lastSeen: state.lastSeen, | ||
| locked: now.Before(state.lockedUntil), | ||
| }) | ||
| } | ||
| sort.Slice(candidates, func(i, j int) bool { | ||
| if candidates[i].locked != candidates[j].locked { | ||
| return !candidates[i].locked | ||
| } | ||
| return candidates[i].lastSeen.Before(candidates[j].lastSeen) | ||
| }) |
There was a problem hiding this comment.
The current eviction policy in enforceMaxLocked sorts candidates solely by whether they are locked and their lastSeen timestamp. This creates a vulnerability where an attacker can bypass the login lockout mechanism via a cache eviction attack (or key spray).
Specifically, if a client has accumulated login failures (e.g., just below the lockout threshold) but is not yet actively locked out, their entry is considered 'unlocked'. If an attacker then floods the tracker with requests from dummy keys/IPs, the tracker will exceed loginAttemptMaxClients and evict the oldest unlocked entries. Since the target client's entry has an older lastSeen timestamp than the newly active dummy entries, it will be evicted, resetting its failure counter to 0. This allows the attacker to brute-force passwords indefinitely without ever triggering a lockout.
To mitigate this, the eviction logic should prioritize preserving entries with active failures (i.e., failures > 0) even if they are not currently locked out, or at least sort them such that entries with 0 failures are evicted first.
type candidate struct {
clientID string
lastSeen time.Time
failures int
locked bool
}
candidates := make([]candidate, 0, len(t.clients))
for clientID, state := range t.clients {
candidates = append(candidates, candidate{
clientID: clientID,
lastSeen: state.lastSeen,
failures: state.failures,
locked: now.Before(state.lockedUntil),
})
}
sort.Slice(candidates, func(i, j int) bool {
if candidates[i].locked != candidates[j].locked {
return !candidates[i].locked
}
if (candidates[i].failures > 0) != (candidates[j].failures > 0) {
return candidates[i].failures == 0
}
return candidates[i].lastSeen.Before(candidates[j].lastSeen)
})There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1282db9f49
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| sort.Slice(candidates, func(i, j int) bool { | ||
| if candidates[i].locked != candidates[j].locked { | ||
| return !candidates[i].locked | ||
| } | ||
| return candidates[i].lastSeen.Before(candidates[j].lastSeen) |
There was a problem hiding this comment.
Preserve failure counters during key-spray eviction
In password login the tracker key is IP|email (services/api/auth/login.go), so an attacker can make two attempts against a target account, submit enough distinct email addresses to exceed the 4096-entry cap, and evict the target's still-unlocked failure state before the third attempt. Repeating this keeps the target below the lockout threshold and bypasses the protection introduced here; retain pre-lockout counters independently of attacker-controlled email cardinality, such as with a separately bounded per-IP tracker.
Useful? React with 👍 / 👎.
| dropped := s.analyticsDropped.Add(1) | ||
| s.analyticsMu.Unlock() | ||
| log.Printf("gateway analytics queue full; dropped event total=%d source=%q event_type=%q", dropped, event.Source, event.EventType) |
There was a problem hiding this comment.
Rate-limit analytics drop logging
When the 256-event analytics queue fills because ingest is slow or unavailable, every subsequent request that emits an event synchronously executes log.Printf. This turns the intended nonblocking drop path into request-path log I/O and can flood stdout or the log backend precisely during sustained overload, amplifying the outage; log a sampled or periodically aggregated dropped count instead.
Useful? React with 👍 / 👎.
Revives the still-applicable scope from #195 on current main.
services/api/auth/lockout.goafter the auth refactorValidation:
go test -race -count=1 ./...inservices/apigo test -race -count=1 ./...inservices/uigo test -race -count=1 ./...inservices/mcp-gatewayThis does not address the remaining #146 gaps for processor shutdown flushing or gateway SIGTERM handling.