Skip to content

fix(sentinel): bound login trackers and report audit drops#314

Open
Agent-Hellboy wants to merge 3 commits into
mainfrom
sentinel/issue_195_revival
Open

fix(sentinel): bound login trackers and report audit drops#314
Agent-Hellboy wants to merge 3 commits into
mainfrom
sentinel/issue_195_revival

Conversation

@Agent-Hellboy

Copy link
Copy Markdown
Owner

Revives the still-applicable scope from #195 on current main.

  • ports API login-attempt bounding into services/api/auth/lockout.go after the auth refactor
  • bounds the UI tracker with the same 30-minute idle TTL and 4096-entry cap
  • amortizes maintenance with periodic pruning and batched eviction
  • evicts unlocked entries before active lockouts, preventing key spray from flushing lockouts
  • counts and logs gateway analytics queue drops, providing the backing value for the metrics work tracked by fix(mcp-gateway): count and log silently dropped audit events #303

Validation:

  • go test -race -count=1 ./... in services/api
  • go test -race -count=1 ./... in services/ui
  • go test -race -count=1 ./... in services/mcp-gateway
  • pre-commit: gitleaks, gofmt, staticcheck, vet, unit tests, generated drift

This does not address the remaining #146 gaps for processor shutdown flushing or gateway SIGTERM handling.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces idle entry pruning and maximum entry limits to the login attempt trackers in both the API auth and UI services to prevent unbounded memory growth. It also refactors locking in the MCP gateway's analytics emitter to prevent deadlocks and tracks dropped events. However, the cache eviction policy in both trackers is vulnerable to a lockout bypass attack, as an attacker can flood the system with dummy requests to evict entries with accumulated failures before they trigger a lockout. It is recommended to prioritize preserving entries with active failures during eviction.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +96 to +114
type candidate struct {
key string
lastSeen time.Time
locked bool
}
candidates := make([]candidate, 0, len(t.entries))
for key, state := range t.entries {
candidates = append(candidates, candidate{
key: key,
lastSeen: state.LastSeen,
locked: state.LockedUntil.After(now),
})
}
sort.Slice(candidates, func(i, j int) bool {
if candidates[i].locked != candidates[j].locked {
return !candidates[i].locked
}
return candidates[i].lastSeen.Before(candidates[j].lastSeen)
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The current eviction policy in enforceMaxLocked sorts candidates solely by whether they are locked and their lastSeen timestamp. This creates a vulnerability where an attacker can bypass the login lockout mechanism via a cache eviction attack (or key spray).

Specifically, if a client has accumulated login failures (e.g., just below the lockout threshold) but is not yet actively locked out, their entry is considered 'unlocked'. If an attacker then floods the tracker with requests from dummy keys/IPs, the tracker will exceed APILoginAttemptMaxEntries and evict the oldest unlocked entries. Since the target client's entry has an older lastSeen timestamp than the newly active dummy entries, it will be evicted, resetting its failure counter to 0. This allows the attacker to brute-force passwords indefinitely without ever triggering a lockout.

To mitigate this, the eviction logic should prioritize preserving entries with active failures (i.e., failures > 0) even if they are not currently locked out, or at least sort them such that entries with 0 failures are evicted first.

	type candidate struct {
		key      string
		lastSeen time.Time
		failures int
		locked   bool
	}
	candidates := make([]candidate, 0, len(t.entries))
	for key, state := range t.entries {
		candidates = append(candidates, candidate{
			key:      key,
			lastSeen: state.LastSeen,
			failures: state.Failures,
			locked:   state.LockedUntil.After(now),
		})
	}
	sort.Slice(candidates, func(i, j int) bool {
		if candidates[i].locked != candidates[j].locked {
			return !candidates[i].locked
		}
		if (candidates[i].failures > 0) != (candidates[j].failures > 0) {
			return candidates[i].failures == 0
		}
		return candidates[i].lastSeen.Before(candidates[j].lastSeen)
	})

Comment thread services/ui/main.go
Comment on lines +830 to +848
type candidate struct {
clientID string
lastSeen time.Time
locked bool
}
candidates := make([]candidate, 0, len(t.clients))
for clientID, state := range t.clients {
candidates = append(candidates, candidate{
clientID: clientID,
lastSeen: state.lastSeen,
locked: now.Before(state.lockedUntil),
})
}
sort.Slice(candidates, func(i, j int) bool {
if candidates[i].locked != candidates[j].locked {
return !candidates[i].locked
}
return candidates[i].lastSeen.Before(candidates[j].lastSeen)
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The current eviction policy in enforceMaxLocked sorts candidates solely by whether they are locked and their lastSeen timestamp. This creates a vulnerability where an attacker can bypass the login lockout mechanism via a cache eviction attack (or key spray).

Specifically, if a client has accumulated login failures (e.g., just below the lockout threshold) but is not yet actively locked out, their entry is considered 'unlocked'. If an attacker then floods the tracker with requests from dummy keys/IPs, the tracker will exceed loginAttemptMaxClients and evict the oldest unlocked entries. Since the target client's entry has an older lastSeen timestamp than the newly active dummy entries, it will be evicted, resetting its failure counter to 0. This allows the attacker to brute-force passwords indefinitely without ever triggering a lockout.

To mitigate this, the eviction logic should prioritize preserving entries with active failures (i.e., failures > 0) even if they are not currently locked out, or at least sort them such that entries with 0 failures are evicted first.

	type candidate struct {
		clientID string
		lastSeen time.Time
		failures int
		locked   bool
	}
	candidates := make([]candidate, 0, len(t.clients))
	for clientID, state := range t.clients {
		candidates = append(candidates, candidate{
			clientID: clientID,
			lastSeen: state.lastSeen,
			failures: state.failures,
			locked:   now.Before(state.lockedUntil),
		})
	}
	sort.Slice(candidates, func(i, j int) bool {
		if candidates[i].locked != candidates[j].locked {
			return !candidates[i].locked
		}
		if (candidates[i].failures > 0) != (candidates[j].failures > 0) {
			return candidates[i].failures == 0
		}
		return candidates[i].lastSeen.Before(candidates[j].lastSeen)
	})

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1282db9f49

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +109 to +113
sort.Slice(candidates, func(i, j int) bool {
if candidates[i].locked != candidates[j].locked {
return !candidates[i].locked
}
return candidates[i].lastSeen.Before(candidates[j].lastSeen)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve failure counters during key-spray eviction

In password login the tracker key is IP|email (services/api/auth/login.go), so an attacker can make two attempts against a target account, submit enough distinct email addresses to exceed the 4096-entry cap, and evict the target's still-unlocked failure state before the third attempt. Repeating this keeps the target below the lockout threshold and bypasses the protection introduced here; retain pre-lockout counters independently of attacker-controlled email cardinality, such as with a separately bounded per-IP tracker.

Useful? React with 👍 / 👎.

Comment thread services/mcp-gateway/analytics.go Outdated
Comment on lines +87 to +89
dropped := s.analyticsDropped.Add(1)
s.analyticsMu.Unlock()
log.Printf("gateway analytics queue full; dropped event total=%d source=%q event_type=%q", dropped, event.Source, event.EventType)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Rate-limit analytics drop logging

When the 256-event analytics queue fills because ingest is slow or unavailable, every subsequent request that emits an event synchronously executes log.Printf. This turns the intended nonblocking drop path into request-path log I/O and can flood stdout or the log backend precisely during sustained overload, amplifying the outage; log a sampled or periodically aggregated dropped count instead.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant