feat(mcp-gateway): validate and expose applied policy snapshots#298
Conversation
Add document-level metadata (schema_version, deterministic revision, informational generated_at) to the rendered gateway policy contract, a shared pkg/policy.Validate used by both producer and consumer, and last-known-good snapshot activation in the gateway. - pkg/policy: schema version + ComputeRevision/Stamp (SHA-256 over canonical content, excluding revision/generated_at); strict, fail-closed Validate covering schema version, server identity, auth/policy/decision/ trust/side-effect values, unique names, and OAuth issuer requirements. - operator: stamp schema+revision in renderGatewayPolicy, validate before replacing the ConfigMap, and avoid reconcile churn by preserving the prior payload when the revision is unchanged. - mcp-gateway: validate after decode and before activating a snapshot; invalid updates retain the previous valid policy and record the error; expand the snapshot with revision/loaded-at/ready/last-error; add /ready, /config/status, and /metrics (reload counters, active revision, last success timestamp). - docs/runtime.md: explain schema version, revision, and last-known-good. Closes #295 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0186c2930e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Readiness: OK only after a valid policy snapshot has been activated. | ||
| mux.HandleFunc("/ready", srv.handleReady) |
There was a problem hiding this comment.
Wire Kubernetes readiness to /ready
Registering /ready does not affect pod readiness because the operator-generated gateway container still uses a TCP socket probe (internal/operator/deployment.go:532-535). Any serving gateway is therefore marked ready regardless of snapshot.Ready, so Kubernetes and MCPServer.status.gatewayReady cannot detect that no valid policy has been activated. Change the generated readiness probe to an HTTP GET on /ready.
Useful? React with 👍 / 👎.
| if _, ok := supportedSchemaVersions[doc.SchemaVersion]; !ok { | ||
| return fmt.Errorf("policy: unsupported schema version %q", doc.SchemaVersion) | ||
| } |
There was a problem hiding this comment.
Verify revision matches policy content
When a file-backed policy has a supported schema but a missing, stale, or arbitrary revision, Validate accepts it without recomputing ComputeRevision. The gateway then activates the changed authorization body while /config/status and mcp_gateway_policy_active_revision_info report the supplied revision, defeating the new revision contract and potentially making a policy change appear unapplied. Require a nonempty revision and compare it with the computed digest before activation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request implements a robust "last-known-good" policy snapshot mechanism for the gateway. It introduces deterministic policy revision hashing, strict policy validation shared between the operator and gateway, and several observability endpoints including /ready, /config/status, and /metrics. Feedback suggests enhancing the policy validation logic to reject duplicate tool rules within a single grant to prevent conflicting policy evaluations.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| for j, rule := range grant.ToolRules { | ||
| if strings.TrimSpace(string(rule.Name)) == "" { | ||
| return fmt.Errorf("policy: grant %q tool_rules[%d] name is required", grant.Name, j) | ||
| } | ||
| if !validDecision(rule.Decision) { | ||
| return fmt.Errorf("policy: grant %q tool rule %q has invalid decision %q", grant.Name, rule.Name, rule.Decision) | ||
| } | ||
| if !validTrust(rule.RequiredTrust) { | ||
| return fmt.Errorf("policy: grant %q tool rule %q has invalid required_trust %q", grant.Name, rule.Name, rule.RequiredTrust) | ||
| } | ||
| } |
There was a problem hiding this comment.
To prevent ambiguous or conflicting policy evaluations, consider validating that each tool rule within a single grant is unique. Adding a duplicate check for rule.Name ensures that a grant does not define multiple conflicting rules for the same tool.
seenRules := make(map[ToolName]struct{}, len(grant.ToolRules))
for j, rule := range grant.ToolRules {
if strings.TrimSpace(string(rule.Name)) == "" {
return fmt.Errorf("policy: grant %q tool_rules[%d] name is required", grant.Name, j)
}
if _, dup := seenRules[rule.Name]; dup {
return fmt.Errorf("policy: grant %q has duplicate tool rule %q", grant.Name, rule.Name)
}
seenRules[rule.Name] = struct{}{}
if !validDecision(rule.Decision) {
return fmt.Errorf("policy: grant %q tool rule %q has invalid decision %q", grant.Name, rule.Name, rule.Decision)
}
if !validTrust(rule.RequiredTrust) {
return fmt.Errorf("policy: grant %q tool rule %q has invalid required_trust %q", grant.Name, rule.Name, rule.RequiredTrust)
}
}CVE-2026-42504 affects the Go 1.26.3 stdlib; upgrading the multi-stage builder to 1.26.4 (latest stable patch) removes the unfixed CVE so the Trivy operator-image scan no longer fails. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- operator: change gateway readiness probe from TCP socket to HTTP GET /ready so Kubernetes pod readiness gates on actual policy activation (internal/operator/deployment.go) - policy: require non-empty revision and verify it matches the computed SHA-256 digest before activation; prevents stale or arbitrary revision fields from masking undetected policy body changes (pkg/policy/validate.go) - policy: detect duplicate tool rules within a single grant; a grant with two rules for the same tool produced ambiguous evaluations (pkg/policy/validate.go) - tests: restamp documents after body mutations so revision reflects the mutated body and each rejection test fires the intended error path; add revision-missing and revision-mismatch cases (pkg/policy/validate_test.go) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The revision digest covers the document exactly as the operator rendered it. Validating after applyPolicyDefaults mutated the document (auth headers, server identity, policy mode) recomputed a different digest and rejected every operator-stamped policy with revision-mismatch. Validate file-backed documents before defaulting; gateway-generated default documents are stamped and validated after defaulting since their revision covers final content. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Combine last-known-good policy snapshot validation with main's scoped gateway request metrics. Merge conflicts in mcp-gateway main, metrics, and policy_cache retain /ready readiness, revision tracking, and both metric families. Update policy reload test to use stamped policy files. Co-authored-by: Cursor <cursoragent@cursor.com>
Add document-level metadata (schema_version, deterministic revision, informational generated_at) to the rendered gateway policy contract, a shared pkg/policy.Validate used by both producer and consumer, and last-known-good snapshot activation in the gateway.
Closes #295