Skip to content

feat(sentinel): add scoped observability links#198

Merged
Agent-Hellboy merged 4 commits into
mainfrom
sentinel/scoped_observability
Jun 12, 2026
Merged

feat(sentinel): add scoped observability links#198
Agent-Hellboy merged 4 commits into
mainfrom
sentinel/scoped_observability

Conversation

@Agent-Hellboy

Copy link
Copy Markdown
Owner

Summary

  • Add scoped observability API links for readable MCP servers and an allowlisted Prometheus query proxy that verifies the live namespace/server target before querying Prometheus.
  • Include observability metadata in runtime server listings and show My Activity Metrics / configured Grafana actions only when the API says the server is observable for the caller.
  • Document the split between raw admin-only Grafana/Prometheus and scoped user observability, plus PROMETHEUS_API_URL, GRAFANA_SERVER_DASHBOARD_URL, and GRAFANA_SCOPED_USER_ACCESS.

Testing

  • go test ./internal/runtimeapi -count=1 from services/api
  • go test ./... -count=1 from services/ui
  • go test ./... -count=1 from services/api
  • go test ./... -count=1 from repo root
  • go vet ./... from repo root, services/api, and services/ui
  • Commit pre-commit hooks: gitleaks, go fmt, staticcheck, go vet, go unit tests, generated file drift

Refs #181

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces scoped observability for tenant users, enabling access to Prometheus metrics and Grafana dashboards restricted to their own namespaces or owned servers. It adds new API endpoints for observability links and Prometheus query proxying, updates the UI with 'Metrics' and 'Grafana' actions, and provides supporting documentation and tests. A review comment suggests enhancing the Prometheus proxy by logging the response body of failed upstream queries to improve debuggability.

Comment on lines +129 to +133
if resp.StatusCode != http.StatusOK {
_, _ = io.Copy(io.Discard, io.LimitReader(resp.Body, 1024))
writeJSON(w, http.StatusBadGateway, map[string]string{"error": "prometheus_query_failed"})
return
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When the upstream Prometheus query fails with a non-200 status code, the response body is discarded without being logged. This can make debugging issues with the Prometheus integration difficult, as the reason for the failure is not captured. It would be beneficial to log the status code and the body of the error response from Prometheus to aid in troubleshooting.

Suggested change
if resp.StatusCode != http.StatusOK {
_, _ = io.Copy(io.Discard, io.LimitReader(resp.Body, 1024))
writeJSON(w, http.StatusBadGateway, map[string]string{"error": "prometheus_query_failed"})
return
}
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 4096))
log.Printf("observability prometheus query returned status %d namespace=%q server=%q query_id=%q body=%s", resp.StatusCode, target.Namespace, target.Name, query.ID, string(body))
writeJSON(w, http.StatusBadGateway, map[string]string{"error": "prometheus_query_failed"})
return
}

@Agent-Hellboy Agent-Hellboy marked this pull request as ready for review May 19, 2026 05:36

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c28bdd297f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ID: "request_rate",
Name: "Request rate",
Description: "Five-minute MCP gateway request rate for this server.",
Query: "sum(rate(mcp_gateway_requests_total{" + selector + "}[5m]))",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Query metrics that Prometheus actually scrapes

In the bundled deployment this query returns no data for every server: rg finds no producer for mcp_gateway_requests_total (nor the other mcp_gateway_* series), and k8s/11-prometheus.yaml only scrapes mcp-sentinel-api, ingest, processor, and clickhouse, not MCP gateway sidecars. Because the UI now shows a Metrics button whenever these links are present, users get an apparently working scoped link that always opens an empty Prometheus result until the gateway metrics are emitted and scraped, or the allowlist is changed to series that actually exist.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Expose tenant Prometheus and platform-scoped Grafana actions by default for readable MCPServer rows, without requiring Grafana env flags for the safe default path.

Add gateway sidecar metrics, Prometheus discovery for annotated MCPServer services, and operator service annotations so scoped dashboards have real per-server traffic data.

Document the default behavior, update QA skills for tenant UI observability checks, and make service Docker builds honor the target platform for Kind arm64 nodes.
@Agent-Hellboy Agent-Hellboy force-pushed the sentinel/scoped_observability branch from 488053c to 494c639 Compare June 11, 2026 12:10
Check namespace auth before server fetch and return 404 to block cross-tenant
enumeration. Serve gateway /metrics on METRICS_PORT and merge Prometheus
service annotations without wiping user metadata.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Agent-Hellboy

Copy link
Copy Markdown
Owner Author

Pushed review fixes (also on `fix/pr-198-review`):

  • Namespace auth before `GetServer`; return 404 to block cross-tenant enumeration
  • Gateway `/metrics` on separate `METRICS_PORT` (9103)
  • Merge Prometheus service annotations without wiping user metadata
  • Cross-tenant observability tests

Keep observability metrics port tests and main's audit risk_level coverage.

Co-authored-by: Cursor <cursoragent@cursor.com>
@Agent-Hellboy Agent-Hellboy merged commit 286ae74 into main Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant