Skip to content

feat: add comprehensive health check endpoints for server and plugins#14

Open
reldothescribe wants to merge 1 commit intoethpandaops:masterfrom
reldothescribe:feat/health-checks
Open

feat: add comprehensive health check endpoints for server and plugins#14
reldothescribe wants to merge 1 commit intoethpandaops:masterfrom
reldothescribe:feat/health-checks

Conversation

@reldothescribe
Copy link

Summary

This PR adds comprehensive health check endpoints for the MCP server and plugins.

Changes

New Health Endpoints

  1. /health - Overall server health

    • Returns server status, version, and timestamp
    • HTTP 200 with JSON response: {"status": "healthy", "version": "...", "timestamp": "..."}
  2. /health/ready - Readiness probe (server initialized)

    • Returns "ready" when the server is running and initialized
    • Returns HTTP 503 if the server is not ready
  3. /health/live - Liveness probe (server running)

    • Returns "alive" when the server is running
    • Returns HTTP 503 if the server is not alive
  4. /health/plugins - Per-plugin health status

    • Returns overall status and per-plugin health check results
    • Includes plugin status, message, and timestamp for each plugin

Plugin HealthCheck Interface

  • Added HealthCheck() method to the Plugin interface in pkg/plugin/plugin.go
  • Added HealthStatus type with constants: healthy, unhealthy, unknown
  • Added HealthCheckResult struct with status, message, and checked_at fields

Plugin Implementations

Implemented HealthCheck() for all existing plugins:

  • clickhouse: Reports number of clusters configured, checks proxy datasources
  • prometheus: Reports number of instances configured
  • loki: Reports number of instances configured
  • dora: Reports number of networks with Dora explorers available

Registry Updates

  • Added HealthChecks() method to plugin.Registry to aggregate health from all plugins

Backward Compatibility

  • Legacy /ready endpoint is preserved for backward compatibility

Tests

  • Added comprehensive tests for all health check endpoints
  • Added tests for plugin registry health checks

Testing

Run the tests:

make test

Related

This implementation follows Kubernetes health probe conventions and provides a foundation for monitoring the MCP server in production environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant