feat: add OpenTelemetry tracing for tool calls#21
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: priyank <priyank8445@gmail.com>
ec68142 to
bec06a0
Compare
|
Hey @priyank766 , this looks great 🚀 |
| "sphinx-design>=0.5", | ||
| ] | ||
| otel = [ | ||
| "opentelemetry-exporter-otlp>=1.25.0", |
There was a problem hiding this comment.
| "opentelemetry-exporter-otlp>=1.25.0", | |
| "opentelemetry-exporter-otlp-proto-http>=1.25.0", |
this avoids installation of unnecessary GRPC subpackage
There was a problem hiding this comment.
Thanks for the Suggestion
I will change the package as well
| exc_info=True, | ||
| ) | ||
| raise | ||
| with tracer.start_as_current_span("tool_call") as span: |
There was a problem hiding this comment.
I tried it just now and it seems span name "tool_call" makes it hard to filter by tool in tracing UIs, Maybe we can consider naming it after the tool: f"tool:{tool_name}" or even just tool_name. The attribute tool.name is still good to keep for structured querying, but the name gives context at a glance.
wdyt?
There was a problem hiding this comment.
Thanks for Reviewing @abhijeet-dhumal
I will push both changes and then you can review it later whenever you get time
Yes I think this naming scheme is much better I will change it .. 👍🏻
Signed-off-by: priyank <priyank8445@gmail.com>
There was a problem hiding this comment.
Pull request overview
Adds optional OpenTelemetry tracing for Kubeflow MCP tool calls, wiring tracing through configuration, CLI, runtime instrumentation, tests, and docs.
Changes:
- Adds telemetry setup/no-op helpers and an
oteloptional dependency extra. - Instruments
_audit_wrapspans with tool/persona/correlation/success/duration attributes. - Wires
--otel-endpoint, config/env loading, startup logging, tests, and README documentation.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
kubeflow_mcp/core/telemetry.py |
Adds OpenTelemetry setup, validation, provider reuse, and no-op fallback helpers. |
kubeflow_mcp/core/server.py |
Adds span creation and attributes around audited tool calls. |
kubeflow_mcp/cli.py |
Adds --otel-endpoint and invokes tracing setup during server startup. |
kubeflow_mcp/core/config.py |
Adds observability.otel_endpoint config/env support. |
kubeflow_mcp/core/logging.py |
Includes tracing_enabled in structured startup logs. |
tests/unit/core/test_telemetry.py |
Adds telemetry helper and span attribute tests. |
kubeflow_mcp/cli_test.py |
Adds CLI tracing setup wiring tests. |
README.md |
Documents optional tracing setup and span attributes. |
pyproject.toml |
Adds the otel optional dependency extra. |
uv.lock |
Locks OpenTelemetry optional dependencies and related resolution changes. |
Comments suppressed due to low confidence (1)
kubeflow_mcp/core/server.py:126
- The circuit-open early-return tracing path is not covered by the new span behavior tests. Add coverage for a breaker with
can_execute() == Falseso thetool.success=falseand duration attributes remain verified for this failure mode.
if not breaker.can_execute():
duration_ms = int((time.monotonic() - start) * 1000)
span.set_attribute("tool.success", False)
span.set_attribute("tool.duration_ms", duration_ms)
logger.warning("circuit_open", extra={"tool": tool_name})
return {
"error": f"Circuit breaker open for '{tool_name}' — K8s API may be degraded. Retries automatically after recovery timeout.",
"error_code": ErrorCode.CIRCUIT_OPEN,
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| with tracer.start_as_current_span(f"tool:{tool_name}") as span: | ||
| span.set_attribute("tool.name", tool_name) | ||
| span.set_attribute("kubeflow.persona", persona) | ||
| span.set_attribute("correlation_id", cid) |
| breaker.record_failure() | ||
| span.set_attribute("tool.success", False) | ||
| span.set_attribute("tool.duration_ms", duration_ms) | ||
| span.record_exception(exc) |
| observability_file = file_config.get("observability", {}) | ||
| observability = ObservabilityConfig( | ||
| otel_endpoint=os.getenv( | ||
| "KUBEFLOW_MCP_OTEL_ENDPOINT", | ||
| observability_file.get("otel_endpoint"), | ||
| ) | ||
| ) |
| if _rate_limiter is not None and not _rate_limiter.acquire(): | ||
| duration_ms = int((time.monotonic() - start) * 1000) | ||
| span.set_attribute("tool.success", False) | ||
| span.set_attribute("tool.duration_ms", duration_ms) | ||
| logger.warning("rate_limited", extra={"tool": tool_name}) | ||
| return { | ||
| "error": "Rate limit exceeded. Retry after a brief pause.", | ||
| "error_code": ErrorCode.RATE_LIMITED, | ||
| } |
| - Install optional dependencies: `pip install ".[otel]"` | ||
| - Enable tracing with CLI flag or env var: | ||
|
|
||
| ```bash | ||
| kubeflow-mcp serve --otel-endpoint http://localhost:4318/v1/traces | ||
| # or | ||
| export KUBEFLOW_MCP_OTEL_ENDPOINT=http://localhost:4318/v1/traces | ||
| kubeflow-mcp serve |
Description
Adds optional OpenTelemetry tracing for tool calls in Kubeflow MCP Server.
core/telemetry.pywithsetup_tracing()andget_tracer()plus safe no-op fallback when OTel deps are unavailable.core.server._audit_wrapto create one span per tool invocation, set tool/persona/duration/success attributes, attachcorrelation_id, and recordexceptions.
--otel-endpointKUBEFLOW_MCP_OTEL_ENDPOINTobservability.otel_endpointImportant compatibility note:
correlation_idsemantics are preserved and exposed as a span attribute; it is not remapped to OTel trace ID.Type of Change
Checklist
make test-python)make verify)Related Issues
Fixes #18