Skip to content

feat: Add Custom Latency Benchmarking and HTML Reporting for Trainer MCP Tools#26

Open
haroon0x wants to merge 6 commits into
kubeflow:mainfrom
haroon0x:trainer/benchmark-suite
Open

feat: Add Custom Latency Benchmarking and HTML Reporting for Trainer MCP Tools#26
haroon0x wants to merge 6 commits into
kubeflow:mainfrom
haroon0x:trainer/benchmark-suite

Conversation

@haroon0x
Copy link
Copy Markdown

@haroon0x haroon0x commented May 18, 2026

First thing: this is not a vibe coded AI slop PR.

I have added the initial benchmark suite for trainer MCP tools as mentioned in #10 . This PR focuses only on latency benchmarks.

Only tests/benchmarks/report.py is AI generated. The benchmark cases, pytest setup, JSON format, and benchmark flow are written in a simple way so it is easy to review and extend later.

What is implemented

This PR adds latency benchmarks under:

tests/benchmarks/test_latency.py

The benchmarks use pytest-benchmark for timing. After pytest finishes, tests/benchmarks/conftest.py collects the benchmark samples and writes the local result files.

Run:

uv run pytest tests/benchmarks/

Generated files:

benchmark-results/latency.json
benchmark-results/index.html

benchmark-results/ is ignored because these files are generated locally.

Latency benchmarks added

Current latency suite covers:

  • server init in full mode
  • server init in progressive mode
  • server init in semantic mode
  • dynamic tool registry init
  • dynamic tool discovery:
    • list_tools
    • describe_tools
    • find_tools
  • trainer schema and metadata scan
  • preview tool paths:
    • fine_tune
    • run_custom_training
    • run_container_training
  • security validation:
    • validate_k8s_name
    • validate_resource_limits
    • is_safe_python_code

Preview benchmarks use confirmed=False, so they do not submit jobs to Kubernetes.

For fine_tune, the GPU pre-check is patched inside the benchmark so the benchmark can run without needing a real cluster or GPU.

Metrics

The JSON output contains:

  • P50
  • P95
  • P99
  • min
  • max

P95 and P99 are included because average or P50 alone will not show tail latency clearly.

Example output shape:

{
  "suite": "latency",
  "iterations": 100,
  "warmup": 5,
  "results": [
    {
      "name": "server_init_full",
      "unit": "ms",
      "p50": 18.1,
      "p95": 24.5,
      "p99": 31.2,
      "min": 15.9,
      "max": 40.8
    }
  ]
}

HTML report

The HTML report is generated at:

benchmark-results/index.html

It shows:

  • summary cards
  • latency table
  • P50, P95, P99
  • min and max
  • P99/P50 tail ratio
  • visual spread bars

The report code is kept separate so future benchmark suites can also be added to the same report.

Future plan

Next benchmark suites can be added as separate files:

tests/benchmarks/test_token_usage.py
tests/benchmarks/test_cpu_profile.py
tests/benchmarks/test_memory.py

Rough plan:

  • token usage: custom estimator because it is not a timing benchmark
  • CPU profile: Python cProfile
  • memory profile: Python tracemalloc

All of them can still run through pytest and write JSON files into benchmark-results/.

Validation done

I ran:

uv run pytest tests/benchmarks/
uv run ruff check tests/benchmarks/test_latency.py tests/benchmarks/conftest.py tests/benchmarks/report.py
uv run ruff format --check tests/benchmarks/test_latency.py tests/benchmarks/conftest.py tests/benchmarks/report.py
uv run python -m tests.benchmarks.report
uv lock --check

All passed locally.

There are 3 warnings during the benchmark run. These warnings come from the installed Kubeflow dependency using old Pydantic class-based config. They are not from this benchmark code.

Type of Change

  • feat: New feature
  • fix: Bug fix
  • revert: Revert a change
  • chore: Maintenance / tooling

Checklist

  • Benchmark tests pass locally
  • Linting passes for benchmark files
  • Documentation updated

… testing and HTML reporting

Signed-off-by: haroon0x <haroonbmc0@gmail.com>
@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign astefanutti for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: haroon0x <haroonbmc0@gmail.com>
@haroon0x haroon0x changed the title feat: initial implementation of benchmarking (latency) feat: Add Custom Latency Benchmarking and HTML Reporting for Trainer MCP Tools May 18, 2026
@abhijeet-dhumal
Copy link
Copy Markdown
Member

Thanks for this @haroon0x 🚀
scope is right for a first PR. Just few nit picks

Comment thread tests/benchmarks/test_latency.py Outdated
Comment thread tests/benchmarks/benchmarks_runner.py Outdated
Signed-off-by: haroon0x <haroonbmc0@gmail.com>
Copilot AI review requested due to automatic review settings May 18, 2026 17:27
@haroon0x haroon0x requested a review from abhijeet-dhumal May 18, 2026 17:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an initial latency benchmark suite for trainer MCP tools, built on pytest-benchmark, plus a small JSON aggregation layer and a self-contained HTML report generator. Output is written under benchmark-results/ and is git-ignored.

Changes:

  • New tests/benchmarks/ package with test_latency.py (server init, dynamic tool discovery, schema scan, preview tool paths, security validation), conftest.py (latency fixture + percentile aggregation + terminal-summary hook), and report.py (HTML report rendering).
  • Add pytest-benchmark>=5.2.3 dev dependency in pyproject.toml / uv.lock.
  • Document benchmark usage in docs/benchmarks.md and ignore benchmark-results/.

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pyproject.toml Adds pytest-benchmark to dev extras.
uv.lock Locks pytest-benchmark and transitive py-cpuinfo.
.gitignore Ignores generated benchmark-results/ directory.
docs/benchmarks.md New documentation describing how to run benchmarks, output format, and planned suites.
tests/benchmarks/test_latency.py Defines latency benchmark cases for server init, dynamic tools, security validation, and preview tool paths.
tests/benchmarks/conftest.py Provides record_latency_benchmark fixture and writes latency.json + triggers report generation in pytest_terminal_summary.
tests/benchmarks/report.py Renders a static HTML report (summary cards, latency table, spread bars) from suite JSON files.
tests/benchmarks/benchmarks_utils.py Empty placeholder file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1 @@

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haroon0x Why this empty file exists ?

Comment thread tests/benchmarks/report.py Outdated
Comment thread tests/benchmarks/conftest.py
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Haroon <106879583+haroon0x@users.noreply.github.com>
@haroon0x
Copy link
Copy Markdown
Author

@abhijeet-dhumal I have made the changes , can you review ?

Comment thread tests/benchmarks/conftest.py Outdated
Comment thread tests/benchmarks/conftest.py
Comment thread tests/benchmarks/report.py Outdated
Comment thread tests/benchmarks/test_latency.py
haroon0x and others added 2 commits May 25, 2026 23:49
Co-authored-by: Abhijeet Dhumal <84722973+abhijeet-dhumal@users.noreply.github.com>
Signed-off-by: Haroon <106879583+haroon0x@users.noreply.github.com>
Signed-off-by: haroon0x <haroonbmc0@gmail.com>
@haroon0x haroon0x requested a review from abhijeet-dhumal May 25, 2026 18:40
@haroon0x
Copy link
Copy Markdown
Author

@abhijeet-dhumal I have resolved the issues as you have mentioned. Could you take a look and let me know if this is ready to be merged?

After this i was thinking on working issue #5 . Do you have any specific thoughts or req before i start working on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants