feat: Add Custom Latency Benchmarking and HTML Reporting for Trainer MCP Tools#26
feat: Add Custom Latency Benchmarking and HTML Reporting for Trainer MCP Tools#26haroon0x wants to merge 6 commits into
Conversation
… testing and HTML reporting Signed-off-by: haroon0x <haroonbmc0@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: haroon0x <haroonbmc0@gmail.com>
|
Thanks for this @haroon0x 🚀 |
Signed-off-by: haroon0x <haroonbmc0@gmail.com>
There was a problem hiding this comment.
Pull request overview
Adds an initial latency benchmark suite for trainer MCP tools, built on pytest-benchmark, plus a small JSON aggregation layer and a self-contained HTML report generator. Output is written under benchmark-results/ and is git-ignored.
Changes:
- New
tests/benchmarks/package withtest_latency.py(server init, dynamic tool discovery, schema scan, preview tool paths, security validation),conftest.py(latency fixture + percentile aggregation + terminal-summary hook), andreport.py(HTML report rendering). - Add
pytest-benchmark>=5.2.3dev dependency inpyproject.toml/uv.lock. - Document benchmark usage in
docs/benchmarks.mdand ignorebenchmark-results/.
Reviewed changes
Copilot reviewed 6 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Adds pytest-benchmark to dev extras. |
| uv.lock | Locks pytest-benchmark and transitive py-cpuinfo. |
| .gitignore | Ignores generated benchmark-results/ directory. |
| docs/benchmarks.md | New documentation describing how to run benchmarks, output format, and planned suites. |
| tests/benchmarks/test_latency.py | Defines latency benchmark cases for server init, dynamic tools, security validation, and preview tool paths. |
| tests/benchmarks/conftest.py | Provides record_latency_benchmark fixture and writes latency.json + triggers report generation in pytest_terminal_summary. |
| tests/benchmarks/report.py | Renders a static HTML report (summary cards, latency table, spread bars) from suite JSON files. |
| tests/benchmarks/benchmarks_utils.py | Empty placeholder file. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1 @@ | |||
|
|
|||
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Haroon <106879583+haroon0x@users.noreply.github.com>
|
@abhijeet-dhumal I have made the changes , can you review ? |
Co-authored-by: Abhijeet Dhumal <84722973+abhijeet-dhumal@users.noreply.github.com> Signed-off-by: Haroon <106879583+haroon0x@users.noreply.github.com>
Signed-off-by: haroon0x <haroonbmc0@gmail.com>
|
@abhijeet-dhumal I have resolved the issues as you have mentioned. Could you take a look and let me know if this is ready to be merged? After this i was thinking on working issue #5 . Do you have any specific thoughts or req before i start working on that? |
First thing: this is not a vibe coded AI slop PR.
I have added the initial benchmark suite for trainer MCP tools as mentioned in #10 . This PR focuses only on latency benchmarks.
Only
tests/benchmarks/report.pyis AI generated. The benchmark cases, pytest setup, JSON format, and benchmark flow are written in a simple way so it is easy to review and extend later.What is implemented
This PR adds latency benchmarks under:
The benchmarks use
pytest-benchmarkfor timing. After pytest finishes,tests/benchmarks/conftest.pycollects the benchmark samples and writes the local result files.Run:
Generated files:
benchmark-results/is ignored because these files are generated locally.Latency benchmarks added
Current latency suite covers:
fullmodeprogressivemodesemanticmodelist_toolsdescribe_toolsfind_toolsfine_tunerun_custom_trainingrun_container_trainingvalidate_k8s_namevalidate_resource_limitsis_safe_python_codePreview benchmarks use
confirmed=False, so they do not submit jobs to Kubernetes.For
fine_tune, the GPU pre-check is patched inside the benchmark so the benchmark can run without needing a real cluster or GPU.Metrics
The JSON output contains:
P95 and P99 are included because average or P50 alone will not show tail latency clearly.
Example output shape:
{ "suite": "latency", "iterations": 100, "warmup": 5, "results": [ { "name": "server_init_full", "unit": "ms", "p50": 18.1, "p95": 24.5, "p99": 31.2, "min": 15.9, "max": 40.8 } ] }HTML report
The HTML report is generated at:
It shows:
The report code is kept separate so future benchmark suites can also be added to the same report.
Future plan
Next benchmark suites can be added as separate files:
Rough plan:
cProfiletracemallocAll of them can still run through pytest and write JSON files into
benchmark-results/.Validation done
I ran:
All passed locally.
There are 3 warnings during the benchmark run. These warnings come from the installed Kubeflow dependency using old Pydantic class-based config. They are not from this benchmark code.
Type of Change
Checklist