Skip to content

Add perf run metadata, variance/confidence reporting, and historical regression checks#35

Merged
wolfiesch merged 5 commits into
masterfrom
codex/2026-05-08-08-30-21-add-performance-metadata-capture
May 8, 2026
Merged

Add perf run metadata, variance/confidence reporting, and historical regression checks#35
wolfiesch merged 5 commits into
masterfrom
codex/2026-05-08-08-30-21-add-performance-metadata-capture

Conversation

@wolfiesch

Copy link
Copy Markdown
Collaborator

Motivation

  • Capture richer run context so benchmark outputs can be compared meaningfully across machines and runs, including CPU model, core count, total memory and Python version.
  • Surface variance and confidence guidance in human- and machine-readable outputs to help spot noisy measurements.
  • Add a lightweight, non-flaky baseline regression check using historical run data so regressions can be triaged without brittle statistical tests.
  • Document the canonical run-comparison procedure for maintainers and CI operators.

Description

  • Add a PerfRunEnvironment dataclass and collect runtime environment with _collect_run_environment() to record cpu_model, core_count, and memory_total_mb, then include it on PerfMetadata and in the serialized results.json via perf_results_to_json_dict() (changes in src/excelbench/perf/runner.py).
  • Extend markdown rendering in src/excelbench/perf/renderer.py to display run environment, add a short Confidence note, and render p50/p95 in summary tables via _fmt_p50_p95_ms() for variance visibility.
  • Extend CSV output with variance/confidence fields (read_cv, write_cv, confidence_note, regression_status) and add helper functions _cv() and _regression_status() that compare the current read_p50 to a historical baseline drawn from results/perf/history.jsonl (median of recent samples, last 5 with a minimum of 3) to report stable, regressed, improved, or lightweight status codes when history is missing/insufficient.
  • Keep history append behavior (writing p50_wall_ms per run) and use that file for non-flaky median-based comparisons instead of per-run statistical tests.
  • Add tests to validate the new metadata and renderer shapes: updates to tests/test_perf_cli.py (assert run_environment in metadata and markdown cues), tests/test_perf_data_shape.py (CSV regression/confidence output shape), and tests/test_perf_workloads.py (metadata capture at runtime).
  • Update docs/trackers/performance-benchmark-runs.md with a canonical run-comparison procedure describing stable knobs, metadata checks, CSV confidence columns, and history-based regression gates.

Testing

  • Ran the linter with python -m ruff check src/excelbench/perf/runner.py src/excelbench/perf/renderer.py tests/test_perf_cli.py tests/test_perf_data_shape.py tests/test_perf_workloads.py and fixed reported issues; the linter check passed.
  • Attempted to run the unit tests with python -m pytest -o addopts='' tests/test_perf_cli.py tests/test_perf_data_shape.py tests/test_perf_workloads.py, but test collection failed in this environment due to a missing external dependency (openpyxl), so the new/updated tests could not be fully executed here.

Codex Task

Copilot AI review requested due to automatic review settings May 8, 2026 08:30

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3902714609

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/excelbench/perf/renderer.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances ExcelBench’s performance benchmarking outputs by recording richer run context (machine/environment metadata), surfacing variance/confidence cues in human- and machine-readable reports, and adding a lightweight historical regression indicator based on prior runs.

Changes:

  • Add PerfRunEnvironment and include captured CPU/core/memory info in PerfMetadata and results.json.
  • Update markdown rendering to show run environment and include a confidence note plus p50/p95 display.
  • Extend CSV output with variance/confidence and regression-status columns, backed by history-based comparisons, and document a canonical comparison procedure.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/excelbench/perf/runner.py Adds run-environment collection and serializes it into perf metadata JSON.
src/excelbench/perf/renderer.py Updates markdown/CSV rendering, adds CV + regression-status helpers, appends perf history.
tests/test_perf_cli.py Updates CLI/render assertions to expect new metadata and markdown cues.
tests/test_perf_data_shape.py Adds coverage that CSV output includes new regression/confidence fields.
tests/test_perf_workloads.py Adds coverage that runtime perf metadata includes run environment.
docs/trackers/performance-benchmark-runs.md Documents the canonical procedure for comparing perf runs and interpreting new columns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_perf_cli.py
Comment thread tests/test_perf_cli.py
Comment thread src/excelbench/perf/renderer.py Outdated
Comment thread src/excelbench/perf/renderer.py
Comment thread src/excelbench/perf/renderer.py Outdated
Comment thread src/excelbench/perf/renderer.py Outdated
Comment thread src/excelbench/perf/renderer.py
Comment thread src/excelbench/perf/renderer.py Outdated
Comment thread docs/trackers/performance-benchmark-runs.md Outdated
@wolfiesch wolfiesch merged commit 0785aeb into master May 8, 2026
5 checks passed
@wolfiesch wolfiesch deleted the codex/2026-05-08-08-30-21-add-performance-metadata-capture branch May 8, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants