Run benchmarks in parallel with tests, not sequentially #1130

sbryngelson · 2026-02-09T16:27:10Z

User description

Summary

Move benchmark jobs from bench.yml into test.yml so they run in parallel with tests instead of sequentially after them, reducing typical PR merge time from ~10 hours to ~5 hours
Add cross-cancellation jobs: if tests fail, benchmarks are cancelled (and vice versa)
Delete bench.yml - benchmarks now live entirely in test.yml
Docs-only PRs (README, etc.) continue to skip both tests and benchmarks via the existing checkall gate

Test plan

Verify test and bench jobs both start after lint-gate passes (parallel, not sequential)
Verify a docs-only PR skips both self-hosted tests and benchmarks
Verify bench jobs clone both PR and master, build both, run benchmarks, and post comparison
Verify cross-cancellation: test failure cancels benchmarks and vice versa

Summary by CodeRabbit

Chores
- Removed the standalone benchmark workflow and excluded it from CI path filtering.
- Consolidated benchmark orchestration into the primary CI workflow and adjusted concurrency/cancel-on-failure behavior.
New Features
- Added a unified Frontier "all-configs" path to streamline test and benchmark runs.
- Introduced coordinated multi-node Frontier test/benchmark orchestration, automated post-processing, and improved log collection and archiving.

CodeAnt-AI Description

Run benchmarks in parallel with tests and consolidate Frontier SLURM jobs

What Changed

Benchmarks removed from a standalone workflow and integrated into the main test workflow so benchmarks and tests start in parallel instead of running sequentially.
Frontier test and benchmark runs consolidated into single multi-node SLURM allocations: one consolidated job for all test configs and one for all benchmark configs, with new orchestration scripts to build per-config copies, run builds in parallel on the login node, and launch per-config work on allocated compute nodes.
CI cancellation logic adjusted so test failures cancel running benchmarks, but benchmark failures no longer cancel correctness tests; logs and benchmark comparisons (master vs PR) are generated and posted for Frontier runs.
CI path filtering updated to stop referencing the deleted standalone benchmark workflow; GPU SBATCH options updated to request H200 GPUs and increased tasks-per-node for relevant submit scripts.

Impact

✅ Shorter CI merge time for PRs with tests and benchmarks
✅ Fewer correctness test runs killed by benchmark flakes
✅ Lower Frontier SLURM queue depth and simpler node usage

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Move benchmark jobs from bench.yml into test.yml so tests and benchmarks run in parallel instead of sequentially, reducing merge time from ~10hrs to ~5hrs. Add cross-cancellation jobs so a failure in either group cancels the entire workflow run. Docs-only PRs continue to skip both. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai · 2026-02-09T16:27:15Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

coderabbitai · 2026-02-09T16:27:34Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Removed the standalone .github/workflows/bench.yml and its filter entry; consolidated benchmarking into .github/workflows/test.yml and added multi-node Frontier orchestration and post-processing scripts under .github/scripts/.

Changes

Cohort / File(s)	Summary
Workflow surface `\.github/file-filter.yml`, `\.github/workflows/bench.yml`, `\.github/workflows/test.yml`	Removed `bench.yml` and its file-filter entry; expanded `test.yml` to incorporate `bench` publicly, add `frontier_all` matrix entries, update concurrency groups, adjust build/test gating, and add dedicated Frontier bench/test paths and artifact/log handling.
Frontier orchestration scripts `\.github/scripts/run_frontier_all_tests.sh`, `\.github/scripts/run_frontier_all_benchmarks.sh`, `\.github/scripts/frontier_test_config.sh`, `\.github/scripts/frontier_bench_config.sh`, `\.github/scripts/frontier_bench_post.sh`	Added scripts to create per-config copies, run parallel builds on login node, submit single multi-node sbatch jobs that SSH into allocated nodes to execute per-config test/bench tasks, aggregate YAML outputs, and post-process bench diffs.
Integration / helpers referenced `.../monitor_slurm_job.sh`, `.../pr/.github/scripts/*`, bench_diff/loading utilities	New scripts call existing SLURM monitoring and bench-diff/loading helpers; verify artifact paths, naming conventions, and PR vs master ref handling.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant GH as GitHub Actions (test.yml)
    participant Repo as Repository (PR & Master refs)
    participant Login as Login Node (build)
    participant SLURM as SLURM Scheduler
    participant Node as Compute Node (per-config)
    participant Diff as bench_diff / Poster

    rect rgba(93,156,236,0.5)
    GH->>Repo: clone PR and master refs
    end

    rect rgba(96,169,79,0.5)
    GH->>Login: run per-config builds (parallel)
    Login-->>GH: build artifacts/logs
    end

    rect rgba(236,151,31,0.5)
    GH->>SLURM: submit multi-node sbatch (run_frontier_all_*.sh)
    SLURM->>Node: allocate nodes and SSH per-config tasks
    Node->>Node: run frontier_test_config / frontier_bench_config
    Node-->>SLURM: write per-config YAML outputs
    SLURM-->>GH: job completion and collected outputs
    end

    rect rgba(178,120,255,0.5)
    GH->>Diff: run bench_diff & generate comment
    Diff-->>GH: diff results / comment payload
    GH->>Repo: post comment & upload logs/artifacts
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Shell completion auto-install and pre-commit hook improvements #1124 — Overlapping changes to benchmark CI and Frontier submit/build scripts, directly touching the removed bench.yml surface.
benchmarks hardening! #1078 — Related edits to CI benchmarking orchestration and parallel benchmarking scripts referenced by the new Frontier scripts.

Suggested labels

Review effort 4/5

Poem

🐰 I hopped through CI at dawn,
removed a bench and carried on,
nodes hum, scripts align,
YAMLs bloom in tidy line,
the rabbit cheers — tests refashioned!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective of the PR: moving benchmarks to run in parallel with tests instead of sequentially.
Description check	✅ Passed	The PR description provides a comprehensive summary of changes, test plan, and expected impact. It covers the main objectives and includes both user and AI-generated descriptions.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:27:40Z

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

 _____________________________________________________________
< All programming is an exercise in caching. - Terje Mathisen >
 -------------------------------------------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).

_{✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.}

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:27:45Z

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

 _____________________________________________________________
< All programming is an exercise in caching. - Terje Mathisen >
 -------------------------------------------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).

_{✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.}

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:27:55Z

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

 _____________________________________________________________
< All programming is an exercise in caching. - Terje Mathisen >
 -------------------------------------------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).

_{✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.}

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codeant-ai · 2026-02-09T16:28:19Z

CodeAnt AI finished reviewing your PR.

coderabbitai · 2026-02-09T16:31:47Z

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow Consolidation `.github/file-filter.yml`, `.github/workflows/bench.yml`, `.github/workflows/test.yml`	Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check	✅ Passed	The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:31:52Z

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow Consolidation `.github/file-filter.yml`, `.github/workflows/bench.yml`, `.github/workflows/test.yml`	Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check	✅ Passed	The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:31:58Z

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow Consolidation `.github/file-filter.yml`, `.github/workflows/bench.yml`, `.github/workflows/test.yml`	Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check	✅ Passed	The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:32:08Z

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow Consolidation `.github/file-filter.yml`, `.github/workflows/bench.yml`, `.github/workflows/test.yml`	Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check	✅ Passed	The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/test.yml:
- Around line 285-287: The bench job is missing the NODE_OPTIONS env used in the
self job and can cause OOMs on Phoenix self-hosted runners; update the bench
job's env block (the same place where ACTIONS_RUNNER_FORCE_ACTIONS_NODE_VERSION
and ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION are set) to include NODE_OPTIONS:
'--max-old-space-size=2048' so Phoenix bench runners get the same Node memory
limit as the self job.

🧹 Nitpick comments (2)

.github/workflows/test.yml (2)
301-306: Parallel build: first failure orphans the second background job.

GitHub Actions' default shell uses set -eo pipefail. If wait %1 returns non-zero, the shell exits immediately and wait %2 is never called — the second build process keeps running in the background on the self-hosted runner. While the step correctly fails, the orphaned process wastes resources.

A more robust pattern waits for both jobs unconditionally and then checks both exit codes:
Proposed fix
       - name: Setup & Build
         if: matrix.build_script != ''
         run: |
           (cd pr     && ${{ matrix.build_script }}) &
           (cd master && ${{ matrix.build_script }}) &
-          wait %1 && wait %2
+          wait -n 2 || true  # suppress set -e for individual waits
+          wait -n 2 || true
+          wait %1; s1=$?
+          wait %2; s2=$?
+          if [ "$s1" -ne 0 ] || [ "$s2" -ne 0 ]; then exit 1; fi
Actually, a simpler approach since both jobs have already finished:
Simpler alternative
       - name: Setup & Build
         if: matrix.build_script != ''
         run: |
+          set +e
           (cd pr     && ${{ matrix.build_script }}) &
           (cd master && ${{ matrix.build_script }}) &
-          wait %1 && wait %2
+          wait %1; s1=$?
+          wait %2; s2=$?
+          [ "$s1" -eq 0 ] && [ "$s2" -eq 0 ]
311-314: bench_diff step missing if: always() or if: success() guard.

If the "Bench (Master v. PR)" step fails, the "Generate & Post Comment" step is skipped by default (GitHub Actions skips subsequent steps on failure). If you want the diff comment posted even on partial benchmark failure (e.g., to report which benchmarks regressed), consider adding if: always(). If the current behavior (skip on failure) is intended, this is fine as-is.

.github/workflows/test.yml

coderabbitai · 2026-02-09T16:32:38Z

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow Consolidation `.github/file-filter.yml`, `.github/workflows/bench.yml`, `.github/workflows/test.yml`	Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check	✅ Passed	The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Benchmark failures (e.g. transient network issues) should not cancel correctness tests. Keep only cancel-on-test-failure so that test failures still cancel benchmarks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-02-09T20:10:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.03%. Comparing base (7b35b59) to head (553ff35).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1130   +/-   ##
=======================================
  Coverage   44.03%   44.03%           
=======================================
  Files          70       70           
  Lines       20649    20649           
  Branches     2053     2053           
=======================================
  Hits         9093     9093           
  Misses      10368    10368           
  Partials     1188     1188

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Reduces Frontier queue depth from 11 SLURM jobs to 2 by consolidating test configs into a single 5-node allocation and bench configs into a single 6-node allocation. Builds remain on login nodes; tests/benchmarks run on compute nodes via ssh to allocated nodes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai · 2026-02-09T23:18:55Z

CodeAnt AI is running Incremental review

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

.github/scripts/frontier_bench_post.sh

.github/scripts/frontier_test_config.sh

.github/scripts/run_frontier_all_benchmarks.sh

codeant-ai · 2026-02-09T23:21:50Z

CodeAnt AI Incremental review completed.

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In @.github/scripts/frontier_bench_post.sh:
- Around line 34-35: The issue is that the module environment sourced by ".
./mfc.sh load -c \"$flag\" -m g" is lost because it runs in a separate subshell
from the subsequent "./mfc.sh bench_diff" call; fix it by running both commands
in the same subshell or sourcing the loader in the parent shell so the
environment persists — e.g., change the two separate "(cd pr && ...)"
invocations so they are combined into a single "(cd pr && . ./mfc.sh load -c
\"$flag\" -m g && ./mfc.sh bench_diff \"../$master_yaml\" \"../$pr_yaml\")" (or
alternatively run "cd pr" then source "mfc.sh load" in the current shell before
invoking "./mfc.sh bench_diff"), ensuring the loader (./mfc.sh load) and
bench_diff run in the same shell/session.

In @.github/scripts/frontier_test_config.sh:
- Around line 28-34: The GPU probe using rocm-smi (the gpus and ngpus
assignments) is executed unconditionally and will cause CPU-only runs to fail;
move the rocm-smi call and the ngpus calculation inside the branch that runs
when device equals "gpu". Specifically, relocate the lines that set
gpus=$(rocm-smi ...) and ngpus=$(...) into the if [ "$device" = "gpu" ] branch
before invoking ./mfc.sh, and add a safe fallback for ngpus (e.g., default to 1
or exit with a clear error) if rocm-smi yields no results; ensure you reference
the same variable names (gpus, ngpus, device) and preserve the existing ./mfc.sh
invocation when fixing.

In @.github/workflows/test.yml:
- Around line 270-275: The "Setup & Build" workflow step is dead because
matrix.build_script is set to the empty string for every matrix entry, so the
conditional if: matrix.build_script != '' is never true; either remove the
entire "Setup & Build" step from the workflow or populate the matrix entries
with non-empty build_script values for the matrix entries that need building so
the condition can be true, and update any references to that step accordingly
(look for the job name/step "Setup & Build" and the matrix variable
matrix.build_script).
- Around line 310-319: The workflow only implements one-directional cancellation
via the job named cancel-on-test-failure; to make cancellation bidirectional add
a symmetric job (e.g., cancel-on-bench-failure) that uses needs: [bench], if:
failure(), runs-on: ubuntu-latest and the same steps (Cancel Workflow Run using
gh run cancel ${{ github.run_id }} --repo ${{ github.repository }} with GH_TOKEN
env) so benchmarks failing will cancel the tests; ensure the new job mirrors
cancel-on-test-failure but references needs: [bench] and an appropriate name and
position in the YAML.

🧹 Nitpick comments (4)

.github/scripts/frontier_bench_config.sh (1)
29-33: Quote $(nproc) to prevent word splitting (ShellCheck SC2046).

While nproc returns a single integer in practice, quoting the command substitution is the safer shell idiom and silences the linter.
Proposed fix
-    ./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c "$cluster" $device_opts -n $n_ranks
+    ./mfc.sh bench --mem 1 -j "$(nproc)" -o "$job_slug.yaml" -- -c "$cluster" $device_opts -n $n_ranks
.github/scripts/run_frontier_all_benchmarks.sh (2)

78-86: Duplicated config table inside the heredoc is a maintenance hazard.

The config array on lines 79–86 must be kept in exact sync with lines 15–22. If someone adds, removes, or reorders a config in one place but not the other, the node-to-config mapping silently breaks (wrong benchmark runs on wrong node, or index out of bounds). The same pattern appears in run_frontier_all_tests.sh lines 81–88.

Consider generating the inner config table dynamically (e.g., write it to a file in Phase 1 and source it inside the heredoc), or at minimum add a CI-time assertion that both arrays match.

111-118: $? after a failed if wait may not carry the original exit code in all shells.

In bash, $? in the else branch of if wait "$pid" correctly reflects the child's exit status. However, this is a bash-specific behavior — the shebang (#!/bin/bash) ensures it, but worth a brief comment for future maintainers, since it's a subtle pattern.
.github/scripts/run_frontier_all_tests.sh (1)
56-56: Hardcoded "5-node" in the echo; the benchmarks script uses $num_nodes.

Minor inconsistency — run_frontier_all_benchmarks.sh line 53 uses "Submitting ${num_nodes}-node SLURM job..." while this script hardcodes the number. Using $num_nodes keeps it self-documenting if configs change.
Proposed fix
-echo "All builds complete. Submitting 5-node SLURM job..."
+echo "All builds complete. Submitting ${num_nodes}-node SLURM job..."

.github/scripts/frontier_bench_post.sh

.github/scripts/frontier_test_config.sh

coderabbitai · 2026-02-09T23:25:16Z

.github/workflows/test.yml

+      - name: Setup & Build
+        if: matrix.build_script != '' && matrix.cluster != 'frontier_all'
+        run: |
+          (cd pr     && ${{ matrix.build_script }}) &
+          (cd master && ${{ matrix.build_script }}) &
+          wait %1 && wait %2


⚠️ Potential issue | 🟠 Major

Setup & Build step is dead code — build_script is empty for every matrix entry.

All matrix entries set build_script: "" (lines 226, 234, 242, 250), so the condition matrix.build_script != '' is never true and this step never executes. Either remove it or populate build_script for the entries that need it.

🤖 Prompt for AI Agents

In @.github/workflows/test.yml around lines 270 - 275, The "Setup & Build" workflow step is dead because matrix.build_script is set to the empty string for every matrix entry, so the conditional if: matrix.build_script != '' is never true; either remove the entire "Setup & Build" step from the workflow or populate the matrix entries with non-empty build_script values for the matrix entries that need building so the condition can be true, and update any references to that step accordingly (look for the job name/step "Setup & Build" and the matrix variable matrix.build_script).

.github/workflows/test.yml

cubic-dev-ai

4 issues found across 6 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".github/scripts/frontier_test_config.sh">

<violation number="1" location=".github/scripts/frontier_test_config.sh:28">
P1: The `rocm-smi` command is invoked unconditionally before the device type check. With `set -e` enabled, this will cause CPU-only tests to fail immediately on nodes without ROCm installed, as the script will terminate before reaching the `else` branch. Move the GPU discovery logic inside the `if [ "$device" = "gpu" ]` block.</violation>
</file>

<file name=".github/scripts/run_frontier_all_benchmarks.sh">

<violation number="1" location=".github/scripts/run_frontier_all_benchmarks.sh:140">
P1: The job ID extraction using `grep -oE '[0-9]+'` is fragile and can incorrectly match numbers from error messages (e.g., account names like 'ENG160' or error codes). This could cause the script to proceed with an invalid job ID when `sbatch` actually failed. Parse only the specific 'Submitted batch job' line to ensure a real job ID is captured.</violation>
</file>

<file name=".github/scripts/frontier_bench_config.sh">

<violation number="1" location=".github/scripts/frontier_bench_config.sh:13">
P2: `frontier_bench_config.sh` always loads GPU mode (`-m g`) even when `device=cpu`, unlike the test config. CPU benchmark runs will get the wrong module set; mirror the device-based mode selection.</violation>
</file>

<file name=".github/scripts/frontier_bench_post.sh">

<violation number="1" location=".github/scripts/frontier_bench_post.sh:35">
P1: The environment setup and `bench_diff` invocation are in separate subshells, so the environment changes from `. ./mfc.sh load` are not visible to `bench_diff`. Both commands should run in the same subshell to ensure the loaded environment is applied.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

.github/scripts/frontier_test_config.sh

.github/scripts/run_frontier_all_benchmarks.sh

.github/scripts/frontier_bench_post.sh

.github/scripts/frontier_bench_config.sh

Use rsync with --link-dest and --exclude to avoid copying the target directories back into themselves when creating per-config source copies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Move rocm-smi GPU discovery inside the gpu branch so CPU configs don't fail with set -e (frontier_test_config.sh) - Combine mfc.sh load and bench_diff into single subshell so loaded module environment persists (frontier_bench_post.sh) - Use device-based mode selection in bench config for consistency (frontier_bench_config.sh) - Parse 'Submitted batch job' line instead of bare grep for digits (run_frontier_all_tests.sh, run_frontier_all_benchmarks.sh) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Prevents rapid pushes from cancelling unrelated jobs. Each job type (github, self, bench) now has its own concurrency group keyed by ref and matrix params, so a new push only cancels stale instances of the same job — not all jobs across the workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".github/workflows/test.yml">

<violation number="1" location=".github/workflows/test.yml:68">
P2: The concurrency group for the github matrix job omits matrix.precision, so the default-precision job and the `single` precision job share the same group and can cancel each other when `cancel-in-progress: true` is enabled. Include precision in the concurrency key to avoid unintended cancellations and preserve parallelism.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

.github/workflows/test.yml

sbryngelson · 2026-02-10T00:15:46Z

/improve

.github/scripts/frontier_bench_config.sh

.github/scripts/run_frontier_all_benchmarks.sh

.github/scripts/run_frontier_all_tests.sh

Run all config builds concurrently instead of sequentially. Each build runs in its own subshell/directory with output captured to a per-config log file. On failure, only the failed build logs are printed. Tests: 5 builds x -j 8 = 40 cores; Bench: 6 builds x -j 8 = 48 cores. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

New pushes should cancel all stale jobs to free limited runners. Per-job concurrency was overly granular — old Phoenix/Github jobs kept running even when a new push made them irrelevant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In @.github/scripts/frontier_bench_config.sh:
- Around line 30-34: The shell invocation of ./mfc.sh bench uses unquoted
numeric substitutions: wrap the $n_ranks and the $(nproc) expansions in double
quotes to avoid word-splitting or empty-value pitfalls (leave intentional
word-splitting for $device_opts as-is); update the two command lines that call
mfc.sh bench (the gpu branch using -j $n_ranks and the else branch using -j
$(nproc)) to use "-j \"$n_ranks\"" and "-j \"$(nproc)\"" respectively, keeping
other arguments like "$job_slug.yaml" and -c "$cluster" unchanged.

In @.github/scripts/run_frontier_all_benchmarks.sh:
- Around line 64-68: The append to build_exits is using an unquoted variable
which can cause word-splitting; change the array append from
build_exits+=($code) to append the quoted value (build_exits+=("$code")) so the
exit code is treated as a single element; update the occurrence around the else
branch where variables code, build_exits and build_failed are handled in the
run_frontier_all_benchmarks.sh script.
- Around line 29-35: The current loop using cp -al (in the for cfg ... read -r
version cluster device interface ... dir=... block) will include previously
created config dirs into subsequent copies; replace the cp-based source-copy
strategy with an rsync --archive --link-dest="$version" plus an --exclude list
(or dynamically build --exclude entries from already-created dir names) to avoid
copying sibling config directories into each other; specifically update the code
that creates "$dir" (the cp -al/cp -r calls) to invoke rsync with --link-dest
pointing at the original "$version" directory and --exclude for any
already-created config dirs so each new config is a hard-linked copy without
nesting.

In @.github/scripts/run_frontier_all_tests.sh:
- Around line 69-73: The variable expansion of code is unquoted which can cause
word-splitting; update the build failure block to quote $code everywhere it's
expanded — e.g., change echo "  Build FAILED: $cluster $device $interface (exit
$code)" to ensure the exit code expansion is quoted where appropriate and push
the code into the array as build_exits+=("$code") (and similarly quote any other
$code uses) so SC2206 is resolved; locate this in the failure handling around
build_exits and build_failed in the script.

🧹 Nitpick comments (4)

.github/scripts/run_frontier_all_benchmarks.sh (1)

15-22: Duplicated config table inside the heredoc is a maintenance hazard.

The config array is defined twice — once on lines 15–22 and again inside the SLURM heredoc on lines 118–125 — with a comment "must match the outer script". Any future edit to one that misses the other will silently cause node–config mismatches and wrong benchmark results.

Consider generating the inner config table dynamically (e.g., writing it to a file before sbatch and sourcing it inside, or using an envsubst/template approach) to keep a single source of truth. The same concern applies to run_frontier_all_tests.sh.

Also applies to: 117-125
.github/scripts/frontier_bench_config.sh (2)
6-9: Missing set -u — undefined positional args silently become empty strings.

The orchestrator scripts use set -euo pipefail, but this script only uses set -e. If called with fewer than 3 arguments, $1/$2/$3 will silently expand to empty strings, leading to confusing downstream failures (e.g., empty $cluster in the mfc.sh load call).
Proposed fix
 set -e
 set -x
 
+if [ $# -ne 3 ]; then
+    echo "Usage: $0 <cluster> <device> <interface>"
+    exit 1
+fi
+
 cluster=$1; device=$2; interface=$3
20-28: GPU detection pipeline will abort the script if no GPUs are found (via set -e + grep).

If rocm-smi returns no numeric GPU IDs, grep -Eo '[0-9]+' exits non-zero and set -e kills the script without a clear error message. This is acceptable as a fast-fail on a compute node that should have GPUs, but a guard with an explicit error message would improve debuggability.
.github/scripts/run_frontier_all_tests.sh (1)
27-40: $excludes string should be an array for robustness.

Line 39 uses $excludes as an unquoted string to achieve word splitting. While the current directory names are safe, this pattern is fragile. Using an array would be more robust and consistent with bash best practices.
Proposed fix
-excludes=""
+excludes=()
 for cfg in "${configs[@]}"; do
     read -r cluster device interface <<< "$cfg"
-    excludes+=" --exclude=test-${cluster}-${device}-${interface}"
+    excludes+=(--exclude="test-${cluster}-${device}-${interface}")
 done
 
 for cfg in "${configs[@]}"; do
     read -r cluster device interface <<< "$cfg"
     dir="test-${cluster}-${device}-${interface}"
     echo "Creating source copy: $dir"
-    rsync -a --link-dest="$(pwd)" $excludes ./ "$dir/"
+    rsync -a --link-dest="$(pwd)" "${excludes[@]}" ./ "$dir/"
 done

.github/scripts/frontier_bench_config.sh

.github/scripts/run_frontier_all_benchmarks.sh

.github/scripts/run_frontier_all_tests.sh

Prints a periodic heartbeat (every 2 min) during parallel Frontier builds showing which configs are still running with log sizes and last output line. Build logs are printed in collapsible ::group:: sections for passed builds and in full for failed builds so failures are immediately visible. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cubic-dev-ai

2 issues found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".github/scripts/run_frontier_all_benchmarks.sh">

<violation number="1" location=".github/scripts/run_frontier_all_benchmarks.sh:94">
P2: `kill` can fail if the heartbeat already exited, and with `set -e` that aborts the script. Guard the kill with `|| true` to avoid false failures when the heartbeat finishes first.</violation>
</file>

<file name=".github/scripts/run_frontier_all_tests.sh">

<violation number="1" location=".github/scripts/run_frontier_all_tests.sh:99">
P2: `set -e` causes the unconditional `kill` to abort the script when the heartbeat has already exited. Guard the `kill` with `|| true` (or check the PID) so a normal heartbeat exit doesn’t fail the workflow.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

.github/scripts/run_frontier_all_benchmarks.sh

.github/scripts/run_frontier_all_tests.sh

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.github/workflows/test.yml (1)
9-11: ⚠️ Potential issue | 🟠 Major

Add scancel to cleanup handler in monitor_slurm_job.sh.

The script does set a trap on EXIT (line 13) but only kills the tail process. When GitHub Actions cancels the workflow via cancel-in-progress: true, the monitoring script exits while the submitted SLURM job continues running on Frontier until it completes or times out.

The cleanup function should call scancel $job_id to terminate the job:
Suggested cleanup handler
cleanup() {
  if [ -n "${tail_pid:-}" ]; then
    kill "${tail_pid}" 2>/dev/null || true
  fi
  if [ -n "${job_id:-}" ]; then
    scancel "$job_id" 2>/dev/null || true
  fi
}

🤖 Fix all issues with AI agents

In @.github/scripts/run_frontier_all_tests.sh:
- Around line 127-129: The echo message currently hardcodes "5-node" which will
become incorrect if the node count changes; update the final status echo to
interpolate the variable num_nodes instead of the literal "5-node" (replace the
string in the echo that mentions "5-node SLURM job..." to use "$num_nodes" so it
reflects the actual configured node count), ensuring any surrounding text
remains the same and quoting is preserved.

🧹 Nitpick comments (5)

.github/scripts/run_frontier_all_benchmarks.sh (2)
15-22: Duplicated config table between outer script and SLURM heredoc is fragile.

The configs array is defined twice — lines 15-22 and lines 149-156 — with only a comment ("must match the outer script") linking them. If a config is added, removed, or reordered in one place but not the other, the SLURM job will silently map the wrong config to the wrong node. Consider generating the inner config table dynamically (e.g., write it to a file in Phase 1 and source it inside the heredoc) or at minimum add a count assertion inside the SLURM job.
Minimal guard: assert count inside the SLURM job
 # Config table (must match the outer script)
 configs=(
     "pr     frontier     gpu acc"
     "pr     frontier     gpu omp"
     "pr     frontier_amd gpu omp"
     "master frontier     gpu acc"
     "master frontier     gpu omp"
     "master frontier_amd gpu omp"
 )
+
+if [ "${`#configs`[@]}" -ne "$SLURM_JOB_NUM_NODES" ]; then
+    echo "ERROR: config count (${`#configs`[@]}) != allocated nodes ($SLURM_JOB_NUM_NODES)"
+    exit 1
+fi
Also applies to: 149-156

168-171: SSH commands use unquoted $SLURM_SUBMIT_DIR/$dir — safe only if paths have no spaces.

Inside the SLURM heredoc, the cd and paths passed to SSH are unquoted. SLURM submit directories on Frontier are unlikely to contain spaces, but quoting them is low-cost insurance.
Proposed fix
     ssh -q -o StrictHostKeyChecking=no "$node" \
-        "cd $SLURM_SUBMIT_DIR/$dir && bash .github/scripts/frontier_bench_config.sh $cluster $device $interface" \
+        "cd \"$SLURM_SUBMIT_DIR/$dir\" && bash .github/scripts/frontier_bench_config.sh \"$cluster\" \"$device\" \"$interface\"" \
         > "$outfile" 2>&1 &
.github/scripts/run_frontier_all_tests.sh (2)
14-20: Same fragile config duplication as in run_frontier_all_benchmarks.sh.

The configs array is maintained in two places (lines 14-20 and 154-160) with a "must match" comment. Same risk of silent mismatch. Consider the same mitigation — at minimum, assert ${#configs[@]} == $SLURM_JOB_NUM_NODES inside the SLURM job.

Also applies to: 154-160

172-176: Same unquoted path concern in SSH commands as benchmarks script.

$SLURM_SUBMIT_DIR/$dir is unquoted inside the SSH command string. Unlikely to cause issues on Frontier but quoting is low-cost.
Proposed fix
     ssh -q -o StrictHostKeyChecking=no "$node" \
-        "cd $SLURM_SUBMIT_DIR/$dir && bash .github/scripts/frontier_test_config.sh $cluster $device $interface" \
+        "cd \"$SLURM_SUBMIT_DIR/$dir\" && bash .github/scripts/frontier_test_config.sh \"$cluster\" \"$device\" \"$interface\"" \
         > "$outfile" 2>&1 &
.github/workflows/test.yml (1)

185-195: Build step (line 186) is effectively dead code for current matrix entries.

The condition matrix.cluster != 'phoenix' && matrix.cluster != 'frontier_all' excludes every current matrix entry (all are either phoenix or frontier_all). This is harmless since both clusters handle builds internally (phoenix via submit.sh, frontier_all via run_frontier_all_tests.sh), but it's worth knowing this step is a no-op today. No action needed unless you want to clean it up.

.github/scripts/run_frontier_all_tests.sh

Add || true to kill command so set -e doesn't abort the script when the heartbeat subshell has already exited. Use $num_nodes instead of hardcoded "5-node" in the test script log message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Self-hosted runners don't clean workspaces between runs, so config directories from cancelled prior runs can persist. Add rm -rf before cp -al/rsync to prevent "same file" hardlink errors when copying into an existing directory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Eliminate config table duplication by writing configs to a file and reading it inside the sbatch heredoc (now unquoted for variable expansion) - Add EXIT trap in sbatch scripts to kill orphaned SSH processes on job cancellation - Add per-config timeout (90 min tests, 120 min benchmarks) to prevent a single hanging config from consuming the full walltime - Extract SLURM account, partition, walltime, and node count into variables at the top of each orchestration script - Validate GPU count from rocm-smi (1-16 range) with diagnostic output on failure in both test and benchmark compute-node scripts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai · 2026-02-10T17:06:22Z

CodeAnt AI is running Incremental review

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai · 2026-02-10T17:10:19Z

CodeAnt AI Incremental review completed.

Copilot AI review requested due to automatic review settings February 9, 2026 16:27

codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Feb 9, 2026

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

Copilot AI reviewed Feb 9, 2026

View reviewed changes

codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 9, 2026

codeant-ai bot reviewed Feb 9, 2026

View reviewed changes

.github/scripts/frontier_bench_post.sh Outdated Show resolved Hide resolved

.github/scripts/frontier_test_config.sh Outdated Show resolved Hide resolved

.github/scripts/run_frontier_all_benchmarks.sh Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

cubic-dev-ai bot reviewed Feb 9, 2026

View reviewed changes

sbryngelson and others added 3 commits February 9, 2026 18:28

Fix cp-into-self error in consolidated Frontier test script

8c14b05

Use rsync with --link-dest and --exclude to avoid copying the target directories back into themselves when creating per-config source copies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cubic-dev-ai bot reviewed Feb 9, 2026

View reviewed changes

.github/workflows/test.yml Outdated Show resolved Hide resolved

qodo-code-review bot reviewed Feb 10, 2026

View reviewed changes

sbryngelson and others added 2 commits February 9, 2026 19:24

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

.github/scripts/frontier_bench_config.sh Show resolved Hide resolved

.github/scripts/run_frontier_all_benchmarks.sh Show resolved Hide resolved

.github/scripts/run_frontier_all_benchmarks.sh Show resolved Hide resolved

.github/scripts/run_frontier_all_tests.sh Show resolved Hide resolved

cubic-dev-ai bot reviewed Feb 10, 2026

View reviewed changes

.github/scripts/run_frontier_all_benchmarks.sh Outdated Show resolved Hide resolved

.github/scripts/run_frontier_all_tests.sh Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

.github/scripts/run_frontier_all_tests.sh Show resolved Hide resolved

sbryngelson and others added 4 commits February 9, 2026 20:03

Switch Phoenix GPU jobs to H200 nodes for faster scheduling

6e173e4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:XL This PR changes 500-999 lines, ignoring generated files labels Feb 10, 2026

Run benchmarks in parallel with tests, not sequentially #1130

Are you sure you want to change the base?

Run benchmarks in parallel with tests, not sequentially #1130

Conversation

sbryngelson commented Feb 9, 2026 • edited by codeant-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Test plan

Summary by CodeRabbit

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

codeant-ai bot commented Feb 9, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

coderabbitai bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026

Uh oh!

codeant-ai bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot commented Feb 9, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot commented Feb 9, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot commented Feb 9, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Feb 9, 2026

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

sbryngelson commented Feb 9, 2026 •

edited by codeant-ai bot

Loading

coderabbitai bot commented Feb 9, 2026 •

edited

Loading

codecov bot commented Feb 9, 2026 •

edited

Loading