Skip to content

[Debug] Use larger runner for (most) integration test suites#1032

Draft
cpuguy83 wants to merge 3 commits intoproject-dalec:mainfrom
cpuguy83:use_larger_runners
Draft

[Debug] Use larger runner for (most) integration test suites#1032
cpuguy83 wants to merge 3 commits intoproject-dalec:mainfrom
cpuguy83:use_larger_runners

Conversation

@cpuguy83
Copy link
Copy Markdown
Collaborator

@cpuguy83 cpuguy83 commented Apr 9, 2026

No description provided.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@cpuguy83 cpuguy83 force-pushed the use_larger_runners branch 8 times, most recently from 4ee4703 to 52d00ea Compare April 9, 2026 18:19
@cpuguy83 cpuguy83 marked this pull request as ready for review April 9, 2026 18:20
Copilot AI review requested due to automatic review settings April 9, 2026 18:20
Comment thread .github/workflows/ci.yml
- name: Setup source policy
if: inputs.source_policy
uses: ./.github/actions/setup-source-policy
- name: Aggressive cleanup
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disks are larger on the updated runners and this step takes 3mins by itself to run, so just not worth it anymore.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts the CI integration-test job to use a larger GitHub Actions runner for most suites and improves docker/containerd restart diagnostics during CI setup.

Changes:

  • Switch integration job runner selection to a conditional matrix-based runner (larger runner for most suites).
  • Add a Docker diagnostics step and tighten docker/containerd restart handling during OTEL tracing setup.
  • Update composite actions to stop/start Docker with clearer failure reporting.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
.github/workflows/ci.yml Conditional runner selection for integration suites; adds Docker info; changes docker/containerd lifecycle around tracing setup; removes aggressive disk cleanup.
.github/actions/enable-containerd/action.yml Stops/starts Docker when enabling containerd snapshotter and emits logs on failure.
.github/actions/dns-spoof-ubuntu-archive/action.yml Stops/starts Docker after writing daemon DNS config and emits logs on failure.

Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/ci.yml
Comment thread .github/workflows/ci.yml
Comment thread .github/actions/enable-containerd/action.yml
Comment thread .github/actions/dns-spoof-ubuntu-archive/action.yml
Comment thread .github/workflows/ci.yml

integration:
runs-on: ubuntu-22.04
runs-on: ${{ matrix.suite == 'other' && 'ubuntu-22.04' || 'ubuntu-latest-4-cores' }}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runners we have access to are all ubuntu-latest-<n>-cores

After switching to larger runners CI had 3 jobs where dockerd just would
not start. Seemingly because we are restarting docker (for config
updates) quickly enough such that systemd refuses to restart it.

This change resets the fail counter in systemd if docker fails to
restart and tries again.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@cpuguy83 cpuguy83 force-pushed the use_larger_runners branch from 52d00ea to 86fbcc1 Compare April 9, 2026 20:47
@cpuguy83 cpuguy83 marked this pull request as draft April 9, 2026 20:48
Add timeout signaling from test2json2gha to GITHUB_OUTPUT so subsequent
CI steps can detect when tests timed out. On timeout, the dump logs step
now collects goroutine stacks, a binary heap profile, and the dockerd
binary from the runner for offline analysis with go tool pprof.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@cpuguy83 cpuguy83 force-pushed the use_larger_runners branch from 5288397 to 8409f8e Compare April 10, 2026 20:28
@cpuguy83 cpuguy83 self-assigned this Apr 13, 2026
@cpuguy83 cpuguy83 changed the title Use larger runner for (most) integration test suites [Debug] Use larger runner for (most) integration test suites Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants