Skip to content

ci: fix test result reporting after re-runs and PR updates#2624

Draft
Ricardo Salveti (ricardosalveti) wants to merge 2 commits into
qualcomm-linux:masterfrom
ricardosalveti:fix-test-reporting-reruns
Draft

ci: fix test result reporting after re-runs and PR updates#2624
Ricardo Salveti (ricardosalveti) wants to merge 2 commits into
qualcomm-linux:masterfrom
ricardosalveti:fix-test-reporting-reruns

Conversation

@ricardosalveti

Copy link
Copy Markdown
Contributor

Fix the "Test Results" check which can contain failures even after a fully green re-run by dropping stale re-run jobs and aborting older runs after the PR gets updated.

When a failed LAVA job is re-run, GitHub keeps the original failed job's
"<distro>-test-job-<id>" artifact and adds a new artifact for the
re-submitted job, which LAVA assigns a fresh, higher id. The summary
step downloads every matching artifact, so the stale failed job was
counted again and the PR comment reported phantom failures (e.g.
"Fail: 1") even after a fully green re-run. Because multiple boot jobs
for one device were merged in non-deterministic find(1) order, the
reported count was unstable too.

De-duplicate to the newest LAVA job (highest id) per
(requested_device_type, description) before counting. That pair is
stable across re-runs of the same job and is read directly from the
saved job-detail JSON, so no extra LAVA API calls are needed. The same
de-duplicated list now drives both the pass/fail table and the "All
jobs summary" table so they always agree.

The EnricoMi "Test Results" check is unaffected: it parses the
deterministically named result artifacts, which are overwritten on
re-run, so it already reflected the latest result. This change brings
the comment into agreement with it.

Signed-off-by: Ricardo Salveti <ricardo.salveti@oss.qualcomm.com>
Test reporting is decoupled from the build via workflow_run and the
LAVA jobs run for hours, so when a PR is updated or force-pushed the
in-flight test run keeps publishing the "Test Results" check and a PR
comment for an orphaned commit. The result is a red check and stale
comments on the PR head while a separate, green run exists for a
commit that is no longer part of the branch.

Add a concurrency group keyed on the PR head (head repository and
branch) with cancel-in-progress, so a newer build supersedes the older
test run instead of both reporting in parallel. Re-running failed jobs
on an existing run keeps the same run id and is therefore not
cancelled.

Tag the bot comment with comment-tag so a single comment is updated in
place per PR rather than a new one being posted on every run, which
previously accumulated multiple comments for different commits. The
"## Test jobs for commit <sha>" header is kept so the comment always
shows which commit it reflects.

Signed-off-by: Ricardo Salveti <ricardo.salveti@oss.qualcomm.com>
@ricardosalveti

Copy link
Copy Markdown
Contributor Author

Milosz Wasilewski (@mwasilew) not sure if we have an easy way to validate this, we could push in next but that won't really validate the PR workflow.

@ricardosalveti

Copy link
Copy Markdown
Contributor Author

Dmitry Baryshkov (@lumag) it seems you are validating another change in next, let me know once you are done there.

@lumag

Copy link
Copy Markdown
Contributor

Dmitry Baryshkov (@lumag) it seems you are validating another change in next, let me know once you are done there.

Please proceed, if you have something to test.

@mwasilew

Copy link
Copy Markdown
Contributor

The change for de-duplication looks OK. As for PR cancelling it's more tricky. I have a repository where I can test it. The main issue is that the cancellation doesn't propagate to LAVA jobs. If the jobs were submitted, they will run till the end. I would prefer cancelling them as well, but there seem to be no easy mechanism for doing so. I'll test it in my repository and then push to next

@ricardosalveti

Copy link
Copy Markdown
Contributor Author

Milosz Wasilewski (@mwasilew) were you able to give it a try?

@mwasilew

Copy link
Copy Markdown
Contributor

Milosz Wasilewski (Milosz Wasilewski (@mwasilew)) were you able to give it a try?

not yet. I tried to mock the lava jobs part and it partly worked, but didn't reproduce the same behaviour as with one failed test. I think I'll need to use real jobs and devices in the LAB, but I didn't want to get in the way of potential GA testing. Will continue next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants