Add a command to re-run only pipelines that had system failures

Lately, the failure rate of pipelines due to system failures started increasing. To give some concrete example from today, here is  a pipeline which should be :heavy_check_mark: but failed because of system failures:
- https://gitlab.spack.io/spack/spack/-/pipelines/912417

On the same day we run two pipelines on `develop`:

- https://gitlab.spack.io/spack/spack/-/pipelines/912139
- https://gitlab.spack.io/spack/spack/-/pipelines/911863

They are both :x: due to system failures. 

Whenever something like that happens on a PR, the two solutions users have are:
- Close and reopen the PR, to trigger the creation of a new merge commit, and a new set of pipelines
- Comment `@spackbot run pipeline`, to re-run _all_ the pipelines

If the failure rate is high enough, there is a fair chance the procedure needs to be repeated a few times to get to a :heavy_check_mark: CI mark. This has the effect of multiplying by a factor the resources we need to run pipelines, in particular for "generate" jobs, which are always re-run.

I guess the best solution from user's perspective would be having a low failure rate but, in absence of that, I wonder if we could add a new command[^1]:
```
@spackbot re-run failed pipelines
``` 
that re-runs only pipelines that failed due to system errors. This should:
- Reduce the chance of a possible new failure
- Reduce the resources we need to get a given CI run :heavy_check_mark: 

[^1]: Naming is tentative, any better choice is welcome

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a command to re-run only pipelines that had system failures #101

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a command to re-run only pipelines that had system failures #101

Description

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions