Skip to content

Potentially incorrect information on SLURM-examples page #8

@novosirj

Description

@novosirj

Hi there,

Happened to be looking for some information on this subject when I came across some information on the SLURM-examples page, found here: https://github.com/statgen/SLURM-examples, that says the following:

"scontrol show job -dd <job_id>. Shows all information about specific SLURM job. It is worth paying attention to the following information:

Requeue. Shows how many times your job was re-queued. Some jobs may have higher priority and may pre-empt (i.e. cancel) your running jobs and put them back to the queue. If your job takes too long time and Requeue is greater than 1 then, most probably, the reason why your job takes so long is because it was cancelled and re-queued several times."

I had briefly thought, wow, I learned a new thing, but I don't believe it's true. Per the scontrol manual, found here: https://slurm.schedmd.com/scontrol.html:

Requeue=<0|1>
Stipulates whether a job should be requeued after a node failure: 0 for no, 1 for yes.

That's in the "update" section of the scontrol manual, but I don't have a single job that says anything other than Requeue=0 or Requeue=1. I did a little bit of looking at the source code, but can't really tell/maybe am looking in the wrong place.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions