parity(slurm): close reason-code vocabulary gap — expand PendingReason 14→~50 and wire reservation/license/QoS reasons

## Summary

Spur ships only ~14 `PendingReason` values against Slurm's ~50, and several reasons Spur *does* compute never reach the user. Two problems:

1. **Vocabulary gap** — missing Reservation, PartitionConfig, SystemFailure, AccountingPolicy, the full `Assoc*`/`QOS*` Grp/Max limit families, and `BurstBuffer*`. Workflow engines and CI gates scrape the reason string, so an absent or generic reason breaks them.
2. **Observable-wiring gap** — `pending_jobs()` drops jobs blocked by reservation, license, or QoS limits *before* `update_pending_reasons()` runs, so those jobs' `pending_reason` never reflects the real cause.
3. **NodeDown/Resources bug** — a fully-allocated (busy-but-up) cluster mis-reports as `NodeDown` because reason emission uses `NodeState::is_available()` (Idle|Mixed only). Slurm reports `Resources`.

## Versions probed

| | Version | Source |
|---|---|---|
| Spur | `main` (pre-fix) | Built from source |
| Slurm | `25.11.6` | `job_reason_string()` table `jsra[]` in `src/common/job_state_reason.c` |

## Repro

```bash
# Busy-but-up cluster — saturate all nodes, then submit one more
spur submit -c <all-cpus> /tmp/sleep.sh   # x2 to fill cluster
jid=$(spur submit -c <all-cpus> /tmp/sleep.sh | grep -oE '[0-9]+' | head -1)
spur show job $jid | grep Reason
# Spur → Reason=NodeDown     ← wrong
# Slurm → Reason=Resources

# Absent/inactive reservation
jid=$(spur submit --reservation=nope /tmp/sleep.sh | grep -oE '[0-9]+' | head -1)
spur show job $jid | grep Reason
# Spur → (generic / dropped)   ← reason never surfaces
# Slurm → Reason=Reservation
```

## Expected behavior (Slurm parity)

1. Add the missing `PendingReason` variants with display strings byte-exact to Slurm 25.11.6's `job_reason_string()` (including Slurm's deliberate casing inconsistency — `QOS*` upper vs `Assoc*`/`Association*` mixed).
2. QoS limit checks emit the specific reason per cap (max wall → `QOSMaxWallDurationPerJobLimit`, MaxTRESPerJob cpu/mem → `QOSMaxCpuPerJobLimit`/`QOSMaxMemoryPerJob`, etc.) instead of generic `Resources`/`PartitionTimeLimit`.
3. A scheduler pass tags reservation/license/QoS-blocked jobs with the real reason before they are dropped from scheduling, so the drop decision and the displayed reason cannot diverge.
4. Add `NodeState::is_up()` (Idle|Mixed|Allocated) and use it in reason emission so a saturated cluster reports `Resources`, while only genuine down/drain/error/unknown/suspended yields `NodeDown`.

## Scope / size

Single-PR change. Estimated **M (200–600 LOC)**:

- `spur-core/src/job.rs` — new `PendingReason` variants + Display/serde
- `spur-core/src/qos.rs` — specific QoS limit reasons
- `spur-core/src/node.rs` — `NodeState::is_up()`
- `spurctld` (`cluster.rs`, `scheduler_loop.rs`) — `tag_blocked_pending_reasons()` pass + shared eligibility helpers (`reservation_block`, `license_block`, `qos_block_for`); NodeDown→Resources fix
- Tests: Display/serde for all new variants, qos limit reasons, tag-blocked passes, `fully_allocated_cluster_reports_resources_not_nodedown`

## Priority

**P1** — Slurm-visible behavior; mis-reports or drops the real reason rather than just being absent.

## Known follow-ups (out of scope here)

- QoS reasons are wired but **inert** until QoS limits are sourced from the accounting DB — see #282 (wire QOS definitions into scheduling) and #283 (enforce per-account limits).
- Submit-time validation of `--reservation` / `license:` GRES to match Slurm's reject-at-submit behavior (Spur admits to PENDING and surfaces the reason — a superset).
- `BeginTime` reason: `pending_jobs()` drops future-`begin_time` jobs with no reason; easy win via the same tag-blocked pass.

## Coordination

Variant set is **disjoint** from open PR #274 (NonZeroExitCode, RaisedSignal, JobLaunchFailure, JobHeldAdmin, BadConstraints, PartitionInactive, DependencyNeverSatisfied, InvalidAccount, InvalidQOS, BootFail, OutOfMemory) — can merge before or after #274 with no conflict.

## Related

Part of broader Category-4 (Job Lifecycle & State Machine) parity work. Siblings: DEADLINE (#258), exit-code (#269), dependency-engine (#259).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parity(slurm): close reason-code vocabulary gap — expand PendingReason 14→~50 and wire reservation/license/QoS reasons #300

Summary

Versions probed

Repro

Expected behavior (Slurm parity)

Scope / size

Priority

Known follow-ups (out of scope here)

Coordination

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	Version	Source
Spur	`main` (pre-fix)	Built from source
Slurm	`25.11.6`	`job_reason_string()` table `jsra[]` in `src/common/job_state_reason.c`

parity(slurm): close reason-code vocabulary gap — expand PendingReason 14→~50 and wire reservation/license/QoS reasons #300

Description

Summary

Versions probed

Repro

Expected behavior (Slurm parity)

Scope / size

Priority

Known follow-ups (out of scope here)

Coordination

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions