Commit 52c8867
committed
Module graceful shutdown support
Provide support for SmartSwitch DPU module graceful shutdown.
# Description:
* **Single source of truth for transitions**
* All components now use `sonic_platform_base.module_base.ModuleBase` helpers:
* `set_module_state_transition(db, name, transition_type)`
* `clear_module_state_transition(db, name)`
* `get_module_state_transition(db, name) -> dict`
* `is_module_state_transition_timed_out(db, name, timeout_secs) -> bool`
* Eliminates duplicated logic and race-prone direct Redis writes.
* **Correct table everywhere**
* Standardized on **`CHASSIS_MODULE_TABLE`** (replaces `CHASSIS_MODULE_INFO_TABLE`).
* HLD mismatch addressed in code (HLD fix tracked separately).
* **Ownership & lifecycle**
* The **initiator** of an operation (`startup`/`shutdown`/`reboot`) sets:
* `state_transition_in_progress=True`
* `transition_type=<op>`
* `transition_start_time=<utc-iso8601>`
* The **platform** (`set_admin_state()`) is responsible for clearing:
* `state_transition_in_progress=False`
* optionally `transition_end_time=<epoch>` (or similar end stamp).
* CLI pre-clears only when a prior transition is **timed out**.
* **Timeouts & policy**
* Platform JSON path only: `/usr/share/sonic/device/{plat}/platform.json`; else **constants**.
* Typical production values used:
* `startup: 180s`, `shutdown: 180s` (≈ `graceful_wait 60s + power 120s`), `reboot: 120s`.
* **Graceful wait** (e.g., waiting for “Graceful shutdown complete”) is a **platform policy** and implemented inside platform `set_admin_state()`—not in ModuleBase.
* **Boot behavior**
* `chassisd` on start:
1. **Clears stale flags once** (centralized sweep).
2. Runs `set_initial_dpu_admin_state()` which **marks transitions** via ModuleBase before calling platform `set_admin_state()`.
3. Leaves clearing to the platform or to well-defined status transitions (ONLINE/OFFLINE) where appropriate.
* **gNOI shutdown daemon**
* Listens on **`CHASSIS_MODULE_TABLE`** and triggers only when:
* `state_transition_in_progress=True` **and** `transition_type=shutdown`.
* Never clears the flag (ownership stays with the platform).
* Bounded RPC timeouts and robust Redis access (swsssdk/swsscommon).
* **CLI (`config chassis modules …`)**
* Uses ModuleBase APIs for all set/get/timeout checks.
* If a previous transition is stuck, `is_module_state_transition_timed_out()` → auto-clear then proceed.
* Sets transition at the start of `startup`/`shutdown`; platform clears on completion.
* Fabric card flow retained; edits are surgical.
* **Redis robustness**
* Helpers handle both stacks (swsssdk/swsscommon); no `hset(mapping=...)` usage.
* Consistent HGETALL/HSET paths; resilient to connector differences.
* **Race reduction & consistency**
* Centralized writes prevent multi-writer races.
* All transition writes include `transition_start_time`; clears may add an end stamp.
* Existing PCI/file-lock logic left intact; unrelated behavior unchanged.
* **Change scope**
* Minimal, targeted diffs.
* No background tasks added, no broad refactors beyond transition handling.
* Behavior changes are limited to making transition semantics correct and uniform across repos.
HLD: # 1991 sonic-net/SONiC#1991
sonic-platform-common: #567 sonic-net/sonic-platform-common#567
sonic-utilities: sonic-net/sonic-utilities#4031
sonic-platform-daemons: sonic-net/sonic-platform-daemons#667
How to verify it
Issue the "config chassis modules shutdown DPUx" command
Verify the DPU module is gracefully shut by checking the logs in /var/log/syslog on both NPU and DPU1 parent 1633661 commit 52c8867
File tree
8 files changed
+1047
-2
lines changed- data/debian
- scripts
- tests
8 files changed
+1047
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
Lines changed: 16 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
0 commit comments