Commit fceb770
committed
[SmartSwitch] Add graceful shutdown and startup handling in platform daemons
<!-- Provide a general summary of your changes in the Title above -->
#### Description
<!--
Describe your changes in detail
-->
HLD: https://github.com/sonic-net/SONiC/blob/master/doc/smart-switch/graceful-shutdown/graceful-shutdown.md
These changes build upon enhancements in [`sonic-platform-daemons#667`](sonic-net#667)
This PR introduces **graceful shutdown and startup orchestration** across SONiC platform daemons to ensure safe DPU and peripheral module transitions during reboot or administrative state changes.
Key updates include:
- Integration of `ModuleBase` lifecycle methods (`module_pre_shutdown`, `module_post_startup`, and `set_admin_state_gracefully`) into platform daemons.
- Move graceful handling of PCIe detach/reattach and sensor reload sequences into set_admin_state_gracefully.
- State tracking in `CHASSIS_MODULE_TABLE` via `STATE_DB` to synchronize transition state across processes.
- File-based operation locks to prevent concurrent access to shared hardware resources.
#### Motivation and Context
<!--
Why is this change required? What problem does it solve?
If this pull request closes/resolves an open Issue, make sure you
include the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here
-->
Platform daemons currently perform shutdown and startup independently, leading to:
- Race conditions during DPU detachment.
- Inconsistent Redis state across PMON daemons.
- Uncoordinated sensor and PCIe transitions during reboot.
This change introduces a unified **graceful shutdown framework** for SmartSwitch modules.
It ensures predictable module transitions, preserves hardware health, and supports orchestrated restarts without transient hardware errors.
#### How Has This Been Tested?
<!--
Please describe in detail how you tested your changes.
Include details of your testing environment, and the tests you ran to
see how your change affects other areas of the code, etc.
-->
Testing performed on both **DPU-enabled (SmartSwitch)**.
**Functional validation**
- Verified end-to-end reboot flow with DPU detach/reattach sequence.
- PCIe state (`detaching/attaching`) reflected in `STATE_DB`.
- `pcied` daemon logs confirm ordered detach before reboot and reattach after startup.
- Confirmed no stale Redis entries or orphaned locks post-reboot.
**Unit tests executed**
- tests/test_DaemonPcied.py
- tests/test_chassisd_graceful.py
Coverage includes:
- Transition flag handling
- Timeout behavior
- DB write/read operations
- Graceful admin state flow
**Manual validation**
#### Additional Information (Optional)1 parent 69ce387 commit fceb770
File tree
3 files changed
+234
-196
lines changed- sonic-chassisd
- scripts
- tests
3 files changed
+234
-196
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
248 | 248 | | |
249 | 249 | | |
250 | 250 | | |
251 | | - | |
| 251 | + | |
252 | 252 | | |
253 | 253 | | |
254 | 254 | | |
255 | 255 | | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
| 256 | + | |
| 257 | + | |
267 | 258 | | |
268 | 259 | | |
269 | 260 | | |
| |||
723 | 714 | | |
724 | 715 | | |
725 | 716 | | |
726 | | - | |
727 | 717 | | |
728 | 718 | | |
729 | 719 | | |
| |||
815 | 805 | | |
816 | 806 | | |
817 | 807 | | |
818 | | - | |
819 | | - | |
820 | | - | |
821 | 808 | | |
822 | 809 | | |
823 | 810 | | |
| |||
852 | 839 | | |
853 | 840 | | |
854 | 841 | | |
855 | | - | |
856 | | - | |
857 | | - | |
858 | 842 | | |
859 | 843 | | |
860 | 844 | | |
| |||
1336 | 1320 | | |
1337 | 1321 | | |
1338 | 1322 | | |
1339 | | - | |
1340 | | - | |
1341 | | - | |
1342 | | - | |
1343 | | - | |
1344 | | - | |
1345 | | - | |
1346 | | - | |
1347 | | - | |
1348 | | - | |
1349 | | - | |
1350 | | - | |
1351 | | - | |
1352 | | - | |
1353 | | - | |
1354 | | - | |
1355 | | - | |
1356 | | - | |
1357 | | - | |
1358 | | - | |
1359 | | - | |
1360 | | - | |
1361 | | - | |
1362 | | - | |
1363 | | - | |
1364 | | - | |
1365 | | - | |
1366 | | - | |
1367 | 1323 | | |
1368 | 1324 | | |
1369 | 1325 | | |
| |||
1400 | 1356 | | |
1401 | 1357 | | |
1402 | 1358 | | |
1403 | | - | |
| 1359 | + | |
| 1360 | + | |
| 1361 | + | |
1404 | 1362 | | |
1405 | 1363 | | |
1406 | | - | |
| 1364 | + | |
| 1365 | + | |
1407 | 1366 | | |
1408 | 1367 | | |
1409 | | - | |
1410 | | - | |
| 1368 | + | |
1411 | 1369 | | |
1412 | 1370 | | |
1413 | 1371 | | |
1414 | 1372 | | |
1415 | 1373 | | |
1416 | | - | |
1417 | | - | |
1418 | 1374 | | |
| 1375 | + | |
| 1376 | + | |
| 1377 | + | |
| 1378 | + | |
| 1379 | + | |
| 1380 | + | |
| 1381 | + | |
| 1382 | + | |
1419 | 1383 | | |
1420 | 1384 | | |
1421 | 1385 | | |
| |||
1437 | 1401 | | |
1438 | 1402 | | |
1439 | 1403 | | |
1440 | | - | |
| 1404 | + | |
1441 | 1405 | | |
1442 | 1406 | | |
1443 | 1407 | | |
| |||
1486 | 1450 | | |
1487 | 1451 | | |
1488 | 1452 | | |
1489 | | - | |
1490 | | - | |
1491 | | - | |
1492 | | - | |
1493 | | - | |
1494 | | - | |
1495 | | - | |
1496 | 1453 | | |
1497 | | - | |
1498 | | - | |
1499 | 1454 | | |
1500 | 1455 | | |
1501 | 1456 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
81 | 93 | | |
82 | 94 | | |
83 | 95 | | |
| |||
0 commit comments