RDKB-63854: WanManager is not Restarting the DHCPv4 and DHCPv6 client parameters automatically when the parameters are disabled#212
Conversation
Root Cause: Race condition between interface state machine thread and DHCP event queue worker. When wan_transition_ipv6_down() executes before the 'DHCPv6 client stopped' event is processed, Dhcp6cStatus remains DHCPC_STARTED even though dibbler-client process is dead. Self-heal restart is incorrectly skipped, leaving IPv6 permanently DOWN. Fix: In addition to checking Dhcp6cStatus flag, also verify the actual dibbler-client process is still alive via WanMgr_IsPIDRunning(). If the PID is stale (process dead), restart the client regardless of the status flag state. Signed-off-by: aadhithan01 <64963550+aadhithan01@users.noreply.github.com>
Previous fix used (Dhcp6cPid > 0 && !IsPIDRunning(Pid)) but Dhcp6cPid=1 (placeholder set by self-heal via DHCP Manager) means IsPIDRunning(1) always returns TRUE since PID 1 is init. Self-heal skipped on repeat. Now triggers self-heal when ANY of: 1. Dhcp6cStatus != DHCPC_STARTED 2. Dhcp6cPid <= 1 (invalid/placeholder) 3. Dhcp6cPid > 1 but process dead
There was a problem hiding this comment.
Pull request overview
Fixes a race condition in wan_transition_ipv6_down() where the DHCPv6 self-heal check could be skipped because the "DHCP client stopped" event had not yet been processed by the event queue worker, leaving Dhcp6cStatus stale at DHCPC_STARTED and Dhcp6cPid set to the placeholder value 1 (init). The strengthened condition now also restarts the client when the recorded PID is <= 1 or when the running PID is no longer alive.
Changes:
- Expanded the DHCPv6 self-heal condition in
wan_transition_ipv6_down()to OR three independent checks (status, PID≤1, PID-not-running). - Added an explanatory block comment describing the race and the three trigger conditions.
- Enhanced the "Starting dibbler-client" log to include current
Dhcp6cStatusandPidfor debugging.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if(p_VirtIf->IP.Dhcp6cStatus != DHCPC_STARTED || | ||
| p_VirtIf->IP.Dhcp6cPid <= 1 || | ||
| WanMgr_IsPIDRunning(p_VirtIf->IP.Dhcp6cPid) != TRUE) |
| * 3. PID > 1 but process is no longer running (confirmed dead) | ||
| */ | ||
| if(p_VirtIf->IP.Dhcp6cStatus != DHCPC_STARTED || | ||
| p_VirtIf->IP.Dhcp6cPid <= 1 || |
There was a problem hiding this comment.
Dhcp6cPid shouldn't be used incase of DHCpmanager enabled build. WanManager will not have the info about the PID of the DHCP clients running
8ddda13 to
aee48a0
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
source/WanManager/wanmgr_interface_sm.c:2935
- The PR description says the IPv6-down self-heal should also restart DHCPv6 when the stored PID is a placeholder/invalid (<=1) or when the PID is not running, to avoid the race where Dhcp6cPid is temporarily set to 1. The current condition here still only checks
Dhcp6cStatus != DHCPC_STARTED, so the described race (PID=1 treated as running) can still prevent self-heal and leave IPv6 DOWN. Please update this condition to match the fix described in the PR (status != STARTED OR pid <= 1 OR pid not running), and ensure the PID check is safe when PID is 0/1.
if(p_VirtIf->IP.Dhcp6cStatus != DHCPC_STARTED)
{
/* Start DHCPv6 Client */
CcspTraceInfo(("%s %d - Starting dibbler-client on interface %s \n", __FUNCTION__, __LINE__, p_VirtIf->Name));
WanManager_StartDhcpv6Client(p_VirtIf, pInterface->IfaceType);
CcspTraceInfo(("%s %d - SELFHEAL - Started dibbler-client on interface %s\n", __FUNCTION__, __LINE__, p_VirtIf->Name));
| if (pVirtIf->IP.Dhcp4cStatus == DHCPC_STARTED) | ||
| { | ||
| /* Client was stopped externally (not by WanManager). | ||
| * WanManager_StopDhcpv4Client sets Dhcp4cStatus=STOPPED | ||
| * synchronously before any event arrives, so DHCPC_STARTED | ||
| * here means the stop was NOT initiated by WanManager. | ||
| * Restart the client to maintain IPv4 connectivity. */ | ||
| pVirtIf->IP.Dhcp4cStatus = DHCPC_STOPPED; | ||
| CcspTraceInfo(("%s-%d : SELFHEAL - Restarting DHCPv4 client for %s\n", __FUNCTION__, __LINE__, pVirtIf->Name)); | ||
| WanManager_StartDhcpv4Client(pVirtIf, gWanMgrDataBase.IfaceCtrl.pIface[pVirtIf->baseIfIdx].data.Name, gWanMgrDataBase.IfaceCtrl.pIface[pVirtIf->baseIfIdx].data.IfaceType); | ||
| } | ||
| else | ||
| { | ||
| pVirtIf->IP.Dhcp4cStatus = DHCPC_STOPPED; | ||
| } |
| CcspTraceInfo(("%s-%d : DHCPv6 client stopped for %s (Dhcp6cStatus=%d)\n", __FUNCTION__, __LINE__, pVirtIf->Name, pVirtIf->IP.Dhcp6cStatus)); | ||
| if (pVirtIf->IP.Dhcp6cStatus == DHCPC_STARTED) | ||
| { | ||
| /* Client was stopped externally (not by WanManager). | ||
| * WanManager_StopDhcpv6Client sets Dhcp6cStatus=STOPPED | ||
| * synchronously before any event arrives, so DHCPC_STARTED | ||
| * here means the stop was NOT initiated by WanManager. | ||
| * Restart the client to maintain IPv6 connectivity. */ | ||
| pVirtIf->IP.Dhcp6cStatus = DHCPC_STOPPED; | ||
| CcspTraceInfo(("%s-%d : SELFHEAL - Restarting DHCPv6 client for %s\n", __FUNCTION__, __LINE__, pVirtIf->Name)); | ||
| WanManager_StartDhcpv6Client(pVirtIf, gWanMgrDataBase.IfaceCtrl.pIface[pVirtIf->baseIfIdx].data.IfaceType); | ||
| } |
In FEATURE_RDKB_DHCP_MANAGER builds, when DHCPManager externally stops the DHCPv6 client, the CLIENT_STOPPED event arrives after LEASE_DEL has already triggered wan_transition_ipv6_down. By the time ISM checks Dhcp6cStatus, it is stale (DHCPC_STARTED) causing the self-heal to skip. Fix: In the DHCP_CLIENT_STOPPED event handler, detect external stops by checking if Dhcp6cStatus is still DHCPC_STARTED (WanManager_StopDhcpv6Client sets it to STOPPED synchronously before events arrive). If external, restart the client immediately. Revert wan_transition_ipv6_down to original logic (secondary self-heal for subsequent cycles remains intact).
aee48a0 to
b4cb180
Compare
Summary
WanManager fails to automatically restart the DHCPv6 client (self-heal) when
Device.DHCPv6.Client.1.Enableis toggled rapidly. The issue manifests intermittently — the self-heal works on some iterations but fails on others due to a race condition between the state machine thread and the DHCP event queue worker.Root Cause
In
wan_transition_ipv6_down(), there is a race condition:wan_transition_ipv6_downDhcp6cStatusWhen the state machine executes before the event worker processes the stop event:
Dhcp6cStatusis stillDHCPC_STARTED(stale)Dhcp6cPidis set to1(placeholder assigned byWanManager_StartDhcpv6Clientvia DHCP Manager)init/systemd(always alive),WanMgr_IsPIDRunning(1)returnsTRUEFix
Changed the self-heal condition in
wan_transition_ipv6_down()to trigger restart when ANY of:Dhcp6cStatus != DHCPC_STARTED— event was processed, client confirmed stoppedDhcp6cPid <= 1— PID is invalid/placeholder (0 = unset, 1 = init process, not dhcp6c)WanMgr_IsPIDRunning(Dhcp6cPid) != TRUE— PID > 1 but process is no longer runningBefore:
After:
Testing Steps
Affected Component
source/WanManager/wanmgr_interface_sm.cwan_transition_ipv6_down()Risk: Low
Single condition change in self-heal logic. No functional change when DHCPv6 client has a valid PID > 1 and is running.