Skip to content

RDKB-63854: WanManager is not Restarting the DHCPv4 and DHCPv6 client parameters automatically when the parameters are disabled#212

Open
aadhithan01 wants to merge 3 commits into
rdkcentral:mainfrom
aadhithan01:fix/ipv6-selfheal-race-condition
Open

RDKB-63854: WanManager is not Restarting the DHCPv4 and DHCPv6 client parameters automatically when the parameters are disabled#212
aadhithan01 wants to merge 3 commits into
rdkcentral:mainfrom
aadhithan01:fix/ipv6-selfheal-race-condition

Conversation

@aadhithan01
Copy link
Copy Markdown
Contributor

Summary

WanManager fails to automatically restart the DHCPv6 client (self-heal) when Device.DHCPv6.Client.1.Enable is toggled rapidly. The issue manifests intermittently — the self-heal works on some iterations but fails on others due to a race condition between the state machine thread and the DHCP event queue worker.

Root Cause

In wan_transition_ipv6_down(), there is a race condition:

  1. State machine thread (tid=8844) detects IPv6 is DOWN and enters wan_transition_ipv6_down
  2. DHCP event queue worker (tid=8983) processes the "DHCPv6 client stopped" event which updates Dhcp6cStatus

When the state machine executes before the event worker processes the stop event:

  • Dhcp6cStatus is still DHCPC_STARTED (stale)
  • Dhcp6cPid is set to 1 (placeholder assigned by WanManager_StartDhcpv6Client via DHCP Manager)
  • Since PID 1 is always init/systemd (always alive), WanMgr_IsPIDRunning(1) returns TRUE
  • The self-heal condition evaluates to FALSE — DHCPv6 client is NOT restarted
  • IPv6 remains permanently DOWN until manual intervention

Fix

Changed the self-heal condition in wan_transition_ipv6_down() to trigger restart when ANY of:

  1. Dhcp6cStatus != DHCPC_STARTED — event was processed, client confirmed stopped
  2. Dhcp6cPid <= 1 — PID is invalid/placeholder (0 = unset, 1 = init process, not dhcp6c)
  3. WanMgr_IsPIDRunning(Dhcp6cPid) != TRUE — PID > 1 but process is no longer running

Before:

if(p_VirtIf->IP.Dhcp6cStatus != DHCPC_STARTED ||
   (p_VirtIf->IP.Dhcp6cPid > 0 && WanMgr_IsPIDRunning(p_VirtIf->IP.Dhcp6cPid) != TRUE))

After:

if(p_VirtIf->IP.Dhcp6cStatus != DHCPC_STARTED ||
   p_VirtIf->IP.Dhcp6cPid <= 1 ||
   WanMgr_IsPIDRunning(p_VirtIf->IP.Dhcp6cPid) != TRUE)

Testing Steps

  1. SSH to the Docsis-Gateway (TECHNICOLORXB7)
  2. Tail the WanManager log:
    tail -f /rdklogs/logs/WANMANAGERLog.txt.0 &
    
  3. Rapidly toggle DHCPv6 disable/enable multiple times:
    dmcli eRT setv Device.DHCPv6.Client.1.Enable bool false
    dmcli eRT setv Device.DHCPv6.Client.1.Enable bool true
    dmcli eRT setv Device.DHCPv6.Client.1.Enable bool false
    dmcli eRT setv Device.DHCPv6.Client.1.Enable bool true
    dmcli eRT setv Device.DHCPv6.Client.1.Enable bool false
    dmcli eRT setv Device.DHCPv6.Client.1.Enable bool true
    
  4. Expected Result: After each disable/enable cycle, the log should show:
    wan_transition_ipv6_down XXXX - Starting dibbler-client on interface erouter0 (Dhcp6cStatus=X, Pid=X)
    wan_transition_ipv6_down XXXX - SELFHEAL - Started dibbler-client on interface erouter0
    
    Followed by IPv6 recovering to DUAL STACK ACTIVE state.
  5. Verify: IPv6 should recover on ALL iterations (not just some). Check:
    grep -c "SELFHEAL" /rdklogs/logs/WANMANAGERLog.txt.0
    
    Count should match number of disable cycles.
  6. Verify no IPv6 permanent DOWN state:
    grep "TRANSITION IPV4 LEASED" /rdklogs/logs/WANMANAGERLog.txt.0
    
    Each occurrence should be followed by a subsequent "TRANSITION DUAL STACK ACTIVE".

Affected Component

  • File: source/WanManager/wanmgr_interface_sm.c
  • Function: wan_transition_ipv6_down()
  • Platform: TECHNICOLORXB7 (tchxb7), RDKB

Risk: Low

Single condition change in self-heal logic. No functional change when DHCPv6 client has a valid PID > 1 and is running.

Root Cause: Race condition between interface state machine thread and DHCP
event queue worker. When wan_transition_ipv6_down() executes before the
'DHCPv6 client stopped' event is processed, Dhcp6cStatus remains
DHCPC_STARTED even though dibbler-client process is dead. Self-heal
restart is incorrectly skipped, leaving IPv6 permanently DOWN.

Fix: In addition to checking Dhcp6cStatus flag, also verify the actual
dibbler-client process is still alive via WanMgr_IsPIDRunning(). If the
PID is stale (process dead), restart the client regardless of the status
flag state.

Signed-off-by: aadhithan01 <64963550+aadhithan01@users.noreply.github.com>
Previous fix used (Dhcp6cPid > 0 && !IsPIDRunning(Pid)) but Dhcp6cPid=1
(placeholder set by self-heal via DHCP Manager) means IsPIDRunning(1)
always returns TRUE since PID 1 is init. Self-heal skipped on repeat.

Now triggers self-heal when ANY of:
  1. Dhcp6cStatus != DHCPC_STARTED
  2. Dhcp6cPid <= 1 (invalid/placeholder)
  3. Dhcp6cPid > 1 but process dead
@aadhithan01 aadhithan01 requested a review from a team as a code owner May 19, 2026 15:39
Copilot AI review requested due to automatic review settings May 19, 2026 15:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a race condition in wan_transition_ipv6_down() where the DHCPv6 self-heal check could be skipped because the "DHCP client stopped" event had not yet been processed by the event queue worker, leaving Dhcp6cStatus stale at DHCPC_STARTED and Dhcp6cPid set to the placeholder value 1 (init). The strengthened condition now also restarts the client when the recorded PID is <= 1 or when the running PID is no longer alive.

Changes:

  • Expanded the DHCPv6 self-heal condition in wan_transition_ipv6_down() to OR three independent checks (status, PID≤1, PID-not-running).
  • Added an explanatory block comment describing the race and the three trigger conditions.
  • Enhanced the "Starting dibbler-client" log to include current Dhcp6cStatus and Pid for debugging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source/WanManager/wanmgr_interface_sm.c Outdated
Comment on lines +2941 to +2943
if(p_VirtIf->IP.Dhcp6cStatus != DHCPC_STARTED ||
p_VirtIf->IP.Dhcp6cPid <= 1 ||
WanMgr_IsPIDRunning(p_VirtIf->IP.Dhcp6cPid) != TRUE)
Comment thread source/WanManager/wanmgr_interface_sm.c Outdated
* 3. PID > 1 but process is no longer running (confirmed dead)
*/
if(p_VirtIf->IP.Dhcp6cStatus != DHCPC_STARTED ||
p_VirtIf->IP.Dhcp6cPid <= 1 ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dhcp6cPid shouldn't be used incase of DHCpmanager enabled build. WanManager will not have the info about the PID of the DHCP clients running

Copilot AI review requested due to automatic review settings May 25, 2026 12:45
@aadhithan01 aadhithan01 force-pushed the fix/ipv6-selfheal-race-condition branch from 8ddda13 to aee48a0 Compare May 25, 2026 12:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

source/WanManager/wanmgr_interface_sm.c:2935

  • The PR description says the IPv6-down self-heal should also restart DHCPv6 when the stored PID is a placeholder/invalid (<=1) or when the PID is not running, to avoid the race where Dhcp6cPid is temporarily set to 1. The current condition here still only checks Dhcp6cStatus != DHCPC_STARTED, so the described race (PID=1 treated as running) can still prevent self-heal and leave IPv6 DOWN. Please update this condition to match the fix described in the PR (status != STARTED OR pid <= 1 OR pid not running), and ensure the PID check is safe when PID is 0/1.
        if(p_VirtIf->IP.Dhcp6cStatus != DHCPC_STARTED)
        {
            /* Start DHCPv6 Client */
            CcspTraceInfo(("%s %d - Starting dibbler-client on interface %s \n", __FUNCTION__, __LINE__, p_VirtIf->Name));
            WanManager_StartDhcpv6Client(p_VirtIf, pInterface->IfaceType);
            CcspTraceInfo(("%s %d - SELFHEAL - Started dibbler-client on interface %s\n", __FUNCTION__, __LINE__, p_VirtIf->Name));

Comment on lines +150 to +164
if (pVirtIf->IP.Dhcp4cStatus == DHCPC_STARTED)
{
/* Client was stopped externally (not by WanManager).
* WanManager_StopDhcpv4Client sets Dhcp4cStatus=STOPPED
* synchronously before any event arrives, so DHCPC_STARTED
* here means the stop was NOT initiated by WanManager.
* Restart the client to maintain IPv4 connectivity. */
pVirtIf->IP.Dhcp4cStatus = DHCPC_STOPPED;
CcspTraceInfo(("%s-%d : SELFHEAL - Restarting DHCPv4 client for %s\n", __FUNCTION__, __LINE__, pVirtIf->Name));
WanManager_StartDhcpv4Client(pVirtIf, gWanMgrDataBase.IfaceCtrl.pIface[pVirtIf->baseIfIdx].data.Name, gWanMgrDataBase.IfaceCtrl.pIface[pVirtIf->baseIfIdx].data.IfaceType);
}
else
{
pVirtIf->IP.Dhcp4cStatus = DHCPC_STOPPED;
}
Comment on lines +226 to +237
CcspTraceInfo(("%s-%d : DHCPv6 client stopped for %s (Dhcp6cStatus=%d)\n", __FUNCTION__, __LINE__, pVirtIf->Name, pVirtIf->IP.Dhcp6cStatus));
if (pVirtIf->IP.Dhcp6cStatus == DHCPC_STARTED)
{
/* Client was stopped externally (not by WanManager).
* WanManager_StopDhcpv6Client sets Dhcp6cStatus=STOPPED
* synchronously before any event arrives, so DHCPC_STARTED
* here means the stop was NOT initiated by WanManager.
* Restart the client to maintain IPv6 connectivity. */
pVirtIf->IP.Dhcp6cStatus = DHCPC_STOPPED;
CcspTraceInfo(("%s-%d : SELFHEAL - Restarting DHCPv6 client for %s\n", __FUNCTION__, __LINE__, pVirtIf->Name));
WanManager_StartDhcpv6Client(pVirtIf, gWanMgrDataBase.IfaceCtrl.pIface[pVirtIf->baseIfIdx].data.IfaceType);
}
In FEATURE_RDKB_DHCP_MANAGER builds, when DHCPManager externally stops
the DHCPv6 client, the CLIENT_STOPPED event arrives after LEASE_DEL has
already triggered wan_transition_ipv6_down. By the time ISM checks
Dhcp6cStatus, it is stale (DHCPC_STARTED) causing the self-heal to skip.

Fix: In the DHCP_CLIENT_STOPPED event handler, detect external stops by
checking if Dhcp6cStatus is still DHCPC_STARTED (WanManager_StopDhcpv6Client
sets it to STOPPED synchronously before events arrive). If external,
restart the client immediately.

Revert wan_transition_ipv6_down to original logic (secondary self-heal
for subsequent cycles remains intact).
@aadhithan01 aadhithan01 force-pushed the fix/ipv6-selfheal-race-condition branch from aee48a0 to b4cb180 Compare May 25, 2026 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants