Skip to content

Fix ESRCH in sched_setaffinity due to glibc TID caching with clone3#264

Open
kiyoungkim-gg wants to merge 1 commit intogoogle:masterfrom
kiyoungkim-gg:patch-1
Open

Fix ESRCH in sched_setaffinity due to glibc TID caching with clone3#264
kiyoungkim-gg wants to merge 1 commit intogoogle:masterfrom
kiyoungkim-gg:patch-1

Conversation

@kiyoungkim-gg
Copy link
Copy Markdown

When nsjail creates a new process in a new PID namespace (CLONE_NEWPID) using the direct kernel syscall clone/clone3 (introduced in d1f332b), glibc's internal PID/TID cache is not updated for the child process.

As a result, calling the glibc wrapper sched_setaffinity(0, ...) inside the child process causes glibc to inadvertently pass the cached parent's TID to the kernel instead of 0 (current thread). Since the parent's TID does not exist within the new PID namespace, the kernel returns ESRCH (No such process).

This commit fixes the issue by bypassing the glibc wrapper and invoking the sched_setaffinity syscall directly via util::syscall. This ensures that 0 is passed accurately to the kernel, referring to the current thread.

When nsjail creates a new process in a new PID namespace (CLONE_NEWPID) using the direct kernel syscall clone/clone3 (introduced in d1f332b), glibc's internal PID/TID cache is not updated for the child process.

As a result, calling the glibc wrapper `sched_setaffinity(0, ...)` inside the child process causes glibc to inadvertently pass the cached parent's TID to the kernel instead of 0 (current thread). Since the parent's TID does not exist within the new PID namespace, the kernel returns ESRCH (No such process).

This commit fixes the issue by bypassing the glibc wrapper and invoking the `sched_setaffinity` syscall directly via `util::syscall`. This ensures that `0` is passed accurately to the kernel, referring to the current thread.
@kiyoungkim-gg
Copy link
Copy Markdown
Author

This issue was originally found when # of max_cpus is less than actual # of CPUs.

The original error message was something like

[W][2026-04-20T02:06:54+0000][27] initCpu():132 sched_setaffinity(mask=0,1,2,3,4,5,6,7,8,12,13,15,16,17,18,19,20,21,22,24,25,27,28,29,30,33,35,36,37,40,41,42,44,47,48,49,51,52,53,54,56,57,58,59,60,61,62,63 size=128 max_cpus=48 (CPU_COUNT=48)) failed: No such process

@kiyoungkim-gg
Copy link
Copy Markdown
Author

@robertswiecki Can you help me to get this reviewed if possible? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant