Fix worker-runner communication on cgroup v2 + containerd#194
Draft
hanthor wants to merge 2 commits intobuildbarn:mainfrom
Draft
Fix worker-runner communication on cgroup v2 + containerd#194hanthor wants to merge 2 commits intobuildbarn:mainfrom
hanthor wants to merge 2 commits intobuildbarn:mainfrom
Conversation
On cgroup v2 systems with containerd (e.g. Debian 12+, Ubuntu 22.04+), Unix sockets in shared emptyDir volumes are not visible across containers in the same pod due to namespace isolation. This fix switches the bb_runner<->bb_worker communication from Unix socket to TCP: 1. runner config: listenPaths ['/worker/runner'] -> listenAddresses [':50051'] (listenPaths is Unix-only, listenAddresses is TCP) 2. worker config: endpoint address 'unix:///worker/runner' -> '127.0.0.1:50051' (use IPv4 explicitly to avoid IPv6 resolution issues) 3. worker deployment: add TCP readiness probe for runner container and fix cache directory permissions (0700 -> 0777 for nobody user)
Updated all manifests and configurations to use catthehacker/ubuntu:act-24.04 (Ubuntu 24.04 LTS - Noble Numbat) instead of 22.04. Ubuntu 24.04 is the current LTS release with better long-term support and updated tooling. Changes: - Renamed worker and runner configs: ubuntu22-04 -> ubuntu24-04 - Updated runner container image: act-22.04 -> act-24.04 - Updated image digest to point to 24.04 build - Updated all deployment/service selectors and labels - Updated kustomization references The 24.04 LTS is stable and well-tested, with better support for modern build tools compared to the older 22.04 LTS.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On cgroup v2 systems with containerd (Debian 12+, Ubuntu 22.04+, k3s with containerd), Unix sockets in shared
emptyDirvolumes are not visible across containers in the same pod due to namespace isolation.This causes
bb_workerto fail with:Even though
bb_runnersuccessfully creates the socket in the shared volume, the worker container cannot see it.Solution
Switch worker→runner communication from Unix socket to TCP:
Runner config:
listenPaths: ["/worker/runner"]→listenAddresses: [":50051"]listenPathsis Unix-socket-only;listenAddressesuses TCPWorker config:
address: "unix:///worker/runner"→address: "127.0.0.1:50051"::1Worker deployment:
tcp-socket :50051)0700→0777) for nobody userWhy TCP?
Both containers are in the same pod, so
127.0.0.1resolves to the same network namespace. TCP avoids the cgroup v2 isolation entirely while maintaining the same-process-isolation model.Testing
Verified on k3s v1.33.0+k3s1 with containerd on Debian 12: