Skip to content

init/plymouth: replace plymouth CLI with direct socket IPC#358

Merged
anatol merged 6 commits into
anatol:masterfrom
pilotstew:pr/plymouth-socket-ipc
May 7, 2026
Merged

init/plymouth: replace plymouth CLI with direct socket IPC#358
anatol merged 6 commits into
anatol:masterfrom
pilotstew:pr/plymouth-socket-ipc

Conversation

@pilotstew
Copy link
Copy Markdown
Contributor

@pilotstew pilotstew commented May 7, 2026

Summary

Talk to plymouthd directly over its abstract Unix socket
(\x00/org/freedesktop/plymouthd) instead of fork-exec'ing the
plymouth CLI client. The wire protocol is documented in Plymouth's
ply-boot-protocol.h.

Carved into 5 commits for review:

  1. init/plymouth: replace plymouth CLI with direct socket IPC
    add init/plysocket.go (transport), swap the 5 exec call sites for
    ping / show-splash / display-message / quit / update-root-fs. Drop
    /usr/bin/plymouth from the initramfs (~80KB binary plus
    glib + libply-boot-client transitive deps). No call-site
    signature changes; plymouthAskPassword stays on the exec path
    for now and converts in commit 5 when ctx threading lands.

  2. init/plymouth: redirect plymouthd stderr to /dev/kmsg
    inherited fds are closed (FD_CLOEXEC) when booster exec's to
    systemd, so a plymouthd that inherits booster's stderr receives
    EPIPE on its next stderr write and dies before
    plymouth-start.service can attach. Routing to /dev/kmsg
    survives the handoff and lands the diagnostic output in the
    kernel ring buffer.

  3. init/plymouth: make plymouthMessage fire-and-forget
    plymouthd is single-threaded; a synchronous in-process
    display-message call would block on splash state, slowing
    concurrent unlock work. Wrap in a goroutine.

  4. init/plymouth: thread ctx through waitForPlymouthInit
    small signature change so unlock paths can bail when a sibling
    token has already cleared the volume, instead of blocking on
    plymouthd init for a volume that's already being unlocked.

  5. init/plymouth: ctx-aware password prompt with cancel-on-hangup
    (the headline) — plymouthAskPassword(ctx, prompt) plus a
    serialization mutex. On ctx cancel the underlying socket is closed;
    the goroutine returns cleanly. plymouthd builds whose
    connection-hangup handler tears down pending prompts also dismiss
    the on-screen UI on close — older builds leave the UI visible
    until the splash is otherwise cleared, but boot proceeds correctly
    either way (matching upstream's prior behaviour minus the orphaned
    plymouth subprocess and leaked goroutine the exec path produced).
    askPasswordWithFallback skips the console fallback when ctx is
    already cancelled, so an already-unlocking volume doesn't flash a
    stray console prompt.

Why

No new dependencies. Tests for splash messaging (#357) still pass
unchanged.

Upstream Plymouth dependency

The cancel-on-hangup behavior in commit 5 — dismissing the on-screen
prompt when the booster client disconnects — depends on plymouthd
having a connection-hangup handler that tears down pending prompts.
That fix is Plymouth MR !393,
currently awaiting upstream review (closes Plymouth issues #125 and
#126).

This PR does not require !393 to land first. Without it, booster
still cleans up its end of the socket on cancel — boot proceeds
correctly, the goroutine returns, no orphaned subprocess. The only
visible difference is a stale prompt that lingers on the splash until
plymouth quits later in boot. Pure visual polish, not a regression vs.
the exec path.

Once !393 is in users' plymouthd, the splash prompt also dismisses
cleanly, completing the UX that #354 / #355 / #356 brought to the
console side.

Test plan

  • go build ./init ./generator clean
  • TestPromptVolumeUnlocked and TestTokenFriendlyName pass
  • Image builds with the smaller plymouth payload (no
    /usr/bin/plymouth in the cpio)
  • Boot test on host with concurrent FIDO2 + TPM2 + keyboard
    unlock paths
  • Boot test on a plymouthd build without the connection-hangup
    handler — confirm no behavioural regression vs. exec path

pilotstew added 5 commits May 6, 2026 22:15
Talk to plymouthd directly over its abstract Unix socket instead of
fork-exec'ing the plymouth CLI client for ping, show-splash,
display-message, quit, and update-root-fs. The wire protocol is
documented in Plymouth's ply-boot-protocol.h.

plysocket.go encapsulates dial / send / recv with three helpers:

  - plymouthSendRecv(frame): raw frame, 1-byte response
  - plymouthCmd(typ, arg): NUL-terminated argument frame, expects ACK
  - plymouthPingOnce(): single ping, returns true on ACK

Drop /usr/bin/plymouth from the initramfs (~80KB binary plus
glib + libply-boot-client transitive deps). plymouthd alone is
sufficient now that init speaks the protocol directly.

No call-site signature changes; plymouthAskPassword stays on the exec
path for now and converts in a later commit when ctx threading lands.
Inherited file descriptors are closed (FD_CLOEXEC) when booster exec's
to systemd, so a plymouthd that inherits booster's stderr will receive
EPIPE on its next stderr write and die — before systemd's
plymouth-start.service can attach to the existing session.

Open /dev/kmsg explicitly and assign it to cmd.Stderr so plymouthd's
diagnostic output ends up in the kernel ring buffer (visible in
journalctl -k post-boot) and survives the handoff to systemd.
Wrap the display-message socket call in a goroutine. plymouthd is
single-threaded and can stall during render setup or while a password
prompt is on screen; a synchronous in-process call would block the
calling goroutine on splash state, slowing concurrent unlock work.
Change waitForPlymouthInit() to take a ctx and return an error so
unlock paths can bail when a sibling token has already cleared the
volume — instead of blocking on plymouthd startup for a volume that's
already being unlocked.

requestKeyboardPassword previously waited unconditionally on
plymouth-init then re-checked ctx; merging the two into a single
ctx-aware select removes the redundant check and lets cancellation
propagate cleanly through the wait.
plymouthAskPassword now takes a ctx. On cancellation the underlying
socket is closed; the in-flight read returns and the goroutine exits
cleanly, holding no resources from the dropped prompt.

In plymouthd builds whose connection-hangup handler tears down pending
prompts, the daemon also dismisses the on-screen prompt UI on the same
socket close, completing the UX. Older plymouthd builds leave the
prompt UI visible on the splash until something else clears it; boot
still proceeds correctly — matching upstream's prior behaviour minus
the orphaned plymouth subprocess and leaked goroutine the exec path
produced.

Serialize calls under plymouthPasswordMu so concurrent unlock
goroutines don't stack two prompts on the splash. The mutex pairs with
a re-check of ctx.Err() after acquire to skip our prompt entirely if
the volume was unlocked while we were waiting on the lock — avoids
flashing a UI for an already-unlocking volume.

askPasswordWithFallback honors ctx cancellation by returning ctx.Err()
without falling back to the console reader. Falling back would print a
prompt to /dev/console that the LUKS unlock loop has already abandoned,
which is at best confusing and at worst lets the user type a passphrase
that gets discarded.
Comment thread init/plysocket.go
Address review feedback on PR anatol#358: add a file-level reference covering
the upstream protocol header, frame format, the full verb table (with
the six booster uses called out), and server response bytes.
@anatol
Copy link
Copy Markdown
Owner

anatol commented May 7, 2026

Thank you very much for this change!

@anatol anatol merged commit 629d841 into anatol:master May 7, 2026
@pilotstew pilotstew deleted the pr/plymouth-socket-ipc branch May 9, 2026 02:53
pilotstew added a commit to pilotstew/booster that referenced this pull request May 14, 2026
Adds a new NOTES subsection covering the concurrent-unlock model that
landed across PRs anatol#350, anatol#353, anatol#355, anatol#356, anatol#357, anatol#358, and anatol#362:
PIN-token serialization in ascending LUKS2 token-ID order, cancel-on-win
semantics for keyboard/FIDO2-PIN/TPM2-PIN prompts on both the console
and the Plymouth splash (with the MR !393 caveat for older Plymouth
builds), and the per-token 3-attempt PIN cap with empty-PIN skip.

Trims two paragraphs from the existing 'Password entry' subsection
(auto-dismiss and PIN attempts) now that the new section covers them
in fuller context. 'Password entry' keeps the Ctrl+W / Ctrl+U / Tab
edit-key reference.
anatol pushed a commit that referenced this pull request May 14, 2026
Adds a new NOTES subsection covering the concurrent-unlock model that
landed across PRs #350, #353, #355, #356, #357, #358, and #362:
PIN-token serialization in ascending LUKS2 token-ID order, cancel-on-win
semantics for keyboard/FIDO2-PIN/TPM2-PIN prompts on both the console
and the Plymouth splash (with the MR !393 caveat for older Plymouth
builds), and the per-token 3-attempt PIN cap with empty-PIN skip.

Trims two paragraphs from the existing 'Password entry' subsection
(auto-dismiss and PIN attempts) now that the new section covers them
in fuller context. 'Password entry' keeps the Ctrl+W / Ctrl+U / Tab
edit-key reference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants