Add SSH remote LUKS unlock#366
Conversation
Tracks the set of LUKS prompts currently awaiting keyboard input so external password sources (SSH remote unlock, next commits) can submit a passphrase through the existing unlock orchestration — tryPassphraseAgainstSlots, ctx cancellation, and the passphraseCache fast-path — rather than walking devices in parallel. requestKeyboardPassword registers on entry and unregisters on exit. trySubmitPassphraseToPending snapshots the registry under the lock, attempts each entry outside the lock, and seeds passphraseCache on any success.
Three new network.ssh_* options in /etc/booster.yaml: ssh_host_key — path to PEM-encoded host private key ssh_authorized_keys — path to authorized_keys file ssh_listen — listen address (default :22) Both ssh_host_key and ssh_authorized_keys are read at build time and embedded into the initramfs config. Validation: one without the other is a config error, and SSH requires either dhcp or static ip.
Starts a stdlib x/crypto/ssh server during boost() when ssh_authorized_keys is configured. Pubkey-only auth. Session is restricted to a passphrase prompt — no shell, no exec, no port forwarding. Submitted passphrase is fed to trySubmitPassphraseToPending so it reuses the existing LUKS unlock orchestration. sshShutdown is called from cleanup() before shutdownNetwork so live connections close cleanly before the network goes away. Co-authored-by: Kenneth Shaw <kenshaw@gmail.com>
Adds a dedicated REMOTE UNLOCK section between CRYPTTAB and NOTES covering host key generation, authorized_keys layout, the network: config block, client usage, and the session-restriction security notes. The existing network bullet gets a one-sentence pointer. Sample config gains a single ssh_authorized_keys line.
QEMU integration tests that boot LUKS roots with SSH remote unlock enabled, port-forward the listener to the host, dial in with a freshly generated ed25519 client key, submit the passphrase, and assert the "Unlocked:" handshake + normal boot completion. Cases: - Single LUKS root. - Two LUKS volumes sharing a passphrase — a single SSH submission unlocks both, pinning the pendingPrompts broadcast property. - Unauthorized client key is rejected at the SSH layer. - Wrong-then-correct passphrase walks the prompt-loop retry paths. - FIDO2 token pending (systemd-fido2-nodev.img) — SSH submission lands during the FIDO2 wait, slot unlocks via the concurrent passphrase path, ctx-cancel dismisses the FIDO2 goroutine.
…mpts Moves the pendingPrompts registration up from requestKeyboardPassword into luksOpen so out-of-band sources (SSH remote unlock) can submit a passphrase while tokens (TPM2 PIN, FIDO2 PIN, clevis) are still in flight, not just while the keyboard prompt is active. A successful out-of-band unlock cancels ctx the same way a sibling token's success does, dismissing any pending PIN/passphrase prompt on console or Plymouth. Slot-level conflict isn't possible because tokens unseal their own slots (t.Slots) while passphrase unlocks target checkSlotsWithPassword, which already excludes slotsWithTokens. The SSH "no devices matched" message is reworded to reflect that the submit attempted unlock against the registered targets — the failure mode is "passphrase wrong" not "no devices pending".
Some hosts bind QEMU user-mode hostfwd on IPv6-only loopback, so a client dialing 127.0.0.1:<port> hits connection refused even though the guest SSH server is listening fine. Pinning the forwarding endpoint to 127.0.0.1 explicitly removes the IPv4-vs-IPv6 race. Mirrors anatol/booster's tests-improvements commit f5e33f7 which applies the same fix to archlinux_test.go, clevis_test.go, and integration_test.go on master; this commit extends the same form to the SSH remote unlock vmtests added in this branch. Co-authored-by: Anatol Pomozov <anatol.pomozov@gmail.com>
| // Ctrl+C byte (0x03) ourselves and abort. The exception is pasted | ||
| // text: a 0x03 that arrives inside a paste is real content, not a | ||
| // cancel, so we leave it alone there. | ||
| func sshReadLine(ch gossh.Channel) ([]byte, error) { |
There was a problem hiding this comment.
I see what @kenshaw meant saying that golang.org/x/crypto/ssh is too low level. Feel free to add github.com/gliderlabs/ssh if you want it, and let's hope that this package is secure enough for booster.
There was a problem hiding this comment.
Looked at gliderlabs/ssh to make sure I'm not just being stubborn about the stdlib. There's some value in net/http-style Handler API, idle timeout config, auto-replied pty-req and would save us a few lines in the listen/dispatch path (sshRun/sshHandleConn/sshHandleSession). But it doesn't change the readline path, gliderlabs ssh.Session embeds gossh.Channel from x/crypto/ssh, so Session.Read([]byte) returns the same raw channel bytes either way. So gliderlabs doesn't provide terminal-escape stripping, bracketed-paste, line editing or Ctrl+C signal handling. Deciding whether a 0x03 byte means "cancel" or "literal paste content inside \x1b[200~...\x1b[201~" is application-level and what I addressed in the byte mode reader.
I chose to route SSH input through the same per-byte FSM as the local console prompt (init/console_input.go) so the two unlock paths behave byte-for-byte identically: same line editing, same bracketed-paste handling, same UTF-8 reassembly across reads, same ctx cancellation. inputScanner comes from the raw byte read PR (commit 571cc8c) and was built with SSH paste in mind. It allows users to paste passphrases from password managers, and modern terminal emulators bracket the pasted content with \x1b[200~ ... \x1b[201~ markers. Basic readerline would treat those markers as literal password bytes and corrupt the entry. This is I believe an improvement over ssh luks unlock systems that break with copy/paste and other handlers I built via byte mapping in consoleInput.
Swapping to gliderlabs would route the save bytes through a different wrapper without removing the per-byte loop or the 0x03 check, and would add a dependency to initramfs code that currently runs on stdlib + anatol/luks.go only. I think you were right to want stdlib here since the savings are limited to listen/dispatch boilerplate but adds another dependency and larger security surface.
TestSshReadLineSharesScanner (init/ssh_test.go) pins the parity with the console scanner across 15 sub-cases: arrow-key strip, OSC sequence strip, BS/DEL editing, Ctrl+U kill, UTF-8 multi-byte, bracketed paste with a literal Ctrl+C inside preserved as content, etc.
|
I should be able to get these updates done tomorrow. |
Mechanical modernization noted in the anatol#366 review — sync.WaitGroup.Go (Go 1.25) inlines the Add(1) + go + defer wg.Done() pattern. trySubmitPassphraseToPending was the last site still spelling it out the long way; the surrounding luksOpen senders already use it. Pure refactor — no behaviour change. Existing TestPendingPromptsInflightFence and TestTrySubmitPassphraseToPendingHoldsInflight cover the path; both still pass under -race.
Addresses the multi-disk UX gap raised in the anatol#366 review: > How does it work if there are multiple disks that wait to be unlocked? > How does the user know what disk to enter password for? The previous prompt was a bare "Enter passphrase: " and the session exited after the first successful submission, so an operator with distinct-passphrase volumes had to reconnect once per volume with no way to see what was still waiting. Two changes: 1. Snapshot the live pendingPrompts entries on each iteration and embed their mapping names in the prompt, e.g. "Enter passphrase for cryptdata, cryptroot: ". Matches the console's "Enter passphrase for <name>:" wording. 2. Keep the session alive after a successful submission — the loop now only exits when pendingPrompts drains ("All devices unlocked.") or the per-session sshMaxPromptAttempts cap kicks in. One SSH session can now serve multiple devices with distinct passphrases sequentially: prompt refreshes after each unlock to show the remaining names. Sibling-passphrase devices still unlock from a single submission via the existing broadcast-and-cache path. Cap semantics unchanged (only wrong submissions advance the counter; successful unlocks do not). The vmtests' prompt assertions widened from "Enter passphrase:" to "Enter passphrase for " to match either prompt shape. The cap unit test now registers a stub luks.Device returning ErrPassphraseDoesNotMatch so the loop stays alive across sshMaxPromptAttempts wrong submissions; with an empty registry the loop now correctly short-circuits to "All devices unlocked."
|
Almost done with testing the multidevice changes, I think I'll get this done tonight. |
Addresses the multi-disk UX gap raised in the anatol#366 review: > How does it work if there are multiple disks that wait to be unlocked? > How does the user know what disk to enter password for? The previous prompt was a bare "Enter passphrase: " and the session exited after the first successful submission, so an operator with distinct-passphrase volumes had to reconnect once per volume with no way to see what was still waiting. Three changes: 1. Snapshot the live pendingPrompts entries on each iteration and embed their mapping names in the prompt, e.g. "Enter passphrase for cryptdata, cryptroot: ". Matches the console's "Enter passphrase for <name>:" wording. 2. Keep the session alive after a successful submission — the loop now only exits when pendingPrompts drains ("All devices unlocked.") or the per-session sshMaxPromptAttempts cap kicks in. One SSH session can now serve multiple devices with distinct passphrases sequentially: prompt refreshes after each unlock to show the remaining names. Sibling-passphrase devices still unlock from a single submission via the existing broadcast-and-cache path. 3. trySubmitPassphraseToPending now calls p.cancel() on the unlocked device immediately after a successful UnsealVolume (mirrors the token-success cancel at luksOpen). pendingDeviceNames's existing ctx.Err() filter then drops the entry on the very next prompt- loop iteration. Without this, the registration would linger until luksOpen's deferred unregister fires (~tens of ms after SetupMapper), and the next prompt would redundantly name the just-unlocked device. Cap semantics unchanged (only wrong submissions advance the counter; successful unlocks do not). The vmtests' prompt assertions widened from "Enter passphrase:" to "Enter passphrase for " to match either prompt shape. The cap unit test now registers a stub luks.Device returning ErrPassphraseDoesNotMatch so the loop stays alive across sshMaxPromptAttempts wrong submissions; with an empty registry the loop now correctly short-circuits to "All devices unlocked." Tests: - TestTrySubmitPassphraseCancelsRegistrationOnSuccess (unit) pins the cancel property as a deterministic regression test. Fails when p.cancel() is removed from the success branch. - TestSSHRemoteUnlockBtrfsRaid1SharedPassphrase (vmtest) exercises the broadcast path on a multi-device btrfs root. One submission unlocks both LUKS-wrapped members; the SSH listener stays alive through waitForBtrfsDevicesReady's polling until btrfs reports the array assembled. Only test asserting "All devices unlocked." is emitted. - TestSSHRemoteUnlockBtrfsRaid1DistinctPassphrase (vmtest) is the full multi-device single-session distinct-passphrase case: each LUKS-wrapped btrfs member has its own passphrase, the btrfs gate keeps the SSH session alive across the gap, and the prompt redraws to name only the still-pending member after each successful unlock. The strongest end-to-end proof of (1) + (2) + (3) working together. The luks_btrfs_raid1_distinct.sh generator was cherry-picked from the shelved pr/btrfs-raid1-distinct-pass branch (issue anatol#283), retaining its existing UUIDs/passphrases (1111 / 2222) so the asset image already on disk works without regeneration.
|
Let me know if anything else jumps out at you or if I need to reword the Ctrl+C comments. Otherwise I think (hope) this is ready. |
|
@kenshaw if you have a chance, could you please try it and let us know your opinion? |
|
Sure, I'll deploy it on some production servers this weekend and try it. Thanks. |
|
Okay moving forward with this change. It would be really useful to get feedback from more users. @kenshaw if you can share your experience with this implementation it would be fantastic. Thank you! |
Mechanical modernization noted in the #366 review — sync.WaitGroup.Go (Go 1.25) inlines the Add(1) + go + defer wg.Done() pattern. trySubmitPassphraseToPending was the last site still spelling it out the long way; the surrounding luksOpen senders already use it. Pure refactor — no behaviour change. Existing TestPendingPromptsInflightFence and TestTrySubmitPassphraseToPendingHoldsInflight cover the path; both still pass under -race.
Closes #320.
Picks up @kenshaw's original SSH-unlock proposal after his clearance to take it
over. Reworks against current master so the submission integrates with the
existing prompt machinery rather than walking devices itself.
Approach
golang.org/x/crypto/sshonly. No dropbear, no external daemon.ssh_host_key+ssh_authorized_keysare paths inbooster.yaml, read at build time and embedded into the initramfs.pendingPromptsregistry ininit/luks.go. Thehandler broadcasts the passphrase against every LUKS device currently
waiting and seeds the in-boot passphrase cache so sibling volumes (e.g.
btrfs RAID1) unlock without further prompts.
local keyboard prompt — not gated behind
tokenWg.Wait(). A successfulout-of-band unlock cancels
ctx, dismissing the in-flighttoken/keyboard prompt.
Hardening
Pubkey-only auth, explicit
MaxAuthTries=6, 15s handshake deadline(slow-loris guard), 10 wrong-passphrase attempts per session before
disconnect. No PAM, no shell, no PTY, no exec, no port forwarding. Threat
model and operator guidance (host key extractability from
/boot,brute-force economics, dual-stack
:22caveat) live in REMOTE UNLOCK >Security notes in
docs/manpage.md.Scope
LUKS volumes only. ZFS-native encryption (#191) uses a different unlock path
and is not covered here.
Test plan
init+generatorunit suites pass under-race. New tests:TestPendingPromptsInflightFence+TestTrySubmitPassphraseToPendingHoldsInflight— race fence preventing panic when an SSH submission races
luksOpen'swatcher closing
volumes(both demonstrably fail without the fix).TestSshPromptLoopDisconnectsAfterMaxAttempts— per-sessionbrute-force cap (demonstrably fails without the cap).
TestSshReadLineSharesScanner— pasted-input safety via sharedconsole inputScanner.
TestParseAuthorizedKeys*,TestReadConfigRejectsEmpty/GarbageAuthorizedKeys,TestReadConfigAcceptsLooseHostKeyPerms— build- and parse-timevalidation.
tests/ssh_unlock_test.go— 5 QEMU integration cases: single root,multi-device shared passphrase, unauthorized key rejection,
wrong-then-correct retry, FIDO2-pending concurrent unlock.
no redraw artifact. Paste from clipboard through real SSH client clean,
no line-mangling.
Yubikey will follow on the interactive harness).
Co-author credit
init/ssh: SSH server for remote LUKS unlockpreserves @kenshaw'sco-authorship — much of the structural scaffold is from Add remote ssh LUKS device unlocking #320.
tests: pin SSH hostfwd to IPv4 loopbackcredits @anatol — patterntaken from
tests-improvements(f5e33f7).