Skip to content

Add SSH remote LUKS unlock#366

Merged
anatol merged 9 commits into
anatol:masterfrom
pilotstew:pr/ssh-unlock
May 27, 2026
Merged

Add SSH remote LUKS unlock#366
anatol merged 9 commits into
anatol:masterfrom
pilotstew:pr/ssh-unlock

Conversation

@pilotstew
Copy link
Copy Markdown
Contributor

Closes #320.

Picks up @kenshaw's original SSH-unlock proposal after his clearance to take it
over. Reworks against current master so the submission integrates with the
existing prompt machinery rather than walking devices itself.

Approach

  • stdlib golang.org/x/crypto/ssh only. No dropbear, no external daemon.
  • Pubkey-only auth. ssh_host_key + ssh_authorized_keys are paths in
    booster.yaml, read at build time and embedded into the initramfs.
  • SSH submissions land in a pendingPrompts registry in init/luks.go. The
    handler broadcasts the passphrase against every LUKS device currently
    waiting and seeds the in-boot passphrase cache so sibling volumes (e.g.
    btrfs RAID1) unlock without further prompts.
  • Runs concurrently with token attempts (TPM2 PIN, FIDO2, clevis) and the
    local keyboard prompt — not gated behind tokenWg.Wait(). A successful
    out-of-band unlock cancels ctx, dismissing the in-flight
    token/keyboard prompt.

Hardening

Pubkey-only auth, explicit MaxAuthTries=6, 15s handshake deadline
(slow-loris guard), 10 wrong-passphrase attempts per session before
disconnect. No PAM, no shell, no PTY, no exec, no port forwarding. Threat
model and operator guidance (host key extractability from /boot,
brute-force economics, dual-stack :22 caveat) live in REMOTE UNLOCK >
Security notes in docs/manpage.md.

Scope

LUKS volumes only. ZFS-native encryption (#191) uses a different unlock path
and is not covered here.

Test plan

  • init + generator unit suites pass under -race. New tests:
    • TestPendingPromptsInflightFence + TestTrySubmitPassphraseToPendingHoldsInflight
      — race fence preventing panic when an SSH submission races luksOpen's
      watcher closing volumes (both demonstrably fail without the fix).
    • TestSshPromptLoopDisconnectsAfterMaxAttempts — per-session
      brute-force cap (demonstrably fails without the cap).
    • TestSshReadLineSharesScanner — pasted-input safety via shared
      console inputScanner.
    • TestParseAuthorizedKeys*,
      TestReadConfigRejectsEmpty/GarbageAuthorizedKeys,
      TestReadConfigAcceptsLooseHostKeyPerms — build- and parse-time
      validation.
  • tests/ssh_unlock_test.go — 5 QEMU integration cases: single root,
    multi-device shared passphrase, unauthorized key rejection,
    wrong-then-correct retry, FIDO2-pending concurrent unlock.
  • Manual QEMU: TPM2-both + Plymouth + SSH — PIN prompt dismissed cleanly,
    no redraw artifact. Paste from clipboard through real SSH client clean,
    no line-mangling.
  • FIDO2/Yubikey real-hardware test deferred (fake-FIDO2 path covered; real
    Yubikey will follow on the interactive harness).

Co-author credit

pilotstew and others added 7 commits May 19, 2026 20:44
Tracks the set of LUKS prompts currently awaiting keyboard input so
external password sources (SSH remote unlock, next commits) can submit
a passphrase through the existing unlock orchestration — tryPassphraseAgainstSlots,
ctx cancellation, and the passphraseCache fast-path — rather than
walking devices in parallel.

requestKeyboardPassword registers on entry and unregisters on exit.
trySubmitPassphraseToPending snapshots the registry under the lock,
attempts each entry outside the lock, and seeds passphraseCache on
any success.
Three new network.ssh_* options in /etc/booster.yaml:
  ssh_host_key         — path to PEM-encoded host private key
  ssh_authorized_keys  — path to authorized_keys file
  ssh_listen           — listen address (default :22)

Both ssh_host_key and ssh_authorized_keys are read at build time and
embedded into the initramfs config. Validation: one without the other
is a config error, and SSH requires either dhcp or static ip.
Starts a stdlib x/crypto/ssh server during boost() when ssh_authorized_keys
is configured. Pubkey-only auth. Session is restricted to a passphrase
prompt — no shell, no exec, no port forwarding. Submitted passphrase
is fed to trySubmitPassphraseToPending so it reuses the existing LUKS
unlock orchestration.

sshShutdown is called from cleanup() before shutdownNetwork so live
connections close cleanly before the network goes away.

Co-authored-by: Kenneth Shaw <kenshaw@gmail.com>
Adds a dedicated REMOTE UNLOCK section between CRYPTTAB and NOTES
covering host key generation, authorized_keys layout, the network:
config block, client usage, and the session-restriction security
notes. The existing network bullet gets a one-sentence pointer.
Sample config gains a single ssh_authorized_keys line.
QEMU integration tests that boot LUKS roots with SSH remote unlock
enabled, port-forward the listener to the host, dial in with a freshly
generated ed25519 client key, submit the passphrase, and assert the
"Unlocked:" handshake + normal boot completion.

Cases:
- Single LUKS root.
- Two LUKS volumes sharing a passphrase — a single SSH submission
  unlocks both, pinning the pendingPrompts broadcast property.
- Unauthorized client key is rejected at the SSH layer.
- Wrong-then-correct passphrase walks the prompt-loop retry paths.
- FIDO2 token pending (systemd-fido2-nodev.img) — SSH submission lands
  during the FIDO2 wait, slot unlocks via the concurrent passphrase
  path, ctx-cancel dismisses the FIDO2 goroutine.
…mpts

Moves the pendingPrompts registration up from requestKeyboardPassword
into luksOpen so out-of-band sources (SSH remote unlock) can submit a
passphrase while tokens (TPM2 PIN, FIDO2 PIN, clevis) are still in
flight, not just while the keyboard prompt is active.

A successful out-of-band unlock cancels ctx the same way a sibling
token's success does, dismissing any pending PIN/passphrase prompt on
console or Plymouth. Slot-level conflict isn't possible because tokens
unseal their own slots (t.Slots) while passphrase unlocks target
checkSlotsWithPassword, which already excludes slotsWithTokens.

The SSH "no devices matched" message is reworded to reflect that the
submit attempted unlock against the registered targets — the failure
mode is "passphrase wrong" not "no devices pending".
Some hosts bind QEMU user-mode hostfwd on IPv6-only loopback, so a
client dialing 127.0.0.1:<port> hits connection refused even though
the guest SSH server is listening fine. Pinning the forwarding
endpoint to 127.0.0.1 explicitly removes the IPv4-vs-IPv6 race.

Mirrors anatol/booster's tests-improvements commit f5e33f7 which
applies the same fix to archlinux_test.go, clevis_test.go, and
integration_test.go on master; this commit extends the same form to
the SSH remote unlock vmtests added in this branch.

Co-authored-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Comment thread init/luks.go Outdated
Comment thread init/ssh.go Outdated
Comment thread init/ssh.go
// Ctrl+C byte (0x03) ourselves and abort. The exception is pasted
// text: a 0x03 that arrives inside a paste is real content, not a
// cancel, so we leave it alone there.
func sshReadLine(ch gossh.Channel) ([]byte, error) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what @kenshaw meant saying that golang.org/x/crypto/ssh is too low level. Feel free to add github.com/gliderlabs/ssh if you want it, and let's hope that this package is secure enough for booster.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at gliderlabs/ssh to make sure I'm not just being stubborn about the stdlib. There's some value in net/http-style Handler API, idle timeout config, auto-replied pty-req and would save us a few lines in the listen/dispatch path (sshRun/sshHandleConn/sshHandleSession). But it doesn't change the readline path, gliderlabs ssh.Session embeds gossh.Channel from x/crypto/ssh, so Session.Read([]byte) returns the same raw channel bytes either way. So gliderlabs doesn't provide terminal-escape stripping, bracketed-paste, line editing or Ctrl+C signal handling. Deciding whether a 0x03 byte means "cancel" or "literal paste content inside \x1b[200~...\x1b[201~" is application-level and what I addressed in the byte mode reader.

I chose to route SSH input through the same per-byte FSM as the local console prompt (init/console_input.go) so the two unlock paths behave byte-for-byte identically: same line editing, same bracketed-paste handling, same UTF-8 reassembly across reads, same ctx cancellation. inputScanner comes from the raw byte read PR (commit 571cc8c) and was built with SSH paste in mind. It allows users to paste passphrases from password managers, and modern terminal emulators bracket the pasted content with \x1b[200~ ... \x1b[201~ markers. Basic readerline would treat those markers as literal password bytes and corrupt the entry. This is I believe an improvement over ssh luks unlock systems that break with copy/paste and other handlers I built via byte mapping in consoleInput.

Swapping to gliderlabs would route the save bytes through a different wrapper without removing the per-byte loop or the 0x03 check, and would add a dependency to initramfs code that currently runs on stdlib + anatol/luks.go only. I think you were right to want stdlib here since the savings are limited to listen/dispatch boilerplate but adds another dependency and larger security surface.

TestSshReadLineSharesScanner (init/ssh_test.go) pins the parity with the console scanner across 15 sub-cases: arrow-key strip, OSC sequence strip, BS/DEL editing, Ctrl+U kill, UTF-8 multi-byte, bracketed paste with a literal Ctrl+C inside preserved as content, etc.

@pilotstew
Copy link
Copy Markdown
Contributor Author

I should be able to get these updates done tomorrow.

Mechanical modernization noted in the anatol#366 review — sync.WaitGroup.Go
(Go 1.25) inlines the Add(1) + go + defer wg.Done() pattern.
trySubmitPassphraseToPending was the last site still spelling it out
the long way; the surrounding luksOpen senders already use it.

Pure refactor — no behaviour change. Existing TestPendingPromptsInflightFence
and TestTrySubmitPassphraseToPendingHoldsInflight cover the path; both
still pass under -race.
pilotstew added a commit to pilotstew/booster that referenced this pull request May 20, 2026
Addresses the multi-disk UX gap raised in the anatol#366 review:
> How does it work if there are multiple disks that wait to be unlocked?
> How does the user know what disk to enter password for?

The previous prompt was a bare "Enter passphrase: " and the session
exited after the first successful submission, so an operator with
distinct-passphrase volumes had to reconnect once per volume with no
way to see what was still waiting.

Two changes:

  1. Snapshot the live pendingPrompts entries on each iteration and
     embed their mapping names in the prompt, e.g.
     "Enter passphrase for cryptdata, cryptroot: ". Matches the
     console's "Enter passphrase for <name>:" wording.

  2. Keep the session alive after a successful submission — the loop
     now only exits when pendingPrompts drains ("All devices unlocked.")
     or the per-session sshMaxPromptAttempts cap kicks in. One SSH
     session can now serve multiple devices with distinct passphrases
     sequentially: prompt refreshes after each unlock to show the
     remaining names. Sibling-passphrase devices still unlock from a
     single submission via the existing broadcast-and-cache path.

Cap semantics unchanged (only wrong submissions advance the counter;
successful unlocks do not). The vmtests' prompt assertions widened
from "Enter passphrase:" to "Enter passphrase for " to match either
prompt shape. The cap unit test now registers a stub luks.Device
returning ErrPassphraseDoesNotMatch so the loop stays alive across
sshMaxPromptAttempts wrong submissions; with an empty registry the
loop now correctly short-circuits to "All devices unlocked."
@pilotstew
Copy link
Copy Markdown
Contributor Author

Almost done with testing the multidevice changes, I think I'll get this done tonight.

Addresses the multi-disk UX gap raised in the anatol#366 review:
> How does it work if there are multiple disks that wait to be unlocked?
> How does the user know what disk to enter password for?

The previous prompt was a bare "Enter passphrase: " and the session
exited after the first successful submission, so an operator with
distinct-passphrase volumes had to reconnect once per volume with no
way to see what was still waiting.

Three changes:

  1. Snapshot the live pendingPrompts entries on each iteration and
     embed their mapping names in the prompt, e.g.
     "Enter passphrase for cryptdata, cryptroot: ". Matches the
     console's "Enter passphrase for <name>:" wording.

  2. Keep the session alive after a successful submission — the loop
     now only exits when pendingPrompts drains ("All devices unlocked.")
     or the per-session sshMaxPromptAttempts cap kicks in. One SSH
     session can now serve multiple devices with distinct passphrases
     sequentially: prompt refreshes after each unlock to show the
     remaining names. Sibling-passphrase devices still unlock from a
     single submission via the existing broadcast-and-cache path.

  3. trySubmitPassphraseToPending now calls p.cancel() on the unlocked
     device immediately after a successful UnsealVolume (mirrors the
     token-success cancel at luksOpen). pendingDeviceNames's existing
     ctx.Err() filter then drops the entry on the very next prompt-
     loop iteration. Without this, the registration would linger
     until luksOpen's deferred unregister fires (~tens of ms after
     SetupMapper), and the next prompt would redundantly name the
     just-unlocked device.

Cap semantics unchanged (only wrong submissions advance the counter;
successful unlocks do not). The vmtests' prompt assertions widened
from "Enter passphrase:" to "Enter passphrase for " to match either
prompt shape. The cap unit test now registers a stub luks.Device
returning ErrPassphraseDoesNotMatch so the loop stays alive across
sshMaxPromptAttempts wrong submissions; with an empty registry the
loop now correctly short-circuits to "All devices unlocked."

Tests:
  - TestTrySubmitPassphraseCancelsRegistrationOnSuccess (unit) pins
    the cancel property as a deterministic regression test. Fails
    when p.cancel() is removed from the success branch.
  - TestSSHRemoteUnlockBtrfsRaid1SharedPassphrase (vmtest) exercises
    the broadcast path on a multi-device btrfs root. One submission
    unlocks both LUKS-wrapped members; the SSH listener stays alive
    through waitForBtrfsDevicesReady's polling until btrfs reports
    the array assembled. Only test asserting "All devices unlocked."
    is emitted.
  - TestSSHRemoteUnlockBtrfsRaid1DistinctPassphrase (vmtest) is the
    full multi-device single-session distinct-passphrase case: each
    LUKS-wrapped btrfs member has its own passphrase, the btrfs gate
    keeps the SSH session alive across the gap, and the prompt
    redraws to name only the still-pending member after each
    successful unlock. The strongest end-to-end proof of (1) + (2)
    + (3) working together.

The luks_btrfs_raid1_distinct.sh generator was cherry-picked from
the shelved pr/btrfs-raid1-distinct-pass branch (issue anatol#283), retaining
its existing UUIDs/passphrases (1111 / 2222) so the asset image already
on disk works without regeneration.
@pilotstew
Copy link
Copy Markdown
Contributor Author

Let me know if anything else jumps out at you or if I need to reword the Ctrl+C comments. Otherwise I think (hope) this is ready.

@anatol
Copy link
Copy Markdown
Owner

anatol commented May 21, 2026

@kenshaw if you have a chance, could you please try it and let us know your opinion?

@kenshaw
Copy link
Copy Markdown
Contributor

kenshaw commented May 21, 2026

Sure, I'll deploy it on some production servers this weekend and try it. Thanks.

@anatol
Copy link
Copy Markdown
Owner

anatol commented May 27, 2026

Okay moving forward with this change.

It would be really useful to get feedback from more users. @kenshaw if you can share your experience with this implementation it would be fantastic. Thank you!

@anatol anatol merged commit c0de6b9 into anatol:master May 27, 2026
anatol pushed a commit that referenced this pull request May 27, 2026
Mechanical modernization noted in the #366 review — sync.WaitGroup.Go
(Go 1.25) inlines the Add(1) + go + defer wg.Done() pattern.
trySubmitPassphraseToPending was the last site still spelling it out
the long way; the surrounding luksOpen senders already use it.

Pure refactor — no behaviour change. Existing TestPendingPromptsInflightFence
and TestTrySubmitPassphraseToPendingHoldsInflight cover the path; both
still pass under -race.
@pilotstew pilotstew deleted the pr/ssh-unlock branch May 28, 2026 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants