ABX-375: All-in-HV + FEX runtime by AprilNEA · Pull Request #293 · arcboxlabs/arcbox

AprilNEA · 2026-05-31T10:12:09Z

ABX-375: All-in-HV + FEX64 for `linux/amd64` — status & handoff

Goal: run linux/amd64 OCI containers through FEX64 (binfmt) inside the single
HV utility VM, so no VZ/Rosetta runtime VM is needed. VZ/Rosetta kept only as an
optional build backend / fallback (ABX-374, PR #291, branch
feat/dual-utility-vm-routing — preserved, do not delete).

Status: ⛔ BLOCKED at Gate A on Apple Silicon (Apple M5 Max / macOS 26)

✅ arm64 native containers: PASS.
❌ linux/amd64 via FEX64: FEX SIGILLs — does not run.

Root cause (diagnosed live)

The HVF guest is advertised phantom SVE it cannot execute. /proc/cpuinfo in
the guest reports sve2 sve2p1 svebf16 …, yet a single SVE instruction (rdvl)
SIGILLs. FEX trusts HWCAP_SVE, runs SVE, and traps. This appears specific to
M5 + macOS-26 HVF (a real M1 advertises no SVE → FEX uses NEON and works).

Fixes attempted

arm64.nosve arm64.nosme guest cmdline (this branch, 514f36b,
app/arcbox-core/src/vm_lifecycle/mod.rs) — ✅ verified: guest → 0 SVE
features. Correct hardening regardless of the amd64 decision; keep.
apple-m1 FEX build (boot-assets PR refactor(vmm): split vmm.rs into platform submodules #23) — removed FEX's auto-vectorized
SVE, but FEX-2605 also emits explicit, unconditional SVE (gdb:
ld1rd {z5.d}, p5/z in FEX's own code) that runs and traps even with guest
SVE disabled. So neither fix, nor both together, makes FEX run amd64 here.

Decision point

Per the ABX-375 plan, a Gate-A failure → resume ABX-374 (VZ Rosetta) for amd64
rather than duct-taping. Options:

A (recommended): route linux/amd64 runtime to VZ Rosetta (proven); keep the
HV path + arm64.nosve for arm64. FEX stays optional/experimental.
B: patch FEX-2605 to stop emitting unconditional SVE (or properly gate it),
or try a newer FEX — uncertain; may be a FEX-vs-M5/HVF incompatibility.

What's in this branch

ABX-375 routing pivot — amd64→FEX translator, fail-closed when FEX absent, no VZ
runtime VM (f4387f9, app/arcbox-docker/src/routing.rs + handlers).
assets.lock pinned to boot-assets v0.5.9 (static FEX at
/arcbox/runtime/bin/FEX) — c9e2e6b.
Reproducible gate harness — tests/fex/validate-fex64.sh.
arm64.nosve guest fix — 514f36b.

Related resources

boot-assets PR refactor(vmm): split vmm.rs into platform submodules #23 (FEX apple-m1 build + scoped SVE guard; CI-red by design —
the guard correctly catches FEX's remaining explicit SVE):
fix(fex): target apple-m1 to avoid SVE codegen (SIGILL on Apple Silicon) boot-assets#23
boot-assets v0.5.9 release — the static FEX binaries that ship today.
ABX-374 fallback (preserved): branch feat/dual-utility-vm-routing, PR feat(docker): add utility VM routing seam #291.

Reproduce the blocker

# in the guest (via any container):
grep -m1 Features /proc/cpuinfo | tr ' ' '\n' | grep sve   # -> sve2p1, svebf16, ...
# a one-line `rdvl` program compiled with -march=armv8-a+sve SIGILLs (exit 132).
# amd64 path:
docker --context arcbox run --rm --platform linux/amd64 alpine uname -m   # SIGILL

Amp-Thread-ID: https://ampcode.com/threads/T-019e68f7-9a35-745d-b9db-c8085864e5a7 Co-authored-by: Amp <amp@ampcode.com>

Records the chosen utility VM role against the canonical container ID returned by `POST /containers/create` (and the exec ID returned by `POST /containers/{id}/exec`) so every follow-up lifecycle call — start, stop, kill, restart, rename, remove, inspect, logs, top, stats, changes, wait, pause/unpause, attach, exec start/resize/inspect — is proxied to the same role. The registry is process-local; lookups for pre-existing or post-restart workloads return `None` and callers fall back to the native default. Durable persistence and strict fail-closed behavior are deferred to a later slice once the connector layer actually resolves each role to a distinct guest dockerd endpoint.

Address routing gaps that would surface once `native` and `rosetta` resolve to distinct guest dockerds: - WorkloadRoleRegistry now tracks container `--name` aliases (with rename propagation) and resolves short hex IDs by canonical prefix, so `docker start web` and `docker logs ab12c3` land on the same role as the canonical entry instead of falling through to native. - proxy_fallback resolves the role from the URI (container ID, then exec ID, then native default), so unrouted endpoints like `/containers/{id}/archive`, `/containers/{id}/update`, and `/exec/{id}/resize` follow the workload's VM. - Module docs reworded to make clear the registry tracks bindings in-process rather than persisting them; durable persistence remains out of scope. BuildKit `/session` routing is intentionally not addressed in this commit: the protocol opens `/session` before the matching `/build` sets the platform, so a session-UUID lookup cannot be honored at session-open time. A follow-up needs either lazy session forwarding or a pending-session buffer keyed by `X-Docker-Expose-Session-Uuid`. Same goes for per-role host port forwarding, which still uses `runtime.default_machine_name()` and needs a Runtime API for role → machine identity.

…rship consistent Two correctness fixes on the workload role registry, before native and rosetta resolve to distinct guest dockerds: - Short hex prefix lookup now collects every canonical that matches the requested prefix. If those canonicals agree on a role, that role is returned; if they disagree the registry returns `None` so callers fall back to the native default instead of silently choosing whichever HashMap iteration order surfaced first. - WorkloadRoleRegistry gained an `alias_owner` reverse map. `add_alias` and `rename_alias` now detach an alias from any previous owner before reassigning it, so forgetting the previous canonical can no longer delete a binding that has since been adopted by another canonical. `forget` honors the reverse map symmetrically. Duplicate alias adds for the same canonical are deduped. `query_param` docstring now flags that values are returned raw; the current callers only use ASCII-safe identifiers (`platform`, `name`), so percent-decoding stays deferred until something that may actually carry encoded bytes wants this helper.

PLAN.md step 3. Lift UtilityVmRole into arcbox-core (workload module) so it is the shared currency between the daemon, runtime, and Docker compatibility layer; arcbox-docker now re-exports it. Runtime gains three role-keyed accessors: - `ensure_role_ready(role)` — role-aware boot/ready hook. Both roles resolve to the existing default VM today; the rosetta branch diverges once the dual lifecycle lands. - `machine_name_for_role(role)` — machine name to address. - `guest_docker_vsock_port_for_role(role)` — dockerd vsock port. VsockConnector::connect_for(role) consumes the new lookups so the machine + port chosen for every connection follows the requested role. The Docker handler and fallback paths now drive the role-aware ensure hook and bubble the role into ensure errors, so a failure on the rosetta VM surfaces as such instead of a generic native error. No behavior change yet — the lookups still alias both roles to the default VM. The seam is now in place for the dual-VM lifecycle slice.

PLAN.md step 4 prep. Threads the machine name and persistent dockerd data image through VmLifecycleManager so a single struct can drive either the default native machine or a secondary VZ Rosetta machine. - New for_machine() constructor that takes the machine name and the docker.img filename. The existing new() delegates to it with the default values so all current callers (Runtime, daemon startup) behave identically. - Internal create_default_machine / start_default_vm / wait_for_agent / idle monitor / event payloads now use self.machine_name instead of DEFAULT_MACHINE_NAME so a rosetta lifecycle reports the right machine in events and logs. - data_image_path() yields the absolute path of this manager's docker.img, replacing the hard-coded DOCKER_DATA_IMAGE_NAME join in create_default_machine. No behavior change: callers still construct one lifecycle on the "default" machine. Adding a second VmLifecycleManager for the rosetta role and wiring it into Runtime is the next slice.

PLAN.md step 4 prep. Replaces the hard-coded `VmBackend::Hv` in `VmManager::build_vmm_config` with a per-machine backend so the rosetta utility VM can run on VZ while the native one keeps running on HV. - `VmConfig` gains a `backend: VmBackend` field defaulting to `Hv`. `build_vmm_config` now reads from it. - `MachineConfig` gains matching `backend` and `enable_rosetta` fields so callers can set them at create-time. `MachineManager::create` threads both into the underlying `VmConfig`. - `VmLifecycleConfig` gains `backend` so each per-role lifecycle decides its own backend. The `create_default_machine` path now feeds it (plus `default_vm.rosetta`) into the `MachineConfig` it builds. - `arcbox-core` re-exports `VmBackend` so downstream crates (`arcbox-api`'s gRPC machine handler) can construct `MachineConfig` without taking a direct `arcbox-vmm` dependency. Existing single-VM behavior is preserved: every constructor and default keeps `backend = Hv`. The rosetta lifecycle starts using `Vz` when the dual-VM Runtime wiring lands.

PLAN.md step 4. Runtime now builds a per-role lifecycle slot at construction time: - Native (HV) slot — always present, drives the existing default machine and stays the eager-started utility VM. - Rosetta (VZ) slot — present only on macOS Apple Silicon. Built with machine name "rosetta", docker-rosetta.img as its persistent data image, VmBackend::Vz, and default_vm.rosetta=true. The lifecycle is constructed up front so the slot's state is addressable, but the VM itself stays cold until `ensure_role_ready(Rosetta)` is first called by the Docker layer. Role-keyed accessors now read from the slot map: - `ensure_role_ready(role)` drives the role's container backend. - `machine_name_for_role(role)` / `guest_docker_vsock_port_for_role(role)` return the slot's machine name and dockerd port. - `lifecycle_for_role(role)` exposes the per-role lifecycle for diagnostics and future shared-control-plane wiring. - `role_is_distinct(role)` lets callers tell whether a role has its own slot or is aliasing onto native on this platform. The pre-existing `vm_lifecycle` / `container_backend` fields are kept and pinned to the native slot so the daemon-wide flows (Kubernetes, shutdown) keep behaving identically. When a role is not configured on the host (e.g. rosetta on non-Apple-Silicon), the slot lookup falls back to native so the Docker layer keeps working as a single-VM setup. Daemon startup wait-for-resources still waits only on the native docker.img; per-role XPC holder handling lands in the shared control plane slice.

PLAN.md step 5 (partial). Replaces the single inbound listener slot in Runtime with a per-machine map so each utility VM owns its own listener, then teaches the Docker handler to bind a container's published ports against the role the container was created on. - Runtime now holds `inbound_listeners: HashMap<String, InboundListenerManager>` keyed by machine name and tracks the per-container rules as `(machine_name, rules)` so teardown reaches the correct listener even when the container migrated roles. `start_port_forwarding_macos`, `stop_port_forwarding_by_id`, and `stop_port_forwarding_all` follow the per-machine map. - The Docker `setup_port_forwarding_from_inspect` path now resolves the machine name via `runtime.machine_name_for_role(role)` instead of `default_machine_name`, so an `amd64` container running on the rosetta VM lands on the rosetta bridge's listener. Existing single-VM deployments are unaffected: when only the native slot is configured every container ends up on the same machine, the old single-listener behavior reduces to one map entry.

…roles PLAN.md step 5. Two host-side coordination changes needed before a real dual VM deployment is safe. - daemon `wait_for_resources` now scans every persistent dockerd image owned by a configured utility VM role (native `docker.img`, rosetta `docker-rosetta.img`) so a stale VZ XPC holder on either image is drained before `init_runtime` brings up either VM. Same 10-second bound applies per image. - Docker handler `ensure_role_ready` refuses requests for a role that is not configured on this host (e.g. Rosetta on non-Apple-Silicon) with a clear platform-specific error rather than silently falling back to native. Silent fallback would land a `linux/amd64` workload on the HV native VM that cannot translate x86_64, with no useful diagnostic. The native default remains the fallback for Rosetta requests that fail open elsewhere; this only short-circuits the case where Rosetta is definitively unsupported by the host.

PLAN.md step 7. Compose-managed containers carry `com.docker.compose.project` on every service; ArcBox now uses that label to pin every service in a project to the same utility VM role so DNS, port forwarding, and volume sharing remain coherent within a project. - `WorkloadRoleRegistry` gains `project_role(name)` and `record_project(name, role)`. Bindings are sticky across compose up/down cycles to keep group routing predictable. - `UtilityVmRoleExt::can_host(platform)` codifies which roles can accept which platforms: rosetta hosts both arm64 and amd64; native refuses amd64. Used by the create handler to decide whether the next service in a project is compatible. - `extract_compose_project(body)` reads the project label from a container-create payload. - `create_container` now: 1. Parses the routing decision from platform metadata. 2. Reads the compose project label. 3. If the project is already bound, uses that role when compatible and rejects with a 400 ("mixed-backend compose projects are not supported") otherwise. 4. If not yet bound, records the first service's role as the project's role. Containers without a compose label retain the per-container behavior.

PLAN.md step 2 follow-up. The Docker CLI opens `/session` before it sends the matching `/build` that carries platform metadata, so the session role cannot be derived at session-open time. ArcBox forwards `/session` to the native (HV) utility VM by default; an `amd64` build that needs Rosetta-side BuildKit features will not see this session and side channels (secrets, ssh, build mounts) will fail. Adds a doc comment explaining the limitation and a debug log of the session UUID so operators can correlate `/session` and `/build` forwarding. Routing both endpoints together requires lazy session forwarding keyed by `X-Docker-Expose-Session-Uuid`; left as a follow-up rather than landed here, since correctly buffering the HTTP/1.1 upgrade until `/build` arrives is substantial work that deserves its own change.

…after daemon restart Closes the two items previously deferred under PLAN.md step 2. BuildKit /session role routing: - `WorkloadRoleRegistry` gains `wait_for_role(key, max_wait)` backed by a `tokio::sync::Notify`; `record(...)` fires the signal. - `build_image` records `X-Docker-Expose-Session-Uuid → role` so the parallel `/session` request can be routed coherently. - `session()` reads the same UUID and parks on `wait_for_role` for up to 30 seconds (matches BuildKit's own session-handshake timeout), forwarding the upgrade to the role declared by `/build`. Both ordering races (`/build`-first or `/session`-first) resolve correctly. On timeout, `/session` forwards to native so the user sees a BuildKit-level error rather than a hung connection. Cross-restart durability via lazy guest-probe rebuild: - `resolve_container_role` and `resolve_role_from_uri` now treat a registry miss as a recovery signal: they probe each configured role's guest dockerd with `GET /containers/{id}/json`, accept the first 200 as the workload's role, and re-record it. Native is always probed first because it's already up; the Rosetta probe triggers lazy startup on first miss after a restart, which is exactly the recovery behavior we want for surviving rosetta workloads. - No on-disk schema is introduced; correctness is recovered from the guest dockerds, which already persist their own container state. - The dead `proxy_upgrade()` helper is dropped — all upgrade paths now go through `proxy_upgrade_to_role`. Tests cover the three `wait_for_role` paths (cache hit, late record wakeup, timeout), bringing the docker-lib suite to 110 passing.

Closes the routing-correctness gaps surfaced by review: the Missing/Ambiguous conflation, the first-hit-wins rebuild, and the silent /session timeout fallback. Every place that previously collapsed "ambiguous" into "missing" and quietly fell back to native now surfaces a Docker-compatible 4xx instead. WorkloadRoleRegistry::lookup returns a new WorkloadRoleLookup {Found(role), Missing, Ambiguous} tri-state. Cross-VM short-ID collisions report Ambiguous (previously None). wait_for_role propagates the same shape, and BuildKit /session no longer routes a timed-out session to native — it returns 400 with a clear message naming the UUID, since silently attaching the upgraded gRPC stream to the wrong VM would just leak the misroute into BuildKit's session layer. resolve_container_role / resolve_exec_role / resolve_role_from_uri all return Result<UtilityVmRole>. Ambiguous identifiers (registry prefix collision *or* multi-guest probe match) surface as 409 Conflict via a new ambiguous_workload_error helper. Macros and the catch-all proxy fallback propagate the Result via `?`. rebuild_container_role_from_guests now probes Native AND Rosetta unconditionally and collects every hit before deciding. Zero hits => Missing, one hit => Found, more than one => Ambiguous. Returning on the first match was a silent-misroute bug for cross-VM short-ID collisions after a daemon restart — fixed. Compose project scheduling docs/comments updated to call the current behavior what it actually is: "first-service-wins binding with mixed-VM rejection". PLAN's stronger "any amd64 service → whole project rosetta" requires reading the full compose file before any service is created, which is out of scope for a per-API-request routing layer; the limitation is documented in code and in PLAN.md rather than papered over. All 110 lib tests pass; existing prefix-collision test renamed to cross_role_prefix_collision_is_ambiguous and updated to assert the new Ambiguous outcome.

Host-driven validation script + README covering PLAN.md Decision Gates A/B/C: arm64 native + amd64-via-FEX64 `uname -m`, representative amd64 images (musl/glibc/busybox/node/python/go/apt), exit-status and stderr propagation, BuildKit amd64 build, and a mixed arm64/amd64 Compose project staying in the single HV VM. Records an environment header (macOS version, guest kernel, FEX version, binfmt status, arcbox commit) for reproducibility, and tags every check PASS/FAIL/UNSUPPORTED/INFRA so a real gate failure is distinguishable from a setup problem. The harness is executed by a developer on Apple Silicon — it cannot run where the daemon can't boot a VM. README documents the FEX-at-/arcbox/bin/FEX contract (registered by the boot-assets rootfs init) and the hardware-TSO go/no-go probe.

… (ABX-375) Pivots the default runtime to a single HV utility VM. Platform no longer selects a utility VM role; it selects an in-guest translator: - routing.rs: `RuntimeTranslator { Native, Fex64 }`; amd64 → Fex64, arm64/unspecified → Native. `RoutingDecision` carries the translator and always resolves the workload to the single HV VM (`utility_vm()` → Native). `is_admissible(decision, fex64_available)` is the fail-closed gate. Drops the dual-VM-only helpers (`utility_vm_role`, `UtilityVmRoleExt::can_host`, `extract_compose_project`). - handlers: `require_amd64_runtime` rejects amd64 with a clear "requires FEX64 in the HV guest" error when FEX64 is not provisioned — never silently falling back to VZ/Rosetta or QEMU. `create_container` drops Compose project-role binding (single VM needs none) and the per-request build/session role machinery is removed; `/build` and `/session` go to the one HV VM. The registry rebuild probes only the HV VM so a lifecycle lookup can never boot the VZ build backend. - core: `Runtime::amd64_runtime_supported()` returns whether `<data_dir>/bin/FEX` (guest `/arcbox/bin/FEX`) is present — the same artifact whose presence makes the boot-assets rootfs init register the x86_64 binfmt handler, so host admission and guest registration share one signal. - workload.rs: drop the Compose-project and BuildKit `/session` role-sync machinery (unnecessary in single-VM); the registry now only maps IDs/aliases → role and fails closed on ambiguity. VZ/Rosetta is demoted, not deleted: the role enum, slot, and ABX-374 machinery remain as the preserved fallback / future explicit build backend, but the runtime path never selects Rosetta or boots the VZ VM. Routing/admission unit tests updated (amd64 → fex64 translator, amd64 fail-closed without FEX64, native always admissible).

Running the harness against a live arcbox daemon exposed a misclassification: amd64 `exec format error` (no x86_64 binfmt handler) and the ABX-375 fail-closed error were tagged as a Gate-A FAIL, which per the goal would wrongly trigger "resume ABX-374". That is the FEX64-*unavailable* state (interpreter not provisioned), not a FEX64 gate failure. Only FEX64 actually running and mis-executing (wrong arch / crash) is a real FAIL. The harness now: - captures stderr and distinguishes "not provisioned" (exec format error / requires-FEX64 / binfmt / missing interpreter) → INFRA, sets an `amd64_blocked` flag; - reserves FAIL for FEX64 running but returning the wrong result; - reports RESULT: BLOCKED (exit 2) with explicit "do not resume ABX-374 on this basis" guidance when amd64 is unprovisioned; - applies the same distinction to the Gate B image matrix. Also corrects the README Docker context endpoint to unix:///<home>/.arcbox/run/docker.sock. Verified against the live `arcbox` context: arm64 PASS, amd64 now INFRA (FEX64 not provisioned) instead of a false FAIL.

…t path ArcBox installs boot-manifest runtime binaries to <data_dir>/runtime/bin (guest /arcbox/runtime/bin, via prepare_binaries), the same set the guest runs dockerd/containerd from. boot-assets v0.5.8+ registers the FEX binfmt handler at /arcbox/runtime/bin/FEX. Align the host-side amd64_runtime_supported() fail-closed gate to <data_dir>/runtime/bin/FEX (was <data_dir>/bin/FEX, which prepare_binaries never populates), and fix the amd64-unavailable error message to name the correct path. Pairs with boot-assets fix/fex-static-runtime-paths (static FEX + /arcbox/runtime/bin/FEX binfmt path).

…provisioned Match the FEX runtime path to the boot-assets binfmt registration (/arcbox/runtime/bin/FEX). Also short-circuit to a BLOCKED summary when Gate A finds amd64 unprovisioned, so the runtime/build/compose amd64 sub-checks don't emit misleading FAIL lines and the verdict stays "decision pending" rather than falsely triggering "resume ABX-374".

Bump [boot] to v0.5.9, which ships the statically-linked FEX64 x86_64 interpreter (FEX/FEXServer, arm64) staged at bin/FEX/... in the manifest. The host syncs it to <data_dir>/runtime/bin/FEX, shared into the guest as /arcbox/runtime/bin/FEX — the path the rootfs init registers as the x86_64 binfmt_misc handler and that amd64_runtime_supported() probes. Static linking makes FEX usable as a binfmt interpreter inside OCI container mount namespaces (no external loader/library closure to resolve against the container rootfs). Update manifest_sha256 to the published v0.5.9 manifest so the boot-time integrity check passes. Also correct two stale comments that listed the synced runtime binaries without FEX: prepare_binaries downloads every manifest binary, and FEX is optional (absent FEX does not block boot; amd64 then fails closed).

On Apple Silicon under HVF (observed on M5 Max / macOS 26) the guest is advertised SVE feature bits it cannot execute: /proc/cpuinfo reports `sve2 sve2p1 svebf16 ...`, yet a single SVE instruction (e.g. `rdvl`) SIGILLs. Userspace that trusts HWCAP_SVE then crashes — glibc's ifunc-selected SVE memcpy/str* and FEX64's SVE paths. Append `arm64.nosve arm64.nosme` to the default machine's kernel cmdline so the guest kernel ignores the phantom features and userspace falls back to NEON. Verified live: guest then reports 0 SVE features and `rdvl` is no longer selected by HWCAP-gated code. Note: this alone does NOT make FEX64 amd64 work — FEX-2605 also emits unconditional explicit SVE that traps regardless (see ABX-375 handoff).

linear-code · 2026-05-31T10:12:12Z

ABX-375

v0.5.9 shipped a FEX that SIGILLs on Apple Silicon (compiler-emitted SVE) and required a FEXServer unreachable in container namespaces. v0.5.10 ships the fixed static-pie FEX (no SVE codegen, runs standalone), so amd64 containers route through FEX64 and run. Manifest verified: FEX present (arm64, FEX-2605), no FEXServer.

514f36b appended `arm64.nosve arm64.nosme` to the default machine's kernel cmdline on the theory that HVF advertised the guest phantom SVE feature bits it could not execute. That diagnosis was wrong: the real cause was the FEX binary being built with `-mcpu=native` on an SVE-capable host, so FEX's own codegen emitted unconditional SVE that trapped. That is fixed in the FEX build shipped in boot-assets 0.5.10 (ab3a218). With the root cause fixed at the build level, the cmdline workaround is dead weight — and unconditional (no model/macOS-version/flag gate), so it would silently force NEON fallback and lose SVE-backed glibc ifuncs and FEX translation paths on hardware where SVE works correctly.

f4387f9 documented the FEX probe at <data_dir>/bin/FEX; f019567 moved the actual check and error string to <data_dir>/runtime/bin/FEX (the path prepare_binaries populates) but missed this doc comment. Align it.

The 0.5.10 FEX build carries a small patch that strips the FEXServer requirement and runs purely as a binfmt_misc interpreter, so no server process needs to be reachable across container mount namespaces. Drop the stale "FEX/FEXServer" pairing from the boot-asset comments, and correct the harness to probe the actual binary at /arcbox/runtime/bin/FEX (not the upstream FEXInterpreter name).

…est setup_fex() The harness README credited a guest-agent setup_fex() for the x86_64 binfmt_misc registration, but no such function exists in this repo — the only binfmt code in arcbox-agent is for Rosetta. Per the boot-assets repo, the rootfs /sbin/init trampoline checks for /arcbox/runtime/bin/FEX and registers the handler with POCF flags. Fix the description and the scope note (runtime.rs already described it correctly).

The default-VM drift check compared the persisted kernel only against the boot-asset version string, so a `--kernel` override that kept the same boot-asset version was ignored and the stale VM was reused with its old kernel. Compare the persisted kernel against the resolved desired kernel path (custom `--kernel` override, else the versioned cache path) so a kernel change is detected even when the boot-asset version is unchanged.

Apple SME cores (e.g. M4 Pro) advertise SME — and SME-derived SVE — to the guest, but plain non-streaming SVE cannot execute on this silicon (a bare `rdvl` SIGILLs on the host). FEX's x86-64 JIT detects the feature by reading ID_AA64PFR1_EL1/ID_AA64PFR0_EL1 directly and emits SVE that traps, so amd64 containers SIGILL. `arm64.nosve` does not help — it only clears HWCAP, which FEX ignores. Clear the SME field [27:24] of the guest's ID_AA64PFR1_EL1 at vCPU init using the get/modify/set_sys_reg pattern QEMU's HVF backend uses to sanitize guest ID registers, presenting a NEON-only guest like Virtualization.framework. This eliminates the SIGILL; the remaining amd64 allocator fault is unrelated.

…le fields Drift detection previously checked only cpus/memory and the kernel path inline, so cmdline changes (e.g. the guest docker vsock port, or an `arm64.nosve` toggle) silently reused the stale VM. Extract the kernel + cmdline resolution into a shared `resolve_desired_boot` used by both `create_default_machine` and the drift check, so the comparison can never diverge from what would be created, and add `machine_drift_reason` as the single place that compares every overridable persisted field (cpus, memory_mb, kernel, cmdline). Covered by a regression test.

… SME cores" This reverts commit c89de9d.

PeronGH · 2026-06-09T13:37:57Z

Root cause found — it's `randomize_va_space`, not SVE

The Gate-A blocker (linux/amd64 via FEX64) is resolved, and the "phantom SVE / SIGILL" diagnosis in the description was a red herring — those SIGILLs were collateral from a deeper allocator failure.

Root cause: the guest kernel ships CONFIG_COMPAT_BRK=y, which forces kernel.randomize_va_space=1 at boot (mmap/stack/vDSO randomized, brk/heap not). FEX's x86-64 allocator can't lay out its VA space under =1 and fails non-deterministically with Failed to map VMA region → SIGSEGV. The SIGILLs only surfaced on runs that got far enough; once the allocator is fixed they're gone.

How it was isolated: OrbStack runs the same FEX binary cleanly — its guest has randomize_va_space=2; ours had =1. Toggling the sysctl at runtime: 1 → fails, 2 → works (x86_64, 10/10) on both a 52-bit (LPA2) and a 48-bit kernel. So VA width and page size were not the cause — the arm64.nosve and VA_BITS=48 detours were dead ends.

Fix: one line in the guest kernel config — disable COMPAT_BRK so it boots randomize_va_space=2 (matching OrbStack / stock distros): arcboxlabs/kernel#7

Result: with a kernel built from that PR, docker run --platform linux/amd64 alpine uname -m → x86_64, deterministic, no SVE masking and no manual sysctls. Gate A passes.

What changes on this branch: keep the drift-detection fixes (recreate the default VM when kernel / cmdline / cpus / memory drift). The VMM SME-mask and the guest VA_BITS=48 kernel change explored during debugging are unnecessary and were reverted/dropped — COMPAT_BRK is the sole fix.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR advances ABX-375 by routing linux/amd64 runtime workloads through FEX64 (binfmt_misc) inside the single HV utility VM, while making the proxy/lifecycle stack role-aware so per-workload follow-ups consistently hit the same utility VM role (supporting the preserved ABX-374 dual-VM fallback).

Changes:

Add a reproducible local FEX64 validation harness (tests/fex/*) and update boot-assets pin in assets.lock.
Introduce role-aware routing + workload role registry in arcbox-docker (container/exec/build/fallback proxying now selects a utility VM role deterministically).
Extend arcbox-core VM lifecycle/runtime to support per-role machine identity (machine name, data image) and per-machine hypervisor backend selection (HV vs VZ).

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
tests/fex/validate-fex64.sh	Adds a local gate-based harness to validate FEX64 behavior for `linux/amd64` under HV.
tests/fex/README.md	Documents how to run/interpret the FEX64 harness and the decision gates.
assets.lock	Updates boot-assets version/manifest pin used to provision guest runtime binaries.
app/arcbox-docker/src/workload.rs	Adds an in-process registry mapping workload IDs/aliases to utility VM roles with ambiguity handling.
app/arcbox-docker/src/routing.rs	Introduces platform parsing and a translator concept (native vs FEX64) for single-HV routing decisions.
app/arcbox-docker/src/proxy/upload.rs	Adds role-selectable streaming upload forwarding into guest dockerd.
app/arcbox-docker/src/proxy/upgrade.rs	Adds role-selectable upgrade proxying (exec attach/session) into guest dockerd.
app/arcbox-docker/src/proxy/mod.rs	Extends the guest connector trait to support role-aware connections.
app/arcbox-docker/src/proxy/forward.rs	Adds role-selectable buffered/streaming proxy helpers to guest dockerd.
app/arcbox-docker/src/proxy/fallback.rs	Routes unmatched Docker API requests by resolving role from URI (container/exec IDs).
app/arcbox-docker/src/proxy/connector.rs	Implements `connect_for(role)` using per-role machine name + vsock port.
app/arcbox-docker/src/lib.rs	Exposes new `routing` and `workload` modules.
app/arcbox-docker/src/handlers/mod.rs	Adds role extraction/resolution from URIs plus role-aware proxy helpers and fail-closed admission for amd64 runtime.
app/arcbox-docker/src/handlers/exec.rs	Records exec IDs to role on exec-create and routes exec follow-ups to the recorded role.
app/arcbox-docker/src/handlers/container.rs	Adds fail-closed amd64 admission, records container IDs/names to role, and routes lifecycle/networking by role.
app/arcbox-docker/src/handlers/build.rs	Routes builds through HV and fails closed for amd64 when FEX64 is absent.
app/arcbox-docker/src/api.rs	Stores the workload role registry in shared app state.
app/arcbox-daemon/src/startup/mod.rs	Expands startup resource-wait to scan both native and rosetta dockerd images.
app/arcbox-core/src/workload.rs	Defines shared `UtilityVmRole` enum and helpers for cross-crate role identity.
app/arcbox-core/src/vm_lifecycle/mod.rs	Adds per-machine identity/config, drift detection based on resolved boot params, and backend selection.
app/arcbox-core/src/vm.rs	Makes VM backend selection a per-machine config property instead of hardcoding HV.
app/arcbox-core/src/runtime.rs	Introduces per-role slots (native + optional rosetta) and per-machine inbound port-forwarding state.
app/arcbox-core/src/machine.rs	Adds backend + rosetta exposure flags to machine creation config.
app/arcbox-core/src/lib.rs	Re-exports `VmBackend` and `UtilityVmRole`.
app/arcbox-core/src/boot_assets.rs	Updates docs for preparing all runtime binaries in the boot manifest (including optional FEX).
app/arcbox-api/src/grpc/machine.rs	Updates machine creation requests to populate new backend/rosetta fields.

+fn ambiguous_workload_error(id: &str) -> DockerError {
+    DockerError::Conflict(format!(
+        "workload identifier '{id}' is ambiguous: it matches multiple workloads. \
+         Use the full canonical container ID."
+    ))
+}


+    /// Ensures the utility VM for `role` is running and ready.
+    ///
+    /// Drives the per-role lifecycle so the native and rosetta VMs are
+    /// reachable independently. If `role` is not configured on this host
+    /// (e.g. rosetta on non-Apple-Silicon) the native slot answers as a
+    /// degradation path — the dockerd connector still works, but the
+    /// workload runs on HV instead of VZ+Rosetta.


+    pub async fn ensure_role_ready(&self, role: UtilityVmRole) -> Result<u32> {
+        self.slot_for(role).container_backend.ensure_ready().await
+    }


+    /// Returns the role slot, falling back to native if `role` is not
+    /// configured on this host.
+    fn slot_for(&self, role: UtilityVmRole) -> &RoleSlot {
+        if let Some(slot) = self.role_slots.get(&role) {
+            return slot;
+        }
+        self.role_slots
+            .get(&UtilityVmRole::Native)
+            .expect("Native role slot must always be present")
+    }


+    /// If the alias is currently owned by a different canonical (e.g. a
+    /// previous container with the same name that has not yet been
+    /// forgotten), the alias is detached from the previous owner first so
+    /// the old owner's alias list never points to a key that now resolves
+    /// to a different role.


+        detach_alias_from_previous_owner(&mut guard, &alias);
+        guard.roles.insert(alias.clone(), role);
+        guard
+            .alias_owner
+            .insert(alias.clone(), canonical.to_string());


    async fn get_cid(&self) -> Result<u32> {
        self.machine_manager
-            .get_cid(DEFAULT_MACHINE_NAME)
+            .get_cid(&self.machine_name)
            .ok_or_else(|| CoreError::Machine("default machine has no CID".to_string()))
    }


+            match self.machine_manager.start(&self.machine_name).await {
                Ok(()) => {
                    tracing::info!("Default VM started successfully");


+# --- Gate A: basic viability ------------------------------------------------
+section "Gate A: basic viability"
+
+arch="$("${DC[@]}" run --rm --platform linux/arm64 alpine uname -m 2>/dev/null)"


+amd64_out="$("${DC[@]}" run --rm --platform linux/amd64 alpine uname -m 2>&1)"
+if [ "$amd64_out" = "x86_64" ]; then


greptile-apps · 2026-06-09T13:52:37Z

Greptile Summary

This PR (ABX-375) routes all Docker runtime containers through the single HV utility VM, using FEX (binfmt_misc) for linux/amd64 workloads instead of a separate VZ/Rosetta VM. VZ/Rosetta is retained as an opt-in build backend only. The PR is explicitly marked blocked at Gate A on M5 Max/macOS-26 due to phantom SVE in HVF guests causing FEX SIGILLs; the code is correct but the hardware/firmware layer is broken.

Routing pivot (routing.rs, handlers/): linux/amd64 requests are admitted only when <data_dir>/runtime/bin/FEX is present; amd64 workloads fail closed with a clear error rather than silently falling back to VZ/Rosetta. Native arm64 workloads are unaffected.
Lifecycle refactor (vm_lifecycle/mod.rs, runtime.rs): VmLifecycleManager gains a machine_name/data_image_filename seam via for_machine() so a second (Rosetta) lifecycle can coexist; drift detection is extended to cover kernel cmdline changes.
WorkloadRoleRegistry (workload.rs): New in-process registry tracks container/exec IDs → utility VM role; handles short-ID prefix lookup and ambiguity detection.

Confidence Score: 3/5

Safe to continue iterating on, but two issues should be resolved before the branch is considered production-ready: the Rosetta lifecycle shutdown gap and the test harness false-FAIL path.

The routing pivot, WorkloadRoleRegistry, and FEX admission gate are all well-implemented and tested. runtime::shutdown() tears down the Rosetta VM machine directly but never calls the Rosetta lifecycle manager's own shutdown, leaving its in-memory state stale and its cancellation token uncancelled — any subsequent call to lifecycle_for_role(Rosetta).shutdown() would then target the wrong machine name. The validate-fex harness exit-code and stderr tests emit FAIL (the resume-ABX-374 signal) for transient infrastructure errors, making the gate unreliable as a decision artifact.

app/arcbox-core/src/runtime.rs (shutdown/shutdown_force Rosetta lifecycle teardown) and tests/fex/validate-fex.sh (exit-code and stderr propagation gate checks).

Important Files Changed

Filename	Overview
app/arcbox-docker/src/routing.rs	New routing module: clean platform parsing, RoutingDecision, is_admissible gate, and query_param helper. All key cases covered by unit tests.
app/arcbox-docker/src/workload.rs	New WorkloadRoleRegistry: short-ID prefix lookup, alias tracking, and ambiguity detection are correct and well-tested.
app/arcbox-core/src/vm_lifecycle/mod.rs	Adds for_machine() seam and cmdline drift detection; shutdown() still uses hardcoded DEFAULT_MACHINE_NAME in graceful_stop (previously flagged), stale error string in get_cid (previously flagged).
app/arcbox-core/src/runtime.rs	Per-role slot map + macOS InboundListenerMap refactor look correct; amd64_runtime_supported() uses synchronous is_file() on async executor (previously flagged); Rosetta lifecycle not shut down cleanly.
app/arcbox-docker/src/handlers/mod.rs	New resolve_container_role, resolve_exec_role, require_amd64_runtime helpers are correct; fail-closed semantics properly implemented.
app/arcbox-docker/src/handlers/container.rs	create_container, start_container, remove_container correctly use require_amd64_runtime and record role bindings; exec ID cleanup gap flagged previously.
tests/fex/validate-fex.sh	Gate A/B/C harness is well-structured but exit-code and stderr propagation tests (lines 156-160) may emit FAIL for infrastructure errors, triggering the wrong architecture decision.
app/arcbox-docker/src/handlers/build.rs	build_image correctly gates on require_amd64_runtime; session handler correctly targets Native role only.

Comments Outside Diff (1)

app/arcbox-core/src/runtime.rs, line 652-705 (link)

Rosetta lifecycle not shut down through its own lifecycle manager

shutdown() calls self.vm_lifecycle.shutdown() for the native VM then stops remaining machines directly via machine_manager.graceful_stop. The Rosetta lifecycle manager (stored in role_slots[Rosetta]) is never told to shut down — its health-monitor cancellation token is never cancelled and its in-memory state stays at Running/Idle rather than transitioning to Stopped. Any subsequent call to lifecycle_for_role(Rosetta).shutdown() would then hit the pre-existing wrong-machine-name bug on an already-stopped VM. The same gap exists in shutdown_force. Iterating role_slots and calling each slot's lifecycle.shutdown() directly would keep lifecycle state and event publishing consistent.

_{Reviews (4): Last reviewed commit: "chore(vmm): drop redundant clone in drif..." | Re-trigger Greptile}

pullfrog

ℹ️ No critical issues — the findings are latent, scoped to the not-yet-routed Rosetta path. The Native runtime path (all that this PR exercises) is sound.

Reviewed changes — the ABX-375 pivot from multi-utility-VM routing to a single HV utility VM that runs linux/amd64 via in-guest FEX64, with VZ/Rosetta demoted to an opt-in build backend.

New routing.rs placement layer — WorkloadPlatform parsing maps to a RuntimeTranslator (Native/Fex64); RoutingDecision::utility_vm() always returns UtilityVmRole::Native, and is_admissible() fails closed for amd64 when FEX64 is absent rather than falling back to VZ/QEMU.
Per-role VM abstraction in Runtime — adds RoleSlot/role_slots, ensure_role_ready, machine_name_for_role, lifecycle_for_role, and amd64_runtime_supported. The Rosetta (VZ) slot is constructed lazily on macOS aarch64 but its VM only boots on ensure_role_ready(Rosetta), which nothing currently calls.
VmLifecycleManager parameterized per machine — new for_machine(name, data_image) constructor; new() delegates with DEFAULT_MACHINE_NAME/docker.img; DEFAULT_MACHINE_NAME replaced with &self.machine_name across the manager. Default-VM drift detection generalized via machine_drift_reason.
Role-aware Docker handlers — proxy_to_role, resolve_container_role, and a workload→role registry with alias tracking and fail-closed prefix-collision handling.
Per-machine inbound port forwarding — inbound_listeners keyed by machine name and inbound_rules carry the owning machine so teardown reaches the right listener.
Daemon resource wait — wait_for_resources now scans both docker.img and docker-rosetta.img (filtered by existence).
Assets + tooling — assets.lock bumped to boot-assets v0.5.10 (static FEX64); tests/fex/ validation harness + README added. The guest arm64.nosve/SME disable was added then reverted later in the branch (net no cmdline change).

ℹ️ `Runtime` shutdown never tears down the Rosetta role slot's lifecycle

The per-role refactor constructs a Rosetta VmLifecycleManager into role_slots on Apple Silicon, but Runtime::shutdown() / shutdown_force() only operate on self.vm_lifecycle (the native lifecycle). The Rosetta slot's lifecycle is never told to shut down, so its health-monitor task, state machine, and MachineStopped event are skipped on daemon teardown.

This is latent today: nothing routes to Rosetta (utility_vm() is always Native), so the Rosetta VM never boots and there is nothing to tear down. It becomes a real resource/teardown gap the moment the Rosetta route is re-enabled (ABX-374). shutdown()/shutdown_force() are outside this PR's diff, which is why this is in the body rather than inline.

Technical details

# Runtime shutdown skips non-native role slots

## Affected sites
- `app/arcbox-core/src/runtime.rs` — `Runtime::shutdown()` (~line 659) and `shutdown_force()` (~line 727) call only `self.vm_lifecycle.shutdown()` / `.force_stop()`.
- `role_slots[UtilityVmRole::Rosetta].lifecycle` is never shut down.

## Required outcome
- On runtime shutdown, every configured role slot's lifecycle manager is shut down (or force-stopped), not just the native one — so the health monitor is cancelled, state transitions to `Stopped`/`NotExist`, and `MachineStopped` is published per machine.

## Suggested approach (optional)
- Iterate `self.role_slots.values()` and call `lifecycle.shutdown()` / `force_stop()` on each, deduplicating the native slot which is already covered by `self.vm_lifecycle`. The generic step-3 "stop non-default machines" loop terminates the VM process but does not drive the lifecycle state machine.

## Open questions for the human
- Is Rosetta intended to stay a fully cold/dead path on this branch (handoff to ABX-374), or should the teardown wiring land here so the slot is safe to activate later?

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ Using Claude Opus ｜ 𝕏}

pullfrog · 2026-06-09T13:56:05Z

                    DEFAULT_MACHINE_NAME
                );
-                self.machine_manager.stop(DEFAULT_MACHINE_NAME)
+                self.machine_manager.stop(&self.machine_name)


The per-machine rename is incomplete in shutdown(): the graceful_stop(DEFAULT_MACHINE_NAME, …) call (line 1105) and both warning logs (lines 1112, 1119) still use the "default" constant, while these force-stop fallbacks correctly use &self.machine_name. For a non-native lifecycle (the rosetta slot), shutdown() would graceful-stop the native machine and only force-stop the right one — and the diagnostics would name the wrong machine.

Latent today (nothing calls the rosetta lifecycle's shutdown()), but it's a clear gap in the rename this branch otherwise completed.

Technical details

# shutdown() still hardcodes DEFAULT_MACHINE_NAME ## Affected sites - `app/arcbox-core/src/vm_lifecycle/mod.rs:1105` — `graceful_stop(DEFAULT_MACHINE_NAME, …)` - `app/arcbox-core/src/vm_lifecycle/mod.rs:1112`, `:1119` — `tracing::warn!` interpolate `DEFAULT_MACHINE_NAME` ## Required outcome - All four sites address `&self.machine_name`, consistent with the force-stop fallbacks at 1114/1122.

pullfrog · 2026-06-09T13:56:05Z

        if let Some(id) = extract_container_id(&uri) {
            let _ = state.runtime.ensure_vm_ready().await;
-            if let Some(body_bytes) = inspect_container_body(&state, &id).await {
+            if let Some(body_bytes) = inspect_container_body(&state, role, &id).await {


restart_container resolves role (line 201) but the best-effort wake on line 209 calls state.runtime.ensure_vm_ready(), which always wakes the native VM, while the subsequent inspect_container_body(&state, role, &id) proxies to the role's VM. The same pattern is in resolve_canonical_from_uri (line 531). Should be state.runtime.ensure_role_ready(role).await.

Latent today since every container resolves to Native, but it would wake the wrong VM once Rosetta is routed.

pullfrog · 2026-06-09T13:56:05Z

+            // Resolve role → machine/port via the runtime. Today both
+            // roles still alias to the default machine; once the dual VM
+            // lifecycle lands the rosetta branch returns its own machine
+            // name and dockerd port without any change here.


This comment is already inaccurate as of this PR: machine_name_for_role(Rosetta) returns "rosetta" (not the default machine) on macOS aarch64, since the rosetta slot is populated. The dual-VM lifecycle has landed; the comment describes the prior state. Consider dropping or correcting it.

This reverts commit c57c70f.

The interpreter has always been plain upstream FEX; FEX64 was an invented name. Renames RuntimeTranslator::Fex64, needs_fex64(), the fex64 tracing label, the fail-closed error message, and the validation script (validate-fex64.sh -> validate-fex.sh, with its 'requires fex64' match updated in the same change). Binary paths and the binfmt entry were already correct.

Pre-existing clippy::redundant_clone on the last use of `current`.

AprilNEA and others added 21 commits May 28, 2026 20:05

feat(docker): add utility VM routing seam

98f5f31

Amp-Thread-ID: https://ampcode.com/threads/T-019e68f7-9a35-745d-b9db-c8085864e5a7 Co-authored-by: Amp <amp@ampcode.com>

AprilNEA mentioned this pull request May 31, 2026

fix(fex): target apple-m1 to avoid SVE codegen (SIGILL on Apple Silicon) arcboxlabs/boot-assets#23

Open

PeronGH added 7 commits June 5, 2026 22:06

docs(docker): fix stale FEX path in require_amd64_runtime comment

a079567

f4387f9 documented the FEX probe at <data_dir>/bin/FEX; f019567 moved the actual check and error string to <data_dir>/runtime/bin/FEX (the path prepare_binaries populates) but missed this doc comment. Align it.

PeronGH added 2 commits June 9, 2026 20:11

Revert "fix(vmm): mask guest SME so FEX amd64 doesn't SIGILL on Apple…

b6b392c

… SME cores" This reverts commit c89de9d.

PeronGH marked this pull request as ready for review June 9, 2026 13:42

Copilot AI review requested due to automatic review settings June 9, 2026 13:42

Copilot AI reviewed Jun 9, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread app/arcbox-core/src/vm_lifecycle/mod.rs

Comment thread app/arcbox-core/src/runtime.rs

Comment thread app/arcbox-docker/src/handlers/container.rs

Copilot started reviewing on behalf of PeronGH June 9, 2026 13:53 View session

pullfrog Bot reviewed Jun 9, 2026

View reviewed changes

PeronGH added 3 commits June 10, 2026 19:39

chore(assets): bump boot assets to 0.5.11

c57c70f

Revert "chore(assets): bump boot assets to 0.5.11"

7ff6abd

This reverts commit c57c70f.

chore(assets): bump boot assets to 0.5.13

06309a2

PeronGH changed the title ~~ABX-375: All-in-HV + FEX64 runtime (BLOCKED on M5/HVF phantom SVE — handoff)~~ ABX-375: All-in-HV + FEX runtime Jun 10, 2026

PeronGH added 2 commits June 10, 2026 20:41

chore(vmm): drop redundant clone in drift-detection test

315715a

Pre-existing clippy::redundant_clone on the last use of `current`.

AprilNEA merged commit 684ce18 into master Jun 10, 2026
6 checks passed

AprilNEA deleted the feat/fex64-hv-runtime-plan branch June 10, 2026 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ABX-375: All-in-HV + FEX runtime#293

ABX-375: All-in-HV + FEX runtime#293
AprilNEA merged 35 commits into
masterfrom
feat/fex64-hv-runtime-plan

AprilNEA commented May 31, 2026

Uh oh!

linear-code Bot commented May 31, 2026

Uh oh!

PeronGH commented Jun 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

greptile-apps Bot commented Jun 9, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pullfrog Bot left a comment

Uh oh!

pullfrog Bot Jun 9, 2026

Uh oh!

pullfrog Bot Jun 9, 2026

Uh oh!

pullfrog Bot Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		amd64_out="$("${DC[@]}" run --rm --platform linux/amd64 alpine uname -m 2>&1)"
		if [ "$amd64_out" = "x86_64" ]; then

Conversation

AprilNEA commented May 31, 2026

ABX-375: All-in-HV + FEX64 for linux/amd64 — status & handoff

Status: ⛔ BLOCKED at Gate A on Apple Silicon (Apple M5 Max / macOS 26)

Root cause (diagnosed live)

Fixes attempted

Decision point

What's in this branch

Related resources

Reproduce the blocker

Uh oh!

linear-code Bot commented May 31, 2026

Uh oh!

PeronGH commented Jun 9, 2026

Root cause found — it's randomize_va_space, not SVE

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

greptile-apps Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pullfrog Bot left a comment

Choose a reason for hiding this comment

ℹ️ Runtime shutdown never tears down the Rosetta role slot's lifecycle

Uh oh!

pullfrog Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ABX-375: All-in-HV + FEX64 for `linux/amd64` — status & handoff

Root cause found — it's `randomize_va_space`, not SVE

greptile-apps Bot commented Jun 9, 2026 •

edited

Loading

ℹ️ `Runtime` shutdown never tears down the Rosetta role slot's lifecycle