Feat/hetzner deploy design by aleksei-okatiev · Pull Request #3 · test-IO/browserkube

aleksei-okatiev · 2026-05-07T12:08:51Z

This is just an dirty attempt to adjust browserkube to agenetic QA needs to be able to connect to tooling

Captures the agreed design for running browserkube on a single Hetzner Cloud VM beside testinator's docker-compose stack: k3s with ingress-nginx on hostPort 8080, build-on-host image pipeline imported into containerd, and a path forward to TLS / public auth without rebuilding the cluster. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

18 bite-sized tasks: scaffold deploy/hetzner/, write k3s/ingress/ufw bootstrap scripts, write Helm values + Makefile + README, run end-to-end on the Hetzner box including testinator tooling reconfig and smoke test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…already echo, comment) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…al ingress hosts) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Chart resolves images via .<component>.tag | default .Chart.AppVersion, so the global --set tag=... in Task 5/6 was a no-op. Update Task 6 to add a values-key column to the IMAGES table and generate per-component --set <key>.tag=<sha> flags. Update Task 5 verification expectations and add a git-archive workaround for local validation when the working tree carries broken in-flight chart edits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The IMAGES table was widened to 5 columns (added values-key) in the plan revision, but _build_one's awk stride was left at i+=4. That made `make image-<X>` fail for everything except the first row (browserkube). Tested with sidecar (row 2) and playwright-webkit (row 14): both now resolve correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

helm lint fails on pre-existing chart errors (Chart.yaml apiVersion mismatch, browserkube-quotas.yaml empty name) that aren't this work's to fix. With lint as a deploy prereq, `make deploy` would always fail before reaching helm upgrade. Drop lint from deploy's deps; keep it as a standalone advisory target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

In-flight playwright/MCP integration, committed onto the Hetzner deploy branch so the chart renders against our locally-built playwright-firefox / chromium / webkit images instead of upstream quay.io/browser/* (which predate browser.bind() and break MCP attach). Changes: - cmds/playwright-{firefox,chromium,webkit}/: new bind-server.js images. Each runs `<engine>.launch()` once, calls `browser.bind()`, and proxies the random-port endpoint through a stable :4444/ surface. - helm/charts/browserkube/templates/browserset.yaml: BrowserSet's `playwright:` section now reads .Values.playwright{Firefox,Chromium, Webkit} instead of hardcoding quay.io/browser/playwright-*. Quoting fixed: outer single-quotes around `'{{ ... | default "..." }}'` to avoid YAML's nested-double-quote parse error. - helm/charts/browserkube/values.yaml: added playwrightFirefox/Chromium/ Webkit blocks; backend tweaks for the new flow. - backend/cmd/browserkube/internal/playwright/{handler,proxy}.go: switch to stdlib httputil.ReverseProxy (Go 1.20+ handles WS upgrade cleanly); recording middlewares moved behind PLAYWRIGHT_RECORD env. - backend/cmd/browserkube/internal/api/handler.go: long-lived /playwright-server/{sessionID} attach endpoint that does NOT delete the pod on disconnect (fixes one-shot tear-down). - backend/cmd/sidecar/{main,sidecar_plugin,cdp_relay}.go: new CDP relay for Chromium DevTools forwarding. - operator/api/v1/browser_types.go + zz_generated + CRDs: BrowserConfig field on Browser CR; pod_utils mounts a per-session ConfigMap and sets BROWSER_ENGINE on the bind-server container. - skaffold.yaml + Taskfile.yaml: build the three new playwright images. - .gitignore: exclude backend/sidecar (build artifact). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two bugs surfaced by the first real `make images` run on the Hetzner box: 1. `.PHONY: $(addprefix image-,$(IMAGE_NAMES))` declared image-<name> targets explicitly. GNU Make treats those as separate empty targets and the pattern rule `image-%` no longer fires for them, so `make images` was a no-op for every component. Drop the explicit .PHONY for image-* and rely on the pattern rule alone. 2. `_build_one` constructed `-f $(REPO_ROOT)/$$df` even when df was a bare `Dockerfile` (no slash) and the actual path is `$(REPO_ROOT)/$$ctx/Dockerfile`. Add a case to resolve the dockerfile path either as repo-relative (when df contains a slash, e.g. `backend/Dockerfile`, `cmds/clipboard/Dockerfile`) or as context-relative (when df is just `Dockerfile`). After these fixes, all 14 images built and imported into k3s containerd on the test VM. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three issues surfaced during the first real `make deploy` on the Hetzner box: 1. The Helm chart does not include CRDs (Browser, BrowserSet, SessionResult). They live in operator/config/crd/bases/ and have to be applied separately. Add a `crds` target that does `kubectl apply -f operator/config/crd/bases/` and make `deploy` depend on it. 2. The chart's blob storage defaults to RustFS-backed S3 (blob.rustfs.enabled: true, BLOB_URL pointing at rustfs-svc). Without RustFS deployed, the backend crashes at startup. Override `blob.rustfs.enabled: false` and point blob.url + blob.archive.url at file:///tmp/... so the backend writes to a local filesystem inside the pod (ephemeral; sufficient for smoke testing). 3. The UI deployment template hardcodes `replicas: 1`, so `--set ui.replicaCount=0` is silently ignored — the UI pod always runs. Document the manual `kubectl scale` workaround in README. Also document the CRD lifecycle (CRDs outlive `helm uninstall`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Today the operator added x-server and vnc-server together gated on EnableVNC. That couples two unrelated concerns: - x-server (xvfb) is required for headed browsers — bind-server.js launches `browserType.launch({ headless: false })`, which exits with "Missing X server or $DISPLAY" without xvfb. - vnc-server is for human debugging via the noVNC viewer — MCP-driven Playwright clients never use it. Split them: - x-server is now added unconditionally so headed browsers always have a display. DISPLAY env on the browser container is unconditional too. - vnc-server is the only thing gated on EnableVNC. Also let the create-session API caller opt out of VNC: add CreateBrowserRequest.EnableVNC (*bool, optional). Defaults to true when omitted to preserve backwards compatibility — clients that don't need a debug VNC stream (e.g. testinator-tooling's MCP adapter) send "enableVNC": false to drop the sidecar. Result: tooling sessions get a 4-container pod (browser + sidecar + clipboard + x-server) instead of 5. Saves one container per session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Scheduler reservation was 1 CPU / 3 GiB per browser pod, but empirical RSS on testinator workloads is ~480 MiB / ~350 m CPU per chromium. The 3 GiB request meant a 16 GiB box could only schedule 4 concurrent browser pods even with plenty of free RAM (5–6 GiB sitting idle). Drop request to 0.5 CPU / 1 GiB; keep the 4 CPU / 4 GiB limit so heavy SPAs can still burst. Now ~10 concurrent browsers fit on the same VM without changing the actual memory consumption pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bind-server.js calls browserType.launch() immediately on startup, with `headless: false` requiring a working DISPLAY. The x-server runs as a sibling container in the same pod and starts in parallel — k8s gives no ordering guarantee. When the browser container starts a moment ahead of x-server, chromium/firefox/webkit exits with: Missing X server or $DISPLAY The platform failed to initialize. Exiting. Most pods don't hit this because xvfb is fast, but under memory pressure (more concurrent pods, slower scheduling) the race window widens. Observed two failures during 8-parallel load testing. Fix: before launch, poll for /tmp/.X11-unix/X<n> with a 30 s deadline. Same change for all three playwright bind-server images (chromium/firefox/webkit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Playwright 1.60.0 went GA today on npm and Microsoft Container Registry published v1.60.0-noble alongside it. The browser.bind() API we depend on is present in stable with only a positional-arg rename (name → title), which doesn't affect our usage. Switching unlocks two wins: 1. **Image size cut.** Was ~2.6 GB (chromium), ~1.7 GB (firefox/webkit) on `node:20-bookworm-slim` + manual apt-get + `playwright install --with-deps` + a `cp -r` of the browsers cache to dodge $HOME-path issues. Now ~1.4 GB / ~0.9 GB on `mcr.microsoft.com/playwright:v1.60.0-noble` — base ships Node 22, all three browsers preinstalled at /ms-playwright, all system libs, and PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 so the npm `playwright` install is metadata-only. 2. **Maintenance posture.** Off the daily-rolling alpha track and onto a stable, date-pinnable release. No more risk of the alpha API drifting underneath us. Dropped from each Dockerfile: - manual apt-get of libgtk/libnss3/libasound2/libegl1/libgl1/ffmpeg/xauth/… - `playwright install --with-deps <browser>` (browsers preinstalled) - `cp -r /root/.cache/ms-playwright /.cache/ms-playwright` (MCR's /ms-playwright is already accessible to non-root) Dropped from firefox/package.json: - playwright-firefox separate package (the `playwright` umbrella package + MCR's preinstalled binary covers it). Compatibility note: @playwright/mcp@0.0.71 still bundles the 04-27 alpha; mcp hasn't shipped a stable-1.60-paired release yet. Wire protocol within 1.60.x has held compatible across alphas, so server on stable + client on late-alpha should work; verify with a smoke test before promoting in spawner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

aleksei-okatiev and others added 21 commits May 4, 2026 13:46

chore(deploy): scaffold deploy/hetzner directory

f47f524

feat(deploy): add idempotent k3s bootstrap script

eb4b07d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(deploy): polish install-k3s.sh per code review (use _, helm-…

449ba4e

…already echo, comment) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(deploy): install ingress-nginx on hostPort 8080/8443

d625448

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(deploy): add ufw bootstrap closing 5432/6379, allowing 22/80/443

66908e3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(deploy): add Hetzner Helm values (local images, IfNotPresent, du…

0e175e5

…al ingress hosts) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(deploy): add Makefile with image build/import + helm orchestration

32b6f1e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(deploy): add reset script for clean redeploy

ad5c190

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(deploy): write Hetzner README quickstart and ops

cf3d604

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/hetzner deploy design#3

Feat/hetzner deploy design#3
aleksei-okatiev wants to merge 21 commits into
developfrom
feat/hetzner-deploy-design

aleksei-okatiev commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aleksei-okatiev commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant