Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
272 commits
Select commit Hold shift + click to select a range
bc8c04f
fix: included necessary secrets and generate files
erinversfeldcodes Apr 14, 2026
6a2955a
fix: set up other docker files properly
erinversfeldcodes Apr 14, 2026
d042c68
feat: use Nix by default
erinversfeldcodes Apr 14, 2026
922c17c
chore: use nix locally
erinversfeldcodes Apr 14, 2026
892ded0
fix: remove duplicate key
erinversfeldcodes Apr 14, 2026
463bae1
chore: fix linting
erinversfeldcodes Apr 14, 2026
f4f7d18
fix: small tweaks
erinversfeldcodes Apr 14, 2026
3ce260a
fix: include missed config update
erinversfeldcodes Apr 14, 2026
4a75fc7
fix: more small tweaks
erinversfeldcodes Apr 14, 2026
6daa5df
fix: small tweaks
erinversfeldcodes Apr 15, 2026
35e31c3
fix: add accept-key and ignore to all three image scans
erinversfeldcodes Apr 15, 2026
254e116
fix: use env vars instead of accept-key and ignore inputs
erinversfeldcodes Apr 15, 2026
2638fc3
fix: make sure e2e browsers installed correctly
erinversfeldcodes Apr 15, 2026
89485ef
fix: adapt config rather than switching envs
erinversfeldcodes Apr 15, 2026
404a757
fix: set home dir correctly
erinversfeldcodes Apr 15, 2026
9f193ab
fix: small tweaks
erinversfeldcodes Apr 15, 2026
ce7c6a1
fix: install /usr/bin/dockle via .deb
erinversfeldcodes Apr 15, 2026
3d7b806
fix: use --ignore-files flag
erinversfeldcodes Apr 15, 2026
6327d12
feat: add act for action debugging
erinversfeldcodes Apr 15, 2026
3efd119
feat: refine act configuration
erinversfeldcodes Apr 15, 2026
624be94
fix: ignore known issues
erinversfeldcodes Apr 15, 2026
fe89000
fix: use correct conifg
erinversfeldcodes Apr 15, 2026
0a19dfa
fix: ignore warnings
erinversfeldcodes Apr 15, 2026
bd67801
fix: experiment with another approach to compensate for shell
erinversfeldcodes Apr 15, 2026
4c30362
fix: configure dialyzer
erinversfeldcodes Apr 15, 2026
81d534d
refactor: only run the e2e tests against a real stack, not the local …
erinversfeldcodes Apr 15, 2026
c105dcf
fix: set correct dir for dialyzer
erinversfeldcodes Apr 15, 2026
0603c64
fix: debug coveralls
erinversfeldcodes Apr 15, 2026
f46c7ba
fix: build frontend for necessary integration in backend tests
erinversfeldcodes Apr 15, 2026
b6b746e
fix: update ci to run gen-elm-proto.sh first
erinversfeldcodes Apr 16, 2026
7097396
fix: configure db for migrations
erinversfeldcodes Apr 16, 2026
993ca9b
fix: extract SQL from migrations files for squawk
erinversfeldcodes Apr 16, 2026
9cfd677
fix: filter out interpolated execute blocks
erinversfeldcodes Apr 16, 2026
dea7848
fix: add flyctl to setup
erinversfeldcodes Apr 16, 2026
21601bb
fix: add termination symbol to extracted sql
erinversfeldcodes Apr 16, 2026
aa7346f
fix: fixed property test generator to avoid FilterTooNarrowError
erinversfeldcodes Apr 16, 2026
9b4d6b1
fix: assorted tweaks
erinversfeldcodes Apr 16, 2026
913d788
fix: use correct flag
erinversfeldcodes Apr 16, 2026
d8f3176
fix: complete setup
erinversfeldcodes Apr 16, 2026
835a1e7
feat: set up squawk locally
erinversfeldcodes Apr 16, 2026
0363e75
fix: add api key
erinversfeldcodes Apr 17, 2026
26eab05
fix: gen proto earlier
erinversfeldcodes Apr 17, 2026
c2998ea
chore: debug failure
erinversfeldcodes Apr 17, 2026
a7e57be
fix: move COPY apps/core/priv/static to a separate layer that always …
erinversfeldcodes Apr 17, 2026
d7d9086
chore: debug why fly isn't receiving everything when built in the pip…
erinversfeldcodes Apr 17, 2026
baf5b50
chore: temporarily disable previous steps
erinversfeldcodes Apr 17, 2026
8d43f92
fix: includ -L o avoid semgrep path-traversal false positives
erinversfeldcodes Apr 17, 2026
ca71338
fix: ingest modal secrets
erinversfeldcodes Apr 17, 2026
9ad96d4
chore: debug inability to capture modal url
erinversfeldcodes Apr 17, 2026
f845c70
fix: handle split URL
erinversfeldcodes Apr 17, 2026
a35bf7b
fix: strip the Unicode tree drawing characters
erinversfeldcodes Apr 17, 2026
224f2d0
refactor: parse with python instead
erinversfeldcodes Apr 17, 2026
f97ca5c
fix: pass secrets to fly
erinversfeldcodes Apr 17, 2026
cb6fe1d
fix: add missing vars
erinversfeldcodes Apr 17, 2026
face52c
fix: correct warm up check
erinversfeldcodes Apr 17, 2026
79ef716
fix: force IPv4 via curl -4 flags in the warmup and pin /etc/hosts to…
erinversfeldcodes Apr 18, 2026
2396ac3
fix: add a wait-for-edge-ready loop before the warmup's auth attempt
erinversfeldcodes Apr 18, 2026
dfef133
fix: enable ipv4 on fly
erinversfeldcodes Apr 18, 2026
b8dc3a9
doc: plan out inclusion of merge to main workflow
erinversfeldcodes Apr 18, 2026
3c8e227
feat: add route-grouping plug for metric tagging
erinversfeldcodes Apr 18, 2026
d013419
feat: emit fuse state gauge every 10s
erinversfeldcodes Apr 18, 2026
74832ad
feat: emit [:stacks, :upload, :terminal] on status transitions
erinversfeldcodes Apr 18, 2026
94c1c4b
feat: require auth on /internal/metrics
erinversfeldcodes Apr 18, 2026
3ba1e03
feat: enable destructive squawk rules with test harness
erinversfeldcodes Apr 18, 2026
fc5a184
feat: migration linter with @breaking_ok annotation
erinversfeldcodes Apr 18, 2026
ad15a82
feat: schema diff with DB_BREAKING_LABEL bypass
erinversfeldcodes Apr 18, 2026
7a1345d
feat: add migration-safety CI job
erinversfeldcodes Apr 18, 2026
bd8c79d
chore: temporarily disable deploy-preview while iterating
erinversfeldcodes Apr 18, 2026
fd4313d
fix: run mix deps.get before scripts/gen-ecto-proto.sh
erinversfeldcodes Apr 18, 2026
35a05fc
refactor: dump structure by swapping migrations dir
erinversfeldcodes Apr 18, 2026
a788a12
doc: add migration standards with anti-pattern rules
erinversfeldcodes Apr 18, 2026
416805e
chore: untrack doc not ready for commit
erinversfeldcodes Apr 18, 2026
8e4172d
fix: drop 6PN allowlist from /internal/metrics auth
erinversfeldcodes Apr 18, 2026
5553db2
fix: terminal upload counter guards on pending→terminal transition
erinversfeldcodes Apr 18, 2026
7d8ec91
refactor: rename router_dispatch re-emit to Stacks namespace
erinversfeldcodes Apr 18, 2026
88b0fed
doc: clarify migration-safety gate detection posture
erinversfeldcodes Apr 18, 2026
34bd910
doc: refresh runtime.exs metrics_scrape_token comment
erinversfeldcodes Apr 18, 2026
87c580e
feat: synthetic probe script for production SLO gate
erinversfeldcodes Apr 18, 2026
fe0b932
feat: SLO gate with multi-machine prom scrape + thresholds
erinversfeldcodes Apr 18, 2026
5f21383
feat: deploy-stack.sh --production mode with retry + DB check
erinversfeldcodes Apr 18, 2026
83e69d9
feat: tag every main commit as main-<sha> for rollback targeting
erinversfeldcodes Apr 18, 2026
080cf81
feat: production deploy workflow with SLO gate + auto-rollback
erinversfeldcodes Apr 18, 2026
580b5e6
doc: #137 follow-up — rollback composite action
erinversfeldcodes Apr 18, 2026
d7f8763
doc: refresh Phase 2 completion record with iterations
erinversfeldcodes Apr 18, 2026
24e086d
fix: prod seeds create one owner, not dev fixtures
erinversfeldcodes Apr 18, 2026
3087978
chore: temporarily add pull_request trigger + wait-for-CI step
erinversfeldcodes Apr 18, 2026
ec751b5
chore: drop wait-for-CI step, run in parallel during iteration
erinversfeldcodes Apr 18, 2026
fb52a92
fix: pin cargo-chef transitive deps with --locked
erinversfeldcodes Apr 18, 2026
1f8b368
fix: set DB URL
erinversfeldcodes Apr 18, 2026
94e7193
fix: replace brittle JSON parser with awk match
erinversfeldcodes Apr 18, 2026
972859c
fix: correct db url check
erinversfeldcodes Apr 18, 2026
ebe531d
doc: document work for isolating prod db
erinversfeldcodes Apr 18, 2026
0f8c9ff
fix: use correct creds for SLO checks
erinversfeldcodes Apr 18, 2026
2e1e7c8
fix: make sure email is confirmed in seeding
erinversfeldcodes Apr 18, 2026
2e133e6
fix: correct the summary script
erinversfeldcodes Apr 18, 2026
a9cefa5
doc: update the follow up ticket
erinversfeldcodes Apr 18, 2026
bb16322
feat: custom stacks_* metrics export via PromEx
erinversfeldcodes Apr 19, 2026
e81098e
feat: SLO gate reflects real production signals
erinversfeldcodes Apr 19, 2026
06ac37c
feat: extract Accounts.mark_confirmed/1
erinversfeldcodes Apr 19, 2026
5fe98b6
refactor: decouple SearXNG from prod core release
erinversfeldcodes Apr 19, 2026
ec2a857
doc: rollback bootstrap docs + extend Issue #137 with migration-ordering
erinversfeldcodes Apr 19, 2026
e721309
fix: don't allow false-passes of the slo checks
erinversfeldcodes Apr 19, 2026
4444a75
doc: add follow up issues
erinversfeldcodes Apr 19, 2026
f1e1d78
feat: add synthetic to exercise searxng
erinversfeldcodes Apr 19, 2026
4296e8d
fix: use public URL
erinversfeldcodes Apr 19, 2026
b32b35a
fix: upgrade searxng image size
erinversfeldcodes Apr 19, 2026
45a5776
fix: update upload check to account for valid and invalid images
erinversfeldcodes Apr 19, 2026
c455f81
feat: forward prod logs to axiom
erinversfeldcodes Apr 19, 2026
6782157
fix: look up searxng and log-shipper in correct dir
erinversfeldcodes Apr 19, 2026
6ec9f63
fix: swap upload canary to real JPG
erinversfeldcodes Apr 19, 2026
63fb654
doc: file #141 for HEIF support
erinversfeldcodes Apr 19, 2026
ce27cd2
fix: update deploy-stack.sh to cd into the Dockerfile's directory bef…
erinversfeldcodes Apr 19, 2026
008d8c5
fix: don't use curl to probe searxng and log-shipper
erinversfeldcodes Apr 20, 2026
e2f909c
fix: don't duplicate axiom config
erinversfeldcodes Apr 20, 2026
591fd4d
fix: remove stock config
erinversfeldcodes Apr 20, 2026
180c032
fix: don't unset SAERXNG url
erinversfeldcodes Apr 20, 2026
7fcebfe
fix: make searxng critical to deployment
erinversfeldcodes Apr 20, 2026
195faf6
fix: use auth.strategy config
erinversfeldcodes Apr 20, 2026
9107829
fix: add searxng_url to the filter list
erinversfeldcodes Apr 20, 2026
520b84e
fix: bump pool size
erinversfeldcodes Apr 20, 2026
de12359
fix: correct nix setup
erinversfeldcodes Apr 20, 2026
8b332ad
refactor: merge modal calls into single call
erinversfeldcodes Apr 20, 2026
a47bdf1
chore: histogram buckets extended
erinversfeldcodes Apr 20, 2026
4abf166
chore: increase bucket size
erinversfeldcodes Apr 20, 2026
28c93a3
feat: split oban into own pool
erinversfeldcodes Apr 20, 2026
cf6c4eb
chore: add slow-query telemetry handler
erinversfeldcodes Apr 20, 2026
53969a0
doc: correct fly tokens command for log-shipper
erinversfeldcodes Apr 20, 2026
ffec05e
chore: bump oban pool size
erinversfeldcodes Apr 20, 2026
5f33ed3
chore: add partial index on oban_jobs(queue, state)
erinversfeldcodes Apr 20, 2026
22d5453
chore: add debugging details to isolate issues in handler performance
erinversfeldcodes Apr 20, 2026
6ce9d2e
chore: rebalance pool sizes
erinversfeldcodes Apr 20, 2026
e49b3ec
chore: worker-tagged query histograms
erinversfeldcodes Apr 20, 2026
51942de
fix: use correct event name in telemetry
erinversfeldcodes Apr 20, 2026
b412c14
chore: generate enough Core.Repo samples in 10 min
erinversfeldcodes Apr 20, 2026
60ada5f
refactor: single-prompt consolidation
erinversfeldcodes Apr 20, 2026
dace141
test: confirm oban split
erinversfeldcodes Apr 20, 2026
b136825
chore: tighten pool queue time
erinversfeldcodes Apr 21, 2026
555d917
fix: handle new connections
erinversfeldcodes Apr 21, 2026
7ac4a72
refactor: add threading to modal and pin region for consistent perfor…
erinversfeldcodes Apr 21, 2026
4f259f6
feat:resize images before Modal inference
erinversfeldcodes Apr 21, 2026
cfa9c8f
refactor: race OL + GB in parallel, add ETS cache
erinversfeldcodes Apr 21, 2026
40fbd1f
refactor: fast-path for checksum-valid local-OCR ISBNs
erinversfeldcodes Apr 21, 2026
bffe0e8
feat: client-side image compression + EXIF strip
erinversfeldcodes Apr 21, 2026
3efaacd
feat: expand upload canaries to cover wider range of inputs
erinversfeldcodes Apr 21, 2026
8b68aeb
chore: increase upload sampling
erinversfeldcodes Apr 21, 2026
d4b21b8
refactor:try_candidate/1 now races OpenLibrary and Google Books in pa…
erinversfeldcodes Apr 21, 2026
9aa23fb
refactor: @modal.concurrent(max_inputs=4) → max_inputs=8
erinversfeldcodes Apr 21, 2026
cd999f6
refactor: big modal refactor, use vllm for continuous batching and Pa…
erinversfeldcodes Apr 21, 2026
3cec0b2
chore: pin vllm version and use async load
erinversfeldcodes Apr 21, 2026
caa1cad
feat: implement brave fuse
erinversfeldcodes Apr 21, 2026
1750a23
chore: increase brave daily limit
erinversfeldcodes Apr 21, 2026
fb712cb
feat: add searxng and r2 fuses
erinversfeldcodes Apr 21, 2026
4667672
refactor: only exercise barcode path once in canaries
erinversfeldcodes Apr 21, 2026
9dd819e
fix: pin transformers==4.48.3, the last version compatible with vLLM …
erinversfeldcodes Apr 21, 2026
746b230
fix: pin to transformer 4.49.0
erinversfeldcodes Apr 21, 2026
f0619b5
chore: cap Modal container autoscale at 10
erinversfeldcodes Apr 22, 2026
1f18102
refactor: scale vision queue + upload rate limit to support concurren…
erinversfeldcodes Apr 22, 2026
8f2a9b5
refactor: move author-source discovery to nightly batch cron
erinversfeldcodes Apr 22, 2026
16b7400
feat: presigned-URL upload flow
erinversfeldcodes Apr 22, 2026
65a2ccc
feat: presigned-URL upload flow (frontend)
erinversfeldcodes Apr 22, 2026
58f1cc9
fix: enum now orders as awaiting_upload → pending → resolved → rejected
erinversfeldcodes Apr 22, 2026
665c543
refactor: early-terminate Modal generation on not_book classification
erinversfeldcodes Apr 22, 2026
f6608a2
refactor: cache ISBNResolver.search_by_title results
erinversfeldcodes Apr 22, 2026
39e27c9
fix: use speculative_config={...} dict form
erinversfeldcodes Apr 22, 2026
a9986b3
fix: revert speculative decoding, vLLM's V0 engine raises a bare Asse…
erinversfeldcodes Apr 22, 2026
d220efb
feat: add persistent cache tables via proto.sync
erinversfeldcodes Apr 22, 2026
e27af63
feat: back ISBN + title-search caches with Postgres L2
erinversfeldcodes Apr 22, 2026
b1e81bf
feat: sweep expired cache rows nightly
erinversfeldcodes Apr 22, 2026
0610b8b
chore: profiling upload_p95
erinversfeldcodes Apr 22, 2026
df45667
fix: properly integrate start measurements
erinversfeldcodes Apr 22, 2026
b5464de
refactor: switch gpu='A10G' → gpu='H100'
erinversfeldcodes Apr 22, 2026
b20d8c4
fix: address squawk failures
erinversfeldcodes Apr 22, 2026
8fb973d
doc: document vision service changes
erinversfeldcodes Apr 23, 2026
af23abf
chore: bump p95 to account for best performance obtained thus far
erinversfeldcodes Apr 23, 2026
41a3a99
fix: increase samples in testing
erinversfeldcodes Apr 23, 2026
99f65d7
chore: revert modal region pinning
erinversfeldcodes Apr 23, 2026
e601701
chore: warm up modal
erinversfeldcodes Apr 23, 2026
9bbdcf2
fix: use correct vars
erinversfeldcodes Apr 23, 2026
2889a96
fix: scrape metrics in slo bucket, not since start
erinversfeldcodes Apr 24, 2026
0a44856
chore: simplify deploy script
erinversfeldcodes Apr 24, 2026
1d50d2f
chore: more simplifications
erinversfeldcodes Apr 24, 2026
1996d55
doc: update docs with latest vision service details
erinversfeldcodes Apr 24, 2026
45f1bc0
doc: issue + plan for staging Neon bootstrap
erinversfeldcodes Apr 28, 2026
bbb2aa6
fix: restore @disable_migration_lock for CONCURRENTLY migrations on Neon
erinversfeldcodes Apr 28, 2026
a7a0244
feat: separate Neon project for staging + previews
erinversfeldcodes Apr 28, 2026
33fbe1e
fix: don't lint def down reversals as destructive
erinversfeldcodes Apr 28, 2026
447d3a3
fix: properly mock pre-signed URLs in mock flow
erinversfeldcodes Apr 28, 2026
fd15e82
test: add E2E testing to ensure presigned URLs work as expected
erinversfeldcodes Apr 28, 2026
cbd965b
chore: fail idor loudly if preconditions not met
erinversfeldcodes Apr 28, 2026
d38d268
fix: remove unreachable error clause in seed_prod owner creation
erinversfeldcodes Apr 28, 2026
bae08ab
fix: exclude cache schemas from factory-proto coverage check
erinversfeldcodes Apr 28, 2026
cd9dafb
fix: clear stale lint debt
erinversfeldcodes Apr 28, 2026
ecb24a7
fix: have githook use nix
erinversfeldcodes Apr 28, 2026
78dfed9
fix: have set up script handle python venv setup
erinversfeldcodes Apr 28, 2026
b90ad14
doc: mark finished docs as complete
erinversfeldcodes Apr 28, 2026
6667581
chore: add back the CI jobs
erinversfeldcodes Apr 28, 2026
3777da9
chore: clean up old PATH injection hack now that setup script handles…
erinversfeldcodes Apr 28, 2026
06731c6
chore: add a non-root USER to the log-shipper Dockerfile
erinversfeldcodes Apr 28, 2026
4b3c1a7
fix: localize IFS in probe-production.sh
erinversfeldcodes Apr 28, 2026
749d5d3
fix: sqlfluff fix on Jinja-block indent
erinversfeldcodes Apr 28, 2026
fa66580
feat: block CI on successful githook report
erinversfeldcodes Apr 28, 2026
fd69cdb
fix: checkov
erinversfeldcodes Apr 28, 2026
2228c18
fix: generate language-specific proto code before rust/python lint
erinversfeldcodes Apr 28, 2026
8b15477
fix: pull atheris out of vision dev requirements
erinversfeldcodes Apr 28, 2026
c9dbc70
test: implement missing tests
erinversfeldcodes Apr 28, 2026
8d088f0
fix: make sure to use correct python env in deploy script
erinversfeldcodes Apr 28, 2026
98e6293
fix: retry core deploy
erinversfeldcodes Apr 29, 2026
7db7afa
fix: pass GH_REPO to gate-pre-push-report's gh invocation
erinversfeldcodes Apr 29, 2026
d40d7e1
fix: only parse the ci-summary
erinversfeldcodes Apr 29, 2026
6b52bdc
fix: update runner paths
erinversfeldcodes Apr 29, 2026
4936100
fix: more path fixing
erinversfeldcodes Apr 29, 2026
9ca4c04
fix: bump auth + password_change limits, parameterise tests
erinversfeldcodes Apr 29, 2026
d120c15
fix: unblock vision test suite — config validator + libzbar
erinversfeldcodes Apr 29, 2026
ae95eb5
test: align tests with new rate-limit + secret-validation defaults
erinversfeldcodes Apr 29, 2026
7fd3c7c
feat: add Stacks.Audit.log_rollback/1 helper
erinversfeldcodes Apr 29, 2026
0304db2
feat: add Neon LSN restore + migration-failure detection to rollback …
erinversfeldcodes Apr 30, 2026
c79b192
feat: add rollback-production composite action + parser
erinversfeldcodes Apr 30, 2026
9ca438b
feat: wire deploy-production.yml to rollback composite action
erinversfeldcodes May 1, 2026
9653376
chore: adopt actionlint in CI
erinversfeldcodes May 2, 2026
3d1b607
doc: manual-rollback + migration-recovery runbooks
erinversfeldcodes May 2, 2026
eddb35e
fix: address security finding
erinversfeldcodes May 2, 2026
fd98810
fix: ping zap container
erinversfeldcodes May 3, 2026
a34b76e
fix: include docker-buildx and properly assess security
erinversfeldcodes May 3, 2026
dbe5277
fix: use correct variables in prod
erinversfeldcodes May 3, 2026
811dc25
refactor: don't duplicate build
erinversfeldcodes May 3, 2026
052a1e6
fix: don't use secrets in comments
erinversfeldcodes May 3, 2026
b0727d4
doc: make failure path clearer
erinversfeldcodes May 3, 2026
fd3cb5a
chore: temporarily disable SLOs and make rollbacks always happen
erinversfeldcodes May 3, 2026
2b308ff
chore: debugging
erinversfeldcodes May 4, 2026
8e05d4d
fix: syntax
erinversfeldcodes May 4, 2026
aa519f9
fix: include previously uncommitted file
erinversfeldcodes May 4, 2026
330c83e
fix: use correct repo name and validate after rollback
erinversfeldcodes May 4, 2026
64061fc
chore: make rollback invoke only after deployment failure now that su…
erinversfeldcodes May 4, 2026
62f629e
chore: make production deployment only happen on merge to main again
erinversfeldcodes May 4, 2026
14bf0ed
chore: add back changes checks for ci checks
erinversfeldcodes May 4, 2026
28cd451
chore: assorted clean up
erinversfeldcodes May 4, 2026
a82c111
doc: mark issue 137 as complete
erinversfeldcodes May 4, 2026
995f19c
fix: remove ref: head_sha parameter from actions/checkout
erinversfeldcodes May 4, 2026
8b4b3e1
fix: run Stacks.Release.seed/0 with ALLOW_SEEDS=true on staging
erinversfeldcodes May 4, 2026
a2df188
feat: audit foundations + prober user + append-only trigger
erinversfeldcodes May 5, 2026
6333f38
feat: TOTP/MFA + admin session pipeline
erinversfeldcodes May 5, 2026
acde4b4
feat: add break-glass admin data endpoints and audit middleware
erinversfeldcodes May 5, 2026
24e497e
feat: require admin MFA for owner routes and audit login/MFA events
erinversfeldcodes May 5, 2026
0920233
fix: elixir-lint, elixir-test, dbt-checkpoint fixes
erinversfeldcodes May 5, 2026
cf1c170
fix: resolve CI failures from PR
erinversfeldcodes May 6, 2026
93ad257
fix: ci
erinversfeldcodes May 6, 2026
f42d1a3
fix: ci
erinversfeldcodes May 6, 2026
f915093
fix: upload tests
erinversfeldcodes May 7, 2026
64b6a03
fix: ci
erinversfeldcodes May 10, 2026
5c4856f
test: update tests
erinversfeldcodes May 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .act/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Custom act runner image: Ubuntu 24.04 + OpenSSL 1.1 compat.
# act's built-in ImageOS mapping downloads OTP binaries built for Ubuntu 20.04
# (linked against libcrypto.so.1.1) even when using a 24.04 container image.
# This adds the OpenSSL 1.1 compat library so OTP 28 loads correctly.
#
# checkov:skip=CKV_DOCKER_2: act runner image, not a production container — no healthcheck needed
# checkov:skip=CKV_DOCKER_3: act runner requires root to simulate GitHub Actions runner
FROM catthehacker/ubuntu:act-24.04
# hadolint ignore=DL3008,DL3059
RUN apt-get update && \
apt-get install -y --no-install-recommends libssl1.1 2>/dev/null || \
( ARCH=$(dpkg --print-architecture) && \
if [ "$ARCH" = "arm64" ]; then \
echo "deb http://ports.ubuntu.com/ubuntu-ports focal-security main" >> /etc/apt/sources.list; \
else \
echo "deb http://security.ubuntu.com/ubuntu focal-security main" >> /etc/apt/sources.list; \
fi && \
apt-get update && \
apt-get install -y --no-install-recommends libssl1.1 ) && \
rm -rf /var/lib/apt/lists/*
13 changes: 13 additions & 0 deletions .act/event.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"pull_request": {
"base": {
"ref": "main"
},
"head": {
"ref": "chore/enable-pipelines"
}
},
"repository": {
"default_branch": "main"
}
}
5 changes: 5 additions & 0 deletions .actrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
-P ubuntu-latest=stacks-act-runner
--pull=false
--env GITHUB_TOKEN
--env-file .act/.env
-e .act/event.json
22 changes: 18 additions & 4 deletions .claude/hooks/post-tool-lint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,24 @@ run_check() {
# Must appear before the extension-specific dispatch block.
# ---------------------------------------------------------------------------
if command -v gitleaks > /dev/null 2>&1; then
run_check \
"gitleaks detect --no-git --source ${FILE_PATH}" \
"Run: gitleaks detect --no-git --source ${FILE_PATH}" \
gitleaks detect --no-git --source "$FILE_PATH" --log-level error
# Skip .env files — they are gitignored by design and intentionally contain
# real secrets. The .gitleaks.toml path allowlist covers them in git-mode
# scans; --no-git --source on a bare file path bypasses that allowlist.
case "$BASENAME" in
.env|.env.local)
: # SKIP: gitignored env file
;;
*)
_gitleaks_config=()
if [[ -f "${REPO_ROOT}/.gitleaks.toml" ]]; then
_gitleaks_config=(--config "${REPO_ROOT}/.gitleaks.toml")
fi
run_check \
"gitleaks detect --no-git --source ${FILE_PATH}" \
"Run: gitleaks detect --no-git --source ${FILE_PATH}" \
gitleaks detect --no-git --source "$FILE_PATH" --log-level error "${_gitleaks_config[@]}"
;;
esac
else
: # SKIP: gitleaks not installed
fi
Expand Down
15 changes: 10 additions & 5 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,18 +39,23 @@ LICENSE
README.md
*.md

# Scripts (not needed inside container)
# Scripts — exclude all except proto generation (needed by Dockerfile.core build)
scripts
!scripts/gen-ecto-proto.sh
!scripts/gen_python_proto.py

# Proto — raw schemas not needed, but proto/gen/elm/ has committed Elm decoders required for build
proto/stacks
proto/buf.yaml
proto/buf.gen.yaml
# Proto — exclude generated outputs but keep source schemas + buf config (needed for codegen)
proto/gen

# Test fixtures / images (only e2e needs them, not runtime)
e2e
images

# Built static assets — generated by build.js on the runner before fly deploy.
# Must be explicitly included because they are gitignored (build outputs),
# and Fly's remote builder may exclude gitignored files from the context.
!apps/core/priv/static/

# Editor / OS
.DS_Store
.vscode
Expand Down
51 changes: 22 additions & 29 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -62,21 +62,6 @@ VISION_HMAC_SECRET=generate-a-strong-random-secret
# Together AI API key for LLM features (used by vision sidecar)
VISION_TOGETHER_API_KEY=together-api-key-for-llm

# =============================================================================
# Object storage (Cloudflare R2 — S3-compatible)
# =============================================================================

# Cloudflare account ID (used to construct the R2 endpoint URL)
R2_ACCOUNT_ID=your_cloudflare_account_id

# Access credentials for the R2 storage bucket
# Create via: Cloudflare dashboard → R2 → Manage R2 API tokens
R2_ACCESS_KEY_ID=your_r2_access_key
R2_SECRET_ACCESS_KEY=your_r2_secret_key

# Bucket name for uploaded book images
R2_BUCKET_NAME=stacks-images

# =============================================================================
# Scraper (Rust microservice)
# =============================================================================
Expand Down Expand Up @@ -117,9 +102,9 @@ STACKS_DBT_DB_PASSWORD=your-strong-password
# Listed here for reference only.
# OPEN_LIBRARY_BASE_URL=https://openlibrary.org

# Google Books API key — currently hardcoded in ISBNResolver, not read from env.
# Listed here for reference; wire through runtime.exs when API key rotation is needed.
# GOOGLE_BOOKS_API_KEY=your-google-books-api-key
# Google Books API key (ISBN resolution fallback — optional, raises rate limit from 1k/day to quota)
# Obtain from: https://console.cloud.google.com/apis/credentials (Public data → API key, restrict to Books API)
GOOGLE_BOOKS_API_KEY=your-google-books-api-key

# Brave Search API key for source discovery
BRAVE_SEARCH_API_KEY=your-brave-search-api-key
Expand All @@ -137,19 +122,27 @@ FLY_API_TOKEN=your-fly-api-token
# =============================================================================
# Neon (preview DB branching — only needed for deploy-preview.sh / CI)
# =============================================================================

# Neon project ID — found in the Neon console under Project Settings.
# Used by deploy-preview.sh to fork a DB branch per PR.
# Obtain from: https://console.neon.tech → your project → Settings → General
NEON_PROJECT_ID=your-neon-project-id

# Neon API key — used to create and delete preview branches via the Neon API.
#
# The Stacks uses two Neon projects with zero copy-on-write lineage:
# - `thestacks` — production data. Prod Fly app only.
# - `thestacks-staging` — staging branch + every preview/<pr> branch.
# Previews are CoW children of `staging` inside `thestacks-staging`, so they
# inherit migrations + dev fixtures with zero chance of leaking prod data.
# See docs/deployment/NEON_BRANCH_TOPOLOGY.md for the full architecture.

# Neon project ID for the `thestacks-staging` project (NOT the prod project).
# Previews are created as branches inside this project.
# Obtain from: https://console.neon.tech → thestacks-staging → Settings → General
NEON_STAGING_PROJECT_ID=your-neon-staging-project-id

# Neon API key scoped to the staging project (or an account-level key).
# Used by deploy-preview.sh to create/delete preview branches.
# Obtain from: https://console.neon.tech → Account → API Keys → New API Key
NEON_API_KEY=your-neon-api-key
NEON_STAGING_API_KEY=your-neon-staging-api-key

# Name of the Neon branch used as parent for preview branches.
# Preview branches inherit this branch's data (fixture data only — no production data).
# Default: staging. See docs/deployment/NEON_BRANCH_TOPOLOGY.md for the branch hierarchy.
# Name of the parent branch for preview creation inside `thestacks-staging`.
# Default: `staging` — a branch containing migrations + the dev fixture set.
# See docs/deployment/NEON_BRANCH_TOPOLOGY.md.
NEON_PARENT_BRANCH=staging

# =============================================================================
Expand Down
1 change: 1 addition & 0 deletions .envrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
use flake
45 changes: 45 additions & 0 deletions .github/actions/check-slo-gate/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: "Check SLO gate (post-deploy health)"
description: >
Wraps scripts/check-slo-gate.sh. Scrapes /internal/metrics for the
PROBE_WINDOW_SECONDS window (default 600s = 10 min), runs
probe-production.sh in parallel, computes SLIs against thresholds.
Exit 0 iff every SLI is healthy. Same SLI definitions used at both
deploy-time gating AND post-rollback verification — operators can
also re-run the underlying script manually to distinguish
genuinely-unhealthy state from probe flakiness.
inputs:
out-path:
description: "Path to write the gate-observations.json artifact."
required: false
default: gate-observations.json
probe-window-seconds:
description: >
Window size in seconds. Default 600 (10 min). Use the same value
across deploy-time and post-rollback verification so the SLI
definitions stay aligned.
required: false
default: "600"
force-breach:
description: >
For testing only. When set to a known SLI name (e.g.
"beam_memory_mb"), forces that SLI to report breached=true
regardless of actual measurements. Used by the workflow's
force_rollback dispatch input to exercise the rollback path
without a real regression.
required: false
default: ""
runs:
using: composite
steps:
- id: check-slo-gate
name: Run SLO gate script
shell: bash
# Inputs flow through env: rather than inline ${{...}} interpolation
# so a malicious value cannot escape the bash context. Defense in
# depth — same env-indirection pattern used by the rollback action.
env:
OUT_PATH: ${{ inputs.out-path }}
PROBE_WINDOW_SECONDS: ${{ inputs.probe-window-seconds }}
FORCE_BREACH: ${{ inputs.force-breach }}
run: |
bash "${{ github.action_path }}/../../../scripts/check-slo-gate.sh" --out "$OUT_PATH"
176 changes: 176 additions & 0 deletions .github/actions/rollback-production/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# `rollback-production` composite action

Wraps `scripts/rollback-production.sh` so its secret dependencies are
**declarative inputs** — every secret the script reads is named at the
call site instead of inherited silently from the surrounding `env:`
block. Reusable from `deploy-production.yml`'s SLO-gate failure path
and from any future `workflow_dispatch` operator-initiated rollback.

## What it does

Three rollback legs, **executed in this order**:

1. **Core image** — `fly deploy --image $CORE_PREV_IMAGE` against
`$CORE_APP`, then waits on `/api/health` via `fly proxy`.
2. **Neon DB** (optional) — `POST /branches/{id}/restore` resets the
prod branch to the captured pre-migrate LSN. The pre-rollback state
is preserved as a `pre-rollback-<sha7>-<ts>` Neon branch (free
safety net).
3. **Modal vision** (optional) — clones `origin-remote` at
`$MODAL_PREV_COMMIT`, runs `modal deploy apps/vision/modal_app.py`
to revert the Modal app to the previous revision.

### Ordering invariant

Core image first, then DB, then vision. This is forced by what each
direction guarantees (see
[`docs/runbooks/vision-service-rollback.md`](../../../docs/runbooks/vision-service-rollback.md)
for the long form):

- **Image N-1 ↔ schema N** is **safe** by construction. The
`migration-safety` lint enforces expand-contract migrations, so
the post-migrate schema is forward-compatible with the previous
image. New columns are unused; no read/write conflicts.
- **Image N ↔ schema N-1** is **unsafe**: image N may write columns
that don't exist in the older schema → INSERT/UPDATE failures.

So we revert the image *first* (entering the safe corner), then the
DB, then vision (which is stateless w.r.t. the DB schema).

## Inputs

### Required

| Input | Used by | Example |
|---|---|---|
| `core-prev-image` | core leg | `registry.fly.io/thestacks-core@sha256:abc…` |
| `fly-api-token` | core leg (`fly deploy`) | `${{ secrets.FLY_API_TOKEN }}` |
| `rollback-reason` | audit log + stdout | `"SLO gate breached: vision_fuse_open=1"` |
| `failed-sha` | audit log (`metadata.failed_sha`) | `${{ github.sha }}` |
| `triggered-by` | audit log (`metadata.triggered_by`) | `slo-gate` \| `manual` \| `step-failure` \| `migration-failure` |
| `database-url` | audit log INSERT | `${{ secrets.DATABASE_URL }}` |
| `cloak-key` | audit-metadata encryption | `${{ secrets.CLOAK_KEY }}` |

### Optional (with defaults)

| Input | Default | Notes |
|---|---|---|
| `core-app` | `thestacks-core` | Fly app name. |
| `modal-app` | `thestacks-vision` | Modal prod app name. |
| `modal-prev-commit` | `""` | Empty = skip Modal rollback (bootstrap, see below). |
| `modal-token-id` | `""` | **Required when** `modal-prev-commit` is set; else unused. |
| `modal-token-secret` | `""` | **Required when** `modal-prev-commit` is set. |
| `origin-remote` | `https://github.com/erinversfeld/thestacks.git` | Git remote for Modal commit checkout. |
| `neon-project-id` | `""` | **Required when** `pre-migrate-lsn` is set. |
| `neon-api-key` | `""` | **Required when** `pre-migrate-lsn` is set. |
| `neon-branch-id` | `""` | **Required when** `pre-migrate-lsn` is set. |
| `pre-migrate-lsn` | `""` | Empty = skip DB rollback (image-only — see below). |

### Outputs

| Output | Values |
|---|---|
| `core-rolled-back` | `true` (rolled back), `false` (skipped — image already current), `error` (leg failed). |
| `modal-rolled-back` | `true`, `false` (skipped — `modal-prev-commit` empty), `error`. |
| `db-rolled-back` | `true`, `false` (skipped — `pre-migrate-lsn` empty), `error`. |

## Bootstrap edge cases

Both Modal and DB rollback are optional **by design**. The first
deploy on a brand-new prod stack and certain operator-suppressed
flows produce empty inputs that exit cleanly rather than failing:

- **No `main-<sha>` tag yet** → `modal-prev-commit` is empty. The
script prints `WARN rollback: MODAL_PREV_COMMIT is unset` and
completes a **core+DB-only rollback**. Output:
`modal-rolled-back=false`. Subsequent deploys (after `tag-main.yml`
stamps a tag) will roll vision back normally.
- **No pre-migrate LSN captured** → `pre-migrate-lsn` is empty (e.g.
the deploy ran without migrations, or operator override). The
script prints `WARN rollback: PRE_MIGRATE_LSN unset` and completes
a **core+vision-only rollback** (image-only DB-wise). Output:
`db-rolled-back=false`.

Neither case is a failure — both are documented partial-rollback
paths.

## Failure modes

The action exits non-zero (and `log-audit` does **not** run, leaving
audit-row absence as a signal that the rollback didn't complete) on:

| Cause | Detection | Output |
|---|---|---|
| Required env missing | `validate-inputs` step's bash assertions | exit 1 before script runs |
| `fly deploy` fails | script exits 1 with `FAIL rollback: fly deploy (core) failed` | `core-rolled-back=error` |
| Neon restore HTTP non-2xx | script exits 1 with `FAIL rollback: Neon restore returned HTTP <code>` | `db-rolled-back=error` |
| Modal deploy fails | script exits 1 with `FAIL rollback: modal deploy …` | `modal-rolled-back=error` |
| `validate-inputs` fails (e.g. `pre-migrate-lsn` set without Neon vars) | bash `exit 1` | all three outputs `error` |

`emit-outputs` always runs (`if: always()`) so the workflow can read
the per-leg status even on failure. The audit row is the source of
truth for "did rollback complete?" — its **absence** indicates the
action exited before reaching `log-audit`.

## How to invoke from `workflow_dispatch`

The Phase 4 workflow change adds a `manual_rollback` boolean to
`deploy-production.yml`'s `workflow_dispatch:` inputs. When set, the
workflow short-circuits the deploy + gate steps and goes straight to
this composite action:

```yaml
on:
workflow_dispatch:
inputs:
manual_rollback:
description: "Roll back the prod stack without running a deploy first."
type: boolean
default: false

jobs:
rollback:
if: ${{ inputs.manual_rollback }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Resolve previous-state SHAs
id: prev
run: |
PREV_TAG=$(git tag --list 'main-*' --sort=-creatordate | head -1)
# …extract CORE_PREV_IMAGE + MODAL_PREV_COMMIT from the tag…
- name: Rollback production stack
uses: ./.github/actions/rollback-production
with:
core-prev-image: ${{ steps.prev.outputs.core-image }}
modal-prev-commit: ${{ steps.prev.outputs.modal-commit }}
modal-token-id: ${{ secrets.MODAL_TOKEN_ID }}
modal-token-secret: ${{ secrets.MODAL_TOKEN_SECRET }}
fly-api-token: ${{ secrets.FLY_API_TOKEN }}
neon-project-id: ${{ secrets.NEON_PROJECT_ID }}
neon-api-key: ${{ secrets.NEON_API_KEY }}
neon-branch-id: ${{ steps.prev.outputs.neon-branch-id }}
pre-migrate-lsn: "" # manual rollbacks skip DB by default
rollback-reason: "Manual rollback by @${{ github.actor }}"
failed-sha: ${{ github.sha }}
triggered-by: manual
database-url: ${{ secrets.DATABASE_URL }}
cloak-key: ${{ secrets.CLOAK_KEY }}
```

For the full operator procedure (when to manual-rollback, what to
expect, post-rollback checks) see the runbook at
[`docs/runbooks/manual-rollback.md`](../../../docs/runbooks/manual-rollback.md)
(landing in Phase 6 of Issue #137).

## See also

- [`docs/runbooks/vision-service-rollback.md`](../../../docs/runbooks/vision-service-rollback.md)
— rationale for the core → DB → vision ordering invariant.
- [`scripts/rollback-production.sh`](../../../scripts/rollback-production.sh)
— the script this action wraps; canonical env-var contract.
- [`apps/core/lib/stacks/audit.ex`](../../../apps/core/lib/stacks/audit.ex)
— `Stacks.Audit.log_rollback/1`, the audit + telemetry helper invoked
by the `log-audit` step.
- Issue [#137](../../../issues/137-rollback-action-composite.md) —
full design rationale and DoD checklist.
Loading