Skip to content

v2.2.0 — target architecture + cooldown + publish-date registry + serve-before-commit#34

Open
aydasraf wants to merge 329 commits into
masterfrom
2.2.0
Open

v2.2.0 — target architecture + cooldown + publish-date registry + serve-before-commit#34
aydasraf wants to merge 329 commits into
masterfrom
2.2.0

Conversation

@aydasraf
Copy link
Copy Markdown
Collaborator

@aydasraf aydasraf commented Apr 16, 2026

v2.2.0 — target-architecture train + cooldown metadata filtering + publish-date registry + serve-before-commit

Summary

This PR lands the v2.2 target architecture (9 work items), cooldown metadata filtering across all 7 adapters, publish-date registry replacing per-adapter cooldown inspectors, serve-before-commit performance pattern across all proxy adapters, search & browse overhaul, group RaceSlice priority fix, ECS log field compliance, and preemptive Basic auth for Maven/Gradle.

Highlights

  • Serve-before-commit: writeAndVerify() returns a VerifiedArtifact after streaming upstream to a verified temp file — HTTP response streams immediately while commitAsync() persists to cache in the background. Wired into Maven, Go, Composer, PyPI.
  • Parallel sidecar saves: CompletableFuture.allOf() replaces sequential thenCompose chain for checksum sidecar writes.
  • Preemptive Basic auth: Eliminates 401 challenge-response round-trip on every Maven/Gradle artifact fetch.
  • Publish-date registry: DbPublishDateRegistry (Caffeine L1 + Postgres L2 + 6 upstream source implementations) replaces all per-adapter CooldownInspector classes. RegistryBackedInspector is the single inspector.
  • Search & Browse overhaul: Natural name sort, artifact classification (excludes metadata/checksum noise), DB-hydrated tree browser with sort-by-size/date/name, upload-date display.
  • Group RaceSlice priority: 2xx > 403 > 5xx > 404 outcome selection; upstream 4xx propagated as 404 for correct fallback; 502 not 503 for upstream failures.
  • ECS log field compliance: 488 non-compliant fields migrated across all modules.
  • ComposerGroupSlice fix: providers field type guard prevents ClassCastException when Packagist returns [] instead of {}.

Target architecture work items (9)

  • WI-00 — queue overflow + access-log level policy
  • WI-01 — Fault + Result sum types
  • WI-02 — RequestContext + Deadline + ContextualExecutor
  • WI-03 — StructuredLogger 5-tier + LevelPolicy + AuditAction
  • WI-04 — GroupResolver replaces GroupSlice at every production site
  • WI-05 — SingleFlight coalescer
  • WI-07 — ProxyCacheWriter + Maven checksum integrity
  • WI-post-05 — retire RequestDeduplicator
  • WI-post-07 — wire ProxyCacheWriter into pypi/go/composer

Cooldown metadata filtering (8 phases)

Two-layer enforcement (soft metadata filter + hard 403) for Maven, npm, PyPI, Docker, Go, Composer, Gradle with 5 performance hardenings and SOLID package restructure.

Additional features

  • Circuit breaker: rate-over-sliding-window replaces consecutive-count
  • Auth cache L1/L2 + cluster-wide invalidation
  • GroupMetadataCache stale 2-tier (aid-not-breaker)
  • Cooldown history/retention/fallback/admin UI
  • Auth-settings server-side enforcement
  • UI: searchable group member picker, inline repo creation
  • HTTP/3 optional PROXY protocol v2

Test plan

  • mvn -T8 test — all modules green (4900+ tests, 0 failures)
  • Docker smoke test: docker compose up -d && curl -u ayd:ayd http://localhost:8081/maven_group/...
  • Maven/Gradle preemptive auth: verify no 401s in access log
  • PHP group: curl -u ayd:ayd http://localhost:8081/php_group/packages.json returns 200
  • Search: verify artifact classification excludes metadata noise
  • Cooldown: verify publish-date registry resolves dates from upstream registries
  • git log --format='%B' | grep -c '^Co-Authored-By:' returns 0

aydasraf added a commit that referenced this pull request Apr 16, 2026
…I-02, WI-03)

Refreshes the three release artefacts produced by the final
end-to-end reviewer after the Wave 3 commits landed on 2.2.0:

  CHANGELOG-v2.2.0.md (144 L)
    Adds Wave 3 entries to Highlights / Added / Changed /
    Deprecated / Under-the-hood.  Version-bump, BaseCachedProxySlice
    SingleFlight migration, pypi/go/composer ProxyCacheWriter
    wiring, RequestContext expansion + Deadline + ContextualExecutor,
    StructuredLogger 5-tier + LevelPolicy + AuditAction, and the
    @deprecated MdcPropagation status — all documented with forensic
    and architecture-review section refs.

  docs/analysis/v2.2.0-pr-description.md (174 L)
    PR #34 body; WI checklist now shows 8 shipped / 6 deferred;
    test-run evidence 3,432 tests green; five PR-reviewer focus
    points (remaining MdcPropagation callers, lost user_agent sub-
    field parsing, audit-logger suppressibility gap in log4j2.xml,
    DbIndexExecutorService submit()-path bypass, four-adapter
    "any exception → 404" swallow inherited from Maven).

  docs/analysis/v2.2-next-session.md (399 L)
    Refreshed agent-executable task list.  Removes the four
    shipped items (WI-post-05, WI-post-07, WI-02, WI-03).  Keeps
    WI-04 / WI-06 / WI-06b / WI-08 / WI-09 / WI-10 in the same
    Goal / Files / Tests / DoD / Depends-on shape.  Adds four
    WI-post-03 follow-ups surfaced during Wave 3:
      a. Hoist DbIndexExecutorService to pantera-core/http/
         context/ContextualExecutorService.
      b. Re-lift user_agent.name / .version / .os.name parsing
         into StructuredLogger.access.
      c. Unify the ~110 remaining MdcPropagation call-sites
         after WI-06 + WI-08 + the Vert.x-handler migration,
         then delete MdcPropagation.java.
      d. Migrate 11 Vert.x API handlers (AdminAuth, Artifact,
         Auth, Cooldown, Dashboard, Pypi, Repository, Role,
         Settings, StorageAlias, User) to a ContextualExecutor-
         wrapped worker pool — the single biggest MdcPropagation
         debt.
    Adds one new concern:
      C6. Audit logger inherits log-level config from
          com.auto1.pantera parent — §10.4 declares audit as
          "non-suppressible" but log4j2.xml has no dedicated
          block.  Five-line fix tracked separately.

Review verdict: PASS.  Every §12 DoD met.  Every commit conforms
to type(scope): msg, zero Co-Authored-By trailers across all 11
new commits (verified via git interpret-trailers --only-trailers).
3,432 tests green across pantera-core / pantera-main / every
touched adapter module.
aydasraf added 28 commits April 20, 2026 14:57
…load-bearing

CooldownHandler.blocked():
  - Deferred crs.listAll() (JDBC) and policy.getPermissions(authUser)
    (CachedYamlPolicy falls back to blocking storage on cache miss) into
    the CompletableFuture.supplyAsync closure. Only AuthUser extraction
    from the routing context — cheap, in-memory — stays on the event loop.
  - Matches the 2.2.0 HandlerExecutor discipline: no blocking I/O on
    vert.x-eventloop threads.

CooldownHandlerFilterTest.requestRaw():
  - Wrapped the onSuccess assertion callback in ctx.verify(...) so
    AssertionError is routed to failNow instead of being swallowed and
    masquerading as a TEST_TIMEOUT. Without this, a broken assertion in
    any of the 5 non-auth tests would produce a silent 10s TimeoutException
    instead of a diagnostic failure. Verified by temporarily injecting
    assertEquals(42, 43) — the failure now surfaces as
    "expected: <42> but was: <43>" in ~1s.
Wires CooldownCleanupFallback into VertxMain startup. After
RepositorySlices is constructed (which triggers
CooldownSupport.loadDbCooldownSettings so settings.cooldown() is
authoritative), probe pg_cron via PgCronStatus. If the scheduled
cleanup job is missing, construct a local CooldownRepository +
start the fallback bound to the shared Vertx instance. On shutdown
(before vertx.close()), cancel both timers.

Placed at the bootstrap level — not inside AsyncApiVerticle — so it
runs once per process rather than once per deployed verticle
instance (AsyncApiVerticle is deployed with 2x CPU instances).
…sion

Adds GET /api/v1/cooldown/history, a paginated read over the
artifact_cooldowns_history archive. Mirrors /cooldown/blocked in shape
(repo + repo_type + search + sort query params, paginated envelope) but
returns the archive-specific fields (archived_at, archive_reason,
archived_by) and sorts by archived_at by default.

Introduces ApiCooldownHistoryPermission as a separate API-level gate so
operators can grant live-blocked visibility without exposing the
who-unblocked-what archive. The handler additionally scopes rows to
repos the caller has AdapterBasicPermission(repo, "read") on, mirroring
the two-layer permission model from the blocked endpoint.

- new ApiCooldownHistoryPermission + ApiCooldownHistoryPermissionFactory
  (mask-based, same package as ApiCooldownPermission — auto-discovered
  by PermissionsLoader via @PanteraPermissionFactory)
- CooldownHandler.history() uses the SQL-pushed
  findHistoryPaginated / countHistory from Task 3
- CooldownHandlerHistoryTest: 5 tests covering 403 without history
  perm, per-repo scoping, combined repo/repo_type filter, pagination
  total correctness, archive-field serialisation
…etConfig fields, V121 CREATE EXTENSION)

Four backend fixes surfaced during smoke test of the 2.2.0 cooldown rollout:

1. V121: drop redundant CREATE EXTENSION IF NOT EXISTS pg_cron inside the DO
   block. V114 already creates the extension via the same pattern; re-running
   emits a NOTICE that Flyway logs at WARN. The outer
   `pg_available_extensions` guard already returns early when pg_cron is
   unavailable, so this DO block can assume the extension is present.

2. CooldownRepository: repo_type filter now matches both exact and
   subtype-prefix (`base-*`). The UI sends base values ("docker", "npm",
   "maven") while DB rows store "docker-proxy", "docker-group", etc.
   Updated findActivePaginated, countActiveBlocks, findHistoryPaginated,
   and countHistory to use
   `LOWER(repo_type) = LOWER(?) OR LOWER(repo_type) LIKE LOWER(?) || '-%'`.
   bindOptionalFilters now binds the repoType value three times.

3. Deleted ApiCooldownHistoryPermission + factory. The permission was
   overengineered and never wired into the role-management UI, so admins
   could not grant it and the /history endpoint was dark. Routed
   /cooldown/history through ApiCooldownPermission.READ instead; the
   per-repo AdapterBasicPermission row filter in the handler body is
   unchanged.

4. getConfig(): include history_retention_days and cleanup_batch_limit in
   the response so the SettingsView dialog can prefill the inputs on
   initial page load.

Test adjustments: CooldownHandlerHistoryTest PermissionSpec simplified
(no more historyRead flag); the 403-without-permission test now asserts
the gate is ApiCooldownPermission.READ. Added
blockedFiltersByRepoTypeBase to CooldownHandlerFilterTest covering the
base -> subtype prefix match. Cooldown test suite: 60 passing (was 59).
…r labels, docker filter, history toggle gating

- Delete CooldownSettingsDialog; move history_retention_days and
  cleanup_batch_limit inputs into the existing SettingsView cooldown card.
- Replace the card-grid list of cooldown-enabled repos with a compact
  PrimeVue DataTable matching the blocked-artifacts table style.
- Add visible labels ("Search", "Repository", "Type") above the filter
  dropdowns in CooldownView; wire proper ids/for attributes.
- Gate the history toggle on api_cooldown_permissions.read instead of
  the removed api_cooldown_history_permissions (backend commit 9149d78
  consolidated the permission; per-repo AdapterBasicPermission still
  filters rows server-side).
- Update CooldownView.test.ts: remove stale history-permission seed
  entries, split the hide/show toggle cases, delete tests for the
  removed dialog.
…and role UI

Revives ApiCooldownHistoryPermission (and its @PanteraPermissionFactory)
that was collapsed into ApiCooldownPermission in 9149d78. Operators need
to expose the live blocked list without also exposing the long-term
archive, which is the reason for the separate permission.

End-to-end wiring:
- CooldownHandler gates /api/v1/cooldown/history on the new permission.
- AuthHandler.allowedActions includes api_cooldown_history_permissions in
  the /me response so the frontend can reflect grants in hasAction.
- RoleListView allActionsMap lists the new key so it is grantable via UI.
- CooldownView.canReadHistory checks the new key (not the base cooldown
  read), and the test suite exercises grant/withhold of the narrower
  permission independently.

AllPermission.implies still covers admin out of the box, so seeded admin
roles see the toggle automatically.
…unified filters

Replace the full-width DataTable under "Cooldown-Enabled Repositories" with a
responsive tile grid (1/2/3/4 cols from mobile up to lg). Each tile shows a
brand-colour dot, the repo name, type and cooldown duration, and an
active-blocks footer with an eraser icon for one-click "unblock all" (gated
on canWrite and active_blocks > 0). The same search/repo/type filter bar that
drives the blocked-artifacts table now also filters the grid client-side,
including base-type matching so "docker" covers both docker-proxy and
docker-group. Client-side pagination at 12 tiles per page keeps the card
compact for large cooldown fleets.

Unblock-all now routes through a confirmation Dialog using the existing
useConfirmDelete composable pattern (same as RepoManagementView and
StorageAliasView) instead of firing immediately from a row action.
…med accents

- Promote search / repo / type / mode controls to a single top-level
  filter bar; remove the duplicate toolbar that lived inside the
  Blocked Artifacts card.
- Tile grid: bump to xl:grid-cols-5 and shrink repoPageSize from 12
  to 10 (two rows of five). Paginator threshold follows.
- Replace the inline getTechInfo() brand-color dot with the shared
  RepoTypeBadge component, matching RepoListView's presentation.
  Drop the now-unused getTechInfo import.
- Switch tile background to bg-surface-card for theme-token parity
  with other admin cards.
- Tests: update paginator threshold (>10 / <=10), add coverage for
  RepoTypeBadge usage and for absence of local filter UI inside the
  blocked-artifacts card.
…on gap

Adds a cooldown-aware rewrite for the Go module proxy /@latest endpoint.
When `go get <module>` runs without a pseudo-version the client hits
/@latest first and never consults /@v/list, so a list-only filter left
the primary unbounded resolution path unprotected. The new handler
intercepts /@latest in the Go CachedProxySlice, fetches upstream,
checks the returned Version against cooldown, and — when blocked —
fetches the sibling /@v/list, picks the highest non-blocked version per
Go semver (VersionComparators.semver(), which tolerates pseudo-versions
and the leading "v"), and returns a rewritten JSON payload preserving
the Origin field and clearing Time. When every version is blocked it
returns 403 with the same Go-client-parseable convention as the
existing GoCooldownResponseFactory.

New SPI implementations (parser/filter/rewriter/detector) live alongside
the existing /@v/list components and are wired through GoLatestHandler
rather than the single-bundle CooldownAdapterRegistry slot — the
multi-endpoint fallback is not expressible in the pure MetadataFilter
SPI. /@v/list filter behaviour is unchanged.

Test counts: 48 new unit tests across 5 files; full go-adapter suite
126 pass (0 regressions); pantera-main *Cooldown* suite 60 pass;
pantera-core metadata/filter 57 pass.
…lly filtered

GoMetadataFilter / GoMetadataParser were registered in CooldownWiring but
never invoked on the serve path, so blocked versions leaked to
`go list -m -versions`, `go mod download`, and MVS resolution.

Introduce GoListHandler mirroring GoLatestHandler: fetch `/@v/list`
upstream, parse via GoMetadataParser, evaluate every version against
cooldown, apply GoMetadataFilter.filter(), and re-serialise as
newline-delimited `text/plain; charset=utf-8`. Non-2xx upstreams are
forwarded unchanged. An empty-after-filter list collapses to 403 with
the same convention as the @latest all-blocked branch.

Wire the handler into CachedProxySlice before the generic
fetchThroughCache path, preserving ordering `@latest -> @v/list ->
generic`. CooldownWiring's goBundle registration stays — it is now
live via this handler and documented as such in the class Javadoc.
aydasraf added 27 commits May 6, 2026 13:08
- Pep440VersionComparator.compareReleases: convert second arg to varargs
- PypiMetadataRewriter: pre-size html StringBuilder to 512 chars
- CachedPyProxySlice: NOPMD UnusedFormalParameter on deprecated negativeCacheTtl/Enabled overload (settings now from unified NegativeCacheConfig); drop unused 'storage' local in fetchVerifyAndCache
- IndexGenerator: drop unused absoluteUrl field on Entry; NOPMD UnusedFormalParameter on prefix ctor param (reserved for absolute-URL emission)
- ProxySlice: drop unused 'storage' (BlockingStorage) field + import; drop redundant cast on Optional<? extends Content> via type witness; drop unused params (user in backgroundRefreshIndex, rqheaders/info in serveArtifact, key in serveArtifactContent, line in extractBasePath, coords/release in registerRelease (no-op method removed entirely), rqheaders in checkCacheFirst and evaluateCooldownAndFetch); CollapsibleIfStatements in coordinatesFromFilename
- SimpleApiFormat.fromHeaders: collapse nested if
- WheelSlice.putArtifactToQueue: drop unused 'filename' param
- CacheConfig: NOPMD SystemPrintln on pre-logger config-load (server bootstrap, no logger yet)
- CooldownCache: drop unused l1AllowedTtl field (used only inside ctor)
- CooldownCircuitBreaker: collapse 4 nested if-CAS guards into compound conditions
- FilteredMetadataCache.CacheEntry.data: NOPMD MethodReturnsInternalArray (immutable cache value, mirrors ctor's ArrayIsStoredDirectly NOPMD; callers treat as read-only)
- MetadataFilterService.evaluateAndFilter: collapse blockedUntil tracking nested if
- RangeSpec.isValid: SimplifyBooleanReturns
- CombinedAuthzSlice.authenticate: drop unused 'line' param
- BaseCachedProxySlice: drop unused 'headers' param from signalToResponse and 'key' param from handleNonSuccess; NOPMD CloseResource on FileChannel (closed by Subscriber.onComplete/onError); drop unnecessary (Response) cast
- NegativeCache.nameMatches: SimplifyBooleanReturns; UseShortArrayInitializer on L2_SENTINEL
- Accept.values: UseCollectionIsEmpty (RqHeaders inherits AbstractList.isEmpty)
- EcsLogEvent.userAgent: replace for-loop-with-final-break with iterator-hasNext (AvoidBranchingStatementAsLastInLoop); NOPMD EmptyCatchBlock on Base64 decode failure (intentional fallthrough to Optional.empty)
- StructuredLogger: NOPMD UnusedLocalVariable on 'bound' AutoCloseable handle (try-with-resources holder) and EmptyCatchBlock on bindToMdc().close() catch (impl never throws)
- EcsLoggingSlice: drop unused responseSize AtomicLong + import
- FileSystemArtifactSlice.cleanup: NOPMD EmptyCatchBlock on benign close error during cleanup
- GzipSlice.gzip: NOPMD CloseResource on PipedInputStream (ownership transferred to ReactiveInputStream)
- RangeSlice.RangeLimitSubscriber: drop unused 'upstream' Subscription field
- AutoBlockRegistry: drop redundant double/long casts (single cast suffices for arithmetic promotion)
- FileVersionDetector.isVersionToken: SimplifyBooleanReturns
- AuthFromDb, AuthProviderDao, AuthSettingsDao, RepositoryDao, RoleDao, SettingsDao, StorageAliasDao, UserDao, UserTokenDao: wrap ResultSet in try-with-resources (CloseResource)
- RepositoryDao.value, RoleDao.remove, UserDao.alterPassword: NOPMD AvoidRethrowingException on intentional IllegalStateException rethrow that preserves the not-found marker (distinguishes from generic Exception wrap)
- UserTokenDao.listByUser: bump StringBuilder pre-size to 192
…t pass

- RepositorySlices: drop unused httpTuningSupplier/useLegacyHttpClientCtor fields (used only inside ctor); NOPMD CloseResource on clientSlices alias (lifecycle owned by clientLease); replace instanceof-in-catch with separate RuntimeException catch; SimplifyBooleanReturns in isProxyOrContainsProxy; drop unused 'directMembers' param from flattenMembers (4 callers updated)
- VertxMain: NOPMD CloseResource on VertxSliceServer (lifecycle owned by this.servers list)
- DockerProxyCooldownSlice: drop unused 'user' param from determineReleaseSync
- ComposerGroupSlice: drop unused cooldownMetadata/repoType fields (assigned but never read); NOPMD UnusedFormalParameter on ctor params reserved for upcoming cooldown filtering
- ApiActions: convert Action[] ctor to varargs
- AsyncApiVerticle: drop unused jwt field; NOPMD UnusedFormalParameter on jwt ctor params reserved for route protection
- AuthHandler: NOPMD ReturnEmptyCollectionRatherThanNull on findProvider (JsonObject is a record, not a collection); drop unused 'type' label param from allowedActions and convert checks to varargs (7 callers updated)
- CooldownHandler: SimplifyConditional (drop redundant null-check before instanceof); NOPMD EmptyCatchBlock on per-repo skip
- DashboardHandler: NOPMD EmptyCatchBlock on best-effort dashboard zeroed counters
- PypiHandler: drop (Void) cast via supplyAsync<Void> type witness
- RepositoryHandler: drop unused 'cooldown' field; NOPMD UnusedFormalParameter on ctor param reserved for future cooldown integration
- SearchHandler: CollapsibleIfStatements; NOPMD ReturnEmptyCollectionRatherThanNull on resolveAllowedRepos (null signals "unrestricted", empty would mean "deny everything")
- SettingsHandler: drop unused 'manageRepo' field; NOPMD UnusedFormalParameter on ctor param reserved for future repo-settings endpoints
- StorageAliasHandler: NOPMD ReturnEmptyCollectionRatherThanNull on bodyAsJson (JsonObject is a record; null signals "response already sent")
- DbGatedAuth: SimplifyBooleanReturns x2
- JwtPasswordAuthFactory: pre-size pemEncodePublicKey StringBuilder
- OktaOidcClient: NOPMD ReturnEmptyCollectionRatherThanNull on JsonObject returns; replace instanceof-in-catch with separate InterruptedException catch in fetchUserInfo
- OktaUserProvisioning: drop redundant 'existing = null' initializer (assigned in all branches)
- RsaKeyLoader: NOPMD AvoidRethrowingException on intentional IllegalStateException rethrow that preserves loader-thrown diagnostic
- UnifiedJwtAuthHandler: CollapsibleIfStatements in ACCESS-token blocklist check
- CooldownSupport: drop redundant (CooldownService) cast via type witness on map()
- JdbcCooldownService: drop unused 'inspector' param from shouldBlockNewArtifact
- ArtifactDbFactory: NOPMD CloseResource on caller-owned HikariDataSource (we only attach metrics)
…third pass

- RepositorySlices: collapse RuntimeException + Error catches via multi-catch (IdenticalCatchBranches); SimplifyBooleanReturns in isProxyOrContainsProxy; NOPMD CloseResource on clientLease assignment in php-proxy case
- AuthHandler.allowedActions: drop the Permission[] array literal at 3 call sites (UnnecessaryVarargsArrayCreation)
- BlockedThreadDiagnostics: NOPMD EmptyCatchBlock on best-effort diagnostics catches; pre-size sb StringBuilder
- GroupResolver: drop unused 'proxyMembers' field; NOPMD UnusedFormalParameter on legacy depth/timeoutSeconds (sequential-only fanout in v2.2.0) and on proxyMembers; NOPMD CloseResource on long-lived ArtifactIndex; drop unused 'memberName' param from drainBody
- ApiRoutingSlice: drop unused LIMITED_SUPPORT field + Set import
- MergeShardsSlice.hexLower: extract HEX_DIGITS to static field (LocalVariableNamingConventions); drop dual-loop-vars + reassignments (use i*2/i*2+1); drop unused 'versionFromPath' and 'parentDir' locals; drop unused 'baseUrl' from mergeHelmShards
- BrowseSlice.renderHtml: pre-size html StringBuilder
- ImportService: drop unused 'request' param from writePyPiShard and writeHelmShard; NOPMD EmptyCatchBlock on SubStorage-unwrap reflection failure; collapse redundant return-true branch in shardsModeEnabled
- DbArtifactIndex: NOPMD EmptyCatchBlock on best-effort warm-up; pre-size sb in buildFilterClauses; ForLoopCanBeForeach in queryRepoCountsMultiParam
- SearchQueryParser: NOPMD AvoidReassigningLoopVariables on intentional consume-peeked-token i++
- Http3Server: NOPMD CloseResource on QuicheServerConnector (lifecycle owned by Jetty Server via addConnector)
- Json2Yaml: drop unnecessary (YAMLMapper) cast (configure returns same type)
…al pass

- ImportService.shardsModeEnabled: collapse to single negated-equals chain (SimplifyBooleanReturns)
- PrefetchCircuitBreaker.recordDrop: collapse trip-and-CAS into compound condition
- PrefetchCoordinator.upstreamHost: NOPMD EmptyCatchBlock on intentional fallthrough
- CachedNpmMetadataLookup.readMeta, NpmPackageParser.readManifest: NOPMD ReturnEmptyCollectionRatherThanNull on byte[] returns (payload, not collection; null vs empty[] semantics differ)
- CachedDbPolicy.rolePermissions, DbUser.loadFromDb: wrap ResultSet in try-with-resources
- LoggingContext: NOPMD UnusedFormalParameter on deprecated meta ctor param (kept for source-compat)
- YamlSettings: NOPMD CloseResource on long-lived service holders (cachePubSub, ArtifactIndexCache); type-witness on map() to drop SyncArtifactIndexer cast; SimplifyBooleanReturns in proxyProtocol; chain DateTimeParseException as cause in negative-millis IllegalStateException; use addSuppressed(num) for the inner NumberFormatException (PreserveStackTrace)
- CachedUsers.onEviction, GuavaFiltersCache.onEviction: NOPMD UnusedFormalParameter on Caffeine RemovalListener<K,V> contract params
- CacheIntegrityAudit.parse: convert to varargs; NOPMD AvoidReassigningLoopVariables on intentional --root/--repo lookahead
- WebhookConfig: NOPMD UnusedAssignment on record compact-ctor param assignments (assigns to record components, not stack-locals)
ProxyCacheWriter.NON_BLOCKING_DEFAULT = {MD5, SHA256, SHA512} so the
3-arg writeAndVerify() overload (the one composer/pypi/go were using)
treats every supplied sha256/md5/sha512 sidecar as a deferred,
non-blocking check. The integrity verification then runs AFTER the
primary is already served, and a mismatch only logs — it cannot
fail-closed (502) because the response is already on the wire.

This is fine for maven (sha1 is the load-bearing blocking sidecar,
md5/sha256/sha512 are observability-only). It is NOT fine for adapters
whose ONLY sidecar is a non-blocking-default algo:
  - composer  → .sha256 only
  - go        → .ziphash (sha256) only
  - pypi      → .sha256 + .md5 + .sha512 (all non-blocking by default)

Switch those three to the 6-arg writeAndVerify(...) overload with
Collections.emptySet() so every supplied sidecar becomes load-bearing.
The API was designed for exactly this case (see NON_BLOCKING_DEFAULT
javadoc: "If a deployment requires strict .md5/.sha256/.sha512
blocking, ... pass Collections.emptySet() to make every supplied
sidecar load-bearing.").

Test coverage that was failing pre-fix and now passes:
  - composer: CachedProxySliceIntegrityTest.sha256Mismatch_rejectsWrite
  - pypi:     CachedPyProxySliceIntegrityTest.sha256Mismatch_rejectsWrite
  - go:       CachedProxySliceIntegrityTest.ziphashMismatch_rejectsWrite

These tests were pre-existing (reproduce on 3163387, the Phase 14a
checkpoint) — the bug predates v2.2.0 PMD work; the failing tests
simply lacked the wiring that would make their assertions actionable.
…System Settings

Three bugs / UX issues fixed in one pass:

1. **PATCH preflight rejected by CORS** — every save on the runtime
   tunables page was failing with a CORS error after a 204 OPTIONS
   preflight. Cause: AsyncApiVerticle's Access-Control-Allow-Methods
   listed GET/POST/PUT/DELETE/HEAD/OPTIONS but not PATCH, and
   /api/v1/settings/runtime/:key is the only PATCH endpoint we expose.
   Add PATCH to the allowed-methods list.

2. **Performance Tuning was a separate page** — the runtime tunables
   (HTTP/2 client + pre-fetch) lived under /admin/performance-tuning,
   duplicating navigation and creating a "two settings places" UX.
   Fold them into SettingsView as two new cards: "HTTP/2 Upstream
   Tuning" and "Pre-fetch". Delete the standalone view, drop the
   sidebar entry, and turn the legacy /admin/performance-tuning route
   into a redirect to /admin/settings so old bookmarks still work.

3. **Two ambiguous "Circuit Breaker" sections** — SettingsView had
   "Circuit Breaker" (resilience4j-style upstream-failure breaker) and
   PerformanceTuning had three prefetch.circuit_breaker.* keys (drop-rate
   breaker on the dispatcher). Same word, different breakers. Rename the
   first to "Upstream Failure Circuit Breaker" with a subtitle that
   explicitly contrasts it against the pre-fetch breaker, and put the
   pre-fetch breaker keys inside the Pre-fetch card with explicit
   "Drop-rate breaker:" prefixes on each label.

4. **Drop the source ('db' / 'default') Tag chips** — they cluttered
   every row without telling ops anything actionable. Keep the underlying
   source field on the row (used to gate visibility of the "Reset to
   default" button) but stop rendering the tag.

Plumbing: extracted runtime-settings state into a useRuntimeSettings
composable so SettingsView gets the same dirty/save/reset behaviour
without inlining ~150 lines of state machinery.

All 78 UI tests pass. Server SettingsHandlerRuntimeTest (13 tests) also
green — endpoint contract unchanged.
…cture)

Diagnosed and fixed the 429 storm from Maven Central (and the symmetric
packagist.org pattern in composer): every cache hit was firing an
upstream HEAD via the cooldown inspector's network-fallback chain.

Track 1 — 429/4xx propagation. 4xx (non-404) propagates verbatim with
Retry-After preserved; 5xx and connection errors collapse to 503 with
stale-serve attempt. 404 stays the only status cached negatively.

Track 2 — ArtifactIndexCache surgical invalidation. L1 Caffeine + L2
Valkey with positive/negative subkeys (artifact-index-positive,
artifact-index-negative). recordUpload/recordDelete invalidate only the
affected key cluster across the cluster bus, not the whole cache.
Settings under caches: in pantera.yml.

Track 3 — Integrity. Atomic primary+sidecar commit order flipped:
sidecars land before the primary so readers never observe a primary
without its sha1. Maven CachedProxySlice constructor rejects empty
storage at startup (always-verify). GroupResolver pins the winning
member per coordinate for sequential queries.

Track 4 — Stream-through + sibling prefetch. ProxyCacheWriter gains
streamThroughAndCommit(...): tees upstream Publisher<ByteBuffer> to the
client response body AND to a verifying temp file in one pass. Client
receives first byte at upstream-first-byte time instead of after the
full body drain + .sha1 round-trip. On verification mismatch the cache
stays empty (Track 3 invariant preserved). MavenSiblingPrefetcher fires
on commit: when foo-1.0.jar lands, foo-1.0.pom is background-fetched
(and reverse).

Track 5 — zero upstream I/O on cache hit. The load-bearing fix for the
user-reported 429 storm.

  1A. Cooldown gate moved inside verifyAndServePrimary after
      storage.exists in maven; evaluateMetadataCooldown removed from
      composer's serveCachedMetadata; go was already correct.

  1B. UpstreamBody record carries upstream Headers so
      enqueueEventForWriter populates publish_date from the
      authoritative Last-Modified, not now(). Fixes the Track 4
      regression where every stream-through wrote now() and made the
      next cooldown evaluation re-resolve via MavenHeadSource.

  2A. PublishDateRegistry.Mode {NETWORK_FALLBACK, CACHE_ONLY}. The
      4-arg overload is a default that delegates to the 3-arg abstract,
      preserving lambda-implementer compatibility. DbPublishDateRegistry
      overrides the 4-arg to short-circuit on CACHE_ONLY with a
      cache_only_miss metric outcome. RegistryBackedInspector grows a
      3-arg constructor for choosing the mode.

  2B. HeadProxySlice (maven + go) accepts Optional<Storage>. Cache hit
      returns 200 + Content-Length from Meta.OP_SIZE, never touching
      the upstream client. Cache miss falls through to pre-Track-5
      pass-through. Wired in MavenProxySlice and GoProxySlice.

  2D. Cross-adapter audit confirmed no other cache-hit-on-upstream
      patterns. helm/debian/rpm/conda/nuget/conan/hex/gem are
      local-only (or pure pass-through) and clean.

  3A. SwrMetadataCache<K, V> primitive in pantera-core encapsulating
      the maven MetadataCache SWR shape: soft TTL serve cached, past
      soft serve stale + async refresh dedup'd by ConcurrentHashMap
      key set, past hard treat as miss. Counters for fresh/stale/miss
      tagged by cacheName. Adapter migrations are follow-up — primitive
      is in place.

  3B. PublishDateExtractor SPI + PublishDateExtractors registry in
      pantera-core, keyed by repo-type. Maven extractor registered in
      VertxMain at boot; CachedProxySlice.buildArtifactEvent now
      consults the registry first, falls back to extractLastModified.
      Non-maven adapters fall through to the registry NO_OP for now.

Test coverage:
  - CacheHitNoUpstreamTest (maven, 2): cache-hit on .jar/.pom is local;
    no upstream calls, no inspector calls.
  - ComposerCacheHitNoUpstreamTest (1): same for composer metadata JSON.
  - HeadProxySliceCacheFirstTest (3): HEAD on cached, HEAD on miss,
    pass-through when no storage.
  - RegistryBackedInspectorCacheOnlyTest (2): mode propagation.
  - PublishDateExtractorsTest (4): registry semantics.
  - SwrMetadataCacheTest (5): fresh-hit / soft-stale / hard-stale /
    dedup / absent-then-cached.
  - Renamed verifyAndServePrimaryBlocksEvenWhenCacheHasVersion to
    verifyAndServePrimaryCacheHitIsLocalEvenWhenBlockIsActive; new
    assertion: cooldown is NOT evaluated on cache hit.

Acceptance: with a warm cache, `mvn dependency:resolve -U` walks make
zero upstream calls for cached artifacts and HEADs. Cooldown evaluation
happens exactly once per (artifact, version) on first fetch. Mutable
per-package index metadata continues to refresh via async SWR (off the
hot path) per the user's serve-stale + background-refresh contract.

PMD clean. 1192 unit tests passing across pantera-core, maven-adapter,
go-adapter, composer-adapter, pantera-main.
…e 3C contract pin

Per-repo-type PublishDateExtractor registrations. VertxMain.start now
registers the RFC 1123 Last-Modified extractor for every header-emitting
proxy ecosystem (maven, npm, pypi, go, composer, gem) — all six upstream
registries we proxy set Last-Modified on artifact GETs, so one extractor
lambda covers all of them. Ecosystems whose publish date lives in the
response body (docker manifests, nuget catalog, hex registry) keep
falling through to the registry's NO_OP and the pre-Track-5
System.currentTimeMillis() DB-consumer fallback; adding body-aware
extractors for those is a per-adapter follow-up.

Phase 3A "migrate composer/pypi/go onto SwrMetadataCache" — DELIBERATELY
NOT DONE. The audit finding that motivated the migration was wrong:
those adapters' cache-hit paths use CacheTimeControl.validate which
queries storage.metadata (Postgres-backed, cross-process), not upstream.
Replacing with SwrMetadataCache's in-process Caffeine timestamps would
trade cross-process consistency for in-process speed — a net regression
in multi-instance deployments where two nodes would drift their
freshness independently. The existing patterns are correct.
SwrMetadataCache remains the canonical primitive for future adapters
without a storage-backed metadata layer.

Phase 3C "deletion of MavenHeadSource et al." — DELIBERATELY NOT DONE.
The PublishDateSource implementations remain load-bearing on the
cache-miss branch (NETWORK_FALLBACK mode). Deleting them would force
the cooldown evaluator to fail open on every freshly-published version
the first time it is requested — defeating the "block fresh versions
even for the first asker" guarantee. The Phase 1A "no upstream on cache
HIT" invariant does not require deletion of the cache-miss fallback;
one upstream HEAD per genuinely-new (artifact, version) is amortised
over the artifact's entire lifetime in cache.

Phase 3C contract pinned via two new tests in DbPublishDateRegistryTest:
- cacheOnlyModeNeverInvokesSource: L1+L2 miss + CACHE_ONLY returns
  empty without firing the source.
- cacheOnlyAfterNetworkFallbackHitsL1OrL2: first call NETWORK_FALLBACK
  populates L2; subsequent CACHE_ONLY hits (even across a fresh
  DbPublishDateRegistry instance simulating a JVM restart) read L2 with
  zero source calls.

Admins wanting strictly-zero upstream HEAD traffic at the cost of
first-asker cooldown can already opt out per-repo with
cooldown.enabled: false. No new toggles.

Test count: 1194 passing (up from 1192). PMD clean.
Cold-cache mvn dependency:resolve -U log review found exactly one error
type repeated 80+ times per build: "Maven POM parse failed: Unexpected
character 'P' (code 80) in prolog [1,2]". 'P' (0x50) at byte 1 is the
ZIP magic byte — PrefetchDispatcher was routing every Maven cache write
through MavenPomParser regardless of file extension, so every .jar (a
binary ZIP) triggered a guaranteed-failed XML parse plus a temp-file
snapshot copy plus a WARN log plus an executor slot consumed.

Fix: PrefetchParser SPI grows an appliesTo(String urlPath) default
method returning true. MavenPomParser overrides to path.endsWith(".pom").
PrefetchDispatcher.onCacheWrite gates on appliesTo BEFORE the snapshot
copy and executor hand-off, so non-applicable writes are a complete
no-op on the cache-write callback path.

Tests:
- MavenPomParserTest.appliesToFiltersOutNonPomPaths: .pom matches; .jar,
  .war, .module, null all filtered.
- PrefetchDispatcherTest.onCacheWrite_skipsParser_whenAppliesToReturnsFalse:
  .jar write does not invoke the parser; subsequent .pom write does.

Other parsers (NpmPackageParser, NpmCompositeParser, NpmPackumentParser)
inherit the default-true behaviour and are unchanged.

16/16 prefetch tests green. PMD clean.
Five RCAs from deep perf analysis of cold mvn through maven_group
(`dependency:resolve -Dartifact=org.codehaus.mojo:sonar-maven-plugin:4.0.0.4121 -U`,
9.58 s direct vs 38-39 s through Pantera). RCA-1 was attempted as a
group-layer SingleFlight + body-buffer dedup and reverted — it broke
streaming and the real cause turned out to be at the proxy slice layer.
RCA-7 (4xx-collapse split) was attempted and reverted — the group
fallthrough chain depends on 4xx → 404, so the split broke mvn entirely
under upstream rate-limiting. Both findings are documented in CHANGELOG
as deferred follow-ups.

RCA-3: delete MavenSiblingPrefetcher. Single-thread executor with
  unbounded queue draining at 5-10/sec, fired on every primary commit,
  could not win the 10-50 ms race against mvn's matching foreground
  request. Net measurable benefit: zero. Latent OOM via unbounded queue.

RCA-4: demote LoggingAuth INFO → DEBUG on success. The two-tier auth
  stack fired 1062 INFO entries per cold mvn run (534 requests × 2
  tiers wrapped by LoggingAuth on each tier). Pure log noise.

RCA-5: VertxMain sweeps orphan `tmp-<uuid>` dirs older than 1 h under
  vertx.cacheDirBase on startup. Production carried 11 353 orphans
  (4.1 GB) — slowed FileStorage I/O and exhausted the workstation
  /private/tmp during this perf session.

RCA-6: GroupResolver.tryNextSequentialMember now logs each 404 / non-2xx
  fallthrough with member name, status, and url.path. Previously silent —
  blocked initial RCA-1 diagnosis (no way to tell whether maven_proxy
  served, 404'd, or errored before groovy got the request).
Speculative prefetch (PrefetchDispatcher) fires N upstream GETs per
successful primary cache write (one per direct dep parsed from the cached
POM/packument), recursively. With per-host concurrency 16 (Maven) / 32
(npm) and no requests-per-second cap, the subsystem multiplies cold-walk
upstream RPS several times above the foreground client's request rate.

CHANGELOG RCA-7 and Track 5 entries already document Maven Central
returning 429 against test workstation IPs once the subsystem was enabled.
The user-reported May 7-8 throttling reproduces the same symptom against
production.

Flipping the default to false makes prefetch opt-in per repo. Operators
who want it must add `settings.prefetch: true` to the repo config; the
admin UI still exposes the toggle. The PrefetchDispatcher / Coordinator
/ Parser code is untouched — only the default for new repos changes.

See analysis/03-findings.md finding #1 for the full evidence + the
long-term fix (replace speculative prefetch with observed-coordinate
pre-warming from the artifact event stream).
…#2, #10)

BaseCachedProxySlice.fetchAndCache called client.response(...) per
caller, then collapsed only the cache-write step through SingleFlight.
N concurrent client requests for the same uncached path produced N
upstream calls instead of 1 — the dedup helped reduce duplicate cache
writes but did nothing for upstream rate-limit budget.

Refactor: leader/follower pattern matching what GroupResolver
.proxyOnlyFanout and MavenGroupSlice.mergeMetadata already do. The
SingleFlight gate now wraps the upstream fetch + cache-write
combination. The leader runs fetchAndCacheLeader (full upstream +
cacheResponse). Followers park on a CompletableFuture<Void> gate and,
on leader completion, re-enter cacheFirstFlow which hits the
freshly-warm cache.

Field type changes from SingleFlight<Key, FetchSignal> to
SingleFlight<Key, Void> to match the new gate semantics — the gate's
terminal value is irrelevant; followers consult the cache directly on
re-entry. FetchSignal is still used internally inside the leader's
chain (cacheResponse → signalToResponse) for the same purpose as before.

Behaviour:
  - Leader 200: followers re-enter, cache hit, return cached bytes,
    zero extra upstream calls.
  - Leader 404: handle404 populates the NegativeCache before the gate
    completes, so followers short-circuit on re-entry.
  - Leader 5xx / exception: cache stays empty, gate still completes,
    followers retry — same upstream cost as no-dedup, no per-caller
    amplification during the leader's in-flight window.

Also updates the class Javadoc and docs/developer-guide.md §7.1 to
reflect the actual pipeline (Finding #10): step 6 now describes the
single-flight gate around the upstream fetch, instead of the previous
misleading "deduplicated upstream fetch" wording that did not match
the pre-fix implementation.

Verified by running BaseCachedProxySlice*Test (34 tests, all green) —
in particular the BaseCachedProxySliceDedupTest property "N concurrent
callers produce exactly one cache write" still holds, and the
"Cache hit" log lines from worker threads confirm followers are
re-entering cacheFirstFlow and serving from the warm cache.

See analysis/03-findings.md findings #2 and #10 for full evidence.
…torm

Phases 0-6 of the senior-staff investigation requested by the operator,
landed as discrete files under analysis/ so each phase remains
addressable.

- 00-mental-model.md  — actual code path of a cold-miss Maven request
  through GroupResolver → CachedProxySlice → ProxyCacheWriter →
  PrefetchDispatcher, with every pool / lock / cache enumerated and
  observability gaps named.
- 01-reproduction.md  — static request-amplification math (~5× for a
  typical Spring/sonar POM tree), reconciled against the team's
  cold-bench-10x (13 s on 2026-05-05) and the CHANGELOG RCA-7
  measurement (38 s on 2026-05-13).
- 02-diff-triage.md   — verdict per relevant commit since master.
  The prefetch subsystem (May 4-5) is the principal new outbound
  generator; Track 5 fixes the cache-HIT case but not the cold-MISS
  case.
- 03-findings.md      — the 10 classified findings with full
  FINDING template (category, evidence, problem impact, severity,
  confidence, short-term fix, long-term fix). Finding #1 is the
  shared root cause that explains both throttling and slowness.
- 04-rebuild-plan.md  — what to delete (prefetch subsystem; eager
  .sha1; .sha1 we already compute), the new request lifecycle, and
  the migration order for landing the fixes.
- 05-perf-harness.md  — the regression-prevention test plan with
  Toxiproxy + fixture upstream, prometheus-gated CI, integration test
  for the single-flight upstream property.

Each commit on this branch addresses one finding and references it by
number in the commit message.
… fixes

Captures the verbatim source diff for findings #1 (RepoConfig default)
and #2 (BaseCachedProxySlice single-flight placement), plus the test
results that confirm:

- BaseCachedProxySliceDedupTest passes after the refactor (4/4) — the
  "N concurrent callers produce exactly 1 cache write" property still
  holds with the leader/follower gate placement.
- RepoConfigTest passes after the prefetch default flip (21/21) — the
  three explicit-precedence tests still correctly override the new
  false default.

Stored under analysis/evidence/ so reviewers can verify the fixes
without re-running the build.
Plan ready for review at analysis/plan/v1/PLAN.md.

Incorporates user's two approval changes:
  1. W4 prefetch: full deletion across every adapter (not keep-on-opt-in).
     The "(i) keep with governance vs (ii) delete" choice in v1.0 of the
     plan is removed; (ii) is the only path. New milestone M2 is dedicated
     to the deletion, landing strictly after M1's observability foundation
     so we can verify caller_tag="prefetch" drops to zero post-deploy.
  2. All fixes explicitly cross-adapter. Added Part A.1 — a per-workstream
     × per-adapter coverage matrix. The user's intent: avoid the same
     regression resurfacing in npm/composer/go/etc. after a Maven-only
     fix. Matrix uses ↓/✓/audit/— to pin coverage.

Headline confidence (post-revision):
  Problem 1 resolved to target:        85% (+5 vs initial)
  Problem 2 resolved to target:        72% (+2 vs initial)
  Both problems resolved to target:    68% (+3 vs initial)

Revision raises confidence because (a) full deletion eliminates the
amplification source's code path entirely — no flag to flip back wrong;
(b) cross-adapter scoping rules out the regression resurfacing under a
different ecosystem name.

Milestones now: M1 observability, M2 prefetch deletion, M3 rate limit,
M4 single-flight + .sha1, M5 cooldown + conditional + real-Maven-Central
gate, M6 1000 req/s SLO. M2 is flagged as the one-way milestone in the
plan (DB migration is forward-only; deletion blast radius is large).

Awaiting user "Approved" before starting implementation of M1.
Adds workstream W6 covering the team's deferred RCA-1 + RCA-7 from
CHANGELOG line 21-58, per user direction 2026-05-13. Greenfield
authorisation removes the "load-bearing assumption" excuse the team
used to defer these.

W6 deliverables:
  - R7a: differentiate upstream non-2xx in CachedProxySlice.
    fetchVerifyAndCache exceptionally handler (404/410 → notFound,
    429/503 → propagate, 5xx → badGateway with status preserved).
  - R7b: ArtifactIndexCache negative-cache write guarded — only
    fires on terminal upstream-true-404, not collapsed 4xx.
  - R1a: GroupResolver.tryNextSequentialMember distinguishes
    upstream-true-404 from member-error; non-404 propagates verbatim.
  - R1b: 5xx falls through ONCE to next member but never writes
    negative-cache; final result is 502 if no 200 found.
  - New GroupResolverStatusFidelityTest integration test (3
    scenarios: true 404 fallthrough, 429 with breaker, 502 single
    fallthrough).

W6 lands in M5 (alongside W5), behind a status_fidelity.enabled
feature flag for the M5 observability window. The flag mitigates
the R2 risk (the team's first RCA-7 attempt was reverted because
it broke mvn); rollback is a flag flip rather than code revert.

Confidence raised:
  Problem 1: 85 → 88 (+3)
  Problem 2: 72 → 78 (+6); target tightened ≤15s → ≤13s p50
  Both:      68 → 74 (+6); R2 retires from residual-risk list

Out-of-scope section updated to mark RCA-1 + RCA-7 as folded in
(strike-through preserved for traceability). Cluster-wide rate
limiting, other architectural debt, and observed-coordinate
prewarming remain out of scope with explicit rationale per item.
Adds the metric scaffolding the plan's later milestones depend on for
validation. Without this, "did the fix work?" remains non-falsifiable.

New metrics:
  - pantera.upstream.requests.total{upstream_host, caller_tag, outcome}
    Incremented once per outbound request at the http-client funnel
    (JettyClientSlice.recordOutboundMetric). Outcome buckets:
    2xx / 3xx / 4xx / 429 / 5xx / timeout / connect_error / error.
  - pantera.proxy.429.total{upstream_host, repo_name}
    Isolated counter for the primary throttling alert.
  - pantera.upstream.request.duration timer with the same labels —
    feeds the upstream-latency-by-source dashboard.

caller_tag plumbing:
  - New ThreadContext key RequestContext.KEY_CALLER_TAG ("caller.tag").
  - Constants: CALLER_TAG_FOREGROUND / _COOLDOWN_HEAD / _METADATA_REFRESH.
  - bindCallerTag(tag) AutoCloseable for try-with-resources at
    non-foreground call sites (cooldown HEAD, metadata refresh).
  - currentCallerTag() reads from ThreadContext, defaults to
    "foreground" if unset.
  - JettyClientSlice snapshots caller.tag + repository.name from
    ThreadContext BEFORE request.send() — the Jetty callback may run on
    a thread that does not carry our MDC.

Prometheus rules + alerts (rules/amplification.yml):
  - pantera_upstream_amplification_ratio recording rule:
    sum(rate(pantera_upstream_requests_total[5m]))
      / clamp_min(sum(rate(pantera_http_requests_total[5m])), 1)
    per upstream_host.
  - pantera_request_to_artifact_ratio recording rule.
  - PanteraUpstream429 alert: any 429 sustained 5 min → page.
  - PanteraAmplificationRatio alert: ratio > 1.5 sustained 5 min → page.
  - rule_files glob enabled in prometheus.yml.

Status code outcome bucketing utility on MicrometerMetrics:
  - outcomeBucket(int statusCode) — coarse buckets + 429 isolated.
  - outcomeFromFailure(Throwable) — timeout / connect_error / error.

Tests:
  - RequestContextTest gains 4 new tests for bindCallerTag /
    currentCallerTag / round-trip / double-close semantics. All
    18 tests green.
  - Full http-client + pantera-core suite re-run: 110 + 1017 = 1127
    tests, 0 failures.

The recording-rule + alert YAML lives at:
  pantera-main/docker-compose/prometheus/rules/amplification.yml
and is loaded via the enabled rule_files glob in prometheus.yml.

References:
  - analysis/03-findings.md finding #8
  - analysis/plan/v1/PLAN.md milestone M1 + workstream W1
The PrefetchDispatcher/Coordinator chain fires N upstream GETs per cache
write recursively (one per direct dep in a POM / packument). With per-host
caps of 16 (maven) / 32 (npm) and no requests-per-second gate, a cold-cache
walk amplifies outbound RPS ~5× above the foreground client's rate, tripping
Maven Central's per-IP rate limiter (RCA in analysis/03-findings.md #1, #7).

Removed across the stack:

  pantera-main/.../prefetch/ — Coordinate, PrefetchTask, PrefetchMetrics,
    PrefetchCircuitBreaker, PrefetchCoordinator, PrefetchDispatcher,
    parser/ subpackage (7 files: MavenPomParser, NpmCompositeParser,
    NpmPackumentParser, NpmPackageParser, NpmMetadataLookup,
    CachedNpmMetadataLookup, PrefetchParser).

  pantera-main/.../api/v1/PrefetchStatsHandler — the 24h sliding-window
    /api/v1/prefetch/stats endpoint that read PrefetchMetrics. AsyncApiVerticle
    no longer takes the metrics ref through its constructor chain.

  pantera-main/.../settings/runtime/PrefetchTuning + CircuitBreakerTuning —
    typed snapshots whose only consumers were the deleted PrefetchCoordinator
    and PrefetchCircuitBreaker. SettingsKey enum trimmed to the three
    http_client.* keys that still have live consumers. SettingsHandler
    validateRuntime falls through any prefetch.* key (no longer in catalog).

  RuntimeSettingsCache — Snapshot trimmed to {http, raw}; prefetchTuning() and
    circuitBreakerTuning() accessors removed.

  RepoConfig.prefetchEnabled + RepositorySlices.{prefetchEnabledFor,
    upstreamUrlOf, repoTypeOf, npmProxyStorages} — accessors whose only
    consumer was the dispatcher.

  VertxMain.installPrefetch — boot wiring (~190 LOC) deleted; the
    PrefetchCoordinator/Dispatcher shutdown blocks removed; field declarations
    replaced with an M2 comment. The CacheWriteCallbackRegistry.clear() call
    is kept so a future Phase 4c observed-coordinate prewarming hook can
    install a consumer without leaking it across restarts.

  NpmProxyAdapter — NpmCacheWriteBridge removed; the NpmProxy ctor now
    receives null for cacheWriteHook + packumentWriteHook. The hook surface
    on NpmProxy is retained for the same future-prewarming reason.

DB:
  V128__drop_prefetch_settings_keys.sql — DELETE FROM settings WHERE
    key LIKE 'prefetch.%'. Removes any rows the v2.1.x SettingsBootstrap or
    admin PATCHes left behind so the /settings/runtime listing does not
    surface dangling keys with no consumer.

UI:
  Deleted: PrefetchPanel.vue (+ test), api/prefetch.ts.
  RepoEditView no longer mounts PrefetchPanel; settings.prefetch read/write
    logic removed.
  SettingsView Pre-fetch card deleted; PREFETCH_KEYS / RUNTIME_INT_RANGES /
    RUNTIME_LABELS / RUNTIME_HELP slimmed to the three http_client.* keys.
  api/runtimeSettings.ts: RuntimeSettingKey union and SPEC_DEFAULTS trimmed;
    test rewritten to match.
  Upstream-failure circuit-breaker card's subtitle no longer references the
    deleted pre-fetch drop-rate breaker (the two were always distinct).

Cache pipeline (preserved):
  BaseCachedProxySlice/ProxyCacheWriter retain the onCacheWrite hook surface
    backed by CacheWriteCallbackRegistry's NO_OP sentinel. Javadoc updated
    to call out the prefetch consumer's removal and reserve the slot for
    Phase 4c (2.3.0).

Tests:
  RuntimeSettingsCacheTest, SettingsKeyTest, RepoConfigTest,
  SettingsHandlerRuntimeTest, RepositoryHandlerTest, BaseCachedProxySliceHookTest,
  ProxyCacheWriterHookTest — prefetch-specific assertions deleted; remaining
  assertions still cover the foreground behaviour they shipped to pin.
  pantera-main unit tests pass (22 in the impacted set); pantera-core hook
  tests pass (11).

Scope per analysis/plan/v1/PLAN.md M2 + user's 2026-05-13 explicit greenfield
authorization for v2.2.0 major-version cleanup.
… gate

Adds a structural fix for the dominant amplification source the v2.2.0
investigation identified (analysis/03-findings.md #3, #7, #9 + RCA-7):
Pantera had no requests-per-second cap on its outbound traffic, so any
adapter could push past the per-IP budget Cloudflare-fronted registries
(Maven Central, npm public) enforce.

The new module wraps every per-host Jetty client slice — for every
adapter, for every caller_tag — with a token bucket plus a 429-and-
Retry-After gate. The bucket caps steady-state RPS; the gate fail-fasts
during the back-off window after upstream throttles us.

New module: http-client/.../ratelimit/

  RateLimitConfig
    - Per-host config: refill rate (tokens/sec) and burst capacity
    - Defaults:
        repo1.maven.org    20 req/s burst 40   — Cloudflare per-IP budget
                                                  starts 429-ing ~25-30 req/s
        registry.npmjs.org 30 req/s burst 60   — npm's CDN tolerates more
        any other host     10 req/s burst 20   — conservative default
    - Builder lets the perf harness inject test configs without touching
      production defaults

  UpstreamRateLimiter (interface + Default impl)
    - Per-host Bucket state via ConcurrentHashMap + AtomicReference CAS.
      O(1) hot-path; no locks.
    - tryAcquire(host): consumes a token if the gate is open and the
      bucket has > 1.0 tokens. Returns false in either failure mode so
      the caller can fail-fast.
    - recordRateLimit(host, retryAfter): closes the gate for retryAfter
      (defaults to 30 s when retryAfter is zero — for 429s with no
      header). Concurrent close attempts keep the LATER deadline so a
      burst of 429s does not shrink the window.
    - recordResponse(host, status, retryAfter): 429 always gates;
      503 only gates with Retry-After (503 without is treated as a
      transient server error, not a throttle signal).
    - gateOpenUntil(host) exposes the deadline so foreground responses
      can carry the right Retry-After through to the client.

  RetryAfter
    - Parses both RFC 7231 forms: delta-seconds and IMF-fixdate.
    - Malformed / blank / null → Duration.ZERO.
    - Past HTTP-date → Duration.ZERO (a deadline in the past is not a
      forward delay).

  RateLimitedClientSlice
    - Decorator that wraps any Slice (placed by JettyClientSlices around
      every per-host JettyClientSlice). Per outbound:
        1. Inspect the gate. Closed → synthesise 429 + Retry-After
           pointing at the gate deadline, do NOT call the wrapped slice.
        2. Otherwise tryAcquire. Empty bucket → synthesise 429 +
           Retry-After 1 s (the bucket refills continuously; the next
           attempt has a token within a fraction of a second).
        3. Token acquired → delegate. On the response, check status —
           a 429 / 503-with-Retry-After closes the gate.
    - Synthesised 429s carry X-Pantera-Rate-Limited: true so future
      cluster-wide propagation and the cache slice can distinguish
      self-imposed from upstream-imposed.

Wiring: JettyClientSlices.slice() now wraps every JettyClientSlice in
the rate-limited decorator. Loopback hosts (localhost, 127.x.x.x, ::1)
bypass the limiter — they are exclusively dev / test fixtures and the
limiter would otherwise throttle the harness. A second constructor
overload accepts an explicit UpstreamRateLimiter for tests / perf
harness injection; production callers use the existing 4-arg ctor
which constructs a JVM-default limiter from RateLimitConfig.defaults().

Metric:
  pantera_outbound_rate_limited_total{upstream_host, reason}
    reason ∈ {gate_closed, bucket_empty}

  Differs from pantera_proxy_429_total: this one fires when WE deny
  the outbound; the existing 429 counter fires when the upstream denies
  us. Operators want both — non-zero of either means somebody is
  throttling somebody.

Alert:
  PanteraOutboundGateStuckClosed — warn after 10 min of sustained
  gate_closed events on a host. Means our gate's back-off window is
  not opening, i.e., the upstream is still 429-ing through our limit.
  Operator action: drop the per-host token rate.

Foreground propagation: BaseCachedProxySlice already preserves 4xx
verbatim (status + Retry-After) — verified in pantera-core line
1189-1193. The synthesised 429 flows through to mvn/npm unchanged so
those tools honour their own back-off behaviour.

Tests (14 unit, all green):

  UpstreamRateLimiterTest
    - acquiresBurstTokensWithoutWaiting — burst drain
    - refillsAtConfiguredRate — 10/s rate with TestClock stepping
    - gateBlocksDespiteAvailableTokens — gate trumps tokens
    - recordRateLimitUsesDefaultDurationWhenAbsent — zero → 30 s
    - hostsAreIndependent — gating maven does not affect npm
    - recordResponseOnlyGatesOn429 — 200/503-no-RA never gate

  RetryAfterTest
    - parses delta-seconds, parses HTTP-date, past date → 0,
      null/blank → 0, malformed → 0

  RateLimitedClientSliceTest
    - gatedRequestNeverReachesWrappedSlice
    - upstream429ClosesTheGate (second call short-circuits)
    - emptyBucketSynthesises429WithOneSecondRetryAfter

  JettyClientSlicesTest
    - shouldProduce* now asserts RateLimitedClientSlice for non-
      loopback, JettyClientSlice for localhost — pins the new wrapping
      contract.

Toxiproxy-mediated integration / perf-fixture test is M6's scope per
analysis/plan/v1/PLAN.md (perf-gate CI workflow). The unit tests cover
the core behaviour; the integration test will exercise it end-to-end
against a rate-limited stub.
Pre-M4 the Maven adapter's custom upstream-fetch path
(CachedProxySlice.fetchVerifyAndCache) had no request coalescing: 50
concurrent clients for the same uncached primary fired 50 independent
upstream calls. That was the dominant cold-walk amplifier after M2's
prefetch deletion (analysis/03-findings.md #2, #10).

The base-class fix (commit 21232a5b1) covered
BaseCachedProxySlice.fetchAndCache. M4 extends the same leader/follower
pattern into the Maven-specific stream-through path via a new
BaseCachedProxySlice.coalesceUpstream helper.

Why the helper is shaped the way it is. The Maven path uses
ProxyCacheWriter.streamThroughAndCommit which returns a Response whose
body is still streaming when the future resolves — the actual cache
commit lands later, signalled by the StreamedArtifact's
verificationOutcome. If we completed the SingleFlight gate when the
leader's response future resolved (the natural "leader done" hook),
followers would wake up against a still-empty cache, re-enter
verifyAndServePrimary, miss the storage check, and become new leaders
themselves — each firing its own upstream call. The integration test
reproduced exactly this: 50 clients producing 11 upstream hits across
11 "waves" before the cache eventually caught up.

coalesceUpstream therefore takes the gate as an explicit argument
the leader closes when the cache write is fully durable. The Maven
caller hooks verificationOutcome:

    artifact.verificationOutcome()
        .whenComplete((r, e) -> singleFlightGate.complete(null));

On exceptional completion of the leader's response future, the gate
is force-closed (defensive — release followers to retry rather than
park forever on a leader that died before signalling).

New file:
  pantera-core/.../BaseCachedProxySlice.coalesceUpstream(key,
    leaderFetch, followerLookup) — protected helper for subclasses with
    custom stream-through paths. Reuses the same per-key SingleFlight
    that fetchAndCache already owns, so a future Maven .pom and a
    BaseCachedProxySlice .jar miss with the same cache key DO share the
    same flight (correct: the cache key IS the de-dup identifier).

Maven CachedProxySlice changes:
  verifyAndServePrimary on cache-miss now routes through
    coalesceUpstream. fetchVerifyAndCache signature gains an explicit
    singleFlightGate the leader completes on verificationOutcome.

Test:
  CachedProxySliceUpstreamDedupTest — 50 concurrent clients for the
    same uncached Maven primary; asserts exactly one upstream primary
    GET and one .sha1 GET. Slow upstream (150 ms artificial sleep) so
    followers really do park on the leader's gate. Bodies are drained
    in the test client to mirror production's Vert.x server-side
    subscription — without that, verificationOutcome never fires and
    the gate would stay closed. Test exposes the production
    invariant explicitly so future regressions surface.

The eager .sha1 elimination (W3c) remains in scope for M5 — it requires
reworking the sidecar serve path so a client-requested .sha1 against a
cached primary is generated locally rather than re-fetched.

130 maven-adapter tests + 1021 pantera-core tests green.
Two coupled changes per analysis/plan/v1/PLAN.md M5: stop poisoning the
group / index-cache on transient upstream errors, and stop the
per-first-fetch HEAD against Maven Central that was a meaningful
contributor to Pantera's outbound amplification once prefetch was
removed (M2) and rate-limited (M3).

W6 R7a — status-code fidelity in CachedProxySlice.fetchVerifyAndCache:

  Pre-W6 the exception handler collapsed every upstream 4xx to a local
  404, masking:
    - 429 (rate-limit) → looked like "doesn't exist"
    - 401 / 403 (auth) → looked like "doesn't exist"
    - 503-with-Retry-After (cooldown) → looked like "doesn't exist"
  and poisoning ArtifactIndexCache for transient throttles. Pre-W6 5xx
  surfaced as 503 verbatim, which RaceSlice treats as a "winning"
  response — stopping the group walk even when another member could
  have answered.

  Post-W6 (mapUpstreamStatus):
    404, 410       → notFound() (group fallthrough continues)
    429            → 429 + Retry-After preserved (M3's gate honours)
    503 + RA       → 503 + Retry-After (transient upstream cooldown)
    503 no RA      → 502 badGateway (group fallthrough)
    401 / 403      → propagated verbatim (authoritative auth)
    5xx            → 502 badGateway (group fallthrough; no index-
                     cache poisoning)
    timeout / SSL  → 502 badGateway (transient infrastructure)

  UpstreamHttpException grew a `retryAfter` field so the handler can
  propagate the upstream header verbatim instead of fabricating a value.

W5b — first-fetch cooldown HEAD is now opt-in:

  Pre-M5 every cooldown evaluation against a fresh (artifact, version)
  pair fell through L1 (Caffeine) + L2 (DB) and fired a HEAD via
  MavenHeadSource. Cold-walk amplification: 50 newly-resolved versions
  → 50 extra HEADs to the same upstream we are about to GET anyway,
  each subject to the same per-IP throttling.

  Track 5 Phase 1B already populates publish_date from the primary
  GET's Last-Modified on cache-write. From the SECOND fetch onwards
  the publish-date is in the DB → no HEAD. The remaining first-asker
  HEAD is now skipped by default in 2.2.0.

  VertxMain registers `maven`/`gradle` publish-date sources only when
  PANTERA_PUBLISH_DATE_HEAD_FALLBACK_ENABLED=true (default false).
  When unset the source map omits maven/gradle entirely, so
  DbPublishDateRegistry's L3 lookup returns Optional.empty() and the
  cooldown gate allows the first fetch through. Trade-off (per PLAN
  option B): the first asker of a freshly-published blocked version
  downloads the bytes before the cooldown evaluator catches it on the
  next request.

  Operators who need strict first-fetch enforcement re-enable via the
  env var; the source is still wired and tested (MavenHeadSourceTest,
  MavenHeadSourceLiveTest unchanged).

Test:
  CachedProxySliceStatusFidelityTest — 7 cases pin the fidelity matrix:
    404 → 404, 429 + Retry-After preserved, 401 propagates, 403
    propagates, 503+RA propagates, 503 without RA collapses to 502,
    5xx (500) → 502. 410 is read-only at code level (RsStatus enum has
    no entry; Maven Central never returns it).

  137 maven-adapter + 13 publish-date tests green.

Out of scope for M5 (follow-up in 2.3.0):
  W5a ConditionalRequestSlice — requires per-host (URL → Last-Modified)
    Caffeine cache + 304 translation. Largest impact on
    maven-metadata.xml / packument refresh paths; M5 already eliminates
    the HEAD-shaped equivalent (W5b). Wiring a full conditional layer
    is a separable 1-2 day effort.
  W6 R1a/R1b GroupResolver — propagating 5xx vs 4xx through
    tryNextSequentialMember requires re-threading Fault.UpstreamServerError
    through the group fanout. R7a's local fidelity is a prerequisite
    (now landed); R1a/R1b is a follow-up.
CI-side enforcement of the three production paging signals shipped in
M1's recording rules (amplification.yml) so any PR that regresses the
fix surfaces immediately rather than waiting for a production page:

  pantera_proxy_429_total                          > 0   → FAIL
  pantera_outbound_rate_limited_total
      {reason="gate_closed"}                       > 0   → FAIL
  pantera_upstream_amplification_ratio             > 1.5 → FAIL

Files:

  scripts/perf-gate-check.sh
    Scrapes a running Pantera's /metrics/vertx, sums the relevant
    series across all label combinations (so a regression that adds a
    new caller_tag still triggers), and exits non-zero on any
    invariant breach. Uses awk for parsing — no Python / jq
    dependency. Verified end-to-end against synthesised clean / dirty
    metrics fixtures (pass + fail paths both emit the expected
    diagnostics).

  .github/workflows/perf-gate.yml
    Triggers on PRs to master / 2.2.0 that touch pantera-core,
    pantera-main, maven-adapter, http-client, the performance harness,
    or the gate script itself. Boots the existing scaling-benchmark
    docker-compose (WireMock-fronted upstream + Pantera SUT), runs a
    short 60 s / 20 VU k6 ramp to populate the counters, then invokes
    perf-gate-check.sh. On failure: uploads Pantera + WireMock logs as
    artifacts for the on-call to triage.

The full scaling matrix (6 cells × ~20 min each = ~2 h) is out of
scope for the gate — it runs nightly via the existing
docker-compose-scaling.yml harness. The gate is the per-PR guard rail.

Toxiproxy-mediated 429 injection (originally scoped for M3's
RateLimitedClientSliceIT) remains follow-up work: the unit tests +
this gate together cover the structural invariants, and a Toxiproxy
fixture is best added when the next ConditionalRequestSlice (W5a)
work lands so both share the same fixture investment.
The 2.2.0 section had drifted from the project's established format
through a long series of incremental doc commits: duplicate version
headers, internal RCA notes, deprecated metric names, and detailed
performance numbers that no longer reflect current behaviour
(prefetch / npm-bridge subsystems have since been removed).

Replaced with a single 2.2.0 entry matching the prior format
(Breaking changes / New features / Performance / Bug fixes /
Security), describing only behaviour that is actually in the shipped
code. Internal implementation detail, per-class commentary, and
revertedwork are out — this changelog is a public-facing release
note, not a development journal.
aydasraf added 2 commits May 13, 2026 16:55
…ack literal

JettyClientSlices.isLoopback compares the host against the IPv6 loopback
literal as part of its rate-limit-skip check. The literal is the value
under test, so the rule cannot be satisfied by refactoring — a per-line
NOPMD suppression matches the existing convention used in
NegativeCacheConfig, CacheConfig, and FilteredMetadataCache.
CachedProxySlice translated a non-2xx upstream status with nested ifs
("status == 503" outside, "retryAfter present" inside), which PMD
flagged as CollapsibleIfStatements. Merging the conditions into a
single guard preserves the original semantics; the explanatory comment
block moves up one level and still applies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant