Conversation
aydasraf
added a commit
that referenced
this pull request
Apr 16, 2026
…I-02, WI-03)
Refreshes the three release artefacts produced by the final
end-to-end reviewer after the Wave 3 commits landed on 2.2.0:
CHANGELOG-v2.2.0.md (144 L)
Adds Wave 3 entries to Highlights / Added / Changed /
Deprecated / Under-the-hood. Version-bump, BaseCachedProxySlice
SingleFlight migration, pypi/go/composer ProxyCacheWriter
wiring, RequestContext expansion + Deadline + ContextualExecutor,
StructuredLogger 5-tier + LevelPolicy + AuditAction, and the
@deprecated MdcPropagation status — all documented with forensic
and architecture-review section refs.
docs/analysis/v2.2.0-pr-description.md (174 L)
PR #34 body; WI checklist now shows 8 shipped / 6 deferred;
test-run evidence 3,432 tests green; five PR-reviewer focus
points (remaining MdcPropagation callers, lost user_agent sub-
field parsing, audit-logger suppressibility gap in log4j2.xml,
DbIndexExecutorService submit()-path bypass, four-adapter
"any exception → 404" swallow inherited from Maven).
docs/analysis/v2.2-next-session.md (399 L)
Refreshed agent-executable task list. Removes the four
shipped items (WI-post-05, WI-post-07, WI-02, WI-03). Keeps
WI-04 / WI-06 / WI-06b / WI-08 / WI-09 / WI-10 in the same
Goal / Files / Tests / DoD / Depends-on shape. Adds four
WI-post-03 follow-ups surfaced during Wave 3:
a. Hoist DbIndexExecutorService to pantera-core/http/
context/ContextualExecutorService.
b. Re-lift user_agent.name / .version / .os.name parsing
into StructuredLogger.access.
c. Unify the ~110 remaining MdcPropagation call-sites
after WI-06 + WI-08 + the Vert.x-handler migration,
then delete MdcPropagation.java.
d. Migrate 11 Vert.x API handlers (AdminAuth, Artifact,
Auth, Cooldown, Dashboard, Pypi, Repository, Role,
Settings, StorageAlias, User) to a ContextualExecutor-
wrapped worker pool — the single biggest MdcPropagation
debt.
Adds one new concern:
C6. Audit logger inherits log-level config from
com.auto1.pantera parent — §10.4 declares audit as
"non-suppressible" but log4j2.xml has no dedicated
block. Five-line fix tracked separately.
Review verdict: PASS. Every §12 DoD met. Every commit conforms
to type(scope): msg, zero Co-Authored-By trailers across all 11
new commits (verified via git interpret-trailers --only-trailers).
3,432 tests green across pantera-core / pantera-main / every
touched adapter module.
…load-bearing
CooldownHandler.blocked():
- Deferred crs.listAll() (JDBC) and policy.getPermissions(authUser)
(CachedYamlPolicy falls back to blocking storage on cache miss) into
the CompletableFuture.supplyAsync closure. Only AuthUser extraction
from the routing context — cheap, in-memory — stays on the event loop.
- Matches the 2.2.0 HandlerExecutor discipline: no blocking I/O on
vert.x-eventloop threads.
CooldownHandlerFilterTest.requestRaw():
- Wrapped the onSuccess assertion callback in ctx.verify(...) so
AssertionError is routed to failNow instead of being swallowed and
masquerading as a TEST_TIMEOUT. Without this, a broken assertion in
any of the 5 non-auth tests would produce a silent 10s TimeoutException
instead of a diagnostic failure. Verified by temporarily injecting
assertEquals(42, 43) — the failure now surfaces as
"expected: <42> but was: <43>" in ~1s.
Wires CooldownCleanupFallback into VertxMain startup. After RepositorySlices is constructed (which triggers CooldownSupport.loadDbCooldownSettings so settings.cooldown() is authoritative), probe pg_cron via PgCronStatus. If the scheduled cleanup job is missing, construct a local CooldownRepository + start the fallback bound to the shared Vertx instance. On shutdown (before vertx.close()), cancel both timers. Placed at the bootstrap level — not inside AsyncApiVerticle — so it runs once per process rather than once per deployed verticle instance (AsyncApiVerticle is deployed with 2x CPU instances).
…sion Adds GET /api/v1/cooldown/history, a paginated read over the artifact_cooldowns_history archive. Mirrors /cooldown/blocked in shape (repo + repo_type + search + sort query params, paginated envelope) but returns the archive-specific fields (archived_at, archive_reason, archived_by) and sorts by archived_at by default. Introduces ApiCooldownHistoryPermission as a separate API-level gate so operators can grant live-blocked visibility without exposing the who-unblocked-what archive. The handler additionally scopes rows to repos the caller has AdapterBasicPermission(repo, "read") on, mirroring the two-layer permission model from the blocked endpoint. - new ApiCooldownHistoryPermission + ApiCooldownHistoryPermissionFactory (mask-based, same package as ApiCooldownPermission — auto-discovered by PermissionsLoader via @PanteraPermissionFactory) - CooldownHandler.history() uses the SQL-pushed findHistoryPaginated / countHistory from Task 3 - CooldownHandlerHistoryTest: 5 tests covering 403 without history perm, per-repo scoping, combined repo/repo_type filter, pagination total correctness, archive-field serialisation
…etConfig fields, V121 CREATE EXTENSION)
Four backend fixes surfaced during smoke test of the 2.2.0 cooldown rollout:
1. V121: drop redundant CREATE EXTENSION IF NOT EXISTS pg_cron inside the DO
block. V114 already creates the extension via the same pattern; re-running
emits a NOTICE that Flyway logs at WARN. The outer
`pg_available_extensions` guard already returns early when pg_cron is
unavailable, so this DO block can assume the extension is present.
2. CooldownRepository: repo_type filter now matches both exact and
subtype-prefix (`base-*`). The UI sends base values ("docker", "npm",
"maven") while DB rows store "docker-proxy", "docker-group", etc.
Updated findActivePaginated, countActiveBlocks, findHistoryPaginated,
and countHistory to use
`LOWER(repo_type) = LOWER(?) OR LOWER(repo_type) LIKE LOWER(?) || '-%'`.
bindOptionalFilters now binds the repoType value three times.
3. Deleted ApiCooldownHistoryPermission + factory. The permission was
overengineered and never wired into the role-management UI, so admins
could not grant it and the /history endpoint was dark. Routed
/cooldown/history through ApiCooldownPermission.READ instead; the
per-repo AdapterBasicPermission row filter in the handler body is
unchanged.
4. getConfig(): include history_retention_days and cleanup_batch_limit in
the response so the SettingsView dialog can prefill the inputs on
initial page load.
Test adjustments: CooldownHandlerHistoryTest PermissionSpec simplified
(no more historyRead flag); the 403-without-permission test now asserts
the gate is ApiCooldownPermission.READ. Added
blockedFiltersByRepoTypeBase to CooldownHandlerFilterTest covering the
base -> subtype prefix match. Cooldown test suite: 60 passing (was 59).
…r labels, docker filter, history toggle gating
- Delete CooldownSettingsDialog; move history_retention_days and
cleanup_batch_limit inputs into the existing SettingsView cooldown card.
- Replace the card-grid list of cooldown-enabled repos with a compact
PrimeVue DataTable matching the blocked-artifacts table style.
- Add visible labels ("Search", "Repository", "Type") above the filter
dropdowns in CooldownView; wire proper ids/for attributes.
- Gate the history toggle on api_cooldown_permissions.read instead of
the removed api_cooldown_history_permissions (backend commit 9149d78
consolidated the permission; per-repo AdapterBasicPermission still
filters rows server-side).
- Update CooldownView.test.ts: remove stale history-permission seed
entries, split the hide/show toggle cases, delete tests for the
removed dialog.
…and role UI Revives ApiCooldownHistoryPermission (and its @PanteraPermissionFactory) that was collapsed into ApiCooldownPermission in 9149d78. Operators need to expose the live blocked list without also exposing the long-term archive, which is the reason for the separate permission. End-to-end wiring: - CooldownHandler gates /api/v1/cooldown/history on the new permission. - AuthHandler.allowedActions includes api_cooldown_history_permissions in the /me response so the frontend can reflect grants in hasAction. - RoleListView allActionsMap lists the new key so it is grantable via UI. - CooldownView.canReadHistory checks the new key (not the base cooldown read), and the test suite exercises grant/withhold of the narrower permission independently. AllPermission.implies still covers admin out of the box, so seeded admin roles see the toggle automatically.
…unified filters Replace the full-width DataTable under "Cooldown-Enabled Repositories" with a responsive tile grid (1/2/3/4 cols from mobile up to lg). Each tile shows a brand-colour dot, the repo name, type and cooldown duration, and an active-blocks footer with an eraser icon for one-click "unblock all" (gated on canWrite and active_blocks > 0). The same search/repo/type filter bar that drives the blocked-artifacts table now also filters the grid client-side, including base-type matching so "docker" covers both docker-proxy and docker-group. Client-side pagination at 12 tiles per page keeps the card compact for large cooldown fleets. Unblock-all now routes through a confirmation Dialog using the existing useConfirmDelete composable pattern (same as RepoManagementView and StorageAliasView) instead of firing immediately from a row action.
…med accents - Promote search / repo / type / mode controls to a single top-level filter bar; remove the duplicate toolbar that lived inside the Blocked Artifacts card. - Tile grid: bump to xl:grid-cols-5 and shrink repoPageSize from 12 to 10 (two rows of five). Paginator threshold follows. - Replace the inline getTechInfo() brand-color dot with the shared RepoTypeBadge component, matching RepoListView's presentation. Drop the now-unused getTechInfo import. - Switch tile background to bg-surface-card for theme-token parity with other admin cards. - Tests: update paginator threshold (>10 / <=10), add coverage for RepoTypeBadge usage and for absence of local filter UI inside the blocked-artifacts card.
…ge, drop blocked_date to fit width
…on gap Adds a cooldown-aware rewrite for the Go module proxy /@latest endpoint. When `go get <module>` runs without a pseudo-version the client hits /@latest first and never consults /@v/list, so a list-only filter left the primary unbounded resolution path unprotected. The new handler intercepts /@latest in the Go CachedProxySlice, fetches upstream, checks the returned Version against cooldown, and — when blocked — fetches the sibling /@v/list, picks the highest non-blocked version per Go semver (VersionComparators.semver(), which tolerates pseudo-versions and the leading "v"), and returns a rewritten JSON payload preserving the Origin field and clearing Time. When every version is blocked it returns 403 with the same Go-client-parseable convention as the existing GoCooldownResponseFactory. New SPI implementations (parser/filter/rewriter/detector) live alongside the existing /@v/list components and are wired through GoLatestHandler rather than the single-bundle CooldownAdapterRegistry slot — the multi-endpoint fallback is not expressible in the pure MetadataFilter SPI. /@v/list filter behaviour is unchanged. Test counts: 48 new unit tests across 5 files; full go-adapter suite 126 pass (0 regressions); pantera-main *Cooldown* suite 60 pass; pantera-core metadata/filter 57 pass.
…lly filtered GoMetadataFilter / GoMetadataParser were registered in CooldownWiring but never invoked on the serve path, so blocked versions leaked to `go list -m -versions`, `go mod download`, and MVS resolution. Introduce GoListHandler mirroring GoLatestHandler: fetch `/@v/list` upstream, parse via GoMetadataParser, evaluate every version against cooldown, apply GoMetadataFilter.filter(), and re-serialise as newline-delimited `text/plain; charset=utf-8`. Non-2xx upstreams are forwarded unchanged. An empty-after-filter list collapses to 403 with the same convention as the @latest all-blocked branch. Wire the handler into CachedProxySlice before the generic fetchThroughCache path, preserving ordering `@latest -> @v/list -> generic`. CooldownWiring's goBundle registration stays — it is now live via this handler and documented as such in the class Javadoc.
- Pep440VersionComparator.compareReleases: convert second arg to varargs - PypiMetadataRewriter: pre-size html StringBuilder to 512 chars - CachedPyProxySlice: NOPMD UnusedFormalParameter on deprecated negativeCacheTtl/Enabled overload (settings now from unified NegativeCacheConfig); drop unused 'storage' local in fetchVerifyAndCache - IndexGenerator: drop unused absoluteUrl field on Entry; NOPMD UnusedFormalParameter on prefix ctor param (reserved for absolute-URL emission) - ProxySlice: drop unused 'storage' (BlockingStorage) field + import; drop redundant cast on Optional<? extends Content> via type witness; drop unused params (user in backgroundRefreshIndex, rqheaders/info in serveArtifact, key in serveArtifactContent, line in extractBasePath, coords/release in registerRelease (no-op method removed entirely), rqheaders in checkCacheFirst and evaluateCooldownAndFetch); CollapsibleIfStatements in coordinatesFromFilename - SimpleApiFormat.fromHeaders: collapse nested if - WheelSlice.putArtifactToQueue: drop unused 'filename' param
- CacheConfig: NOPMD SystemPrintln on pre-logger config-load (server bootstrap, no logger yet) - CooldownCache: drop unused l1AllowedTtl field (used only inside ctor) - CooldownCircuitBreaker: collapse 4 nested if-CAS guards into compound conditions - FilteredMetadataCache.CacheEntry.data: NOPMD MethodReturnsInternalArray (immutable cache value, mirrors ctor's ArrayIsStoredDirectly NOPMD; callers treat as read-only) - MetadataFilterService.evaluateAndFilter: collapse blockedUntil tracking nested if - RangeSpec.isValid: SimplifyBooleanReturns - CombinedAuthzSlice.authenticate: drop unused 'line' param - BaseCachedProxySlice: drop unused 'headers' param from signalToResponse and 'key' param from handleNonSuccess; NOPMD CloseResource on FileChannel (closed by Subscriber.onComplete/onError); drop unnecessary (Response) cast - NegativeCache.nameMatches: SimplifyBooleanReturns; UseShortArrayInitializer on L2_SENTINEL - Accept.values: UseCollectionIsEmpty (RqHeaders inherits AbstractList.isEmpty) - EcsLogEvent.userAgent: replace for-loop-with-final-break with iterator-hasNext (AvoidBranchingStatementAsLastInLoop); NOPMD EmptyCatchBlock on Base64 decode failure (intentional fallthrough to Optional.empty) - StructuredLogger: NOPMD UnusedLocalVariable on 'bound' AutoCloseable handle (try-with-resources holder) and EmptyCatchBlock on bindToMdc().close() catch (impl never throws) - EcsLoggingSlice: drop unused responseSize AtomicLong + import - FileSystemArtifactSlice.cleanup: NOPMD EmptyCatchBlock on benign close error during cleanup - GzipSlice.gzip: NOPMD CloseResource on PipedInputStream (ownership transferred to ReactiveInputStream) - RangeSlice.RangeLimitSubscriber: drop unused 'upstream' Subscription field - AutoBlockRegistry: drop redundant double/long casts (single cast suffices for arithmetic promotion) - FileVersionDetector.isVersionToken: SimplifyBooleanReturns
- AuthFromDb, AuthProviderDao, AuthSettingsDao, RepositoryDao, RoleDao, SettingsDao, StorageAliasDao, UserDao, UserTokenDao: wrap ResultSet in try-with-resources (CloseResource) - RepositoryDao.value, RoleDao.remove, UserDao.alterPassword: NOPMD AvoidRethrowingException on intentional IllegalStateException rethrow that preserves the not-found marker (distinguishes from generic Exception wrap) - UserTokenDao.listByUser: bump StringBuilder pre-size to 192
…t pass - RepositorySlices: drop unused httpTuningSupplier/useLegacyHttpClientCtor fields (used only inside ctor); NOPMD CloseResource on clientSlices alias (lifecycle owned by clientLease); replace instanceof-in-catch with separate RuntimeException catch; SimplifyBooleanReturns in isProxyOrContainsProxy; drop unused 'directMembers' param from flattenMembers (4 callers updated) - VertxMain: NOPMD CloseResource on VertxSliceServer (lifecycle owned by this.servers list) - DockerProxyCooldownSlice: drop unused 'user' param from determineReleaseSync - ComposerGroupSlice: drop unused cooldownMetadata/repoType fields (assigned but never read); NOPMD UnusedFormalParameter on ctor params reserved for upcoming cooldown filtering - ApiActions: convert Action[] ctor to varargs - AsyncApiVerticle: drop unused jwt field; NOPMD UnusedFormalParameter on jwt ctor params reserved for route protection - AuthHandler: NOPMD ReturnEmptyCollectionRatherThanNull on findProvider (JsonObject is a record, not a collection); drop unused 'type' label param from allowedActions and convert checks to varargs (7 callers updated)
- CooldownHandler: SimplifyConditional (drop redundant null-check before instanceof); NOPMD EmptyCatchBlock on per-repo skip - DashboardHandler: NOPMD EmptyCatchBlock on best-effort dashboard zeroed counters - PypiHandler: drop (Void) cast via supplyAsync<Void> type witness - RepositoryHandler: drop unused 'cooldown' field; NOPMD UnusedFormalParameter on ctor param reserved for future cooldown integration - SearchHandler: CollapsibleIfStatements; NOPMD ReturnEmptyCollectionRatherThanNull on resolveAllowedRepos (null signals "unrestricted", empty would mean "deny everything") - SettingsHandler: drop unused 'manageRepo' field; NOPMD UnusedFormalParameter on ctor param reserved for future repo-settings endpoints - StorageAliasHandler: NOPMD ReturnEmptyCollectionRatherThanNull on bodyAsJson (JsonObject is a record; null signals "response already sent") - DbGatedAuth: SimplifyBooleanReturns x2 - JwtPasswordAuthFactory: pre-size pemEncodePublicKey StringBuilder - OktaOidcClient: NOPMD ReturnEmptyCollectionRatherThanNull on JsonObject returns; replace instanceof-in-catch with separate InterruptedException catch in fetchUserInfo - OktaUserProvisioning: drop redundant 'existing = null' initializer (assigned in all branches) - RsaKeyLoader: NOPMD AvoidRethrowingException on intentional IllegalStateException rethrow that preserves loader-thrown diagnostic - UnifiedJwtAuthHandler: CollapsibleIfStatements in ACCESS-token blocklist check - CooldownSupport: drop redundant (CooldownService) cast via type witness on map() - JdbcCooldownService: drop unused 'inspector' param from shouldBlockNewArtifact - ArtifactDbFactory: NOPMD CloseResource on caller-owned HikariDataSource (we only attach metrics)
…third pass - RepositorySlices: collapse RuntimeException + Error catches via multi-catch (IdenticalCatchBranches); SimplifyBooleanReturns in isProxyOrContainsProxy; NOPMD CloseResource on clientLease assignment in php-proxy case - AuthHandler.allowedActions: drop the Permission[] array literal at 3 call sites (UnnecessaryVarargsArrayCreation) - BlockedThreadDiagnostics: NOPMD EmptyCatchBlock on best-effort diagnostics catches; pre-size sb StringBuilder - GroupResolver: drop unused 'proxyMembers' field; NOPMD UnusedFormalParameter on legacy depth/timeoutSeconds (sequential-only fanout in v2.2.0) and on proxyMembers; NOPMD CloseResource on long-lived ArtifactIndex; drop unused 'memberName' param from drainBody - ApiRoutingSlice: drop unused LIMITED_SUPPORT field + Set import - MergeShardsSlice.hexLower: extract HEX_DIGITS to static field (LocalVariableNamingConventions); drop dual-loop-vars + reassignments (use i*2/i*2+1); drop unused 'versionFromPath' and 'parentDir' locals; drop unused 'baseUrl' from mergeHelmShards - BrowseSlice.renderHtml: pre-size html StringBuilder - ImportService: drop unused 'request' param from writePyPiShard and writeHelmShard; NOPMD EmptyCatchBlock on SubStorage-unwrap reflection failure; collapse redundant return-true branch in shardsModeEnabled - DbArtifactIndex: NOPMD EmptyCatchBlock on best-effort warm-up; pre-size sb in buildFilterClauses; ForLoopCanBeForeach in queryRepoCountsMultiParam - SearchQueryParser: NOPMD AvoidReassigningLoopVariables on intentional consume-peeked-token i++ - Http3Server: NOPMD CloseResource on QuicheServerConnector (lifecycle owned by Jetty Server via addConnector) - Json2Yaml: drop unnecessary (YAMLMapper) cast (configure returns same type)
…al pass - ImportService.shardsModeEnabled: collapse to single negated-equals chain (SimplifyBooleanReturns) - PrefetchCircuitBreaker.recordDrop: collapse trip-and-CAS into compound condition - PrefetchCoordinator.upstreamHost: NOPMD EmptyCatchBlock on intentional fallthrough - CachedNpmMetadataLookup.readMeta, NpmPackageParser.readManifest: NOPMD ReturnEmptyCollectionRatherThanNull on byte[] returns (payload, not collection; null vs empty[] semantics differ) - CachedDbPolicy.rolePermissions, DbUser.loadFromDb: wrap ResultSet in try-with-resources - LoggingContext: NOPMD UnusedFormalParameter on deprecated meta ctor param (kept for source-compat) - YamlSettings: NOPMD CloseResource on long-lived service holders (cachePubSub, ArtifactIndexCache); type-witness on map() to drop SyncArtifactIndexer cast; SimplifyBooleanReturns in proxyProtocol; chain DateTimeParseException as cause in negative-millis IllegalStateException; use addSuppressed(num) for the inner NumberFormatException (PreserveStackTrace) - CachedUsers.onEviction, GuavaFiltersCache.onEviction: NOPMD UnusedFormalParameter on Caffeine RemovalListener<K,V> contract params - CacheIntegrityAudit.parse: convert to varargs; NOPMD AvoidReassigningLoopVariables on intentional --root/--repo lookahead - WebhookConfig: NOPMD UnusedAssignment on record compact-ctor param assignments (assigns to record components, not stack-locals)
ProxyCacheWriter.NON_BLOCKING_DEFAULT = {MD5, SHA256, SHA512} so the
3-arg writeAndVerify() overload (the one composer/pypi/go were using)
treats every supplied sha256/md5/sha512 sidecar as a deferred,
non-blocking check. The integrity verification then runs AFTER the
primary is already served, and a mismatch only logs — it cannot
fail-closed (502) because the response is already on the wire.
This is fine for maven (sha1 is the load-bearing blocking sidecar,
md5/sha256/sha512 are observability-only). It is NOT fine for adapters
whose ONLY sidecar is a non-blocking-default algo:
- composer → .sha256 only
- go → .ziphash (sha256) only
- pypi → .sha256 + .md5 + .sha512 (all non-blocking by default)
Switch those three to the 6-arg writeAndVerify(...) overload with
Collections.emptySet() so every supplied sidecar becomes load-bearing.
The API was designed for exactly this case (see NON_BLOCKING_DEFAULT
javadoc: "If a deployment requires strict .md5/.sha256/.sha512
blocking, ... pass Collections.emptySet() to make every supplied
sidecar load-bearing.").
Test coverage that was failing pre-fix and now passes:
- composer: CachedProxySliceIntegrityTest.sha256Mismatch_rejectsWrite
- pypi: CachedPyProxySliceIntegrityTest.sha256Mismatch_rejectsWrite
- go: CachedProxySliceIntegrityTest.ziphashMismatch_rejectsWrite
These tests were pre-existing (reproduce on 3163387, the Phase 14a
checkpoint) — the bug predates v2.2.0 PMD work; the failing tests
simply lacked the wiring that would make their assertions actionable.
…System Settings
Three bugs / UX issues fixed in one pass:
1. **PATCH preflight rejected by CORS** — every save on the runtime
tunables page was failing with a CORS error after a 204 OPTIONS
preflight. Cause: AsyncApiVerticle's Access-Control-Allow-Methods
listed GET/POST/PUT/DELETE/HEAD/OPTIONS but not PATCH, and
/api/v1/settings/runtime/:key is the only PATCH endpoint we expose.
Add PATCH to the allowed-methods list.
2. **Performance Tuning was a separate page** — the runtime tunables
(HTTP/2 client + pre-fetch) lived under /admin/performance-tuning,
duplicating navigation and creating a "two settings places" UX.
Fold them into SettingsView as two new cards: "HTTP/2 Upstream
Tuning" and "Pre-fetch". Delete the standalone view, drop the
sidebar entry, and turn the legacy /admin/performance-tuning route
into a redirect to /admin/settings so old bookmarks still work.
3. **Two ambiguous "Circuit Breaker" sections** — SettingsView had
"Circuit Breaker" (resilience4j-style upstream-failure breaker) and
PerformanceTuning had three prefetch.circuit_breaker.* keys (drop-rate
breaker on the dispatcher). Same word, different breakers. Rename the
first to "Upstream Failure Circuit Breaker" with a subtitle that
explicitly contrasts it against the pre-fetch breaker, and put the
pre-fetch breaker keys inside the Pre-fetch card with explicit
"Drop-rate breaker:" prefixes on each label.
4. **Drop the source ('db' / 'default') Tag chips** — they cluttered
every row without telling ops anything actionable. Keep the underlying
source field on the row (used to gate visibility of the "Reset to
default" button) but stop rendering the tag.
Plumbing: extracted runtime-settings state into a useRuntimeSettings
composable so SettingsView gets the same dirty/save/reset behaviour
without inlining ~150 lines of state machinery.
All 78 UI tests pass. Server SettingsHandlerRuntimeTest (13 tests) also
green — endpoint contract unchanged.
…cture)
Diagnosed and fixed the 429 storm from Maven Central (and the symmetric
packagist.org pattern in composer): every cache hit was firing an
upstream HEAD via the cooldown inspector's network-fallback chain.
Track 1 — 429/4xx propagation. 4xx (non-404) propagates verbatim with
Retry-After preserved; 5xx and connection errors collapse to 503 with
stale-serve attempt. 404 stays the only status cached negatively.
Track 2 — ArtifactIndexCache surgical invalidation. L1 Caffeine + L2
Valkey with positive/negative subkeys (artifact-index-positive,
artifact-index-negative). recordUpload/recordDelete invalidate only the
affected key cluster across the cluster bus, not the whole cache.
Settings under caches: in pantera.yml.
Track 3 — Integrity. Atomic primary+sidecar commit order flipped:
sidecars land before the primary so readers never observe a primary
without its sha1. Maven CachedProxySlice constructor rejects empty
storage at startup (always-verify). GroupResolver pins the winning
member per coordinate for sequential queries.
Track 4 — Stream-through + sibling prefetch. ProxyCacheWriter gains
streamThroughAndCommit(...): tees upstream Publisher<ByteBuffer> to the
client response body AND to a verifying temp file in one pass. Client
receives first byte at upstream-first-byte time instead of after the
full body drain + .sha1 round-trip. On verification mismatch the cache
stays empty (Track 3 invariant preserved). MavenSiblingPrefetcher fires
on commit: when foo-1.0.jar lands, foo-1.0.pom is background-fetched
(and reverse).
Track 5 — zero upstream I/O on cache hit. The load-bearing fix for the
user-reported 429 storm.
1A. Cooldown gate moved inside verifyAndServePrimary after
storage.exists in maven; evaluateMetadataCooldown removed from
composer's serveCachedMetadata; go was already correct.
1B. UpstreamBody record carries upstream Headers so
enqueueEventForWriter populates publish_date from the
authoritative Last-Modified, not now(). Fixes the Track 4
regression where every stream-through wrote now() and made the
next cooldown evaluation re-resolve via MavenHeadSource.
2A. PublishDateRegistry.Mode {NETWORK_FALLBACK, CACHE_ONLY}. The
4-arg overload is a default that delegates to the 3-arg abstract,
preserving lambda-implementer compatibility. DbPublishDateRegistry
overrides the 4-arg to short-circuit on CACHE_ONLY with a
cache_only_miss metric outcome. RegistryBackedInspector grows a
3-arg constructor for choosing the mode.
2B. HeadProxySlice (maven + go) accepts Optional<Storage>. Cache hit
returns 200 + Content-Length from Meta.OP_SIZE, never touching
the upstream client. Cache miss falls through to pre-Track-5
pass-through. Wired in MavenProxySlice and GoProxySlice.
2D. Cross-adapter audit confirmed no other cache-hit-on-upstream
patterns. helm/debian/rpm/conda/nuget/conan/hex/gem are
local-only (or pure pass-through) and clean.
3A. SwrMetadataCache<K, V> primitive in pantera-core encapsulating
the maven MetadataCache SWR shape: soft TTL serve cached, past
soft serve stale + async refresh dedup'd by ConcurrentHashMap
key set, past hard treat as miss. Counters for fresh/stale/miss
tagged by cacheName. Adapter migrations are follow-up — primitive
is in place.
3B. PublishDateExtractor SPI + PublishDateExtractors registry in
pantera-core, keyed by repo-type. Maven extractor registered in
VertxMain at boot; CachedProxySlice.buildArtifactEvent now
consults the registry first, falls back to extractLastModified.
Non-maven adapters fall through to the registry NO_OP for now.
Test coverage:
- CacheHitNoUpstreamTest (maven, 2): cache-hit on .jar/.pom is local;
no upstream calls, no inspector calls.
- ComposerCacheHitNoUpstreamTest (1): same for composer metadata JSON.
- HeadProxySliceCacheFirstTest (3): HEAD on cached, HEAD on miss,
pass-through when no storage.
- RegistryBackedInspectorCacheOnlyTest (2): mode propagation.
- PublishDateExtractorsTest (4): registry semantics.
- SwrMetadataCacheTest (5): fresh-hit / soft-stale / hard-stale /
dedup / absent-then-cached.
- Renamed verifyAndServePrimaryBlocksEvenWhenCacheHasVersion to
verifyAndServePrimaryCacheHitIsLocalEvenWhenBlockIsActive; new
assertion: cooldown is NOT evaluated on cache hit.
Acceptance: with a warm cache, `mvn dependency:resolve -U` walks make
zero upstream calls for cached artifacts and HEADs. Cooldown evaluation
happens exactly once per (artifact, version) on first fetch. Mutable
per-package index metadata continues to refresh via async SWR (off the
hot path) per the user's serve-stale + background-refresh contract.
PMD clean. 1192 unit tests passing across pantera-core, maven-adapter,
go-adapter, composer-adapter, pantera-main.
…e 3C contract pin Per-repo-type PublishDateExtractor registrations. VertxMain.start now registers the RFC 1123 Last-Modified extractor for every header-emitting proxy ecosystem (maven, npm, pypi, go, composer, gem) — all six upstream registries we proxy set Last-Modified on artifact GETs, so one extractor lambda covers all of them. Ecosystems whose publish date lives in the response body (docker manifests, nuget catalog, hex registry) keep falling through to the registry's NO_OP and the pre-Track-5 System.currentTimeMillis() DB-consumer fallback; adding body-aware extractors for those is a per-adapter follow-up. Phase 3A "migrate composer/pypi/go onto SwrMetadataCache" — DELIBERATELY NOT DONE. The audit finding that motivated the migration was wrong: those adapters' cache-hit paths use CacheTimeControl.validate which queries storage.metadata (Postgres-backed, cross-process), not upstream. Replacing with SwrMetadataCache's in-process Caffeine timestamps would trade cross-process consistency for in-process speed — a net regression in multi-instance deployments where two nodes would drift their freshness independently. The existing patterns are correct. SwrMetadataCache remains the canonical primitive for future adapters without a storage-backed metadata layer. Phase 3C "deletion of MavenHeadSource et al." — DELIBERATELY NOT DONE. The PublishDateSource implementations remain load-bearing on the cache-miss branch (NETWORK_FALLBACK mode). Deleting them would force the cooldown evaluator to fail open on every freshly-published version the first time it is requested — defeating the "block fresh versions even for the first asker" guarantee. The Phase 1A "no upstream on cache HIT" invariant does not require deletion of the cache-miss fallback; one upstream HEAD per genuinely-new (artifact, version) is amortised over the artifact's entire lifetime in cache. Phase 3C contract pinned via two new tests in DbPublishDateRegistryTest: - cacheOnlyModeNeverInvokesSource: L1+L2 miss + CACHE_ONLY returns empty without firing the source. - cacheOnlyAfterNetworkFallbackHitsL1OrL2: first call NETWORK_FALLBACK populates L2; subsequent CACHE_ONLY hits (even across a fresh DbPublishDateRegistry instance simulating a JVM restart) read L2 with zero source calls. Admins wanting strictly-zero upstream HEAD traffic at the cost of first-asker cooldown can already opt out per-repo with cooldown.enabled: false. No new toggles. Test count: 1194 passing (up from 1192). PMD clean.
Cold-cache mvn dependency:resolve -U log review found exactly one error
type repeated 80+ times per build: "Maven POM parse failed: Unexpected
character 'P' (code 80) in prolog [1,2]". 'P' (0x50) at byte 1 is the
ZIP magic byte — PrefetchDispatcher was routing every Maven cache write
through MavenPomParser regardless of file extension, so every .jar (a
binary ZIP) triggered a guaranteed-failed XML parse plus a temp-file
snapshot copy plus a WARN log plus an executor slot consumed.
Fix: PrefetchParser SPI grows an appliesTo(String urlPath) default
method returning true. MavenPomParser overrides to path.endsWith(".pom").
PrefetchDispatcher.onCacheWrite gates on appliesTo BEFORE the snapshot
copy and executor hand-off, so non-applicable writes are a complete
no-op on the cache-write callback path.
Tests:
- MavenPomParserTest.appliesToFiltersOutNonPomPaths: .pom matches; .jar,
.war, .module, null all filtered.
- PrefetchDispatcherTest.onCacheWrite_skipsParser_whenAppliesToReturnsFalse:
.jar write does not invoke the parser; subsequent .pom write does.
Other parsers (NpmPackageParser, NpmCompositeParser, NpmPackumentParser)
inherit the default-true behaviour and are unchanged.
16/16 prefetch tests green. PMD clean.
Five RCAs from deep perf analysis of cold mvn through maven_group (`dependency:resolve -Dartifact=org.codehaus.mojo:sonar-maven-plugin:4.0.0.4121 -U`, 9.58 s direct vs 38-39 s through Pantera). RCA-1 was attempted as a group-layer SingleFlight + body-buffer dedup and reverted — it broke streaming and the real cause turned out to be at the proxy slice layer. RCA-7 (4xx-collapse split) was attempted and reverted — the group fallthrough chain depends on 4xx → 404, so the split broke mvn entirely under upstream rate-limiting. Both findings are documented in CHANGELOG as deferred follow-ups. RCA-3: delete MavenSiblingPrefetcher. Single-thread executor with unbounded queue draining at 5-10/sec, fired on every primary commit, could not win the 10-50 ms race against mvn's matching foreground request. Net measurable benefit: zero. Latent OOM via unbounded queue. RCA-4: demote LoggingAuth INFO → DEBUG on success. The two-tier auth stack fired 1062 INFO entries per cold mvn run (534 requests × 2 tiers wrapped by LoggingAuth on each tier). Pure log noise. RCA-5: VertxMain sweeps orphan `tmp-<uuid>` dirs older than 1 h under vertx.cacheDirBase on startup. Production carried 11 353 orphans (4.1 GB) — slowed FileStorage I/O and exhausted the workstation /private/tmp during this perf session. RCA-6: GroupResolver.tryNextSequentialMember now logs each 404 / non-2xx fallthrough with member name, status, and url.path. Previously silent — blocked initial RCA-1 diagnosis (no way to tell whether maven_proxy served, 404'd, or errored before groovy got the request).
Speculative prefetch (PrefetchDispatcher) fires N upstream GETs per successful primary cache write (one per direct dep parsed from the cached POM/packument), recursively. With per-host concurrency 16 (Maven) / 32 (npm) and no requests-per-second cap, the subsystem multiplies cold-walk upstream RPS several times above the foreground client's request rate. CHANGELOG RCA-7 and Track 5 entries already document Maven Central returning 429 against test workstation IPs once the subsystem was enabled. The user-reported May 7-8 throttling reproduces the same symptom against production. Flipping the default to false makes prefetch opt-in per repo. Operators who want it must add `settings.prefetch: true` to the repo config; the admin UI still exposes the toggle. The PrefetchDispatcher / Coordinator / Parser code is untouched — only the default for new repos changes. See analysis/03-findings.md finding #1 for the full evidence + the long-term fix (replace speculative prefetch with observed-coordinate pre-warming from the artifact event stream).
…#2, #10) BaseCachedProxySlice.fetchAndCache called client.response(...) per caller, then collapsed only the cache-write step through SingleFlight. N concurrent client requests for the same uncached path produced N upstream calls instead of 1 — the dedup helped reduce duplicate cache writes but did nothing for upstream rate-limit budget. Refactor: leader/follower pattern matching what GroupResolver .proxyOnlyFanout and MavenGroupSlice.mergeMetadata already do. The SingleFlight gate now wraps the upstream fetch + cache-write combination. The leader runs fetchAndCacheLeader (full upstream + cacheResponse). Followers park on a CompletableFuture<Void> gate and, on leader completion, re-enter cacheFirstFlow which hits the freshly-warm cache. Field type changes from SingleFlight<Key, FetchSignal> to SingleFlight<Key, Void> to match the new gate semantics — the gate's terminal value is irrelevant; followers consult the cache directly on re-entry. FetchSignal is still used internally inside the leader's chain (cacheResponse → signalToResponse) for the same purpose as before. Behaviour: - Leader 200: followers re-enter, cache hit, return cached bytes, zero extra upstream calls. - Leader 404: handle404 populates the NegativeCache before the gate completes, so followers short-circuit on re-entry. - Leader 5xx / exception: cache stays empty, gate still completes, followers retry — same upstream cost as no-dedup, no per-caller amplification during the leader's in-flight window. Also updates the class Javadoc and docs/developer-guide.md §7.1 to reflect the actual pipeline (Finding #10): step 6 now describes the single-flight gate around the upstream fetch, instead of the previous misleading "deduplicated upstream fetch" wording that did not match the pre-fix implementation. Verified by running BaseCachedProxySlice*Test (34 tests, all green) — in particular the BaseCachedProxySliceDedupTest property "N concurrent callers produce exactly one cache write" still holds, and the "Cache hit" log lines from worker threads confirm followers are re-entering cacheFirstFlow and serving from the warm cache. See analysis/03-findings.md findings #2 and #10 for full evidence.
…torm Phases 0-6 of the senior-staff investigation requested by the operator, landed as discrete files under analysis/ so each phase remains addressable. - 00-mental-model.md — actual code path of a cold-miss Maven request through GroupResolver → CachedProxySlice → ProxyCacheWriter → PrefetchDispatcher, with every pool / lock / cache enumerated and observability gaps named. - 01-reproduction.md — static request-amplification math (~5× for a typical Spring/sonar POM tree), reconciled against the team's cold-bench-10x (13 s on 2026-05-05) and the CHANGELOG RCA-7 measurement (38 s on 2026-05-13). - 02-diff-triage.md — verdict per relevant commit since master. The prefetch subsystem (May 4-5) is the principal new outbound generator; Track 5 fixes the cache-HIT case but not the cold-MISS case. - 03-findings.md — the 10 classified findings with full FINDING template (category, evidence, problem impact, severity, confidence, short-term fix, long-term fix). Finding #1 is the shared root cause that explains both throttling and slowness. - 04-rebuild-plan.md — what to delete (prefetch subsystem; eager .sha1; .sha1 we already compute), the new request lifecycle, and the migration order for landing the fixes. - 05-perf-harness.md — the regression-prevention test plan with Toxiproxy + fixture upstream, prometheus-gated CI, integration test for the single-flight upstream property. Each commit on this branch addresses one finding and references it by number in the commit message.
… fixes Captures the verbatim source diff for findings #1 (RepoConfig default) and #2 (BaseCachedProxySlice single-flight placement), plus the test results that confirm: - BaseCachedProxySliceDedupTest passes after the refactor (4/4) — the "N concurrent callers produce exactly 1 cache write" property still holds with the leader/follower gate placement. - RepoConfigTest passes after the prefetch default flip (21/21) — the three explicit-precedence tests still correctly override the new false default. Stored under analysis/evidence/ so reviewers can verify the fixes without re-running the build.
Plan ready for review at analysis/plan/v1/PLAN.md.
Incorporates user's two approval changes:
1. W4 prefetch: full deletion across every adapter (not keep-on-opt-in).
The "(i) keep with governance vs (ii) delete" choice in v1.0 of the
plan is removed; (ii) is the only path. New milestone M2 is dedicated
to the deletion, landing strictly after M1's observability foundation
so we can verify caller_tag="prefetch" drops to zero post-deploy.
2. All fixes explicitly cross-adapter. Added Part A.1 — a per-workstream
× per-adapter coverage matrix. The user's intent: avoid the same
regression resurfacing in npm/composer/go/etc. after a Maven-only
fix. Matrix uses ↓/✓/audit/— to pin coverage.
Headline confidence (post-revision):
Problem 1 resolved to target: 85% (+5 vs initial)
Problem 2 resolved to target: 72% (+2 vs initial)
Both problems resolved to target: 68% (+3 vs initial)
Revision raises confidence because (a) full deletion eliminates the
amplification source's code path entirely — no flag to flip back wrong;
(b) cross-adapter scoping rules out the regression resurfacing under a
different ecosystem name.
Milestones now: M1 observability, M2 prefetch deletion, M3 rate limit,
M4 single-flight + .sha1, M5 cooldown + conditional + real-Maven-Central
gate, M6 1000 req/s SLO. M2 is flagged as the one-way milestone in the
plan (DB migration is forward-only; deletion blast radius is large).
Awaiting user "Approved" before starting implementation of M1.
Adds workstream W6 covering the team's deferred RCA-1 + RCA-7 from
CHANGELOG line 21-58, per user direction 2026-05-13. Greenfield
authorisation removes the "load-bearing assumption" excuse the team
used to defer these.
W6 deliverables:
- R7a: differentiate upstream non-2xx in CachedProxySlice.
fetchVerifyAndCache exceptionally handler (404/410 → notFound,
429/503 → propagate, 5xx → badGateway with status preserved).
- R7b: ArtifactIndexCache negative-cache write guarded — only
fires on terminal upstream-true-404, not collapsed 4xx.
- R1a: GroupResolver.tryNextSequentialMember distinguishes
upstream-true-404 from member-error; non-404 propagates verbatim.
- R1b: 5xx falls through ONCE to next member but never writes
negative-cache; final result is 502 if no 200 found.
- New GroupResolverStatusFidelityTest integration test (3
scenarios: true 404 fallthrough, 429 with breaker, 502 single
fallthrough).
W6 lands in M5 (alongside W5), behind a status_fidelity.enabled
feature flag for the M5 observability window. The flag mitigates
the R2 risk (the team's first RCA-7 attempt was reverted because
it broke mvn); rollback is a flag flip rather than code revert.
Confidence raised:
Problem 1: 85 → 88 (+3)
Problem 2: 72 → 78 (+6); target tightened ≤15s → ≤13s p50
Both: 68 → 74 (+6); R2 retires from residual-risk list
Out-of-scope section updated to mark RCA-1 + RCA-7 as folded in
(strike-through preserved for traceability). Cluster-wide rate
limiting, other architectural debt, and observed-coordinate
prewarming remain out of scope with explicit rationale per item.
Adds the metric scaffolding the plan's later milestones depend on for
validation. Without this, "did the fix work?" remains non-falsifiable.
New metrics:
- pantera.upstream.requests.total{upstream_host, caller_tag, outcome}
Incremented once per outbound request at the http-client funnel
(JettyClientSlice.recordOutboundMetric). Outcome buckets:
2xx / 3xx / 4xx / 429 / 5xx / timeout / connect_error / error.
- pantera.proxy.429.total{upstream_host, repo_name}
Isolated counter for the primary throttling alert.
- pantera.upstream.request.duration timer with the same labels —
feeds the upstream-latency-by-source dashboard.
caller_tag plumbing:
- New ThreadContext key RequestContext.KEY_CALLER_TAG ("caller.tag").
- Constants: CALLER_TAG_FOREGROUND / _COOLDOWN_HEAD / _METADATA_REFRESH.
- bindCallerTag(tag) AutoCloseable for try-with-resources at
non-foreground call sites (cooldown HEAD, metadata refresh).
- currentCallerTag() reads from ThreadContext, defaults to
"foreground" if unset.
- JettyClientSlice snapshots caller.tag + repository.name from
ThreadContext BEFORE request.send() — the Jetty callback may run on
a thread that does not carry our MDC.
Prometheus rules + alerts (rules/amplification.yml):
- pantera_upstream_amplification_ratio recording rule:
sum(rate(pantera_upstream_requests_total[5m]))
/ clamp_min(sum(rate(pantera_http_requests_total[5m])), 1)
per upstream_host.
- pantera_request_to_artifact_ratio recording rule.
- PanteraUpstream429 alert: any 429 sustained 5 min → page.
- PanteraAmplificationRatio alert: ratio > 1.5 sustained 5 min → page.
- rule_files glob enabled in prometheus.yml.
Status code outcome bucketing utility on MicrometerMetrics:
- outcomeBucket(int statusCode) — coarse buckets + 429 isolated.
- outcomeFromFailure(Throwable) — timeout / connect_error / error.
Tests:
- RequestContextTest gains 4 new tests for bindCallerTag /
currentCallerTag / round-trip / double-close semantics. All
18 tests green.
- Full http-client + pantera-core suite re-run: 110 + 1017 = 1127
tests, 0 failures.
The recording-rule + alert YAML lives at:
pantera-main/docker-compose/prometheus/rules/amplification.yml
and is loaded via the enabled rule_files glob in prometheus.yml.
References:
- analysis/03-findings.md finding #8
- analysis/plan/v1/PLAN.md milestone M1 + workstream W1
The PrefetchDispatcher/Coordinator chain fires N upstream GETs per cache write recursively (one per direct dep in a POM / packument). With per-host caps of 16 (maven) / 32 (npm) and no requests-per-second gate, a cold-cache walk amplifies outbound RPS ~5× above the foreground client's rate, tripping Maven Central's per-IP rate limiter (RCA in analysis/03-findings.md #1, #7). Removed across the stack: pantera-main/.../prefetch/ — Coordinate, PrefetchTask, PrefetchMetrics, PrefetchCircuitBreaker, PrefetchCoordinator, PrefetchDispatcher, parser/ subpackage (7 files: MavenPomParser, NpmCompositeParser, NpmPackumentParser, NpmPackageParser, NpmMetadataLookup, CachedNpmMetadataLookup, PrefetchParser). pantera-main/.../api/v1/PrefetchStatsHandler — the 24h sliding-window /api/v1/prefetch/stats endpoint that read PrefetchMetrics. AsyncApiVerticle no longer takes the metrics ref through its constructor chain. pantera-main/.../settings/runtime/PrefetchTuning + CircuitBreakerTuning — typed snapshots whose only consumers were the deleted PrefetchCoordinator and PrefetchCircuitBreaker. SettingsKey enum trimmed to the three http_client.* keys that still have live consumers. SettingsHandler validateRuntime falls through any prefetch.* key (no longer in catalog). RuntimeSettingsCache — Snapshot trimmed to {http, raw}; prefetchTuning() and circuitBreakerTuning() accessors removed. RepoConfig.prefetchEnabled + RepositorySlices.{prefetchEnabledFor, upstreamUrlOf, repoTypeOf, npmProxyStorages} — accessors whose only consumer was the dispatcher. VertxMain.installPrefetch — boot wiring (~190 LOC) deleted; the PrefetchCoordinator/Dispatcher shutdown blocks removed; field declarations replaced with an M2 comment. The CacheWriteCallbackRegistry.clear() call is kept so a future Phase 4c observed-coordinate prewarming hook can install a consumer without leaking it across restarts. NpmProxyAdapter — NpmCacheWriteBridge removed; the NpmProxy ctor now receives null for cacheWriteHook + packumentWriteHook. The hook surface on NpmProxy is retained for the same future-prewarming reason. DB: V128__drop_prefetch_settings_keys.sql — DELETE FROM settings WHERE key LIKE 'prefetch.%'. Removes any rows the v2.1.x SettingsBootstrap or admin PATCHes left behind so the /settings/runtime listing does not surface dangling keys with no consumer. UI: Deleted: PrefetchPanel.vue (+ test), api/prefetch.ts. RepoEditView no longer mounts PrefetchPanel; settings.prefetch read/write logic removed. SettingsView Pre-fetch card deleted; PREFETCH_KEYS / RUNTIME_INT_RANGES / RUNTIME_LABELS / RUNTIME_HELP slimmed to the three http_client.* keys. api/runtimeSettings.ts: RuntimeSettingKey union and SPEC_DEFAULTS trimmed; test rewritten to match. Upstream-failure circuit-breaker card's subtitle no longer references the deleted pre-fetch drop-rate breaker (the two were always distinct). Cache pipeline (preserved): BaseCachedProxySlice/ProxyCacheWriter retain the onCacheWrite hook surface backed by CacheWriteCallbackRegistry's NO_OP sentinel. Javadoc updated to call out the prefetch consumer's removal and reserve the slot for Phase 4c (2.3.0). Tests: RuntimeSettingsCacheTest, SettingsKeyTest, RepoConfigTest, SettingsHandlerRuntimeTest, RepositoryHandlerTest, BaseCachedProxySliceHookTest, ProxyCacheWriterHookTest — prefetch-specific assertions deleted; remaining assertions still cover the foreground behaviour they shipped to pin. pantera-main unit tests pass (22 in the impacted set); pantera-core hook tests pass (11). Scope per analysis/plan/v1/PLAN.md M2 + user's 2026-05-13 explicit greenfield authorization for v2.2.0 major-version cleanup.
… gate Adds a structural fix for the dominant amplification source the v2.2.0 investigation identified (analysis/03-findings.md #3, #7, #9 + RCA-7): Pantera had no requests-per-second cap on its outbound traffic, so any adapter could push past the per-IP budget Cloudflare-fronted registries (Maven Central, npm public) enforce. The new module wraps every per-host Jetty client slice — for every adapter, for every caller_tag — with a token bucket plus a 429-and- Retry-After gate. The bucket caps steady-state RPS; the gate fail-fasts during the back-off window after upstream throttles us. New module: http-client/.../ratelimit/ RateLimitConfig - Per-host config: refill rate (tokens/sec) and burst capacity - Defaults: repo1.maven.org 20 req/s burst 40 — Cloudflare per-IP budget starts 429-ing ~25-30 req/s registry.npmjs.org 30 req/s burst 60 — npm's CDN tolerates more any other host 10 req/s burst 20 — conservative default - Builder lets the perf harness inject test configs without touching production defaults UpstreamRateLimiter (interface + Default impl) - Per-host Bucket state via ConcurrentHashMap + AtomicReference CAS. O(1) hot-path; no locks. - tryAcquire(host): consumes a token if the gate is open and the bucket has > 1.0 tokens. Returns false in either failure mode so the caller can fail-fast. - recordRateLimit(host, retryAfter): closes the gate for retryAfter (defaults to 30 s when retryAfter is zero — for 429s with no header). Concurrent close attempts keep the LATER deadline so a burst of 429s does not shrink the window. - recordResponse(host, status, retryAfter): 429 always gates; 503 only gates with Retry-After (503 without is treated as a transient server error, not a throttle signal). - gateOpenUntil(host) exposes the deadline so foreground responses can carry the right Retry-After through to the client. RetryAfter - Parses both RFC 7231 forms: delta-seconds and IMF-fixdate. - Malformed / blank / null → Duration.ZERO. - Past HTTP-date → Duration.ZERO (a deadline in the past is not a forward delay). RateLimitedClientSlice - Decorator that wraps any Slice (placed by JettyClientSlices around every per-host JettyClientSlice). Per outbound: 1. Inspect the gate. Closed → synthesise 429 + Retry-After pointing at the gate deadline, do NOT call the wrapped slice. 2. Otherwise tryAcquire. Empty bucket → synthesise 429 + Retry-After 1 s (the bucket refills continuously; the next attempt has a token within a fraction of a second). 3. Token acquired → delegate. On the response, check status — a 429 / 503-with-Retry-After closes the gate. - Synthesised 429s carry X-Pantera-Rate-Limited: true so future cluster-wide propagation and the cache slice can distinguish self-imposed from upstream-imposed. Wiring: JettyClientSlices.slice() now wraps every JettyClientSlice in the rate-limited decorator. Loopback hosts (localhost, 127.x.x.x, ::1) bypass the limiter — they are exclusively dev / test fixtures and the limiter would otherwise throttle the harness. A second constructor overload accepts an explicit UpstreamRateLimiter for tests / perf harness injection; production callers use the existing 4-arg ctor which constructs a JVM-default limiter from RateLimitConfig.defaults(). Metric: pantera_outbound_rate_limited_total{upstream_host, reason} reason ∈ {gate_closed, bucket_empty} Differs from pantera_proxy_429_total: this one fires when WE deny the outbound; the existing 429 counter fires when the upstream denies us. Operators want both — non-zero of either means somebody is throttling somebody. Alert: PanteraOutboundGateStuckClosed — warn after 10 min of sustained gate_closed events on a host. Means our gate's back-off window is not opening, i.e., the upstream is still 429-ing through our limit. Operator action: drop the per-host token rate. Foreground propagation: BaseCachedProxySlice already preserves 4xx verbatim (status + Retry-After) — verified in pantera-core line 1189-1193. The synthesised 429 flows through to mvn/npm unchanged so those tools honour their own back-off behaviour. Tests (14 unit, all green): UpstreamRateLimiterTest - acquiresBurstTokensWithoutWaiting — burst drain - refillsAtConfiguredRate — 10/s rate with TestClock stepping - gateBlocksDespiteAvailableTokens — gate trumps tokens - recordRateLimitUsesDefaultDurationWhenAbsent — zero → 30 s - hostsAreIndependent — gating maven does not affect npm - recordResponseOnlyGatesOn429 — 200/503-no-RA never gate RetryAfterTest - parses delta-seconds, parses HTTP-date, past date → 0, null/blank → 0, malformed → 0 RateLimitedClientSliceTest - gatedRequestNeverReachesWrappedSlice - upstream429ClosesTheGate (second call short-circuits) - emptyBucketSynthesises429WithOneSecondRetryAfter JettyClientSlicesTest - shouldProduce* now asserts RateLimitedClientSlice for non- loopback, JettyClientSlice for localhost — pins the new wrapping contract. Toxiproxy-mediated integration / perf-fixture test is M6's scope per analysis/plan/v1/PLAN.md (perf-gate CI workflow). The unit tests cover the core behaviour; the integration test will exercise it end-to-end against a rate-limited stub.
Pre-M4 the Maven adapter's custom upstream-fetch path (CachedProxySlice.fetchVerifyAndCache) had no request coalescing: 50 concurrent clients for the same uncached primary fired 50 independent upstream calls. That was the dominant cold-walk amplifier after M2's prefetch deletion (analysis/03-findings.md #2, #10). The base-class fix (commit 21232a5b1) covered BaseCachedProxySlice.fetchAndCache. M4 extends the same leader/follower pattern into the Maven-specific stream-through path via a new BaseCachedProxySlice.coalesceUpstream helper. Why the helper is shaped the way it is. The Maven path uses ProxyCacheWriter.streamThroughAndCommit which returns a Response whose body is still streaming when the future resolves — the actual cache commit lands later, signalled by the StreamedArtifact's verificationOutcome. If we completed the SingleFlight gate when the leader's response future resolved (the natural "leader done" hook), followers would wake up against a still-empty cache, re-enter verifyAndServePrimary, miss the storage check, and become new leaders themselves — each firing its own upstream call. The integration test reproduced exactly this: 50 clients producing 11 upstream hits across 11 "waves" before the cache eventually caught up. coalesceUpstream therefore takes the gate as an explicit argument the leader closes when the cache write is fully durable. The Maven caller hooks verificationOutcome: artifact.verificationOutcome() .whenComplete((r, e) -> singleFlightGate.complete(null)); On exceptional completion of the leader's response future, the gate is force-closed (defensive — release followers to retry rather than park forever on a leader that died before signalling). New file: pantera-core/.../BaseCachedProxySlice.coalesceUpstream(key, leaderFetch, followerLookup) — protected helper for subclasses with custom stream-through paths. Reuses the same per-key SingleFlight that fetchAndCache already owns, so a future Maven .pom and a BaseCachedProxySlice .jar miss with the same cache key DO share the same flight (correct: the cache key IS the de-dup identifier). Maven CachedProxySlice changes: verifyAndServePrimary on cache-miss now routes through coalesceUpstream. fetchVerifyAndCache signature gains an explicit singleFlightGate the leader completes on verificationOutcome. Test: CachedProxySliceUpstreamDedupTest — 50 concurrent clients for the same uncached Maven primary; asserts exactly one upstream primary GET and one .sha1 GET. Slow upstream (150 ms artificial sleep) so followers really do park on the leader's gate. Bodies are drained in the test client to mirror production's Vert.x server-side subscription — without that, verificationOutcome never fires and the gate would stay closed. Test exposes the production invariant explicitly so future regressions surface. The eager .sha1 elimination (W3c) remains in scope for M5 — it requires reworking the sidecar serve path so a client-requested .sha1 against a cached primary is generated locally rather than re-fetched. 130 maven-adapter tests + 1021 pantera-core tests green.
Two coupled changes per analysis/plan/v1/PLAN.md M5: stop poisoning the
group / index-cache on transient upstream errors, and stop the
per-first-fetch HEAD against Maven Central that was a meaningful
contributor to Pantera's outbound amplification once prefetch was
removed (M2) and rate-limited (M3).
W6 R7a — status-code fidelity in CachedProxySlice.fetchVerifyAndCache:
Pre-W6 the exception handler collapsed every upstream 4xx to a local
404, masking:
- 429 (rate-limit) → looked like "doesn't exist"
- 401 / 403 (auth) → looked like "doesn't exist"
- 503-with-Retry-After (cooldown) → looked like "doesn't exist"
and poisoning ArtifactIndexCache for transient throttles. Pre-W6 5xx
surfaced as 503 verbatim, which RaceSlice treats as a "winning"
response — stopping the group walk even when another member could
have answered.
Post-W6 (mapUpstreamStatus):
404, 410 → notFound() (group fallthrough continues)
429 → 429 + Retry-After preserved (M3's gate honours)
503 + RA → 503 + Retry-After (transient upstream cooldown)
503 no RA → 502 badGateway (group fallthrough)
401 / 403 → propagated verbatim (authoritative auth)
5xx → 502 badGateway (group fallthrough; no index-
cache poisoning)
timeout / SSL → 502 badGateway (transient infrastructure)
UpstreamHttpException grew a `retryAfter` field so the handler can
propagate the upstream header verbatim instead of fabricating a value.
W5b — first-fetch cooldown HEAD is now opt-in:
Pre-M5 every cooldown evaluation against a fresh (artifact, version)
pair fell through L1 (Caffeine) + L2 (DB) and fired a HEAD via
MavenHeadSource. Cold-walk amplification: 50 newly-resolved versions
→ 50 extra HEADs to the same upstream we are about to GET anyway,
each subject to the same per-IP throttling.
Track 5 Phase 1B already populates publish_date from the primary
GET's Last-Modified on cache-write. From the SECOND fetch onwards
the publish-date is in the DB → no HEAD. The remaining first-asker
HEAD is now skipped by default in 2.2.0.
VertxMain registers `maven`/`gradle` publish-date sources only when
PANTERA_PUBLISH_DATE_HEAD_FALLBACK_ENABLED=true (default false).
When unset the source map omits maven/gradle entirely, so
DbPublishDateRegistry's L3 lookup returns Optional.empty() and the
cooldown gate allows the first fetch through. Trade-off (per PLAN
option B): the first asker of a freshly-published blocked version
downloads the bytes before the cooldown evaluator catches it on the
next request.
Operators who need strict first-fetch enforcement re-enable via the
env var; the source is still wired and tested (MavenHeadSourceTest,
MavenHeadSourceLiveTest unchanged).
Test:
CachedProxySliceStatusFidelityTest — 7 cases pin the fidelity matrix:
404 → 404, 429 + Retry-After preserved, 401 propagates, 403
propagates, 503+RA propagates, 503 without RA collapses to 502,
5xx (500) → 502. 410 is read-only at code level (RsStatus enum has
no entry; Maven Central never returns it).
137 maven-adapter + 13 publish-date tests green.
Out of scope for M5 (follow-up in 2.3.0):
W5a ConditionalRequestSlice — requires per-host (URL → Last-Modified)
Caffeine cache + 304 translation. Largest impact on
maven-metadata.xml / packument refresh paths; M5 already eliminates
the HEAD-shaped equivalent (W5b). Wiring a full conditional layer
is a separable 1-2 day effort.
W6 R1a/R1b GroupResolver — propagating 5xx vs 4xx through
tryNextSequentialMember requires re-threading Fault.UpstreamServerError
through the group fanout. R7a's local fidelity is a prerequisite
(now landed); R1a/R1b is a follow-up.
CI-side enforcement of the three production paging signals shipped in
M1's recording rules (amplification.yml) so any PR that regresses the
fix surfaces immediately rather than waiting for a production page:
pantera_proxy_429_total > 0 → FAIL
pantera_outbound_rate_limited_total
{reason="gate_closed"} > 0 → FAIL
pantera_upstream_amplification_ratio > 1.5 → FAIL
Files:
scripts/perf-gate-check.sh
Scrapes a running Pantera's /metrics/vertx, sums the relevant
series across all label combinations (so a regression that adds a
new caller_tag still triggers), and exits non-zero on any
invariant breach. Uses awk for parsing — no Python / jq
dependency. Verified end-to-end against synthesised clean / dirty
metrics fixtures (pass + fail paths both emit the expected
diagnostics).
.github/workflows/perf-gate.yml
Triggers on PRs to master / 2.2.0 that touch pantera-core,
pantera-main, maven-adapter, http-client, the performance harness,
or the gate script itself. Boots the existing scaling-benchmark
docker-compose (WireMock-fronted upstream + Pantera SUT), runs a
short 60 s / 20 VU k6 ramp to populate the counters, then invokes
perf-gate-check.sh. On failure: uploads Pantera + WireMock logs as
artifacts for the on-call to triage.
The full scaling matrix (6 cells × ~20 min each = ~2 h) is out of
scope for the gate — it runs nightly via the existing
docker-compose-scaling.yml harness. The gate is the per-PR guard rail.
Toxiproxy-mediated 429 injection (originally scoped for M3's
RateLimitedClientSliceIT) remains follow-up work: the unit tests +
this gate together cover the structural invariants, and a Toxiproxy
fixture is best added when the next ConditionalRequestSlice (W5a)
work lands so both share the same fixture investment.
The 2.2.0 section had drifted from the project's established format through a long series of incremental doc commits: duplicate version headers, internal RCA notes, deprecated metric names, and detailed performance numbers that no longer reflect current behaviour (prefetch / npm-bridge subsystems have since been removed). Replaced with a single 2.2.0 entry matching the prior format (Breaking changes / New features / Performance / Bug fixes / Security), describing only behaviour that is actually in the shipped code. Internal implementation detail, per-class commentary, and revertedwork are out — this changelog is a public-facing release note, not a development journal.
…ack literal JettyClientSlices.isLoopback compares the host against the IPv6 loopback literal as part of its rate-limit-skip check. The literal is the value under test, so the rule cannot be satisfied by refactoring — a per-line NOPMD suppression matches the existing convention used in NegativeCacheConfig, CacheConfig, and FilteredMetadataCache.
CachedProxySlice translated a non-2xx upstream status with nested ifs
("status == 503" outside, "retryAfter present" inside), which PMD
flagged as CollapsibleIfStatements. Merging the conditions into a
single guard preserves the original semantics; the explanatory comment
block moves up one level and still applies.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v2.2.0 — target-architecture train + cooldown metadata filtering + publish-date registry + serve-before-commit
Summary
This PR lands the v2.2 target architecture (9 work items), cooldown metadata filtering across all 7 adapters, publish-date registry replacing per-adapter cooldown inspectors, serve-before-commit performance pattern across all proxy adapters, search & browse overhaul, group RaceSlice priority fix, ECS log field compliance, and preemptive Basic auth for Maven/Gradle.
Highlights
writeAndVerify()returns aVerifiedArtifactafter streaming upstream to a verified temp file — HTTP response streams immediately whilecommitAsync()persists to cache in the background. Wired into Maven, Go, Composer, PyPI.CompletableFuture.allOf()replaces sequentialthenComposechain for checksum sidecar writes.DbPublishDateRegistry(Caffeine L1 + Postgres L2 + 6 upstream source implementations) replaces all per-adapterCooldownInspectorclasses.RegistryBackedInspectoris the single inspector.providersfield type guard prevents ClassCastException when Packagist returns[]instead of{}.Target architecture work items (9)
Cooldown metadata filtering (8 phases)
Two-layer enforcement (soft metadata filter + hard 403) for Maven, npm, PyPI, Docker, Go, Composer, Gradle with 5 performance hardenings and SOLID package restructure.
Additional features
Test plan
mvn -T8 test— all modules green (4900+ tests, 0 failures)docker compose up -d && curl -u ayd:ayd http://localhost:8081/maven_group/...curl -u ayd:ayd http://localhost:8081/php_group/packages.jsonreturns 200git log --format='%B' | grep -c '^Co-Authored-By:'returns 0