Fixes #XXXXX - Add cache logging, health endpoint, and concurrency limiter to registration module by pablomh · Pull Request #1 · pablomh/smart-proxy

pablomh · 2026-04-04T23:04:30Z

Summary

Builds on smart-proxy#935 (in-memory registration script cache) with:

Cache HIT/MISS logging — read_registration_cache logs registration_script cache=HIT age=Xs or cache=MISS, with source=shared|local when Redis is in use
GET /register/health endpoint — returns 200 OK if the capsule can reach Foreman's /api/status, 503 if not; used by HAProxy health checks to route registration traffic away from degraded capsules
with_concurrency_limit(&block) — block-based abstraction that wraps the POST / handler; owns semaphore acquire/release and returns 503 Retry-After: 30 when the limit is exhausted. The handler expresses what to do; the helper owns how to limit concurrency.
Shared Redis/Valkey cache — optional :cache_url setting (redis://host:6379/0); when set, all capsule nodes in an LB pool share one cache so a single warm request benefits all nodes. Falls back to in-memory cache if Redis is unreachable.
:max_concurrent_registrations setting — configurable via registration.yml; default unlimited (backward compatible)

Depends on

smart-proxy#935 — in-memory registration script cache (base for this PR)

Design notes

`with_concurrency_limit(&block)` — top-down design

The POST / handler now reads as a statement of intent:

post '/' do
  with_concurrency_limit do
    resp = Proxy::Registration::ProxyRequest.new.host_register(request)
    handle_response(resp)
  end
end

The mechanical concern (semaphore acquire/release, 503 response, Retry-After header) lives entirely in with_concurrency_limit. The handler has no knowledge of how concurrency limiting works.

Shared cache architecture

One Redis/Valkey instance on the LB host, shared by all capsule nodes in the pool. Each capsule connects over the private network. If Redis is unreachable, the capsule falls back to its in-memory cache — registration continues working with per-node warm-up.

# /etc/foreman-proxy/settings.d/registration.yml
:cache_url: redis://lb-host-private-ip:6379/0
:max_concurrent_registrations: 50

Companion PRs

Repo	PR	What
smart-proxy	#935	In-memory cache (base)
smart-proxy	this PR	HIT/MISS logging, health endpoint, concurrency limiter, shared cache
satperf	`registration-metrics`	`registration_cache` Ansible role + HAProxy health check config

Test plan

ruby test/registration/registration_api_test.rb
Verify cache=HIT / cache=MISS lines appear in proxy log
GET /register/health returns 200 when Foreman is reachable, 503 when not
POST / returns 503 with Retry-After: 30 when semaphore is exhausted
With :cache_url set, verify Redis keyspace_hits increments on cache hits
Without :cache_url, verify fallback to in-memory cache works

🤖 Generated with Claude Code

GET /register returns the same shell script for all hosts sharing the same registration parameters (org, location, hostgroup, activation keys). Under bulk registration — 100+ hosts hitting the same capsule simultaneously — this endpoint is called once per host, each time proxying to Foreman and waiting for the ERB template to render (~103ms in profiling). Cache the rendered script in memory (5-minute TTL) keyed on a canonical form of the request query string. The cache key is computed by parsing the query string, sorting parameters alphabetically, and rebuilding — so requests that differ only in parameter order share the same cache entry and the same Foreman response, regardless of how the client ordered them. The implementation uses three clearly separated layers: get '/' — handles errors only; delegates to registration_script registration_script — owns the cache key and the business logic (what to cache and what to do on a miss); raises ScriptFetchError on non-200 so errors never reach the cache write cache(key, &block) — owns the mechanism: per-key double-checked locking via a block abstraction that keeps the caller free of locking concerns Per-key locking allows concurrent requests for genuinely different keys (e.g. different activation keys) to fetch from Foreman in parallel, while serialising only threads competing for the same key. The per-key Mutex is evicted from KEY_MUTEXES immediately after caching — once a key is hot, all future requests take the lock-free fast path and KEY_MUTEXES is empty under steady state. Only HTTP 200 responses are cached. Non-200 responses raise ScriptFetchError out of the cache block, which is rescued in get '/' and rendered via handle_response without poisoning the cache. Both KEY_MUTEXES and SCRIPT_CACHE use Concurrent::Map (already a smart-proxy dependency) for lock-free, thread-safe access on all Ruby VMs without relying on MRI's GIL. Tests added: - Cache hit: Foreman called once for repeated identical requests - Per-key isolation: different parameter sets cached independently - Parameter order independence: requests differing only in param order share the same cache entry - TTL expiry: expired entries are not served; Foreman is re-called - Mutex eviction: KEY_MUTEXES is empty after a successful cache write - Error non-caching: Foreman is called on every request when it errors - setup clears both SCRIPT_CACHE and KEY_MUTEXES between tests

Makes the script cache added in #39208 observable at debug log level: registration_script cache=HIT age=42s key_prefix=org_id=1&location_id=... registration_script cache=MISS key_prefix=org_id=1&location_id=... The key is truncated to 40 characters to avoid flooding the log while still distinguishing between different parameter sets. Log level is debug so it is silent in production by default and available on demand via the smart-proxy log level setting. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Load balancers that front multiple capsules currently have no way to distinguish a capsule that is up but cannot reach Foreman (and will therefore fail every registration) from a healthy capsule. Adds GET /register/health that: - Returns 200 {"status":"ok"} if the capsule can reach Foreman's /api/status endpoint (any HTTP response = reachable) - Returns 503 {"status":"error",...} if the connection fails - Any unexpected error also returns 503 so the LB removes the capsule The check uses the existing ForemanRequest infrastructure already used by the registration proxy, so no new configuration is needed. LB configuration example (HAProxy): option httpchk GET /register/health http-check expect status 200 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…deployments When multiple capsule nodes sit behind a load balancer, the existing per-node in-memory cache (PR #39208) means each node must independently fetch the registration script from Foreman on its first request — N capsule nodes = N warming requests. Under a burst of concurrent registrations, each node experiences its own thundering herd. This adds an optional shared Redis cache so that one warm request on any node benefits all nodes in the pool. Configuration: # config/settings.d/registration.yml :redis_url: redis://lb-host:6379/0 Cache lookup order: 1. Redis (shared, populated by whichever node fetched first) 2. Per-node in-memory cache (fast path for repeat requests to same node) 3. Foreman (miss — fetches and populates both caches) A Redis HIT from one node also warms that node's local in-memory cache, so subsequent requests to the same node skip Redis entirely. Failure handling: Redis errors (connection refused, timeout, LoadError) are caught and logged at warn level; the in-memory cache takes over transparently. This means a Redis outage does not break registration — it only reverts to per-node caching. The 'redis' gem is added with require: false so it is only loaded when :redis_url is configured. Without configuration, zero overhead. Note: standalone capsules (single node) gain no benefit from Redis; the existing in-memory cache already provides full hit rate after one warm request. Redis is recommended only for LB deployments with 2+ capsule nodes. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Both Redis (RHEL 9) and Valkey (RHEL 10+, the Redis fork that ships in RHEL 10 due to Redis license change to SSPL) use the same redis:// wire protocol and are supported by the same Ruby redis gem. The setting name should not imply a specific package. Renames: :redis_url → :cache_url (plugin setting) redis_client() → registration_cache_client() (class method) cache=HIT source=redis → cache=HIT source=shared (log message) The redis:// URI scheme in the setting value is unchanged — it is the wire protocol identifier accepted by both Redis and Valkey. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Under bulk registration (500+ hosts), all POST /register requests arrive at the capsule simultaneously and are forwarded to Foreman without any throttling, creating the same pile-up at POST /rhsm/consumers that the Katello caching PRs are trying to reduce. Adds an optional :max_concurrent_registrations setting that limits how many host-register requests the capsule forwards to Foreman in parallel: # /etc/foreman-proxy/settings.d/registration.yml :max_concurrent_registrations: 50 When all permits are taken the capsule returns 503 + Retry-After: 30 immediately (no in-capsule wait queue). The 503 propagates to the registration script, and the orchestration layer (Ansible retry_failed, satperf wave batching) decides when to retry. Uses Concurrent::Semaphore from the concurrent-ruby gem (already a smart-proxy dependency). The semaphore is lazy-initialised at class level and persists for the lifetime of the process so the permit count is shared across all concurrent Sinatra handler threads. Default: unset (unlimited) — backward-compatible with existing deploys. Sizing guide: start at 50-80% of Foreman's Rails thread pool size, then tune based on observed queue depth in the HAProxy stats page. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

pablomh and others added 6 commits April 2, 2026 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #XXXXX - Add cache logging, health endpoint, and concurrency limiter to registration module#1

Fixes #XXXXX - Add cache logging, health endpoint, and concurrency limiter to registration module#1
pablomh wants to merge 6 commits intodevelopfrom
registration-observability

pablomh commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pablomh commented Apr 4, 2026

Summary

Depends on

Design notes

with_concurrency_limit(&block) — top-down design

Shared cache architecture

Companion PRs

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`with_concurrency_limit(&block)` — top-down design