Fixes #XXXXX - Add cache logging, health endpoint, and concurrency limiter to registration module#1
Draft
Fixes #XXXXX - Add cache logging, health endpoint, and concurrency limiter to registration module#1
Conversation
GET /register returns the same shell script for all hosts sharing the
same registration parameters (org, location, hostgroup, activation keys).
Under bulk registration — 100+ hosts hitting the same capsule simultaneously
— this endpoint is called once per host, each time proxying to Foreman and
waiting for the ERB template to render (~103ms in profiling).
Cache the rendered script in memory (5-minute TTL) keyed on a canonical
form of the request query string. The cache key is computed by parsing the
query string, sorting parameters alphabetically, and rebuilding — so
requests that differ only in parameter order share the same cache entry
and the same Foreman response, regardless of how the client ordered them.
The implementation uses three clearly separated layers:
get '/' — handles errors only; delegates to registration_script
registration_script — owns the cache key and the business logic (what
to cache and what to do on a miss); raises
ScriptFetchError on non-200 so errors never reach
the cache write
cache(key, &block) — owns the mechanism: per-key double-checked locking
via a block abstraction that keeps the caller free
of locking concerns
Per-key locking allows concurrent requests for genuinely different keys
(e.g. different activation keys) to fetch from Foreman in parallel, while
serialising only threads competing for the same key. The per-key Mutex is
evicted from KEY_MUTEXES immediately after caching — once a key is hot,
all future requests take the lock-free fast path and KEY_MUTEXES is empty
under steady state.
Only HTTP 200 responses are cached. Non-200 responses raise ScriptFetchError
out of the cache block, which is rescued in get '/' and rendered via
handle_response without poisoning the cache.
Both KEY_MUTEXES and SCRIPT_CACHE use Concurrent::Map (already a
smart-proxy dependency) for lock-free, thread-safe access on all Ruby
VMs without relying on MRI's GIL.
Tests added:
- Cache hit: Foreman called once for repeated identical requests
- Per-key isolation: different parameter sets cached independently
- Parameter order independence: requests differing only in param order
share the same cache entry
- TTL expiry: expired entries are not served; Foreman is re-called
- Mutex eviction: KEY_MUTEXES is empty after a successful cache write
- Error non-caching: Foreman is called on every request when it errors
- setup clears both SCRIPT_CACHE and KEY_MUTEXES between tests
Makes the script cache added in #39208 observable at debug log level: registration_script cache=HIT age=42s key_prefix=org_id=1&location_id=... registration_script cache=MISS key_prefix=org_id=1&location_id=... The key is truncated to 40 characters to avoid flooding the log while still distinguishing between different parameter sets. Log level is debug so it is silent in production by default and available on demand via the smart-proxy log level setting. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Load balancers that front multiple capsules currently have no way to
distinguish a capsule that is up but cannot reach Foreman (and will
therefore fail every registration) from a healthy capsule.
Adds GET /register/health that:
- Returns 200 {"status":"ok"} if the capsule can reach Foreman's
/api/status endpoint (any HTTP response = reachable)
- Returns 503 {"status":"error",...} if the connection fails
- Any unexpected error also returns 503 so the LB removes the capsule
The check uses the existing ForemanRequest infrastructure already used
by the registration proxy, so no new configuration is needed.
LB configuration example (HAProxy):
option httpchk GET /register/health
http-check expect status 200
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…deployments When multiple capsule nodes sit behind a load balancer, the existing per-node in-memory cache (PR #39208) means each node must independently fetch the registration script from Foreman on its first request — N capsule nodes = N warming requests. Under a burst of concurrent registrations, each node experiences its own thundering herd. This adds an optional shared Redis cache so that one warm request on any node benefits all nodes in the pool. Configuration: # config/settings.d/registration.yml :redis_url: redis://lb-host:6379/0 Cache lookup order: 1. Redis (shared, populated by whichever node fetched first) 2. Per-node in-memory cache (fast path for repeat requests to same node) 3. Foreman (miss — fetches and populates both caches) A Redis HIT from one node also warms that node's local in-memory cache, so subsequent requests to the same node skip Redis entirely. Failure handling: Redis errors (connection refused, timeout, LoadError) are caught and logged at warn level; the in-memory cache takes over transparently. This means a Redis outage does not break registration — it only reverts to per-node caching. The 'redis' gem is added with require: false so it is only loaded when :redis_url is configured. Without configuration, zero overhead. Note: standalone capsules (single node) gain no benefit from Redis; the existing in-memory cache already provides full hit rate after one warm request. Redis is recommended only for LB deployments with 2+ capsule nodes. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Both Redis (RHEL 9) and Valkey (RHEL 10+, the Redis fork that ships in RHEL 10 due to Redis license change to SSPL) use the same redis:// wire protocol and are supported by the same Ruby redis gem. The setting name should not imply a specific package. Renames: :redis_url → :cache_url (plugin setting) redis_client() → registration_cache_client() (class method) cache=HIT source=redis → cache=HIT source=shared (log message) The redis:// URI scheme in the setting value is unchanged — it is the wire protocol identifier accepted by both Redis and Valkey. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Under bulk registration (500+ hosts), all POST /register requests arrive at the capsule simultaneously and are forwarded to Foreman without any throttling, creating the same pile-up at POST /rhsm/consumers that the Katello caching PRs are trying to reduce. Adds an optional :max_concurrent_registrations setting that limits how many host-register requests the capsule forwards to Foreman in parallel: # /etc/foreman-proxy/settings.d/registration.yml :max_concurrent_registrations: 50 When all permits are taken the capsule returns 503 + Retry-After: 30 immediately (no in-capsule wait queue). The 503 propagates to the registration script, and the orchestration layer (Ansible retry_failed, satperf wave batching) decides when to retry. Uses Concurrent::Semaphore from the concurrent-ruby gem (already a smart-proxy dependency). The semaphore is lazy-initialised at class level and persists for the lifetime of the process so the permit count is shared across all concurrent Sinatra handler threads. Default: unset (unlimited) — backward-compatible with existing deploys. Sizing guide: start at 50-80% of Foreman's Rails thread pool size, then tune based on observed queue depth in the HAProxy stats page. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds on smart-proxy#935 (in-memory registration script cache) with:
read_registration_cachelogsregistration_script cache=HIT age=Xsorcache=MISS, withsource=shared|localwhen Redis is in useGET /register/healthendpoint — returns200 OKif the capsule can reach Foreman's/api/status,503if not; used by HAProxy health checks to route registration traffic away from degraded capsuleswith_concurrency_limit(&block)— block-based abstraction that wraps thePOST /handler; owns semaphore acquire/release and returns503 Retry-After: 30when the limit is exhausted. The handler expresses what to do; the helper owns how to limit concurrency.:cache_urlsetting (redis://host:6379/0); when set, all capsule nodes in an LB pool share one cache so a single warm request benefits all nodes. Falls back to in-memory cache if Redis is unreachable.:max_concurrent_registrationssetting — configurable viaregistration.yml; default unlimited (backward compatible)Depends on
Design notes
with_concurrency_limit(&block)— top-down designThe
POST /handler now reads as a statement of intent:The mechanical concern (semaphore acquire/release, 503 response,
Retry-Afterheader) lives entirely inwith_concurrency_limit. The handler has no knowledge of how concurrency limiting works.Shared cache architecture
One Redis/Valkey instance on the LB host, shared by all capsule nodes in the pool. Each capsule connects over the private network. If Redis is unreachable, the capsule falls back to its in-memory cache — registration continues working with per-node warm-up.
Companion PRs
registration-metricsregistration_cacheAnsible role + HAProxy health check configTest plan
ruby test/registration/registration_api_test.rbcache=HIT/cache=MISSlines appear in proxy logGET /register/healthreturns 200 when Foreman is reachable, 503 when notPOST /returns 503 withRetry-After: 30when semaphore is exhausted:cache_urlset, verify Rediskeyspace_hitsincrements on cache hits:cache_url, verify fallback to in-memory cache works🤖 Generated with Claude Code