feat: GoodJob oldest-queued-age, process count, and opt-in per-queue metrics#373
Open
xrl wants to merge 1 commit into
Open
feat: GoodJob oldest-queued-age, process count, and opt-in per-queue metrics#373xrl wants to merge 1 commit into
xrl wants to merge 1 commit into
Conversation
…eue metrics Adds two purely-additive global gauges to the GoodJob instrumentation: - good_job_oldest_queued_age_seconds (queue latency / backlog age) - good_job_processes (active GoodJob processes) Adds an opt-in `GoodJob.start(per_queue: true)` that breaks the job-state gauges and oldest-queued-age down by a `queue` label. It defaults to false, so existing (cluster-wide, unlabelled) output is unchanged; when enabled, a metric carries the queue label consistently so sum() still yields the total. The server collector already applies custom_labels, so the queue label needs no collector change beyond registering the two new gauges.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
While building Grafana dashboards for a Rails app's GoodJob queues, the built-in
GoodJobinstrumentation gave us the job-state totals but was missing the three signals we reached for most:good_job_oldest_queued_age_seconds— how long the oldest ready-to-run job has been waiting. This is the single best backlog/latency signal (a queue can have a small count but be badly behind), and it's what you'd actually alert on.good_job_processes— number of active GoodJob processes (GoodJob::Process.active.count), so you can see whether workers are actually up.Opening this to add them and to get your thoughts on the per-queue approach.
Backwards compatibility
collectpayload and the server collector'sGOOD_JOB_GAUGES.GoodJob.start(per_queue: true)(passed throughPeriodicStats.start's**kwargs). Default isfalse, so existing deployments emit the exact same cluster-wide, unlabelled series as before.per_queue: true, the job-state gauges and oldest-age carry aqueuelabel instead of an aggregate (sosum(good_job_queued)still returns the cluster total), whilegood_job_processesstays unlabelled. A metric never mixes labelled and unlabelled series, so nothing double-counts.The server collector needed no label logic — it already applies
custom_labels, so the instrumentation just sends one object per queue withcustom_labels: { queue: ... }.Tests / docs
test/server/good_job_collector_test.rb(new gauges + a per-queue label assertion).test/instrumentation/good_job_test.rbwith a minimal GoodJob double covering bothcollect(default) andcollect_per_queue.164 runs, 0 failures); rubocop clean.Questions for you
good_job_oldest_queued_age_secondsvsgood_job_queue_latency_seconds;good_job_processesvsgood_job_active_processes?good_job_processesis guarded withdefined?(::GoodJob::Process); happy to adjust the floor if you support older GoodJob.Happy to iterate on any of it.