feat: worktime scoring strategy for Prometheus discovery#54
Conversation
Implements a strategy pattern for time-based weighting of Prometheus range query data points. The WorktimeStrategy allows configuring time-of-day windows with weight multipliers, so images discovered during business hours score higher than those discovered at night. API additions: - ScoringStrategy type with worktime variant - WorktimeStrategy with configurable windows and timezone - WorktimeWindow with startHour, endHour, weight Implementation: - ScoreWeighter interface in internal/discovery/scoring.go - worktimeWeighter implementation with timezone support - Integration into aggregateRangeValues via optional weighter param - Controller wiring in buildSource Closes #53
|
@copilot given that more sources might follow in the future, a discovery source is an interface which can implement/use strategies or not. The architecutral choice is where this weighting happens. a source shall always return pairs of Image + score. Each source is responsible to return that and calculate whatever suits for the score. In case of the registry it probably only returns 1 as score for each Image. In the case of prometheus it's based on query and aggregation method. In case of other Metrik backends it might be something else. I think the Design of this is Crucial. Propose 3 architectures that make sense and weigh their pros and cons from Software pattern Perspektive |
Here are three architectures for where time-based weighting lives relative to the Architecture A: Strategy injected into Source (current implementation)Pros:
Cons:
Architecture B: Decorator/Middleware wrapping Sourcetype WeightedSource struct {
Inner Source
Weighter ScoreWeighter
}
func (w *WeightedSource) Fetch(ctx context.Context) ([]ImageResult, error) {
results, err := w.Inner.Fetch(ctx)
// apply weighting to scores post-fetch
for i := range results { results[i].Score = applyWeight(results[i].Score, ...) }
return results, err
}Pros:
Cons:
Architecture C: Scoring as a separate pipeline stage (Source → Scorer → Ranker)type Scorer interface {
Apply(results []ImageResult, meta []DataPointMeta) []ImageResult
}Sources return raw results plus optional metadata (e.g., per-datapoint timestamps). A separate Pros:
Cons:
RecommendationArchitecture A (current) is the pragmatic choice for now because:
If the project later adds 3+ metric sources that all need the same weighting, it would make sense to extract a shared Want me to keep Architecture A as-is, or pivot to one of the alternatives? |
|
@copilot it's fine like it is. Add e2e tests for the current implementation and also seed the prometheus with images that we later do Not want to see in the discovery object |
Seeds Prometheus with images in the aggregation-test namespace and creates two DiscoveryPolicies: - e2e-worktime-full: weight=1.0 for all hours → positive scores - e2e-worktime-zero: weight=0.0 for all hours → scores suppressed to 0 The zero-weight policy demonstrates that images pulled outside desired time windows (e.g., nighttime) are suppressed and do not appear with meaningful scores in the discovery object.
Added e2e Chainsaw tests in
The zero-weight policy proves that images seeded in Prometheus (busybox, alpine) are suppressed when the worktime window assigns them zero weight — they still appear in |
|
@copilot that is not what I want to test. I want e2e tests that have your example above 9-17 |
…s-hours scenario - Increase Prometheus retention from 1h to 48h so historical seed data (injected at yesterday's timestamps) survives long enough for the test - Seed nginx at 09:00-16:59 UTC yesterday (business hours, weight=1.0) and redis at 00:00-05:59 UTC yesterday (nighttime, outside all windows, weight=0.0) via promtool tsdb create-blocks-from openmetrics + reload - DiscoveryPolicy e2e-worktime-business uses the exact 9-17 / 6-9 / 17-19 window config from the PR description (timezone UTC, aggregation sum) - maxImages=1 means only the highest-scoring image (nginx) appears; redis is excluded because its weighted sum is 0 - Assert imageCount=1, discoveredImages[0].image=nginx, score>0, and that redis does not appear anywhere in discoveredImages
Done. Replaced the trivial 0–24 tests with a proper 9–17 scenario. How the test works: The Chainsaw seed step uses
The The assertion step verifies Prometheus retention was bumped from 1h to 48h so the injected historical blocks survive the test run: commits |
Two bugs caused the test to fail when CI runs after 12:00 UTC:
1. lookback: 27h → 48h in DiscoveryPolicy
Yesterday's 09:00 UTC data can be up to ~39h old (when CI runs at
23:59 UTC). With lookback=27h it falls outside the query window and
nginx gets score=0, causing the assert (imageCount=1) to fail.
lookback=48h matches the Prometheus retention and always covers all
seed data regardless of the time CI runs.
2. Wait loop now queries at a fixed historical timestamp (yesterday noon)
The readiness check previously queried worktime_test_cpu without a
time= parameter, so Prometheus evaluated it at 'now' using its default
5-minute staleness window. Since all seeded data is from yesterday
(24h+ ago), the instant query always returned empty results and the
seed step always timed out with an error. Passing &time=${TS_NOON}
pins the query to a known data point.
Adds a strategy-pattern–based scoring function that weights Prometheus range query data points by time of day before aggregation. This lets operators prioritize images pulled during business hours over nighttime noise.
Changes
api/v1alpha1):ScoringStrategy,WorktimeStrategy,WorktimeWindow— configurable time-of-day windows with weight multipliers and timezone supportinternal/discovery/scoring.go):ScoreWeighterinterface withworktimeWeighterimplementation; extensible for future strategiesaggregateRangeValuesapplies per-datapoint weight multiplication when a weighter is configured; no-op when nil (backward compatible)buildSourceconstructs the weighter from API config and passes it through toPrometheusSourcemake codegentest/e2e/discovery-worktime/): Chainsaw test with a realistic 9–17 business-hours window. Historical seed data is injected into Prometheus at deterministic UTC timestamps viapromtool tsdb create-blocks-from openmetrics:nginx:1.25-alpineseeded at 09:00–16:59 UTC (business hours) → weight 1.0 → positive score → appears in discoveryredis:7-alpineseeded at 00:00–05:59 UTC (nighttime, outside all windows) → weight 0.0 → score 0 → excluded bymaxImages: 1hack/e2e-infra/prometheus.yaml): bumped from1hto48hso injected historical TSDB blocks survive the test runExample
Data points at 3am get weight 0 (dropped from sum); data points at 10am get full weight. The
endHourboundary is exclusive (hour < endHour), so adjacent windows are contiguous with no overlap. The strategy is optional and nil-safe — existing configs are unaffected.