Goal
Extend Drop's DiscoveryPolicy so image discovery can rank images with more than a single usage-count score.
The proposed model is:
queries -> signals -> ranking -> selected images
This separates data collection from scoring:
- Queries fetch raw data from systems such as Prometheus or Loki.
- Signals derive named per-image metrics from query results.
- Ranking strategies combine one or more signals into the final ordered image list.
The goal is to support practical image prewarming strategies for Kubernetes CI/CD workloads, especially GitLab Kubernetes executor node pools.
Problem
A simple count-based discovery strategy answers:
Which images appeared most often?
That is useful, but incomplete.
CI workloads have different shapes:
- some images are used steadily throughout the day,
- some images are used mainly during developer feedback hours,
- some images appear in short high-concurrency bursts,
- some images are used in nightly validation jobs,
- some images are not frequent but are expensive when cold,
- some images matter because node rotation leaves many nodes cold for them.
To support these cases, Drop needs named input data, reusable derived signals, and explicit ranking logic.
Design Overview
A DiscoveryPolicy should define:
spec:
queries: []
signals: []
ranking: {}
Query
A query fetches raw observations.
Examples:
- Prometheus range query for image usage.
- Loki range query for Kubernetes image pull events.
- Future external pull-cost profile.
Signal
A signal derives a named per-image value from query results.
Examples:
total-usage
peak-concurrency
developer-weighted-usage
recent-usage
p50-cold-pull-time
Ranking
A ranking strategy combines signals into the final score.
Examples:
- rank by one signal,
- weighted sum of normalized signals,
- model-aware exposure score.
Discovery Strategies
1. Total Usage
Ranks images by total observed usage over a lookback window.
score(I) = sum(count_I(t) for t in W)
Required signal:
Required query:
Prometheus image-usage range query
Use when:
- the workload is stable,
- the goal is a simple hot-image baseline,
- the user wants the most commonly observed images.
Limitation:
- May miss images that are not globally frequent but appear in large bursts.
2. Peak Same-Image Concurrency
Ranks images by maximum observed concurrent usage.
score(I) = max(count_I(t) for t in W)
Required signal:
Required query:
Prometheus image-usage range query
Use when:
- CI has fan-out stages,
- CI has scheduled high-volume jobs,
- nightly validation jobs create many Pods using the same image,
- registry pressure from synchronized cold pulls is a concern.
Limitation:
- A rare spike can dominate if this is used alone.
3. Developer-Time Weighted Usage
Ranks images by usage during configured developer feedback windows.
score(I) = sum(weight(t) * count_I(t) for t in W)
Example weighting:
| Time window |
Weight |
| 07:00-09:00 |
0.3 |
| 09:00-17:00 |
1.0 |
| 17:00-20:00 |
0.3 |
| otherwise |
0.0 |
Required signal:
Required query:
Prometheus image-usage range query
Use when:
- optimizing developer feedback time,
- the team has known working-hour patterns,
- interactive CI matters more than background/nightly work.
Limitation:
- Requires timezone and window configuration.
- May not fit globally distributed teams without multiple windows or broader policies.
4. Recent Usage
Ranks images by usage in a short recent window.
score(I) = sum(count_I(t) for t in recent window)
Required signal:
Required query:
Prometheus image-usage range query
Use when:
- image usage changes quickly,
- new images are introduced often,
- short-lived project activity should influence prewarming.
Limitation:
- Can overreact to temporary spikes.
5. Hybrid Usage + Peak Concurrency
Balances generally hot images and burst-heavy images.
score(I) =
alpha * normalize(total_usage(I))
+ (1 - alpha) * normalize(peak_concurrency(I))
Example:
Meaning:
70% total usage
30% peak concurrency
Required signals:
total-usage
peak-concurrency
Required query:
Prometheus image-usage range query
Use when:
- the cluster has mixed workloads,
- both steady hot images and bursty images matter,
- pure count and pure max are both too narrow.
Limitation:
- Requires normalization and explainable status output.
6. Hybrid Developer-Time Usage + Peak Concurrency
Balances developer-feedback relevance with burst detection.
score(I) =
alpha * normalize(developer_weighted_usage(I))
+ (1 - alpha) * normalize(peak_concurrency(I))
Required signals:
developer-weighted-usage
peak-concurrency
Required query:
Prometheus image-usage range query
Use when:
- developer feedback is the primary goal,
- but off-hour bursts still matter operationally.
Limitation:
- Requires both time-window weighting and normalization.
7. Count × Pull Time
Ranks images by usage multiplied by measured image availability time.
score(I) = total_usage(I) * p_hat(I)
Required signals:
total-usage
p50-cold-pull-time
or:
total-usage
p95-cold-pull-time
Required queries:
Prometheus image-usage query
Loki pull-event query or external pull-cost profile
Use when:
- image pull costs vary significantly,
- a medium-frequency but expensive image should outrank a tiny frequent image.
Limitation:
- Requires per-image pull-time estimates.
8. Developer-Weighted Count × Pull Time
Ranks developer-relevant images by estimated cold-start cost.
score(I) = developer_weighted_usage(I) * p_hat(I)
Required signals:
developer-weighted-usage
p50-cold-pull-time
Required queries:
Prometheus image-usage query
Loki pull-event query or external pull-cost profile
Use when:
- the goal is reducing developer-facing affected job-minutes.
Limitation:
- Requires time-window configuration and pull-time estimates.
9. Model-Aware Exposure
Ranks images by estimated post-rotation cold-node exposure.
score(I) =
J_target(I)
* cold_fraction_hat(I)
* p_hat(I)
with:
cold_fraction_hat(I) = (1 - 1/N) ^ J_pre(I)
Where:
N is the number of eligible CI nodes,
J_pre(I) is usage before the target window,
J_target(I) is usage during the target window,
p_hat(I) is measured or estimated image availability time.
Required signals:
pre-window-usage
target-window-usage
p50-cold-pull-time
Required configuration:
Required queries:
Prometheus image-usage query
Loki pull-event query or external pull-cost profile
Use when:
- prewarming should be node-rotation-aware,
- enough observability exists to estimate pull time,
- the user wants a closer approximation of affected job-minutes.
Limitation:
- More assumptions than usage-only strategies.
- Should be implemented as a typed ranking strategy.
Required Pipeline Capabilities
Query Types
Prometheus
Used for:
- total usage,
- peak concurrency,
- developer-time usage,
- recent usage,
- pre-window usage,
- target-window usage.
Normalized output:
Loki
Used for Kubernetes image-pull event analysis when Prometheus does not expose useful per-image pull durations.
Normalized output:
timestamp,pod,image,reason,message
Pull Cost Profile
Optional future alternative to Loki.
Normalized output:
image,p50ColdPullSeconds,p95ColdPullSeconds,sampleCount
This can be generated by an external analyzer if pull-time parsing should not live inside the Drop controller.
Signal Types
| Signal type |
Purpose |
Example signals |
aggregate |
Aggregate all samples per image |
total-usage, peak-concurrency |
timeWeightedAggregate |
Apply time-window weights before aggregation |
developer-weighted-usage |
windowAggregate |
Aggregate a specific sub-window |
recent-usage, pre-window-usage, target-window-usage |
eventPullTime |
Derive pull-time stats from events |
p50-cold-pull-time, p95-cold-pull-time |
Ranking Strategies
| Ranking strategy |
Purpose |
signal |
Rank directly by one signal |
weightedSum |
Combine normalized signals |
modelExposure |
Rank by expected post-rotation exposure |
Proposed CRD Shape
Overview
apiVersion: drop.corewire.io/v1alpha1
kind: DiscoveryPolicy
metadata:
name: gitlab-runner-discovery
spec:
syncInterval: 1h
maxImages: 30
queries: []
signals: []
ranking: {}
Queries
Prometheus Image Usage Query
queries:
- name: runner-image-usage
type: prometheus
prometheus:
endpoint: https://mimir.example.com
queryType: range
lookback: 168h
step: 1m
query: |
count(
container_memory_working_set_bytes{
container!="",
container!="POD",
namespace="gitlab-runner",
pod=~"runner-.*"
}
) by (image)
The query must return an image label.
Normalized result:
Example:
2026-06-18T09:00:00Z,registry.example.com/ci/node-build:22,18
2026-06-18T09:01:00Z,registry.example.com/ci/node-build:22,21
Loki Image Pull Event Query
queries:
- name: image-pull-events
type: loki
loki:
endpoint: https://loki.example.com
queryType: range
lookback: 168h
query: |
{job="kubernetes-events", namespace="gitlab-runner"}
| json
| involvedObject_name =~ "runner-.*"
| reason =~ "Pulling|Pulled|Failed|BackOff"
parser:
type: kubernetesEvents
podField: involvedObject_name
reasonField: reason
messageField: message
imageField: message
Normalized result:
timestamp,pod,image,reason,message
Expected event messages include:
Pulling image "registry.example.com/ci/java-gradle:21"
Successfully pulled image "registry.example.com/ci/java-gradle:21" in 42.3s
Container image "registry.example.com/ci/java-gradle:21" already present on machine
Failed to pull image "registry.example.com/ci/java-gradle:21"
Back-off pulling image "registry.example.com/ci/java-gradle:21"
Signals
aggregate
Aggregates all samples per image.
Supported methods:
Total usage:
signals:
- name: total-usage
queryRef: runner-image-usage
type: aggregate
aggregate:
method: sum
Peak concurrency:
signals:
- name: peak-concurrency
queryRef: runner-image-usage
type: aggregate
aggregate:
method: max
timeWeightedAggregate
Applies configured time weights before aggregation.
signals:
- name: developer-weighted-usage
queryRef: runner-image-usage
type: timeWeightedAggregate
timeWeightedAggregate:
method: sum
timezone: Europe/Berlin
defaultWeight: "0"
windows:
- startHour: 7
endHour: 9
weight: "0.3"
- startHour: 9
endHour: 17
weight: "1.0"
- startHour: 17
endHour: 20
weight: "0.3"
windowAggregate
Aggregates a specific time window.
Recent usage:
signals:
- name: recent-usage
queryRef: runner-image-usage
type: windowAggregate
windowAggregate:
method: sum
relativeWindow: 2h
Pre-window usage:
signals:
- name: pre-window-usage
queryRef: runner-image-usage
type: windowAggregate
windowAggregate:
method: sum
timezone: Europe/Berlin
window:
start: "00:00"
end: "09:00"
Target-window usage:
signals:
- name: developer-window-usage
queryRef: runner-image-usage
type: windowAggregate
windowAggregate:
method: sum
timezone: Europe/Berlin
window:
start: "09:00"
end: "17:00"
eventPullTime
Derives image pull-time statistics from event records.
signals:
- name: p50-cold-pull-time
queryRef: image-pull-events
type: eventPullTime
eventPullTime:
statistic: p50
includeCacheHits: false
durationMode: eventPair
Supported statistics:
p50
p90
p95
avg
max
count
failureCount
cacheHitCount
Supported duration modes:
| Mode |
Meaning |
eventPair |
Pulled.timestamp - Pulling.timestamp for the same Pod/image |
messageDuration |
parse duration from a Pulled event message |
Cache hits should be detected separately and excluded from cold-pull duration when:
Ranking Strategies
signal
Ranks directly by one signal.
ranking:
strategy: signal
signal:
signalRef: total-usage
weightedSum
Combines normalized signals.
ranking:
strategy: weightedSum
weightedSum:
normalize: minMax
missingSignal: zero
terms:
- signalRef: total-usage
weight: "0.7"
- signalRef: peak-concurrency
weight: "0.3"
Formula:
final_score(I) =
0.7 * normalize(total_usage(I))
+ 0.3 * normalize(peak_concurrency(I))
Initial normalization method:
Formula:
normalized(x) = (x - min) / (max - min)
If all values are equal:
modelExposure
Ranks by expected post-rotation exposure.
ranking:
strategy: modelExposure
modelExposure:
nodeCount: 100
preWindowUsageSignalRef: pre-window-usage
targetWindowUsageSignalRef: developer-window-usage
pullTimeSignalRef: p50-cold-pull-time
Formula:
score(I) =
J_target(I)
* (1 - 1/N) ^ J_pre(I)
* p_hat(I)
Complete Examples
Example 1: Hybrid Usage and Peak Concurrency
apiVersion: drop.corewire.io/v1alpha1
kind: DiscoveryPolicy
metadata:
name: gitlab-hybrid-usage-concurrency
spec:
syncInterval: 1h
maxImages: 30
queries:
- name: runner-image-usage
type: prometheus
prometheus:
endpoint: https://mimir.example.com
queryType: range
lookback: 168h
step: 1m
query: |
count(
container_memory_working_set_bytes{
container!="",
container!="POD",
namespace="gitlab-runner",
pod=~"runner-.*"
}
) by (image)
signals:
- name: total-usage
queryRef: runner-image-usage
type: aggregate
aggregate:
method: sum
- name: peak-concurrency
queryRef: runner-image-usage
type: aggregate
aggregate:
method: max
ranking:
strategy: weightedSum
weightedSum:
normalize: minMax
missingSignal: zero
terms:
- signalRef: total-usage
weight: "0.7"
- signalRef: peak-concurrency
weight: "0.3"
Example 2: Developer-Time Usage and Peak Concurrency
apiVersion: drop.corewire.io/v1alpha1
kind: DiscoveryPolicy
metadata:
name: gitlab-developer-and-burst
spec:
syncInterval: 1h
maxImages: 30
queries:
- name: runner-image-usage
type: prometheus
prometheus:
endpoint: https://mimir.example.com
queryType: range
lookback: 168h
step: 1m
query: |
count(
container_memory_working_set_bytes{
container!="",
container!="POD",
namespace="gitlab-runner",
pod=~"runner-.*"
}
) by (image)
signals:
- name: developer-weighted-usage
queryRef: runner-image-usage
type: timeWeightedAggregate
timeWeightedAggregate:
method: sum
timezone: Europe/Berlin
defaultWeight: "0"
windows:
- startHour: 7
endHour: 9
weight: "0.3"
- startHour: 9
endHour: 17
weight: "1.0"
- startHour: 17
endHour: 20
weight: "0.3"
- name: peak-concurrency
queryRef: runner-image-usage
type: aggregate
aggregate:
method: max
ranking:
strategy: weightedSum
weightedSum:
normalize: minMax
missingSignal: zero
terms:
- signalRef: developer-weighted-usage
weight: "0.7"
- signalRef: peak-concurrency
weight: "0.3"
Example 3: Model-Aware Exposure
apiVersion: drop.corewire.io/v1alpha1
kind: DiscoveryPolicy
metadata:
name: gitlab-model-aware-exposure
spec:
syncInterval: 1h
maxImages: 30
queries:
- name: runner-image-usage
type: prometheus
prometheus:
endpoint: https://mimir.example.com
queryType: range
lookback: 168h
step: 5m
query: |
count(
container_memory_working_set_bytes{
container!="",
container!="POD",
namespace="gitlab-runner",
pod=~"runner-.*"
}
) by (image)
- name: image-pull-events
type: loki
loki:
endpoint: https://loki.example.com
queryType: range
lookback: 168h
query: |
{job="kubernetes-events", namespace="gitlab-runner"}
| json
| involvedObject_name =~ "runner-.*"
| reason =~ "Pulling|Pulled|Failed|BackOff"
parser:
type: kubernetesEvents
podField: involvedObject_name
reasonField: reason
messageField: message
imageField: message
signals:
- name: pre-window-usage
queryRef: runner-image-usage
type: windowAggregate
windowAggregate:
method: sum
timezone: Europe/Berlin
window:
start: "00:00"
end: "09:00"
- name: developer-window-usage
queryRef: runner-image-usage
type: windowAggregate
windowAggregate:
method: sum
timezone: Europe/Berlin
window:
start: "09:00"
end: "17:00"
- name: p50-cold-pull-time
queryRef: image-pull-events
type: eventPullTime
eventPullTime:
statistic: p50
includeCacheHits: false
durationMode: eventPair
ranking:
strategy: modelExposure
modelExposure:
nodeCount: 100
preWindowUsageSignalRef: pre-window-usage
targetWindowUsageSignalRef: developer-window-usage
pullTimeSignalRef: p50-cold-pull-time
Status and Observability
The controller should expose enough status to explain every selected image.
Example:
status:
lastRunTime: "2026-06-18T10:00:00Z"
observedGeneration: 4
queryResults:
- name: runner-image-usage
type: prometheus
series: 30
samples: 60480
status: success
- name: image-pull-events
type: loki
records: 1820
status: success
signalResults:
- name: total-usage
images: 30
status: success
- name: peak-concurrency
images: 30
status: success
discoveredImages:
- image: registry.example.com/ci/java-gradle:21
rank: 1
finalScore: "0.8768"
selected: true
signals:
- name: total-usage
rawValue: "8210"
normalizedValue: "0.824"
- name: peak-concurrency
rawValue: "96"
normalizedValue: "1.0"
ranking:
strategy: weightedSum
terms:
- signal: total-usage
weight: "0.7"
contribution: "0.5768"
- signal: peak-concurrency
weight: "0.3"
contribution: "0.3"
Status output should support debugging:
- query failures,
- missing labels,
- missing signals,
- normalization values,
- ranking contributions,
- final selected images.
Validation Plan
Query Tests
- Prometheus query results are normalized into
timestamp,image,value.
- Loki query results are normalized into
timestamp,pod,image,reason,message.
- Missing
image labels are rejected or ignored according to defined behavior.
- Query failures are surfaced in status.
Signal Tests
aggregate.sum
aggregate.max
aggregate.avg
aggregate.count
timeWeightedAggregate
windowAggregate
eventPullTime
Ranking Tests
signal
weightedSum
modelExposure
- missing signal handling,
- normalization behavior,
- deterministic tie-breaking.
Integration Tests
Use fake Prometheus and Loki responses to verify:
- one query can feed multiple signals,
- multiple signals can feed one ranking,
- selected image order is deterministic,
- status contains query, signal, and ranking details.
Implementation Split
Issue 1: CRD for Query, Signal, and Ranking Pipeline
Define the queries, signals, and ranking API.
Issue 2: Prometheus Query Execution
Implement named Prometheus range queries and normalized sample output.
Issue 3: Aggregate Signals
Implement:
aggregate.sum
aggregate.max
aggregate.avg
aggregate.count
aggregate.min
Issue 4: Basic Ranking
Implement signal ranking.
Issue 5: Weighted Ranking
Implement weightedSum ranking with minMax normalization.
Issue 6: Status Output
Expose query results, signal results, ranking contributions, and selected images.
Issue 7: Time-Based Signals
Implement:
timeWeightedAggregate
windowAggregate
Issue 8: Loki Query Source
Implement Loki range query support.
Issue 9: Event Pull-Time Signal
Implement eventPullTime.
Issue 10: Model-Aware Exposure Ranking
Implement typed modelExposure.
Issue 11: Documentation
Document:
- total usage,
- peak concurrency,
- developer-time usage,
- hybrid usage/concurrency,
- pull-time-aware ranking,
- model-aware exposure.
Design Decisions to Resolve
Missing signal behavior
Initial proposal:
Alternative:
drop image from ranking if a required signal is missing
Pull-time statistic
Initial proposal:
Alternative:
The choice should be configurable.
Pull-time source
Two options:
- Native Loki query and
eventPullTime.
- External
ImagePullCostProfile produced by a separate analyzer.
A native Loki source is convenient. An external profile may keep the controller simpler.
Recommendation
Adopt the queries -> signals -> ranking pipeline for Drop discovery.
This design supports:
- multiple signals from one query,
- true hybrid ranking,
- Prometheus and Loki inputs,
- pull-time-aware ranking,
- model-aware exposure scoring,
- explainable status output,
- and a clean split into implementation PRs.
The first production-ready strategies should be:
signal(total-usage)
signal(peak-concurrency)
weightedSum(total-usage, peak-concurrency)
signal(developer-weighted-usage)
weightedSum(developer-weighted-usage, peak-concurrency)
The advanced strategy should be:
modelExposure(pre-window-usage, target-window-usage, p50-cold-pull-time)
Goal
Extend Drop's
DiscoveryPolicyso image discovery can rank images with more than a single usage-count score.The proposed model is:
This separates data collection from scoring:
The goal is to support practical image prewarming strategies for Kubernetes CI/CD workloads, especially GitLab Kubernetes executor node pools.
Problem
A simple count-based discovery strategy answers:
That is useful, but incomplete.
CI workloads have different shapes:
To support these cases, Drop needs named input data, reusable derived signals, and explicit ranking logic.
Design Overview
A
DiscoveryPolicyshould define:Query
A query fetches raw observations.
Examples:
Signal
A signal derives a named per-image value from query results.
Examples:
total-usagepeak-concurrencydeveloper-weighted-usagerecent-usagep50-cold-pull-timeRanking
A ranking strategy combines signals into the final score.
Examples:
Discovery Strategies
1. Total Usage
Ranks images by total observed usage over a lookback window.
Required signal:
Required query:
Use when:
Limitation:
2. Peak Same-Image Concurrency
Ranks images by maximum observed concurrent usage.
Required signal:
Required query:
Use when:
Limitation:
3. Developer-Time Weighted Usage
Ranks images by usage during configured developer feedback windows.
Example weighting:
Required signal:
Required query:
Use when:
Limitation:
4. Recent Usage
Ranks images by usage in a short recent window.
Required signal:
Required query:
Use when:
Limitation:
5. Hybrid Usage + Peak Concurrency
Balances generally hot images and burst-heavy images.
Example:
Meaning:
Required signals:
Required query:
Use when:
Limitation:
6. Hybrid Developer-Time Usage + Peak Concurrency
Balances developer-feedback relevance with burst detection.
Required signals:
Required query:
Use when:
Limitation:
7. Count × Pull Time
Ranks images by usage multiplied by measured image availability time.
Required signals:
or:
Required queries:
Use when:
Limitation:
8. Developer-Weighted Count × Pull Time
Ranks developer-relevant images by estimated cold-start cost.
Required signals:
Required queries:
Use when:
Limitation:
9. Model-Aware Exposure
Ranks images by estimated post-rotation cold-node exposure.
with:
Where:
Nis the number of eligible CI nodes,J_pre(I)is usage before the target window,J_target(I)is usage during the target window,p_hat(I)is measured or estimated image availability time.Required signals:
Required configuration:
Required queries:
Use when:
Limitation:
Required Pipeline Capabilities
Query Types
Prometheus
Used for:
Normalized output:
Loki
Used for Kubernetes image-pull event analysis when Prometheus does not expose useful per-image pull durations.
Normalized output:
Pull Cost Profile
Optional future alternative to Loki.
Normalized output:
This can be generated by an external analyzer if pull-time parsing should not live inside the Drop controller.
Signal Types
aggregatetotal-usage,peak-concurrencytimeWeightedAggregatedeveloper-weighted-usagewindowAggregaterecent-usage,pre-window-usage,target-window-usageeventPullTimep50-cold-pull-time,p95-cold-pull-timeRanking Strategies
signalweightedSummodelExposureProposed CRD Shape
Overview
Queries
Prometheus Image Usage Query
The query must return an
imagelabel.Normalized result:
Example:
Loki Image Pull Event Query
Normalized result:
Expected event messages include:
Signals
aggregateAggregates all samples per image.
Supported methods:
Total usage:
Peak concurrency:
timeWeightedAggregateApplies configured time weights before aggregation.
windowAggregateAggregates a specific time window.
Recent usage:
Pre-window usage:
Target-window usage:
eventPullTimeDerives image pull-time statistics from event records.
Supported statistics:
Supported duration modes:
eventPairPulled.timestamp - Pulling.timestampfor the same Pod/imagemessageDurationPulledevent messageCache hits should be detected separately and excluded from cold-pull duration when:
Ranking Strategies
signalRanks directly by one signal.
weightedSumCombines normalized signals.
Formula:
Initial normalization method:
Formula:
If all values are equal:
modelExposureRanks by expected post-rotation exposure.
Formula:
Complete Examples
Example 1: Hybrid Usage and Peak Concurrency
Example 2: Developer-Time Usage and Peak Concurrency
Example 3: Model-Aware Exposure
Status and Observability
The controller should expose enough status to explain every selected image.
Example:
Status output should support debugging:
Validation Plan
Query Tests
timestamp,image,value.timestamp,pod,image,reason,message.imagelabels are rejected or ignored according to defined behavior.Signal Tests
aggregate.sumaggregate.maxaggregate.avgaggregate.counttimeWeightedAggregatewindowAggregateeventPullTimeRanking Tests
signalweightedSummodelExposureIntegration Tests
Use fake Prometheus and Loki responses to verify:
Implementation Split
Issue 1: CRD for Query, Signal, and Ranking Pipeline
Define the
queries,signals, andrankingAPI.Issue 2: Prometheus Query Execution
Implement named Prometheus range queries and normalized sample output.
Issue 3: Aggregate Signals
Implement:
Issue 4: Basic Ranking
Implement
signalranking.Issue 5: Weighted Ranking
Implement
weightedSumranking withminMaxnormalization.Issue 6: Status Output
Expose query results, signal results, ranking contributions, and selected images.
Issue 7: Time-Based Signals
Implement:
Issue 8: Loki Query Source
Implement Loki range query support.
Issue 9: Event Pull-Time Signal
Implement
eventPullTime.Issue 10: Model-Aware Exposure Ranking
Implement typed
modelExposure.Issue 11: Documentation
Document:
Design Decisions to Resolve
Missing signal behavior
Initial proposal:
Alternative:
Pull-time statistic
Initial proposal:
Alternative:
The choice should be configurable.
Pull-time source
Two options:
eventPullTime.ImagePullCostProfileproduced by a separate analyzer.A native Loki source is convenient. An external profile may keep the controller simpler.
Recommendation
Adopt the
queries -> signals -> rankingpipeline for Drop discovery.This design supports:
The first production-ready strategies should be:
The advanced strategy should be: