chore: integrate prometheus in tilt#1140
Conversation
Add opt-in Prometheus support to the KinD/Tilt development environment so developers can verify that metrics are properly exposed, ServiceMonitor manifests are valid, and the full scrape pipeline works end-to-end. ## Development environment (developing/) - Add `setup-prometheus.sh` script that installs the Prometheus Operator via upstream `bundle.yaml` and deploys a minimal Prometheus instance in `prometheus-system` namespace. Follows the same idempotent pattern as `setup-cert-manager.sh` and `setup-istio.sh`. - Add `manifests/prometheus/prometheus.yaml` with the Prometheus CR, ServiceAccount, ClusterRole, and ClusterRoleBinding. Configured with minimal resources (128Mi/256Mi), 2h retention, and wildcard ServiceMonitor discovery across all namespaces. - Add `manifests/overlays/controller-prometheus/kustomization.yaml`, a dev-only kustomize overlay that layers the existing prometheus component on top of the istio overlay. This avoids creating a new overlay in the controller's published manifests — enabling prometheus is a dev-environment concern, not a vendoring decision. - Update `Makefile` with `setup-prometheus` target, conditionally invoked by `tilt-up` when `ENABLE_PROMETHEUS=true`. - Update `Tiltfile` with `ENABLE_PROMETHEUS` env var, conditional overlay selection, and a `prometheus-ui` local_resource that port-forwards to localhost:9090. ## Production manifests (workspaces/) ### NetworkPolicy for metrics scraping Add `network-policy.yaml` to the controller's prometheus kustomize component. Without this, the existing `default-deny-ingress` policy blocks Prometheus from reaching the controller's metrics port. Uses `from: []` (allow from any source) scoped to the metrics port, matching the same pattern used by the istio component's webhook NetworkPolicy. ### Remove `managed-by` from CRD selectors The `common` kustomize component previously used a single `labels` block with `includeSelectors: true` for all three standard labels, including `app.kubernetes.io/managed-by: kustomize`. This injected `managed-by` into every selector field — including CRD-specific selectors like `ServiceMonitor.spec.selector` and `AuthorizationPolicy.spec.selector`. Deployment tools (Tilt, ArgoCD, Flux) override `managed-by` on resource metadata labels but cannot follow CRD-specific selector fields, causing selectors to not match their target resources. For example, Tilt sets `managed-by: tilt` on the metrics Service labels, but the ServiceMonitor selector still expects `managed-by: kustomize`. Fix by splitting the `common` component's `labels` block in all three modules (controller, backend, frontend): - `name` and `part-of`: `includeSelectors: true` (stable identifiers) - `managed-by`: `includeSelectors: false, includeTemplates: true` (deployment method label — belongs on metadata, not in selectors) Also remove hardcoded `managed-by: kustomize` from CRD selectors in: - `controller/components/prometheus/monitor.yaml` (ServiceMonitor) - `backend/components/istio/authorization-policy.yaml` - `frontend/components/istio/authorization-policy.yaml` ## Usage # Default — no Prometheus (unchanged behavior) make tilt-up # With Prometheus enabled ENABLE_PROMETHEUS=true make tilt-up Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Andy Stoneberg <astonebe@redhat.com>
5672dd6 to
87840dc
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Current manifests diff: Backenddiff -up manifests-diff/backend/*
--- /home/chris/Documents/shared_projects/kion/kubeflow-notebooks/manifests-diff/backend/base 2026-06-04 16:53:05.529052312 +0200
+++ /home/chris/Documents/shared_projects/kion/kubeflow-notebooks/manifests-diff/backend/head 2026-06-04 16:53:05.941054007 +0200
@@ -133,7 +133,6 @@ spec:
selector:
app: workspaces-backend
app.kubernetes.io/component: api
- app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: workspaces-backend
app.kubernetes.io/part-of: kubeflow-workspaces
type: ClusterIP
@@ -155,7 +154,6 @@ spec:
matchLabels:
app: workspaces-backend
app.kubernetes.io/component: api
- app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: workspaces-backend
app.kubernetes.io/part-of: kubeflow-workspaces
strategy:
@@ -291,7 +289,6 @@ spec:
matchLabels:
app: workspaces-backend
app.kubernetes.io/component: api
- app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: workspaces-backend
app.kubernetes.io/part-of: kubeflow-workspaces
policyTypes:
@@ -318,6 +315,5 @@ spec:
matchLabels:
app: workspaces-backend
app.kubernetes.io/component: api
- app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: workspaces-backend
app.kubernetes.io/part-of: kubeflow-workspacesControllerdiff -up manifests-diff/controller/*
--- /home/chris/Documents/shared_projects/kion/kubeflow-notebooks/manifests-diff/controller/base 2026-06-04 16:53:05.760053262 +0200
+++ /home/chris/Documents/shared_projects/kion/kubeflow-notebooks/manifests-diff/controller/head 2026-06-04 16:53:06.174054965 +0200
@@ -5457,7 +5457,6 @@ spec:
selector:
app: workspaces-controller
app.kubernetes.io/component: controller-manager
- app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: workspaces-controller
app.kubernetes.io/part-of: kubeflow-workspaces
---
@@ -5478,7 +5477,6 @@ spec:
matchLabels:
app: workspaces-controller
app.kubernetes.io/component: controller-manager
- app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: workspaces-controller
app.kubernetes.io/part-of: kubeflow-workspaces
template:
@@ -5635,7 +5633,6 @@ spec:
matchLabels:
app: workspaces-controller
app.kubernetes.io/component: controller-manager
- app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: workspaces-controller
app.kubernetes.io/part-of: kubeflow-workspaces
policyTypes:Frontenddiff -up manifests-diff/frontend/*
--- /home/chris/Documents/shared_projects/kion/kubeflow-notebooks/manifests-diff/frontend/base 2026-06-04 16:53:05.819053505 +0200
+++ /home/chris/Documents/shared_projects/kion/kubeflow-notebooks/manifests-diff/frontend/head 2026-06-04 16:53:06.235055216 +0200
@@ -17,7 +17,6 @@ spec:
selector:
app: workspaces-frontend
app.kubernetes.io/component: ui
- app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: workspaces-frontend
app.kubernetes.io/part-of: kubeflow-workspaces
type: ClusterIP
@@ -39,7 +38,6 @@ spec:
matchLabels:
app: workspaces-frontend
app.kubernetes.io/component: ui
- app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: workspaces-frontend
app.kubernetes.io/part-of: kubeflow-workspaces
strategy:
@@ -170,7 +168,6 @@ spec:
matchLabels:
app: workspaces-frontend
app.kubernetes.io/component: ui
- app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: workspaces-frontend
app.kubernetes.io/part-of: kubeflow-workspaces
policyTypes:
@@ -197,6 +194,5 @@ spec:
matchLabels:
app: workspaces-frontend
app.kubernetes.io/component: ui
- app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: workspaces-frontend
app.kubernetes.io/part-of: kubeflow-workspaces |
christian-heusel
left a comment
There was a problem hiding this comment.
Hey @andyatmiami awesome to get a good starting point to be able to easily develop metrics! 🔥
Overall it seems to be working, I have left a review below 🤗
| kubectl wait --for=condition=ready pod \ | ||
| -l app.kubernetes.io/name=prometheus-operator \ | ||
| -n default \ | ||
| --timeout=120s | ||
|
|
||
| kubectl wait --for=condition=established crd/prometheuses.monitoring.coreos.com --timeout=60s | ||
| kubectl wait --for=condition=established crd/servicemonitors.monitoring.coreos.com --timeout=60s |
There was a problem hiding this comment.
It seems like there is a race-condition/fallthrough here:
Waiting for Prometheus to be ready...
error: no matching resources found
make[1]: *** [Makefile:55: setup-prometheus] Error 1
make[1]: Leaving directory '/home/chris/Documents/shared_projects/kion/kubeflow-notebooks/developing'
make: *** [Makefile:60: tilt-up] Error 2
| echo "Installing Prometheus Operator ${PROMETHEUS_OPERATOR_VERSION}..." | ||
| # NOTE: must use 'kubectl create' not 'kubectl apply' — bundle.yaml is too large | ||
| # for client-side apply annotations. | ||
| curl -sL "${BUNDLE_URL}" | kubectl create -f - 2>/dev/null || \ |
There was a problem hiding this comment.
You can omit curl here, as kubectl create -f already supports URLs:
-f, --filename=[] Filename, directory, or URL to files to use to create the resource
|
|
||
| # ---- Step 2: Deploy Prometheus instance ---- | ||
|
|
||
| kubectl create namespace "${PROMETHEUS_NAMESPACE}" --dry-run=client -o yaml | kubectl apply -f - |
There was a problem hiding this comment.
Please add a short comment that details that the reason for this kubectl create [...] --dry-run=client -o yaml | kubectl apply -f - construct is that it doesn't fail if the namespace already exists 🤗
There was a problem hiding this comment.
Also couldn't we just add the manifest of the namespace to developing/manifests/prometheus/prometheus.yaml to omit this statement completely? 🤔
| tilt-up: check-tilt setup-istio kustomize | ||
| ifeq ($(ENABLE_PROMETHEUS),true) | ||
| @$(MAKE) setup-prometheus | ||
| endif |
There was a problem hiding this comment.
Please also document this new variable (and maybe the existance/purpose of the whole setup in DEVELOPMENT_GUIDE.md

ℹ️ NO GH ISSUE
Add opt-in Prometheus support to the KinD/Tilt development environment so developers can verify that metrics are properly exposed, ServiceMonitor manifests are valid, and the full scrape pipeline works end-to-end.
Development environment (developing/)
Add
setup-prometheus.shscript that installs the Prometheus Operator via upstreambundle.yamland deploys a minimal Prometheus instance inprometheus-systemnamespace. Follows the same idempotent pattern assetup-cert-manager.shandsetup-istio.sh.Add
manifests/prometheus/prometheus.yamlwith the Prometheus CR, ServiceAccount, ClusterRole, and ClusterRoleBinding. Configured with minimal resources (128Mi/256Mi), 2h retention, and wildcard ServiceMonitor discovery across all namespaces.Add
manifests/overlays/controller-prometheus/kustomization.yaml, a dev-only kustomize overlay that layers the existing prometheus component on top of the istio overlay. This avoids creating a new overlay in the controller's published manifests — enabling prometheus is a dev-environment concern, not a vendoring decision.Update
Makefilewithsetup-prometheustarget, conditionally invoked bytilt-upwhenENABLE_PROMETHEUS=true.Update
TiltfilewithENABLE_PROMETHEUSenv var, conditional overlay selection, and aprometheus-uilocal_resource that port-forwards to localhost:9090.Production manifests (workspaces/)
NetworkPolicy for metrics scraping
Add
network-policy.yamlto the controller's prometheus kustomize component. Without this, the existingdefault-deny-ingresspolicy blocks Prometheus from reaching the controller's metrics port. Usesfrom: [](allow from any source) scoped to the metrics port, matching the same pattern used by the istio component's webhook NetworkPolicy.Remove
managed-byfrom CRD selectorsThe
commonkustomize component previously used a singlelabelsblock withincludeSelectors: truefor all three standard labels, includingapp.kubernetes.io/managed-by: kustomize. This injectedmanaged-byinto every selector field — including CRD-specific selectors likeServiceMonitor.spec.selectorandAuthorizationPolicy.spec.selector.Deployment tools (Tilt, ArgoCD, Flux) override
managed-byon resource metadata labels but cannot follow CRD-specific selector fields, causing selectors to not match their target resources. For example, Tilt setsmanaged-by: tilton the metrics Service labels, but the ServiceMonitor selector still expectsmanaged-by: kustomize.Fix by splitting the
commoncomponent'slabelsblock in all three modules (controller, backend, frontend):nameandpart-of:includeSelectors: true(stable identifiers)managed-by:includeSelectors: false, includeTemplates: true(deployment method label — belongs on metadata, not in selectors)Also remove hardcoded
managed-by: kustomizefrom CRD selectors in:controller/components/prometheus/monitor.yaml(ServiceMonitor)backend/components/istio/authorization-policy.yamlfrontend/components/istio/authorization-policy.yamlUsage
Default — no Prometheus (unchanged behavior)
make tilt-upWith Prometheus enabled
ENABLE_PROMETHEUS=true make tilt-up