feat: Metrics for cardano nodes by gonzalezzfelipe · Pull Request #25 · txpipe/metis

gonzalezzfelipe · 2026-04-15T14:06:34Z

Summary by CodeRabbit

Release Notes

New Features
- Enhanced dashboard workload details page with expanded metrics sections (Node, Mempool, Connections, Block Propagation, Resources, Producer) with formatted values and role-specific display
- Added information tooltips throughout the dashboard UI
- Implemented metrics collection from containerized pods with comprehensive formatting
Infrastructure
- Updated Apex Fusion metrics port and Helm configuration
- Added metrics script execution capability within pod containers
- Improved dashboard query management for workload data

coderabbitai · 2026-04-15T14:07:12Z

📝 Walkthrough

Walkthrough

This PR introduces metrics collection from Cardano node pods into the dashboard and updates the Helm charts to support this capability. It adds new Kubernetes exec utilities, configures metrics scripts in ConfigMaps for both cardano-node and apex-fusion charts, refactors the dashboard workload details page to use TanStack Query and display rich metrics, and adds comprehensive metric formatting helpers.

Changes

Cohort / File(s)	Summary
Apex Fusion Helm Chart `extensions/apex-fusion/values.yaml`, `extensions/apex-fusion/templates/_helpers.tpl`, `extensions/apex-fusion/templates/configmap-metrics.yaml`, `extensions/apex-fusion/templates/statefulset.yaml`	Updated metrics port from 12798 to 12789, added `metricsConfigMapName` helper, created ConfigMap containing a Bash script that fetches Prometheus metrics from the node, parses genesis parameters, and outputs a JSON metrics object with error tracking. Updated StatefulSet to mount the metrics script and set `METIS_SHELLEY_GENESIS_PATH` environment variable.
Cardano Node Helm Chart `extensions/cardano-node/templates/_helpers.tpl`, `extensions/cardano-node/templates/configmap-metrics.yaml`, `extensions/cardano-node/templates/statefulset.yaml`	Added `metricsConfigMapName` helper and created ConfigMap with a Bash metrics collection script that extracts Prometheus metrics, parses genesis data, computes derived fields (density, epoch progress, KES expiration, sync percent), and outputs JSON with error handling. Updated StatefulSet to mount metrics script and configure genesis path environment variable.
Dashboard Type Definitions `frontends/dashboard/@types/workload.d.ts`	Extended `SimplifiedPod` with optional `chartName` and `metrics` properties. Added new type aliases and `CardanoNodeMetrics` interface containing node role, 40+ metric fields (all nullable), and an errors array.
Dashboard UI Components `frontends/dashboard/src/components/ui/InfoTooltip.tsx`	Added new `InfoTooltip` React component that renders an info icon with toggleable tooltip, supporting keyboard interaction (Escape to close) and accessibility attributes (`aria-label`, `aria-describedby`, `role="tooltip"`).
Dashboard Route & Query `frontends/dashboard/src/routes/$namespace/$name/index.tsx`	Refactored to use TanStack Query with 30s refetch interval, replaced metrics display with multi-section layout ("Node", "Mempool", "Connections", "Block propagation", "Resources", "Producer"), integrated `InfoCard` with optional description tooltips, updated log streaming from callback-based to effect-based per pod identity, and added metric value formatting.
Dashboard Utilities `frontends/dashboard/src/utils/helm.ts`, `frontends/dashboard/src/utils/home/calls.ts`, `frontends/dashboard/src/utils/k8s.ts`, `frontends/dashboard/src/utils/metricsFormat.ts`	Added `getMergedHelmValues` and `getNodeRoleFromHelmRelease` for Helm data extraction; added `execPodCommand` to execute commands in pod containers via Kubernetes Exec with stdout/stderr capture and 15s timeout; added metrics collection helper `supportsCardanoNodeMetrics` and `getCardanoNodeMetricsForPod` that executes `/opt/metis/bin/metrics.sh` inside pods; created comprehensive metrics formatting module with 17 helpers for numeric, temporal, percentage, and role-based formatting.
GitHub Actions & Configuration `.github/workflows/check_extensions.yml`, `extensions/dolos/ci/values-default.yaml`, `frontends/dashboard/package.json`	Updated workflow to add HashiCorp Helm repo and supply additional values file to Dolos chart test, expanded kubeconform skip list to include `VaultStaticSecret`, modified test script to pass with no tests found, and added upstream relay configuration to Dolos values.

Sequence Diagram(s)

sequenceDiagram
    participant Dashboard as Dashboard Component
    participant QueryClient as TanStack Query
    participant KubeAPI as Kubernetes API
    participant PodExec as Pod Exec Stream
    participant MetricsScript as metrics.sh Script
    participant Prometheus as Prometheus Endpoint

    Dashboard->>QueryClient: ensureQueryData(workloadDetailsQueryOptions)
    QueryClient->>KubeAPI: getServerWorkloadPods(namespace, name)
    KubeAPI->>KubeAPI: fetch pods & helm release
    
    KubeAPI->>Dashboard: return pod list + helm metadata
    Dashboard->>Dashboard: getNodeRoleFromHelmRelease(helmRelease)
    
    loop For each pod (parallel)
        Dashboard->>KubeAPI: execPodCommand(pod, 'metrics.sh')
        KubeAPI->>PodExec: establish exec websocket
        PodExec->>MetricsScript: execute /opt/metis/bin/metrics.sh
        MetricsScript->>Prometheus: curl http://127.0.0.1:metricsPort/metrics
        Prometheus-->>MetricsScript: Prometheus metrics text
        MetricsScript->>MetricsScript: parse metrics, read genesis, compute derived
        MetricsScript-->>PodExec: output JSON metrics object
        PodExec-->>KubeAPI: stdout/stderr streams close
        KubeAPI-->>Dashboard: { stdout: JSON metrics, stderr: errors }
    end
    
    Dashboard->>Dashboard: render metrics sections with formatMetricValue helpers
    Dashboard-->>Dashboard: display in InfoCard components with tooltips

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

chore: Add hydra node extension #14: Both modify .github/workflows/check_extensions.yml with extended Helm test matrix configuration changes.
feat: ui updates on console #20: Both modify overlapping dashboard code including frontends/dashboard/src/routes/$namespace/$name/index.tsx and frontends/dashboard/src/utils/helm.ts.
chore: Add apex extension #15: Both modify the apex-fusion Helm chart including values.yaml and template helpers for metrics-related resources.

Suggested reviewers

scarmuega

Poem

🐰 With metrics fresh from pods so keen,
The dashboard now shows what they glean—
Through scripts and shells, data flows fast,
From genesis times to metrics so vast!
A hop, skip, and merge brings it all to light! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: Metrics for cardano nodes' directly and accurately summarizes the main objective of this pull request, which adds comprehensive metrics collection and display capabilities for Cardano nodes across the dashboard and Helm charts.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/metrics-for-cnodes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

frontends/dashboard/src/utils/metrics.ts (1)

140-149: Consider parallelizing metric variant queries for performance.

getFirstMatchingMetricValue queries metric variants sequentially. Since these are independent lookups for the same conceptual metric (just different naming conventions), you could query them in parallel and return the first non-null result.

♻️ Parallel query approach

 async function getFirstMatchingMetricValue(queries: string[]): Promise<number | null> {
-  for (const query of queries) {
-    const value = await getInstantMetricValue(query);
-    if (value !== null) {
-      return value;
-    }
-  }
-
-  return null;
+  const results = await Promise.all(queries.map(getInstantMetricValue));
+  return results.find(v => v !== null) ?? null;
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@frontends/dashboard/src/utils/metrics.ts` around lines 140 - 149,
getFirstMatchingMetricValue currently awaits getInstantMetricValue sequentially;
change it to start all getInstantMetricValue calls concurrently (map queries ->
promises) and then await their results (e.g., Promise.allSettled or Promise.all)
and return the first non-null fulfilled value found, otherwise return null.
Update getFirstMatchingMetricValue to fire promises for each query, handle
failures/settled results safely, and preserve the Promise<number | null> return
type; reference getFirstMatchingMetricValue and getInstantMetricValue when
making the change.

frontends/dashboard/src/components/ui/InfoTooltip.tsx (1)

37-39: Potential interaction conflict between click and focus/blur handlers.

When clicking an already-focused button, onClick toggles the state, but the subsequent blur/focus cycle may immediately override it. On touch devices, onBlur firing after a tap can close the tooltip unexpectedly. Consider simplifying to use only hover/focus for desktop and click-to-toggle for touch, or use a single interaction model.

♻️ Suggested simplification using pointer events

       <button
         ref={buttonRef}
         type="button"
         className="inline-flex cursor-help items-center text-[`#969FAB`] outline-none transition-colors hover:text-[`#64748B`] focus-visible:text-[`#64748B`]"
         aria-label="Show metric description"
         aria-describedby={open ? tooltipId : undefined}
         onFocus={() => setOpen(true)}
-        onBlur={() => setOpen(false)}
-        onClick={() => setOpen(prev => !prev)}
+        onBlur={(e) => {
+          // Don't close if clicking within the tooltip trigger
+          if (!e.currentTarget.contains(e.relatedTarget as Node)) {
+            setOpen(false);
+          }
+        }}
         onKeyDown={handleKeyDown}
       >

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@frontends/dashboard/src/components/ui/InfoTooltip.tsx` around lines 37 - 39,
The handlers conflict because onClick toggles while onFocus/onBlur also set open
and blur after a tap can immediately close the tooltip; replace the click
handler with a unified pointer-based handler: remove onClick and add an
onPointerDown/onPointerUp handler on the InfoTooltip trigger that inspects
event.pointerType (touch vs mouse) and for pointerType "touch" toggles the
tooltip via setOpen(prev=>!prev) while keeping onFocus/onBlur to open/close for
keyboard/hover; ensure the pointer handler does not trigger a subsequent blur
override (use event.preventDefault()/event.stopPropagation() as needed) so
setOpen is the single source of truth.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@extensions/apex-fusion/values.yaml`:
- Line 44: Revert the metrics port value for the apex-fusion extension by
changing the metricsPort setting (metricsPort: 12789) back to the Cardano
standard 12798 so it matches the cardano-node extension and allows Prometheus
scraping to work as expected; update the metricsPort entry in the apex-fusion
values.yaml to 12798.

In `@frontends/dashboard/src/routes/`$namespace/$name/index.tsx:
- Around line 274-289: The type error arises because activePod.name (string |
undefined) isn't narrowed inside the async streamLogs closure; capture the
validated values before defining/awaiting streamLogs by assigning const podName
= activePod.name, const namespace = activePodNamespace, and const containerName
= activePodContainerName after the initial if-check, then use those captured
locals in the async function and when calling streamWorkloadPodLogs so
TypeScript sees them as definite strings; update references to
activePod.name/activePodNamespace/activePodContainerName inside streamLogs to
use podName/namespace/containerName.

---

Nitpick comments:
In `@frontends/dashboard/src/components/ui/InfoTooltip.tsx`:
- Around line 37-39: The handlers conflict because onClick toggles while
onFocus/onBlur also set open and blur after a tap can immediately close the
tooltip; replace the click handler with a unified pointer-based handler: remove
onClick and add an onPointerDown/onPointerUp handler on the InfoTooltip trigger
that inspects event.pointerType (touch vs mouse) and for pointerType "touch"
toggles the tooltip via setOpen(prev=>!prev) while keeping onFocus/onBlur to
open/close for keyboard/hover; ensure the pointer handler does not trigger a
subsequent blur override (use event.preventDefault()/event.stopPropagation() as
needed) so setOpen is the single source of truth.

In `@frontends/dashboard/src/utils/metrics.ts`:
- Around line 140-149: getFirstMatchingMetricValue currently awaits
getInstantMetricValue sequentially; change it to start all getInstantMetricValue
calls concurrently (map queries -> promises) and then await their results (e.g.,
Promise.allSettled or Promise.all) and return the first non-null fulfilled value
found, otherwise return null. Update getFirstMatchingMetricValue to fire
promises for each query, handle failures/settled results safely, and preserve
the Promise<number | null> return type; reference getFirstMatchingMetricValue
and getInstantMetricValue when making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 939826f5-5419-4386-8fab-62ac39415039

📥 Commits

Reviewing files that changed from the base of the PR and between 192dcaf and 9a6bfc2.

📒 Files selected for processing (8)

extensions/apex-fusion/values.yaml
frontends/dashboard/@types/workload.d.ts
frontends/dashboard/package.json
frontends/dashboard/src/components/ui/InfoTooltip.tsx
frontends/dashboard/src/routes/$namespace/$name/index.tsx
frontends/dashboard/src/utils/helm.ts
frontends/dashboard/src/utils/home/calls.ts
frontends/dashboard/src/utils/metrics.ts

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

frontends/dashboard/src/utils/k8s.ts (1)
48-49: Consider reusing KubeConfig/Exec instead of recreating per call.

On Line 48, loading config per execPodCommand call can become expensive when polling many pods. Prefer injecting or reusing a shared Exec/KubeConfig instance from existing client setup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontends/dashboard/src/utils/k8s.ts` around lines 48 - 49, The code
currently calls loadK8sConfig() and new Exec(kc) inside execPodCommand for every
invocation (expensive when polling many pods); instead reuse a shared
KubeConfig/Exec by either accepting an injected Exec or KubeConfig parameter on
execPodCommand (e.g., execPodCommand(..., exec?: Exec | kc?: KubeConfig)) or
create a lazy module-level singleton that calls loadK8sConfig() and new Exec(kc)
once and reuses it; update execPodCommand to use the provided/shared Exec and
adjust callers to pass the shared instance when available.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@frontends/dashboard/src/utils/k8s.ts`:
- Around line 80-103: The timeout handler currently rejects without releasing
resources; modify the setTimeout callback to call cleanup() before rejecting and
also explicitly close the websocket (ws.close()) to terminate the exec session;
ensure cleanup still clears the timeout and removes listeners (keeps
clearTimeout(timeout) in cleanup) so the timeout branch uses cleanup(),
ws.close(), then reject(new Error(`Timed out waiting for exec command in pod
${namespace}/${podName}`)); keep existing handleClose and handleError logic
unchanged.

---

Nitpick comments:
In `@frontends/dashboard/src/utils/k8s.ts`:
- Around line 48-49: The code currently calls loadK8sConfig() and new Exec(kc)
inside execPodCommand for every invocation (expensive when polling many pods);
instead reuse a shared KubeConfig/Exec by either accepting an injected Exec or
KubeConfig parameter on execPodCommand (e.g., execPodCommand(..., exec?: Exec |
kc?: KubeConfig)) or create a lazy module-level singleton that calls
loadK8sConfig() and new Exec(kc) once and reuses it; update execPodCommand to
use the provided/shared Exec and adjust callers to pass the shared instance when
available.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9625ff0b-8125-4be3-855b-ce4e46110952

📥 Commits

Reviewing files that changed from the base of the PR and between 9a6bfc2 and 4980779.

📒 Files selected for processing (8)

.github/workflows/check_extensions.yml
extensions/dolos/ci/values-default.yaml
frontends/dashboard/@types/workload.d.ts
frontends/dashboard/src/routes/$namespace/$name/index.tsx
frontends/dashboard/src/utils/home/calls.ts
frontends/dashboard/src/utils/k8s.ts
frontends/dashboard/src/utils/metrics.ts
frontends/dashboard/src/utils/metricsFormat.ts

✅ Files skipped from review due to trivial changes (2)

extensions/dolos/ci/values-default.yaml
frontends/dashboard/@types/workload.d.ts

🚧 Files skipped from review as they are similar to previous changes (1)

frontends/dashboard/src/utils/home/calls.ts

coderabbitai · 2026-04-15T21:56:59Z

+  await new Promise<void>((resolve, reject) => {
+    const timeout = setTimeout(() => {
+      reject(new Error(`Timed out waiting for exec command in pod ${namespace}/${podName}`));
+    }, 15000);
+
+    const cleanup = () => {
+      clearTimeout(timeout);
+      ws.off('close', handleClose);
+      ws.off('error', handleError);
+    };
+
+    const handleClose = () => {
+      cleanup();
+      resolve();
+    };
+
+    const handleError = (error: Error) => {
+      cleanup();
+      reject(error);
+    };
+
+    ws.on('close', handleClose);
+    ws.on('error', handleError);
+  });


⚠️ Potential issue | 🟠 Major

Timeout path does not clean up websocket/session resources.

On Line 82, timeout rejects without invoking cleanup(), and the websocket is never closed. This can leave in-pod exec sessions running and listeners attached after timeout.

Proposed fix

- await new Promise<void>((resolve, reject) => { - const timeout = setTimeout(() => { - reject(new Error(`Timed out waiting for exec command in pod ${namespace}/${podName}`)); - }, 15000); + await new Promise<void>((resolve, reject) => { + let settled = false; + const finish = (fn: () => void) => { + if (settled) return; + settled = true; + cleanup(); + fn(); + }; + const timeout = setTimeout(() => { + try { + ws.close(); + } catch { + // no-op + } + finish(() => reject(new Error(`Timed out waiting for exec command in pod ${namespace}/${podName}`))); + }, 15000); @@ - const handleClose = () => { - cleanup(); - resolve(); - }; + const handleClose = () => finish(resolve); @@ - const handleError = (error: Error) => { - cleanup(); - reject(error); - }; + const handleError = (error: Error) => finish(() => reject(error));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@frontends/dashboard/src/utils/k8s.ts` around lines 80 - 103, The timeout handler currently rejects without releasing resources; modify the setTimeout callback to call cleanup() before rejecting and also explicitly close the websocket (ws.close()) to terminate the exec session; ensure cleanup still clears the timeout and removes listeners (keeps clearTimeout(timeout) in cleanup) so the timeout branch uses cleanup(), ws.close(), then reject(new Error(`Timed out waiting for exec command in pod ${namespace}/${podName}`)); keep existing handleClose and handleError logic unchanged.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@extensions/apex-fusion/templates/statefulset.yaml`:
- Around line 130-133: The StatefulSet mounts metrics.sh via subPath so changes
to the ConfigMap don't trigger pod updates; add a metrics checksum annotation to
the pod template annotations (same pattern as the existing proxy/config
checksums) that computes a checksum of the metrics ConfigMap (the one providing
metrics.sh) so edits to configmap-metrics.yaml roll the pods. Update the
StatefulSet template that contains the metrics-scripts mount (reference
metrics.sh and the pod template annotations block where checksum/proxy and
checksum/config are defined) and apply the identical change to the other
StatefulSet that mounts metrics.sh.

In `@extensions/cardano-node/templates/configmap-metrics.yaml`:
- Around line 36-39: The metrics fetch can hang because curl/wget lack timeouts;
update the command substitution that sets the metrics variable (metrics="$(...")
to include explicit connect and total/read timeouts for curl (e.g.
--connect-timeout and --max-time or -m) and equivalent timeout flags for wget
(e.g. --timeout and --tries=1) so the pipeline fails fast and triggers
append_error "failed to read node metrics" instead of blocking; apply the same
change to the corresponding metrics command in the apex-fusion config as well.

In `@frontends/dashboard/src/routes/`$namespace/$name/index.tsx:
- Around line 225-230: The component currently keeps a cumulative raw log buffer
in logState and calls AnsiToHtml.toHtml on the whole buffer with stream: true,
causing style bleed, semantic misuse of streaming mode, and truncation
corruption; fix by (a) when switching pods (detect via activePodKey) reset or
recreate the AnsiToHtml instance so streaming state is cleared, and stop
reconverting the entire raw buffer on every render, or (b) change AnsiToHtml
instantiation to stream: false for render-time conversion, or better yet convert
incoming chunks incrementally before storing (store converted HTML chunks or
keep per-pod parser state) so only new data is passed to AnsiToHtml.toHtml;
update usages around logState, setLogState, activePodKey and the toHtml call
accordingly to ensure escapes are never sliced mid-sequence.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: beecf8fc-1db8-419c-9d56-4319d34fd56b

📥 Commits

Reviewing files that changed from the base of the PR and between 4980779 and bb6211c.

📒 Files selected for processing (10)

extensions/apex-fusion/templates/_helpers.tpl
extensions/apex-fusion/templates/configmap-metrics.yaml
extensions/apex-fusion/templates/statefulset.yaml
extensions/cardano-node/templates/_helpers.tpl
extensions/cardano-node/templates/configmap-metrics.yaml
extensions/cardano-node/templates/statefulset.yaml
frontends/dashboard/@types/workload.d.ts
frontends/dashboard/src/routes/$namespace/$name/index.tsx
frontends/dashboard/src/utils/home/calls.ts
frontends/dashboard/src/utils/metricsFormat.ts

🚧 Files skipped from review as they are similar to previous changes (2)

frontends/dashboard/src/utils/home/calls.ts
frontends/dashboard/src/utils/metricsFormat.ts

coderabbitai · 2026-04-16T15:01:01Z

+            - name: metrics-scripts
+              mountPath: /opt/metis/bin/metrics.sh
+              subPath: metrics.sh
+              readOnly: true


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the apex-fusion statefulset.yaml file exists and examine the relevant lines find . -path "*/extensions/apex-fusion/templates/statefulset.yaml" -type f 2>/dev/null | head -20

Repository: txpipe/metis

Length of output: 107

🏁 Script executed:

# Read the entire file to verify the specific lines and structure cat -n ./extensions/apex-fusion/templates/statefulset.yaml

Repository: txpipe/metis

Length of output: 11800

🏁 Script executed:

# Check if the cardano-node statefulset.yaml exists and examine relevant sections find . -path "*/extensions/cardano-node/templates/statefulset.yaml" -type f 2>/dev/null

Repository: txpipe/metis

Length of output: 108

🏁 Script executed:

# Read the cardano-node statefulset.yaml file cat -n ./extensions/cardano-node/templates/statefulset.yaml

Repository: txpipe/metis

Length of output: 11649

🏁 Script executed:

# Let's verify the Kubernetes behavior regarding subPath ConfigMap mounts and live updates # by searching web documentation

Repository: txpipe/metis

Length of output: 38

🌐 Web query:

Kubernetes ConfigMap subPath mount live updates behavior

💡 Result:

Kubernetes ConfigMap volumes mounted with subPath do not receive live updates when the ConfigMap changes. The mounted file retains its original content until the Pod (or container) is restarted. This is a documented limitation. Regular ConfigMap volume mounts (without subPath) use symlinks that Kubernetes updates via the kubelet during periodic syncs (typically within ~1-2 minutes), enabling live propagation of changes to the mounted directory. Applications must watch for file changes or poll to detect updates. However, subPath mounts use bind-mounts for security isolation, which bind to the original inode and do not update automatically. This behavior is consistent across ConfigMaps, Secrets, and projected volumes using subPath. Official Kubernetes documentation explicitly states: "A container using a ConfigMap as a subPath volume mount will not receive ConfigMap updates." This note appears in the ConfigMaps concepts page. Workarounds include: - Avoid subPath for dynamic configs; mount the full ConfigMap volume to a separate directory and symlink. - Use tools like Reloader to trigger Pod restarts on ConfigMap changes. - Use immutable ConfigMaps with versioned names and update Deployments. No changes to this behavior appear in recent Kubernetes versions (up to 2026); it remains a known limitation since at least v1.10.

Citations:

1: https://kubernetes.io/docs/concepts/configuration/configmap/

2: ConfigMaps and Secrets mounted with subPath do not update when changed kubernetes/kubernetes#50345

3: Configmap mounted with subPath doesn't get reloaded automatically kubernetes/kubernetes#92296

4: https://stackoverflow.com/questions/48561338/how-to-correctly-mount-configmap-with-subpath-in-kubernetes-not-update-configs

5: https://kubernetes.io/docs/concepts/storage/projected-volumes/

Add metrics checksum to roll ConfigMap changes into pod updates.

metrics.sh is mounted via subPath at lines 130-133. Kubernetes does not refresh subPath mounts when ConfigMaps change; the mounted file retains its original content until the pod restarts. This StatefulSet only includes checksums for proxy and configuration (lines 37-42), so edits to configmap-metrics.yaml won't trigger a rollout. Add a metrics checksum annotation to match the existing pattern, and apply the same fix to extensions/cardano-node/templates/statefulset.yaml (lines 130-133 and 220-223).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@extensions/apex-fusion/templates/statefulset.yaml` around lines 130 - 133, The StatefulSet mounts metrics.sh via subPath so changes to the ConfigMap don't trigger pod updates; add a metrics checksum annotation to the pod template annotations (same pattern as the existing proxy/config checksums) that computes a checksum of the metrics ConfigMap (the one providing metrics.sh) so edits to configmap-metrics.yaml roll the pods. Update the StatefulSet template that contains the metrics-scripts mount (reference metrics.sh and the pod template annotations block where checksum/proxy and checksum/config are defined) and apply the identical change to the other StatefulSet that mounts metrics.sh.

coderabbitai · 2026-04-16T15:01:01Z

+    metrics="$(curl -s --fail "http://127.0.0.1:{{ .Values.service.metricsPort }}/metrics" || wget -qO- "http://127.0.0.1:{{ .Values.service.metricsPort }}/metrics" || true)"
+    if [ -z "$metrics" ]; then
+      append_error "failed to read node metrics"
+    fi


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Locate the configmap-metrics.yaml files mentioned find . -name "configmap-metrics.yaml" -type f

Repository: txpipe/metis

Length of output: 172

🏁 Script executed:

# Check the exact content of the cardano-node file at lines 36-39 if [ -f "extensions/cardano-node/templates/configmap-metrics.yaml" ]; then echo "=== cardano-node configmap-metrics.yaml (lines 30-45) ===" sed -n '30,45p' "extensions/cardano-node/templates/configmap-metrics.yaml" fi

Repository: txpipe/metis

Length of output: 624

🏁 Script executed:

# Check if apex-fusion file exists and its metrics handling if [ -f "extensions/apex-fusion/templates/configmap-metrics.yaml" ]; then echo "=== apex-fusion configmap-metrics.yaml (lines 30-45) ===" sed -n '30,45p' "extensions/apex-fusion/templates/configmap-metrics.yaml" fi

Repository: txpipe/metis

Length of output: 623

🏁 Script executed:

# Search for curl/wget timeout configurations in the repo echo "=== Searching for timeout flags in curl/wget commands ===" rg "(curl|wget).*--connect-timeout|--max-time|--timeout|-m|--read-timeout" --type yaml

Repository: txpipe/metis

Length of output: 50368

🏁 Script executed:

# Search for dashboard-related code that might execute metrics.sh echo "=== Searching for dashboard execution or metrics.sh references ===" rg "metrics\.sh|exec.*metrics|workload.*details" --type yaml --type js --type ts -A 3

Repository: txpipe/metis

Length of output: 1872

🏁 Script executed:

# Read the full metrics.sh section from cardano-node to check for any timeout wrappers sed -n '1,50p' "extensions/cardano-node/templates/configmap-metrics.yaml" | tail -30

Repository: txpipe/metis

Length of output: 1230

🏁 Script executed:

# Check if there's a timeout wrapper function or any timeout mechanism in the script rg "timeout|TIMEOUT" extensions/cardano-node/templates/configmap-metrics.yaml

Repository: txpipe/metis

Length of output: 38

🏁 Script executed:

# Verify the complete metrics fetch section with more context sed -n '30,50p' "extensions/cardano-node/templates/configmap-metrics.yaml"

Repository: txpipe/metis

Length of output: 948

🏁 Script executed:

# Get the full metrics.sh script to check for any outer timeout wrapper wc -l "extensions/cardano-node/templates/configmap-metrics.yaml"

Repository: txpipe/metis

Length of output: 116

🏁 Script executed:

# Check for timeout at script/function level sed -n '1,100p' "extensions/cardano-node/templates/configmap-metrics.yaml"

Repository: txpipe/metis

Length of output: 7677

Add timeouts to the local metrics fetch.

The curl/wget fallback at lines 36-39 lacks connect and read timeouts. If the node metrics endpoint stalls, the script will block indefinitely. Since the dashboard executes this script synchronously via bash -lc, a hung read will stall the workload details page. Add explicit timeout flags to both commands and apply the same fix to extensions/apex-fusion/templates/configmap-metrics.yaml.

Current code

metrics="$(curl -s --fail "http://127.0.0.1:{{ .Values.service.metricsPort }}/metrics" || wget -qO- "http://127.0.0.1:{{ .Values.service.metricsPort }}/metrics" || true)" if [ -z "$metrics" ]; then append_error "failed to read node metrics" fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@extensions/cardano-node/templates/configmap-metrics.yaml` around lines 36 - 39, The metrics fetch can hang because curl/wget lack timeouts; update the command substitution that sets the metrics variable (metrics="$(...") to include explicit connect and total/read timeouts for curl (e.g. --connect-timeout and --max-time or -m) and equivalent timeout flags for wget (e.g. --timeout and --tries=1) so the pipeline fails fast and triggers append_error "failed to read node metrics" instead of blocking; apply the same change to the corresponding metrics command in the apex-fusion config as well.

coderabbitai · 2026-04-16T15:01:01Z

+  const [logState, setLogState] = useState(() => ({
+    podKey: activePodKey,
+    value: '',
+  }));
+  const logs = logState.podKey === activePodKey ? logState.value : '';



⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "index.tsx" -path "*/routes/*" | grep -E "namespace.*name"

Repository: txpipe/metis

Length of output: 181

🏁 Script executed:

cat -n frontends/dashboard/src/routes/\$namespace/\$name/index.tsx | head -100

Repository: txpipe/metis

Length of output: 5640

🏁 Script executed:

cat -n frontends/dashboard/src/routes/\$namespace/\$name/index.tsx | sed -n '150,180p'

Repository: txpipe/metis

Length of output: 1193

🏁 Script executed:

cat -n frontends/dashboard/src/routes/\$namespace/\$name/index.tsx | sed -n '220,240p'

Repository: txpipe/metis

Length of output: 1045

🏁 Script executed:

cat -n frontends/dashboard/src/routes/\$namespace/\$name/index.tsx | sed -n '270,290p'

Repository: txpipe/metis

Length of output: 692

🏁 Script executed:

cat -n frontends/dashboard/src/routes/\$namespace/\$name/index.tsx | sed -n '580,600p'

Repository: txpipe/metis

Length of output: 1008

🏁 Script executed:

# Check if ansi-to-html is used and how rg "ansi-to-html|AnsiToHtml|toHtml" frontends/dashboard/src/routes/\$namespace/\$name/index.tsx -B2 -A2

Repository: txpipe/metis

Length of output: 588

🏁 Script executed:

# Check the package.json to see ansi-to-html version cat frontends/dashboard/package.json | grep -A 2 -B 2 "ansi-to-html"

Repository: txpipe/metis

Length of output: 216

🏁 Script executed:

# Check if there's any reset/close method being called rg "converter\.(reset|close|clear)" frontends/dashboard/src/routes/\$namespace/\$name/index.tsx

Repository: txpipe/metis

Length of output: 38

🏁 Script executed:

# Check ansi-to-html library documentation in node_modules or docs find . -name "ansi-to-html" -type d | head -5

Repository: txpipe/metis

Length of output: 38

🏁 Script executed:

# Search for any prior converter state management rg "stream.*true|converter" frontends/dashboard/src/routes/\$namespace/\$name/index.tsx -B2 -A2

Repository: txpipe/metis

Length of output: 407

🌐 Web query:

ansi-to-html stream mode state behavior reset method

💡 Result:

In the ansi-to-html library (rburns/ansi-to-html), stream mode is enabled by setting the stream: true option when creating a Convert instance. This saves style state (current colors, attributes like bold/underline) across multiple calls to toHtml, enabling proper handling of partial ANSI streams where chunks may split escape sequences or text. When stream: false (default), each toHtml call processes input independently: it starts from default/reset state, applies ANSI codes within the input, and closes all open spans at the end. The reset method (handling ANSI \x1b[0m or similar reset codes) sets colors/attributes to defaults (fg: '#FFF', bg: '#000' by default, customizable). In stream mode, this updates the persistent state for subsequent calls. In non-stream mode, it only affects that call's output. CLI usage (ansi-to-html command) automatically sets stream: true for streaming input like git log | ansi-to-html. No explicit public "reset" method exists; state reset occurs via ANSI reset codes or reinstantiating the Convert object. Issues note challenges with incomplete escape sequences in streams, but core behavior maintains state continuity in stream mode. Sources confirm via README/docs across GitHub, npm pages: stream option explicitly "Save style state across invocations of toHtml".

Citations:

1: https://github.com/rburns/ansi-to-html

2: https://registry.npmjs.org/ansi-to-html

3: https://npm.io/package/ansi-to-html

4: https://github.com/rburns/ansi-to-html/issues

5: https://www.npmjs.com/package/ansi-to-html

🏁 Script executed:

# Check if converter is used elsewhere in codebase rg "converter\." frontends/dashboard/

Repository: txpipe/metis

Length of output: 218

Don't reparse the whole log buffer with a streaming ANSI parser on every render.

AnsiToHtml is configured with stream: true (lines 161–165), which preserves parser state across toHtml() calls. The component stores the full raw log buffer and reconverts that entire string on every render (line 586), causing:

Style bleed across pod switches: When switching pods (line 229), the converter's state isn't reset, so colors and formatting from the previous pod apply to the new pod's logs.

Semantic mismatch: Stream mode expects incremental chunk processing, not reprocessing the same cumulative buffer repeatedly.

Truncation corruption: The 10k-character trim (line 280) uses slice(-10000) on raw ANSI text, which can bisect multi-byte escape sequences like \x1b[38;5;123m, causing malformed output.

Either disable streaming for render-time conversion, convert only incremental chunks before storing them, or reset the converter instance when switching pods.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@frontends/dashboard/src/routes/`$namespace/$name/index.tsx around lines 225 - 230, The component currently keeps a cumulative raw log buffer in logState and calls AnsiToHtml.toHtml on the whole buffer with stream: true, causing style bleed, semantic misuse of streaming mode, and truncation corruption; fix by (a) when switching pods (detect via activePodKey) reset or recreate the AnsiToHtml instance so streaming state is cleared, and stop reconverting the entire raw buffer on every render, or (b) change AnsiToHtml instantiation to stream: false for render-time conversion, or better yet convert incoming chunks incrementally before storing (store converted HTML chunks or keep per-pod parser state) so only new data is passed to AnsiToHtml.toHtml; update usages around logState, setLogState, activePodKey and the toHtml call accordingly to ensure escapes are never sliced mid-sequence.

feat: Metrics for cardano nodes

9a6bfc2

gonzalezzfelipe requested a review from mduthey April 15, 2026 14:06

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread extensions/apex-fusion/values.yaml

Comment thread frontends/dashboard/src/routes/$namespace/$name/index.tsx

gonzalezzfelipe added 3 commits April 15, 2026 11:16

fix ci

5ea2b48

fix ci

6a552d1

Move formatting utils to another file

1b6707d

mduthey approved these changes Apr 15, 2026

View reviewed changes

scarmuega approved these changes Apr 15, 2026

View reviewed changes

Get metrics from pod metrics endpoint

4980779

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

Refactor metrics into configmap and add genesis related metrics

bb6211c

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

gonzalezzfelipe merged commit b230e64 into main Apr 16, 2026
15 checks passed

gonzalezzfelipe deleted the feat/metrics-for-cnodes branch April 16, 2026 15:23

coderabbitai Bot mentioned this pull request Apr 22, 2026

chore: Add block producer migration skills #27

Merged

Conversation

gonzalezzfelipe commented Apr 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gonzalezzfelipe commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading