Conversation
📝 WalkthroughWalkthroughThis PR introduces metrics collection from Cardano node pods into the dashboard and updates the Helm charts to support this capability. It adds new Kubernetes exec utilities, configures metrics scripts in ConfigMaps for both Changes
Sequence Diagram(s)sequenceDiagram
participant Dashboard as Dashboard Component
participant QueryClient as TanStack Query
participant KubeAPI as Kubernetes API
participant PodExec as Pod Exec Stream
participant MetricsScript as metrics.sh Script
participant Prometheus as Prometheus Endpoint
Dashboard->>QueryClient: ensureQueryData(workloadDetailsQueryOptions)
QueryClient->>KubeAPI: getServerWorkloadPods(namespace, name)
KubeAPI->>KubeAPI: fetch pods & helm release
KubeAPI->>Dashboard: return pod list + helm metadata
Dashboard->>Dashboard: getNodeRoleFromHelmRelease(helmRelease)
loop For each pod (parallel)
Dashboard->>KubeAPI: execPodCommand(pod, 'metrics.sh')
KubeAPI->>PodExec: establish exec websocket
PodExec->>MetricsScript: execute /opt/metis/bin/metrics.sh
MetricsScript->>Prometheus: curl http://127.0.0.1:metricsPort/metrics
Prometheus-->>MetricsScript: Prometheus metrics text
MetricsScript->>MetricsScript: parse metrics, read genesis, compute derived
MetricsScript-->>PodExec: output JSON metrics object
PodExec-->>KubeAPI: stdout/stderr streams close
KubeAPI-->>Dashboard: { stdout: JSON metrics, stderr: errors }
end
Dashboard->>Dashboard: render metrics sections with formatMetricValue helpers
Dashboard-->>Dashboard: display in InfoCard components with tooltips
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
frontends/dashboard/src/utils/metrics.ts (1)
140-149: Consider parallelizing metric variant queries for performance.
getFirstMatchingMetricValuequeries metric variants sequentially. Since these are independent lookups for the same conceptual metric (just different naming conventions), you could query them in parallel and return the first non-null result.♻️ Parallel query approach
async function getFirstMatchingMetricValue(queries: string[]): Promise<number | null> { - for (const query of queries) { - const value = await getInstantMetricValue(query); - if (value !== null) { - return value; - } - } - - return null; + const results = await Promise.all(queries.map(getInstantMetricValue)); + return results.find(v => v !== null) ?? null; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontends/dashboard/src/utils/metrics.ts` around lines 140 - 149, getFirstMatchingMetricValue currently awaits getInstantMetricValue sequentially; change it to start all getInstantMetricValue calls concurrently (map queries -> promises) and then await their results (e.g., Promise.allSettled or Promise.all) and return the first non-null fulfilled value found, otherwise return null. Update getFirstMatchingMetricValue to fire promises for each query, handle failures/settled results safely, and preserve the Promise<number | null> return type; reference getFirstMatchingMetricValue and getInstantMetricValue when making the change.frontends/dashboard/src/components/ui/InfoTooltip.tsx (1)
37-39: Potential interaction conflict between click and focus/blur handlers.When clicking an already-focused button,
onClicktoggles the state, but the subsequent blur/focus cycle may immediately override it. On touch devices,onBlurfiring after a tap can close the tooltip unexpectedly. Consider simplifying to use only hover/focus for desktop and click-to-toggle for touch, or use a single interaction model.♻️ Suggested simplification using pointer events
<button ref={buttonRef} type="button" className="inline-flex cursor-help items-center text-[`#969FAB`] outline-none transition-colors hover:text-[`#64748B`] focus-visible:text-[`#64748B`]" aria-label="Show metric description" aria-describedby={open ? tooltipId : undefined} onFocus={() => setOpen(true)} - onBlur={() => setOpen(false)} - onClick={() => setOpen(prev => !prev)} + onBlur={(e) => { + // Don't close if clicking within the tooltip trigger + if (!e.currentTarget.contains(e.relatedTarget as Node)) { + setOpen(false); + } + }} onKeyDown={handleKeyDown} >🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontends/dashboard/src/components/ui/InfoTooltip.tsx` around lines 37 - 39, The handlers conflict because onClick toggles while onFocus/onBlur also set open and blur after a tap can immediately close the tooltip; replace the click handler with a unified pointer-based handler: remove onClick and add an onPointerDown/onPointerUp handler on the InfoTooltip trigger that inspects event.pointerType (touch vs mouse) and for pointerType "touch" toggles the tooltip via setOpen(prev=>!prev) while keeping onFocus/onBlur to open/close for keyboard/hover; ensure the pointer handler does not trigger a subsequent blur override (use event.preventDefault()/event.stopPropagation() as needed) so setOpen is the single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@extensions/apex-fusion/values.yaml`:
- Line 44: Revert the metrics port value for the apex-fusion extension by
changing the metricsPort setting (metricsPort: 12789) back to the Cardano
standard 12798 so it matches the cardano-node extension and allows Prometheus
scraping to work as expected; update the metricsPort entry in the apex-fusion
values.yaml to 12798.
In `@frontends/dashboard/src/routes/`$namespace/$name/index.tsx:
- Around line 274-289: The type error arises because activePod.name (string |
undefined) isn't narrowed inside the async streamLogs closure; capture the
validated values before defining/awaiting streamLogs by assigning const podName
= activePod.name, const namespace = activePodNamespace, and const containerName
= activePodContainerName after the initial if-check, then use those captured
locals in the async function and when calling streamWorkloadPodLogs so
TypeScript sees them as definite strings; update references to
activePod.name/activePodNamespace/activePodContainerName inside streamLogs to
use podName/namespace/containerName.
---
Nitpick comments:
In `@frontends/dashboard/src/components/ui/InfoTooltip.tsx`:
- Around line 37-39: The handlers conflict because onClick toggles while
onFocus/onBlur also set open and blur after a tap can immediately close the
tooltip; replace the click handler with a unified pointer-based handler: remove
onClick and add an onPointerDown/onPointerUp handler on the InfoTooltip trigger
that inspects event.pointerType (touch vs mouse) and for pointerType "touch"
toggles the tooltip via setOpen(prev=>!prev) while keeping onFocus/onBlur to
open/close for keyboard/hover; ensure the pointer handler does not trigger a
subsequent blur override (use event.preventDefault()/event.stopPropagation() as
needed) so setOpen is the single source of truth.
In `@frontends/dashboard/src/utils/metrics.ts`:
- Around line 140-149: getFirstMatchingMetricValue currently awaits
getInstantMetricValue sequentially; change it to start all getInstantMetricValue
calls concurrently (map queries -> promises) and then await their results (e.g.,
Promise.allSettled or Promise.all) and return the first non-null fulfilled value
found, otherwise return null. Update getFirstMatchingMetricValue to fire
promises for each query, handle failures/settled results safely, and preserve
the Promise<number | null> return type; reference getFirstMatchingMetricValue
and getInstantMetricValue when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 939826f5-5419-4386-8fab-62ac39415039
📒 Files selected for processing (8)
extensions/apex-fusion/values.yamlfrontends/dashboard/@types/workload.d.tsfrontends/dashboard/package.jsonfrontends/dashboard/src/components/ui/InfoTooltip.tsxfrontends/dashboard/src/routes/$namespace/$name/index.tsxfrontends/dashboard/src/utils/helm.tsfrontends/dashboard/src/utils/home/calls.tsfrontends/dashboard/src/utils/metrics.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
frontends/dashboard/src/utils/k8s.ts (1)
48-49: Consider reusingKubeConfig/Execinstead of recreating per call.On Line 48, loading config per
execPodCommandcall can become expensive when polling many pods. Prefer injecting or reusing a sharedExec/KubeConfiginstance from existing client setup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontends/dashboard/src/utils/k8s.ts` around lines 48 - 49, The code currently calls loadK8sConfig() and new Exec(kc) inside execPodCommand for every invocation (expensive when polling many pods); instead reuse a shared KubeConfig/Exec by either accepting an injected Exec or KubeConfig parameter on execPodCommand (e.g., execPodCommand(..., exec?: Exec | kc?: KubeConfig)) or create a lazy module-level singleton that calls loadK8sConfig() and new Exec(kc) once and reuses it; update execPodCommand to use the provided/shared Exec and adjust callers to pass the shared instance when available.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@frontends/dashboard/src/utils/k8s.ts`:
- Around line 80-103: The timeout handler currently rejects without releasing
resources; modify the setTimeout callback to call cleanup() before rejecting and
also explicitly close the websocket (ws.close()) to terminate the exec session;
ensure cleanup still clears the timeout and removes listeners (keeps
clearTimeout(timeout) in cleanup) so the timeout branch uses cleanup(),
ws.close(), then reject(new Error(`Timed out waiting for exec command in pod
${namespace}/${podName}`)); keep existing handleClose and handleError logic
unchanged.
---
Nitpick comments:
In `@frontends/dashboard/src/utils/k8s.ts`:
- Around line 48-49: The code currently calls loadK8sConfig() and new Exec(kc)
inside execPodCommand for every invocation (expensive when polling many pods);
instead reuse a shared KubeConfig/Exec by either accepting an injected Exec or
KubeConfig parameter on execPodCommand (e.g., execPodCommand(..., exec?: Exec |
kc?: KubeConfig)) or create a lazy module-level singleton that calls
loadK8sConfig() and new Exec(kc) once and reuses it; update execPodCommand to
use the provided/shared Exec and adjust callers to pass the shared instance when
available.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9625ff0b-8125-4be3-855b-ce4e46110952
📒 Files selected for processing (8)
.github/workflows/check_extensions.ymlextensions/dolos/ci/values-default.yamlfrontends/dashboard/@types/workload.d.tsfrontends/dashboard/src/routes/$namespace/$name/index.tsxfrontends/dashboard/src/utils/home/calls.tsfrontends/dashboard/src/utils/k8s.tsfrontends/dashboard/src/utils/metrics.tsfrontends/dashboard/src/utils/metricsFormat.ts
✅ Files skipped from review due to trivial changes (2)
- extensions/dolos/ci/values-default.yaml
- frontends/dashboard/@types/workload.d.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- frontends/dashboard/src/utils/home/calls.ts
| await new Promise<void>((resolve, reject) => { | ||
| const timeout = setTimeout(() => { | ||
| reject(new Error(`Timed out waiting for exec command in pod ${namespace}/${podName}`)); | ||
| }, 15000); | ||
|
|
||
| const cleanup = () => { | ||
| clearTimeout(timeout); | ||
| ws.off('close', handleClose); | ||
| ws.off('error', handleError); | ||
| }; | ||
|
|
||
| const handleClose = () => { | ||
| cleanup(); | ||
| resolve(); | ||
| }; | ||
|
|
||
| const handleError = (error: Error) => { | ||
| cleanup(); | ||
| reject(error); | ||
| }; | ||
|
|
||
| ws.on('close', handleClose); | ||
| ws.on('error', handleError); | ||
| }); |
There was a problem hiding this comment.
Timeout path does not clean up websocket/session resources.
On Line 82, timeout rejects without invoking cleanup(), and the websocket is never closed. This can leave in-pod exec sessions running and listeners attached after timeout.
Proposed fix
- await new Promise<void>((resolve, reject) => {
- const timeout = setTimeout(() => {
- reject(new Error(`Timed out waiting for exec command in pod ${namespace}/${podName}`));
- }, 15000);
+ await new Promise<void>((resolve, reject) => {
+ let settled = false;
+ const finish = (fn: () => void) => {
+ if (settled) return;
+ settled = true;
+ cleanup();
+ fn();
+ };
+ const timeout = setTimeout(() => {
+ try {
+ ws.close();
+ } catch {
+ // no-op
+ }
+ finish(() => reject(new Error(`Timed out waiting for exec command in pod ${namespace}/${podName}`)));
+ }, 15000);
@@
- const handleClose = () => {
- cleanup();
- resolve();
- };
+ const handleClose = () => finish(resolve);
@@
- const handleError = (error: Error) => {
- cleanup();
- reject(error);
- };
+ const handleError = (error: Error) => finish(() => reject(error));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frontends/dashboard/src/utils/k8s.ts` around lines 80 - 103, The timeout
handler currently rejects without releasing resources; modify the setTimeout
callback to call cleanup() before rejecting and also explicitly close the
websocket (ws.close()) to terminate the exec session; ensure cleanup still
clears the timeout and removes listeners (keeps clearTimeout(timeout) in
cleanup) so the timeout branch uses cleanup(), ws.close(), then reject(new
Error(`Timed out waiting for exec command in pod ${namespace}/${podName}`));
keep existing handleClose and handleError logic unchanged.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@extensions/apex-fusion/templates/statefulset.yaml`:
- Around line 130-133: The StatefulSet mounts metrics.sh via subPath so changes
to the ConfigMap don't trigger pod updates; add a metrics checksum annotation to
the pod template annotations (same pattern as the existing proxy/config
checksums) that computes a checksum of the metrics ConfigMap (the one providing
metrics.sh) so edits to configmap-metrics.yaml roll the pods. Update the
StatefulSet template that contains the metrics-scripts mount (reference
metrics.sh and the pod template annotations block where checksum/proxy and
checksum/config are defined) and apply the identical change to the other
StatefulSet that mounts metrics.sh.
In `@extensions/cardano-node/templates/configmap-metrics.yaml`:
- Around line 36-39: The metrics fetch can hang because curl/wget lack timeouts;
update the command substitution that sets the metrics variable (metrics="$(...")
to include explicit connect and total/read timeouts for curl (e.g.
--connect-timeout and --max-time or -m) and equivalent timeout flags for wget
(e.g. --timeout and --tries=1) so the pipeline fails fast and triggers
append_error "failed to read node metrics" instead of blocking; apply the same
change to the corresponding metrics command in the apex-fusion config as well.
In `@frontends/dashboard/src/routes/`$namespace/$name/index.tsx:
- Around line 225-230: The component currently keeps a cumulative raw log buffer
in logState and calls AnsiToHtml.toHtml on the whole buffer with stream: true,
causing style bleed, semantic misuse of streaming mode, and truncation
corruption; fix by (a) when switching pods (detect via activePodKey) reset or
recreate the AnsiToHtml instance so streaming state is cleared, and stop
reconverting the entire raw buffer on every render, or (b) change AnsiToHtml
instantiation to stream: false for render-time conversion, or better yet convert
incoming chunks incrementally before storing (store converted HTML chunks or
keep per-pod parser state) so only new data is passed to AnsiToHtml.toHtml;
update usages around logState, setLogState, activePodKey and the toHtml call
accordingly to ensure escapes are never sliced mid-sequence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: beecf8fc-1db8-419c-9d56-4319d34fd56b
📒 Files selected for processing (10)
extensions/apex-fusion/templates/_helpers.tplextensions/apex-fusion/templates/configmap-metrics.yamlextensions/apex-fusion/templates/statefulset.yamlextensions/cardano-node/templates/_helpers.tplextensions/cardano-node/templates/configmap-metrics.yamlextensions/cardano-node/templates/statefulset.yamlfrontends/dashboard/@types/workload.d.tsfrontends/dashboard/src/routes/$namespace/$name/index.tsxfrontends/dashboard/src/utils/home/calls.tsfrontends/dashboard/src/utils/metricsFormat.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- frontends/dashboard/src/utils/home/calls.ts
- frontends/dashboard/src/utils/metricsFormat.ts
| - name: metrics-scripts | ||
| mountPath: /opt/metis/bin/metrics.sh | ||
| subPath: metrics.sh | ||
| readOnly: true |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if the apex-fusion statefulset.yaml file exists and examine the relevant lines
find . -path "*/extensions/apex-fusion/templates/statefulset.yaml" -type f 2>/dev/null | head -20Repository: txpipe/metis
Length of output: 107
🏁 Script executed:
# Read the entire file to verify the specific lines and structure
cat -n ./extensions/apex-fusion/templates/statefulset.yamlRepository: txpipe/metis
Length of output: 11800
🏁 Script executed:
# Check if the cardano-node statefulset.yaml exists and examine relevant sections
find . -path "*/extensions/cardano-node/templates/statefulset.yaml" -type f 2>/dev/nullRepository: txpipe/metis
Length of output: 108
🏁 Script executed:
# Read the cardano-node statefulset.yaml file
cat -n ./extensions/cardano-node/templates/statefulset.yamlRepository: txpipe/metis
Length of output: 11649
🏁 Script executed:
# Let's verify the Kubernetes behavior regarding subPath ConfigMap mounts and live updates
# by searching web documentationRepository: txpipe/metis
Length of output: 38
🌐 Web query:
Kubernetes ConfigMap subPath mount live updates behavior
💡 Result:
Kubernetes ConfigMap volumes mounted with subPath do not receive live updates when the ConfigMap changes. The mounted file retains its original content until the Pod (or container) is restarted. This is a documented limitation. Regular ConfigMap volume mounts (without subPath) use symlinks that Kubernetes updates via the kubelet during periodic syncs (typically within ~1-2 minutes), enabling live propagation of changes to the mounted directory. Applications must watch for file changes or poll to detect updates. However, subPath mounts use bind-mounts for security isolation, which bind to the original inode and do not update automatically. This behavior is consistent across ConfigMaps, Secrets, and projected volumes using subPath. Official Kubernetes documentation explicitly states: "A container using a ConfigMap as a subPath volume mount will not receive ConfigMap updates." This note appears in the ConfigMaps concepts page. Workarounds include: - Avoid subPath for dynamic configs; mount the full ConfigMap volume to a separate directory and symlink. - Use tools like Reloader to trigger Pod restarts on ConfigMap changes. - Use immutable ConfigMaps with versioned names and update Deployments. No changes to this behavior appear in recent Kubernetes versions (up to 2026); it remains a known limitation since at least v1.10.
Citations:
- 1: https://kubernetes.io/docs/concepts/configuration/configmap/
- 2: ConfigMaps and Secrets mounted with subPath do not update when changed kubernetes/kubernetes#50345
- 3: Configmap mounted with subPath doesn't get reloaded automatically kubernetes/kubernetes#92296
- 4: https://stackoverflow.com/questions/48561338/how-to-correctly-mount-configmap-with-subpath-in-kubernetes-not-update-configs
- 5: https://kubernetes.io/docs/concepts/storage/projected-volumes/
Add metrics checksum to roll ConfigMap changes into pod updates.
metrics.sh is mounted via subPath at lines 130-133. Kubernetes does not refresh subPath mounts when ConfigMaps change; the mounted file retains its original content until the pod restarts. This StatefulSet only includes checksums for proxy and configuration (lines 37-42), so edits to configmap-metrics.yaml won't trigger a rollout. Add a metrics checksum annotation to match the existing pattern, and apply the same fix to extensions/cardano-node/templates/statefulset.yaml (lines 130-133 and 220-223).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@extensions/apex-fusion/templates/statefulset.yaml` around lines 130 - 133,
The StatefulSet mounts metrics.sh via subPath so changes to the ConfigMap don't
trigger pod updates; add a metrics checksum annotation to the pod template
annotations (same pattern as the existing proxy/config checksums) that computes
a checksum of the metrics ConfigMap (the one providing metrics.sh) so edits to
configmap-metrics.yaml roll the pods. Update the StatefulSet template that
contains the metrics-scripts mount (reference metrics.sh and the pod template
annotations block where checksum/proxy and checksum/config are defined) and
apply the identical change to the other StatefulSet that mounts metrics.sh.
| metrics="$(curl -s --fail "http://127.0.0.1:{{ .Values.service.metricsPort }}/metrics" || wget -qO- "http://127.0.0.1:{{ .Values.service.metricsPort }}/metrics" || true)" | ||
| if [ -z "$metrics" ]; then | ||
| append_error "failed to read node metrics" | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Locate the configmap-metrics.yaml files mentioned
find . -name "configmap-metrics.yaml" -type fRepository: txpipe/metis
Length of output: 172
🏁 Script executed:
# Check the exact content of the cardano-node file at lines 36-39
if [ -f "extensions/cardano-node/templates/configmap-metrics.yaml" ]; then
echo "=== cardano-node configmap-metrics.yaml (lines 30-45) ==="
sed -n '30,45p' "extensions/cardano-node/templates/configmap-metrics.yaml"
fiRepository: txpipe/metis
Length of output: 624
🏁 Script executed:
# Check if apex-fusion file exists and its metrics handling
if [ -f "extensions/apex-fusion/templates/configmap-metrics.yaml" ]; then
echo "=== apex-fusion configmap-metrics.yaml (lines 30-45) ==="
sed -n '30,45p' "extensions/apex-fusion/templates/configmap-metrics.yaml"
fiRepository: txpipe/metis
Length of output: 623
🏁 Script executed:
# Search for curl/wget timeout configurations in the repo
echo "=== Searching for timeout flags in curl/wget commands ==="
rg "(curl|wget).*--connect-timeout|--max-time|--timeout|-m|--read-timeout" --type yamlRepository: txpipe/metis
Length of output: 50368
🏁 Script executed:
# Search for dashboard-related code that might execute metrics.sh
echo "=== Searching for dashboard execution or metrics.sh references ==="
rg "metrics\.sh|exec.*metrics|workload.*details" --type yaml --type js --type ts -A 3Repository: txpipe/metis
Length of output: 1872
🏁 Script executed:
# Read the full metrics.sh section from cardano-node to check for any timeout wrappers
sed -n '1,50p' "extensions/cardano-node/templates/configmap-metrics.yaml" | tail -30Repository: txpipe/metis
Length of output: 1230
🏁 Script executed:
# Check if there's a timeout wrapper function or any timeout mechanism in the script
rg "timeout|TIMEOUT" extensions/cardano-node/templates/configmap-metrics.yamlRepository: txpipe/metis
Length of output: 38
🏁 Script executed:
# Verify the complete metrics fetch section with more context
sed -n '30,50p' "extensions/cardano-node/templates/configmap-metrics.yaml"Repository: txpipe/metis
Length of output: 948
🏁 Script executed:
# Get the full metrics.sh script to check for any outer timeout wrapper
wc -l "extensions/cardano-node/templates/configmap-metrics.yaml"Repository: txpipe/metis
Length of output: 116
🏁 Script executed:
# Check for timeout at script/function level
sed -n '1,100p' "extensions/cardano-node/templates/configmap-metrics.yaml"Repository: txpipe/metis
Length of output: 7677
Add timeouts to the local metrics fetch.
The curl/wget fallback at lines 36-39 lacks connect and read timeouts. If the node metrics endpoint stalls, the script will block indefinitely. Since the dashboard executes this script synchronously via bash -lc, a hung read will stall the workload details page. Add explicit timeout flags to both commands and apply the same fix to extensions/apex-fusion/templates/configmap-metrics.yaml.
Current code
metrics="$(curl -s --fail "http://127.0.0.1:{{ .Values.service.metricsPort }}/metrics" || wget -qO- "http://127.0.0.1:{{ .Values.service.metricsPort }}/metrics" || true)"
if [ -z "$metrics" ]; then
append_error "failed to read node metrics"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@extensions/cardano-node/templates/configmap-metrics.yaml` around lines 36 -
39, The metrics fetch can hang because curl/wget lack timeouts; update the
command substitution that sets the metrics variable (metrics="$(...") to include
explicit connect and total/read timeouts for curl (e.g. --connect-timeout and
--max-time or -m) and equivalent timeout flags for wget (e.g. --timeout and
--tries=1) so the pipeline fails fast and triggers append_error "failed to read
node metrics" instead of blocking; apply the same change to the corresponding
metrics command in the apex-fusion config as well.
| const [logState, setLogState] = useState(() => ({ | ||
| podKey: activePodKey, | ||
| value: '', | ||
| })); | ||
| const logs = logState.podKey === activePodKey ? logState.value : ''; | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "index.tsx" -path "*/routes/*" | grep -E "namespace.*name"Repository: txpipe/metis
Length of output: 181
🏁 Script executed:
cat -n frontends/dashboard/src/routes/\$namespace/\$name/index.tsx | head -100Repository: txpipe/metis
Length of output: 5640
🏁 Script executed:
cat -n frontends/dashboard/src/routes/\$namespace/\$name/index.tsx | sed -n '150,180p'Repository: txpipe/metis
Length of output: 1193
🏁 Script executed:
cat -n frontends/dashboard/src/routes/\$namespace/\$name/index.tsx | sed -n '220,240p'Repository: txpipe/metis
Length of output: 1045
🏁 Script executed:
cat -n frontends/dashboard/src/routes/\$namespace/\$name/index.tsx | sed -n '270,290p'Repository: txpipe/metis
Length of output: 692
🏁 Script executed:
cat -n frontends/dashboard/src/routes/\$namespace/\$name/index.tsx | sed -n '580,600p'Repository: txpipe/metis
Length of output: 1008
🏁 Script executed:
# Check if ansi-to-html is used and how
rg "ansi-to-html|AnsiToHtml|toHtml" frontends/dashboard/src/routes/\$namespace/\$name/index.tsx -B2 -A2Repository: txpipe/metis
Length of output: 588
🏁 Script executed:
# Check the package.json to see ansi-to-html version
cat frontends/dashboard/package.json | grep -A 2 -B 2 "ansi-to-html"Repository: txpipe/metis
Length of output: 216
🏁 Script executed:
# Check if there's any reset/close method being called
rg "converter\.(reset|close|clear)" frontends/dashboard/src/routes/\$namespace/\$name/index.tsxRepository: txpipe/metis
Length of output: 38
🏁 Script executed:
# Check ansi-to-html library documentation in node_modules or docs
find . -name "ansi-to-html" -type d | head -5Repository: txpipe/metis
Length of output: 38
🏁 Script executed:
# Search for any prior converter state management
rg "stream.*true|converter" frontends/dashboard/src/routes/\$namespace/\$name/index.tsx -B2 -A2Repository: txpipe/metis
Length of output: 407
🌐 Web query:
ansi-to-html stream mode state behavior reset method
💡 Result:
In the ansi-to-html library (rburns/ansi-to-html), stream mode is enabled by setting the stream: true option when creating a Convert instance. This saves style state (current colors, attributes like bold/underline) across multiple calls to toHtml, enabling proper handling of partial ANSI streams where chunks may split escape sequences or text. When stream: false (default), each toHtml call processes input independently: it starts from default/reset state, applies ANSI codes within the input, and closes all open spans at the end. The reset method (handling ANSI \x1b[0m or similar reset codes) sets colors/attributes to defaults (fg: '#FFF', bg: '#000' by default, customizable). In stream mode, this updates the persistent state for subsequent calls. In non-stream mode, it only affects that call's output. CLI usage (ansi-to-html command) automatically sets stream: true for streaming input like git log | ansi-to-html. No explicit public "reset" method exists; state reset occurs via ANSI reset codes or reinstantiating the Convert object. Issues note challenges with incomplete escape sequences in streams, but core behavior maintains state continuity in stream mode. Sources confirm via README/docs across GitHub, npm pages: stream option explicitly "Save style state across invocations of toHtml".
Citations:
- 1: https://github.com/rburns/ansi-to-html
- 2: https://registry.npmjs.org/ansi-to-html
- 3: https://npm.io/package/ansi-to-html
- 4: https://github.com/rburns/ansi-to-html/issues
- 5: https://www.npmjs.com/package/ansi-to-html
🏁 Script executed:
# Check if converter is used elsewhere in codebase
rg "converter\." frontends/dashboard/Repository: txpipe/metis
Length of output: 218
Don't reparse the whole log buffer with a streaming ANSI parser on every render.
AnsiToHtml is configured with stream: true (lines 161–165), which preserves parser state across toHtml() calls. The component stores the full raw log buffer and reconverts that entire string on every render (line 586), causing:
- Style bleed across pod switches: When switching pods (line 229), the converter's state isn't reset, so colors and formatting from the previous pod apply to the new pod's logs.
- Semantic mismatch: Stream mode expects incremental chunk processing, not reprocessing the same cumulative buffer repeatedly.
- Truncation corruption: The 10k-character trim (line 280) uses
slice(-10000)on raw ANSI text, which can bisect multi-byte escape sequences like\x1b[38;5;123m, causing malformed output.
Either disable streaming for render-time conversion, convert only incremental chunks before storing them, or reset the converter instance when switching pods.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@frontends/dashboard/src/routes/`$namespace/$name/index.tsx around lines 225 -
230, The component currently keeps a cumulative raw log buffer in logState and
calls AnsiToHtml.toHtml on the whole buffer with stream: true, causing style
bleed, semantic misuse of streaming mode, and truncation corruption; fix by (a)
when switching pods (detect via activePodKey) reset or recreate the AnsiToHtml
instance so streaming state is cleared, and stop reconverting the entire raw
buffer on every render, or (b) change AnsiToHtml instantiation to stream: false
for render-time conversion, or better yet convert incoming chunks incrementally
before storing (store converted HTML chunks or keep per-pod parser state) so
only new data is passed to AnsiToHtml.toHtml; update usages around logState,
setLogState, activePodKey and the toHtml call accordingly to ensure escapes are
never sliced mid-sequence.
Summary by CodeRabbit
Release Notes
New Features
Infrastructure