-
Notifications
You must be signed in to change notification settings - Fork 682
Open
Description
After deploying or rolling out deployment node-problem-detector, i encounter a kmsg channel closed error on some nodes.
As a result, kernel monitor based metrics are not collected on those nodes.
The affected nodes are not under any significant load, and there are essentially no kernel log messages being generated.
Environment
- Ubuntu Jammy (kernel 5.15.0-x)
- Ubuntu Noble (kernel 6.8.0-x)
log
I1030 06:41:19.605611 1 log_watchers.go:40] Use log watcher of plugin "kmsg"
<REDACTED>
I1030 06:41:19.605732 1 log_watchers.go:40] Use log watcher of plugin "filelog"
I1030 06:41:19.606512 1 k8s_exporter.go:54] Waiting for kube-apiserver to be ready (timeout 5m0s)...
I1030 06:41:19.614361 1 node_problem_detector.go:63] K8s exporter started.
I1030 06:41:19.614493 1 node_problem_detector.go:67] Prometheus exporter started.
I1030 06:41:19.614504 1 log_monitor.go:111] Start log monitor /custom-config/additional-filelog.json
I1030 06:41:19.614541 1 log_watcher.go:80] Start watching filelog
I1030 06:41:19.614549 1 log_monitor.go:111] Start log monitor /config/kernel-monitor.json
I1030 06:41:19.614613 1 log_monitor.go:236] Initialize condition generated: []
I1030 06:41:19.615573 1 log_monitor.go:111] Start log monitor /config/docker-monitor.json
I1030 06:41:19.615599 1 log_monitor.go:236] Initialize condition generated: [{Type:KernelDeadlock Status:False Transition:2025-10-30 06:41:19.615589877 +0000 UTC m=+0.055785295 Reason:KernelHasNoDeadlock Message:kernel has no deadlock} {Type:ReadonlyFilesystem Status:False Transition:2025-10-30 06:41:19.61558997 +0000 UTC m=+0.055785381 Reason:FilesystemIsNotReadOnly Message:Filesystem is not read-only}]
E1030 06:41:19.615656 1 log_watcher_linux.go:105] Kmsg channel closed
E1030 06:41:19.615696 1 log_monitor.go:137] Log channel closed: /config/kernel-monitor.json
I1030 06:41:19.619274 1 log_watcher.go:80] Start watching journald
I1030 06:41:19.619292 1 log_monitor.go:111] Start log monitor /config/systemd-monitor.json
I1030 06:41:19.619325 1 log_monitor.go:236] Initialize condition generated: [{Type:CorruptDockerOverlay2 Status:False Transition:2025-10-30 06:41:19.619318108 +0000 UTC m=+0.059513518 Reason:NoCorruptDockerOverlay2 Message:docker overlay2 is functioning properly}]
I1030 06:41:19.621986 1 log_watcher.go:80] Start watching journald
I1030 06:41:19.622011 1 log_monitor.go:111] Start log monitor /custom-config/additional.json
I1030 06:41:19.622120 1 log_monitor.go:236] Initialize condition generated: []
I1030 06:41:19.623024 1 problem_detector.go:76] Problem detector started
I1030 06:41:19.623053 1 log_monitor.go:236] Initialize condition generated: []
E1030 06:41:19.623115 1 log_watcher_linux.go:105] Kmsg channel closed
E1030 06:41:19.623138 1 log_monitor.go:137] Log channel closed: /custom-config/additional.json
reproduce
On the affected nodes, cat /dev/kmsg exits immediately
root@hostname:~# cat /dev/kmsg
6,2047,25658836,-;microcode: CPU 171: patch_level=0x0a0011d5
cat: /dev/kmsg: Broken pipe
What I've tried
- Running
dmesg -Cto consume kmsg has no effect. journalctl -kfis working normally.- running
echo -n "kerustest" > /dev/kmsgcan sometimes resolve the broken pipe problem.
Metadata
Metadata
Assignees
Labels
No labels