roachtest/clusterstats: ignore PromQL info-level annotations#170947
Conversation
CollectPoint and CollectInterval treated any non-empty Prometheus warnings slice as a fatal error. With Prometheus 2.53+, this slice now includes informational annotations like PromQL info: metric might not be a counter, name does not end in _total/_sum/_count/_bucket: "sys_host_disk_write_bytes" which is emitted for queries such as rate(sys_host_disk_write_bytes[1m]) and is purely a naming-convention nudge -- the query result is still valid. After the recent bump to Prometheus 2.53.5 (cockroachdb#170676), this broke admission-control/disk-bandwidth-limiter outright (it t.Fatals on collection errors) and dropped bandwidth samples in admission- control/{index,single-node-index}-backfill, which already log the error and continue. Extract a handlePromWarnings helper that classifies entries by the "PromQL info:" wire-format prefix: - "PromQL info: ..." entries are logged and dropped. - "PromQL warning: ..." entries, and any unprefixed entries (e.g. legacy remote-read warnings, which predate the PromQL annotation system), continue to surface as the same error string as before. Prometheus' annotations package defines exactly two sentinel error types (PromQLInfo and PromQLWarning), so the prefix check is the actual wire-format contract; the client_golang v1 API we use flattens both into a single []string with no structured alternative. Resolves: cockroachdb#170841 See also: cockroachdb#170790, cockroachdb#170793, cockroachdb#170843 Release note: None
|
😎 Merged successfully - details. |
|
@cockroachdb/test-eng I gave this a shot, but I'm happy to scrap this if you have a better fix in mind. |
srosenberg
left a comment
There was a problem hiding this comment.
Thanks! It looks like there is an attempt to separate infos from warnings [1], but it hasn't landed as of yet. Meanwhile, the infos filter does the job.
|
/trunk merge |
|
blathers backport all |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. Branch release-all doesn't exist, skipping Backport to branch all failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
blathers backport 26.2 26.1 25.4 25.2 |
|
Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches. Issue #170841: branch-release-25.2, branch-release-25.4, branch-release-26.1, branch-release-26.2. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
CollectPoint and CollectInterval treated any non-empty Prometheus
warnings slice as a fatal error. With Prometheus 2.53+, this slice
now includes informational annotations like
PromQL info: metric might not be a counter, name does not end in
_total/_sum/_count/_bucket: "sys_host_disk_write_bytes"
which is emitted for queries such as rate(sys_host_disk_write_bytes[1m])
and is purely a naming-convention nudge -- the query result is still
valid. After the recent bump to Prometheus 2.53.5 (#170676), this
broke admission-control/disk-bandwidth-limiter outright (it t.Fatals
on collection errors) and dropped bandwidth samples in admission-
control/{index,single-node-index}-backfill, which already log the
error and continue.
Extract a handlePromWarnings helper that classifies entries by the
"PromQL info:" wire-format prefix:
legacy remote-read warnings, which predate the PromQL annotation
system), continue to surface as the same error string as before.
Prometheus' annotations package defines exactly two sentinel error
types (PromQLInfo and PromQLWarning), so the prefix check is the
actual wire-format contract; the client_golang v1 API we use flattens
both into a single []string with no structured alternative.
Resolves: #170841
See also: #170790, #170793, #170843
Release note: None