release-26.1: roachtest/clusterstats: ignore PromQL info-level annotations#171093
Merged
trunk-io[bot] merged 1 commit intoMay 29, 2026
Merged
Conversation
CollectPoint and CollectInterval treated any non-empty Prometheus warnings slice as a fatal error. With Prometheus 2.53+, this slice now includes informational annotations like PromQL info: metric might not be a counter, name does not end in _total/_sum/_count/_bucket: "sys_host_disk_write_bytes" which is emitted for queries such as rate(sys_host_disk_write_bytes[1m]) and is purely a naming-convention nudge -- the query result is still valid. After the recent bump to Prometheus 2.53.5 (#170676), this broke admission-control/disk-bandwidth-limiter outright (it t.Fatals on collection errors) and dropped bandwidth samples in admission- control/{index,single-node-index}-backfill, which already log the error and continue. Extract a handlePromWarnings helper that classifies entries by the "PromQL info:" wire-format prefix: - "PromQL info: ..." entries are logged and dropped. - "PromQL warning: ..." entries, and any unprefixed entries (e.g. legacy remote-read warnings, which predate the PromQL annotation system), continue to surface as the same error string as before. Prometheus' annotations package defines exactly two sentinel error types (PromQLInfo and PromQLWarning), so the prefix check is the actual wire-format contract; the client_golang v1 API we use flattens both into a single []string with no structured alternative. Resolves: #170841 See also: #170790, #170793, #170843 Release note: None
561bc82 to
d4a7c8f
Compare
Contributor
|
😎 Merged successfully - details. |
Author
|
Thanks for opening a backport. Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate. |
Member
pav-kv
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport 1/1 commits from #170947 on behalf of @miraradeva.
CollectPoint and CollectInterval treated any non-empty Prometheus
warnings slice as a fatal error. With Prometheus 2.53+, this slice
now includes informational annotations like
PromQL info: metric might not be a counter, name does not end in
_total/_sum/_count/_bucket: "sys_host_disk_write_bytes"
which is emitted for queries such as rate(sys_host_disk_write_bytes[1m])
and is purely a naming-convention nudge -- the query result is still
valid. After the recent bump to Prometheus 2.53.5 (#170676), this
broke admission-control/disk-bandwidth-limiter outright (it t.Fatals
on collection errors) and dropped bandwidth samples in admission-
control/{index,single-node-index}-backfill, which already log the
error and continue.
Extract a handlePromWarnings helper that classifies entries by the
"PromQL info:" wire-format prefix:
legacy remote-read warnings, which predate the PromQL annotation
system), continue to surface as the same error string as before.
Prometheus' annotations package defines exactly two sentinel error
types (PromQLInfo and PromQLWarning), so the prefix check is the
actual wire-format contract; the client_golang v1 API we use flattens
both into a single []string with no structured alternative.
Resolves: #170841
See also: #170790, #170793, #170843
Release note: None
Release justification: test-only fix