Skip to content

roachtest/clusterstats: ignore PromQL info-level annotations#170947

Merged
trunk-io[bot] merged 1 commit into
cockroachdb:masterfrom
miraradeva:mira/fix-prom-warnings-collector
May 27, 2026
Merged

roachtest/clusterstats: ignore PromQL info-level annotations#170947
trunk-io[bot] merged 1 commit into
cockroachdb:masterfrom
miraradeva:mira/fix-prom-warnings-collector

Conversation

@miraradeva

Copy link
Copy Markdown
Contributor

CollectPoint and CollectInterval treated any non-empty Prometheus
warnings slice as a fatal error. With Prometheus 2.53+, this slice
now includes informational annotations like

PromQL info: metric might not be a counter, name does not end in
_total/_sum/_count/_bucket: "sys_host_disk_write_bytes"

which is emitted for queries such as rate(sys_host_disk_write_bytes[1m])
and is purely a naming-convention nudge -- the query result is still
valid. After the recent bump to Prometheus 2.53.5 (#170676), this
broke admission-control/disk-bandwidth-limiter outright (it t.Fatals
on collection errors) and dropped bandwidth samples in admission-
control/{index,single-node-index}-backfill, which already log the
error and continue.

Extract a handlePromWarnings helper that classifies entries by the
"PromQL info:" wire-format prefix:

  • "PromQL info: ..." entries are logged and dropped.
  • "PromQL warning: ..." entries, and any unprefixed entries (e.g.
    legacy remote-read warnings, which predate the PromQL annotation
    system), continue to surface as the same error string as before.

Prometheus' annotations package defines exactly two sentinel error
types (PromQLInfo and PromQLWarning), so the prefix check is the
actual wire-format contract; the client_golang v1 API we use flattens
both into a single []string with no structured alternative.

Resolves: #170841
See also: #170790, #170793, #170843

Release note: None

CollectPoint and CollectInterval treated any non-empty Prometheus
warnings slice as a fatal error. With Prometheus 2.53+, this slice
now includes informational annotations like

  PromQL info: metric might not be a counter, name does not end in
  _total/_sum/_count/_bucket: "sys_host_disk_write_bytes"

which is emitted for queries such as rate(sys_host_disk_write_bytes[1m])
and is purely a naming-convention nudge -- the query result is still
valid. After the recent bump to Prometheus 2.53.5 (cockroachdb#170676), this
broke admission-control/disk-bandwidth-limiter outright (it t.Fatals
on collection errors) and dropped bandwidth samples in admission-
control/{index,single-node-index}-backfill, which already log the
error and continue.

Extract a handlePromWarnings helper that classifies entries by the
"PromQL info:" wire-format prefix:

 - "PromQL info: ..." entries are logged and dropped.
 - "PromQL warning: ..." entries, and any unprefixed entries (e.g.
   legacy remote-read warnings, which predate the PromQL annotation
   system), continue to surface as the same error string as before.

Prometheus' annotations package defines exactly two sentinel error
types (PromQLInfo and PromQLWarning), so the prefix check is the
actual wire-format contract; the client_golang v1 API we use flattens
both into a single []string with no structured alternative.

Resolves: cockroachdb#170841
See also: cockroachdb#170790, cockroachdb#170793, cockroachdb#170843

Release note: None
@miraradeva miraradeva requested a review from a team as a code owner May 26, 2026 19:20
@miraradeva miraradeva requested review from cpj2195 and srosenberg and removed request for a team May 26, 2026 19:20
@trunk-io

trunk-io Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

😎 Merged successfully - details.

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@miraradeva

Copy link
Copy Markdown
Contributor Author

@cockroachdb/test-eng I gave this a shot, but I'm happy to scrap this if you have a better fix in mind.

@srosenberg srosenberg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It looks like there is an attempt to separate infos from warnings [1], but it hasn't landed as of yet. Meanwhile, the infos filter does the job.

[1] prometheus/client_golang#1963

@miraradeva

Copy link
Copy Markdown
Contributor Author

/trunk merge

@trunk-io trunk-io Bot merged commit 744a820 into cockroachdb:master May 27, 2026
24 of 25 checks passed
@pav-kv

pav-kv commented May 28, 2026

Copy link
Copy Markdown
Collaborator

blathers backport all

@blathers-crl

blathers-crl Bot commented May 28, 2026

Copy link
Copy Markdown

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


Branch release-all doesn't exist, skipping

Backport to branch all failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@pav-kv

pav-kv commented May 28, 2026

Copy link
Copy Markdown
Collaborator

blathers backport 26.2 26.1 25.4 25.2

@blathers-crl

blathers-crl Bot commented May 28, 2026

Copy link
Copy Markdown

Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches.


Issue #170841: branch-release-25.2, branch-release-25.4, branch-release-26.1, branch-release-26.2.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: admission-control/disk-bandwidth-limiter failed

4 participants