Skip to content

release-25.4: roachtest/clusterstats: ignore PromQL info-level annotations#171094

Merged
trunk-io[bot] merged 1 commit into
release-25.4from
blathers/backport-release-25.4-170947
May 29, 2026
Merged

release-25.4: roachtest/clusterstats: ignore PromQL info-level annotations#171094
trunk-io[bot] merged 1 commit into
release-25.4from
blathers/backport-release-25.4-170947

Conversation

@blathers-crl

@blathers-crl blathers-crl Bot commented May 28, 2026

Copy link
Copy Markdown

Backport 1/1 commits from #170947 on behalf of @miraradeva.


CollectPoint and CollectInterval treated any non-empty Prometheus
warnings slice as a fatal error. With Prometheus 2.53+, this slice
now includes informational annotations like

PromQL info: metric might not be a counter, name does not end in
_total/_sum/_count/_bucket: "sys_host_disk_write_bytes"

which is emitted for queries such as rate(sys_host_disk_write_bytes[1m])
and is purely a naming-convention nudge -- the query result is still
valid. After the recent bump to Prometheus 2.53.5 (#170676), this
broke admission-control/disk-bandwidth-limiter outright (it t.Fatals
on collection errors) and dropped bandwidth samples in admission-
control/{index,single-node-index}-backfill, which already log the
error and continue.

Extract a handlePromWarnings helper that classifies entries by the
"PromQL info:" wire-format prefix:

  • "PromQL info: ..." entries are logged and dropped.
  • "PromQL warning: ..." entries, and any unprefixed entries (e.g.
    legacy remote-read warnings, which predate the PromQL annotation
    system), continue to surface as the same error string as before.

Prometheus' annotations package defines exactly two sentinel error
types (PromQLInfo and PromQLWarning), so the prefix check is the
actual wire-format contract; the client_golang v1 API we use flattens
both into a single []string with no structured alternative.

Resolves: #170841
See also: #170790, #170793, #170843

Release note: None


Release justification: test-only fix

CollectPoint and CollectInterval treated any non-empty Prometheus
warnings slice as a fatal error. With Prometheus 2.53+, this slice
now includes informational annotations like

  PromQL info: metric might not be a counter, name does not end in
  _total/_sum/_count/_bucket: "sys_host_disk_write_bytes"

which is emitted for queries such as rate(sys_host_disk_write_bytes[1m])
and is purely a naming-convention nudge -- the query result is still
valid. After the recent bump to Prometheus 2.53.5 (#170676), this
broke admission-control/disk-bandwidth-limiter outright (it t.Fatals
on collection errors) and dropped bandwidth samples in admission-
control/{index,single-node-index}-backfill, which already log the
error and continue.

Extract a handlePromWarnings helper that classifies entries by the
"PromQL info:" wire-format prefix:

 - "PromQL info: ..." entries are logged and dropped.
 - "PromQL warning: ..." entries, and any unprefixed entries (e.g.
   legacy remote-read warnings, which predate the PromQL annotation
   system), continue to surface as the same error string as before.

Prometheus' annotations package defines exactly two sentinel error
types (PromQLInfo and PromQLWarning), so the prefix check is the
actual wire-format contract; the client_golang v1 API we use flattens
both into a single []string with no structured alternative.

Resolves: #170841
See also: #170790, #170793, #170843

Release note: None
@blathers-crl blathers-crl Bot requested a review from a team as a code owner May 28, 2026 09:14
@blathers-crl blathers-crl Bot requested review from golgeek and shailendra-patel and removed request for a team May 28, 2026 09:14
@blathers-crl blathers-crl Bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels May 28, 2026
@blathers-crl

blathers-crl Bot commented May 28, 2026

Copy link
Copy Markdown
Author

Thanks for opening a backport.

Before merging, please confirm that it falls into one of the following categories (select one):

  • Non-production code changes OR fixes for serious issues. Non-production includes test-only changes, build system changes, etc. Serious issues are defined in the policy as correctness, stability, or security issues, data corruption/loss, significant performance regressions, breaking working and widely used functionality, or an inability to detect and debug production issues.
  • Other approved changes. These changes must be gated behind a disabled-by-default feature flag unless there is a strong justification not to. Reference the approved ENGREQ ticket in the PR body (e.g., "Fixes ENGREQ-123").

Add a brief release justification to the PR description explaining your selection.

Also, confirm that the change does not break backward compatibility and complies with all aspects of the backport policy.

All backports must be reviewed by the TL and EM for the owning area.

@trunk-io

trunk-io Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

😎 Merged successfully - details.

@blathers-crl blathers-crl Bot requested review from cpj2195 and srosenberg May 28, 2026 09:14
@blathers-crl blathers-crl Bot added backport Label PR's that are backports to older release branches T-kv KV Team labels May 28, 2026
@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@blathers-crl

blathers-crl Bot commented May 28, 2026

Copy link
Copy Markdown
Author

✅ PR #171094 is compliant with backport policy

Confidence: high
Backward compatible: true
Explanation: The pull request is compliant with the backport policy, as it exclusively modifies files within the non-production path pkg/cmd/roachtest/, which is specified in the policy as being exempt from the standard backport requirements. The changes pertain to the roachtest framework, specifically addressing how Prometheus warnings are handled in test scenarios. These changes do not affect the production codebase and are directly related to test infrastructure and behavior, thus not requiring a critical bug justification or feature flag gating as outlined in the policy. The PR body also explicitly indicates this as a 'test-only fix' which further aligns it with the exception criteria.

ENGREQ Check Passed: No ENGREQ required (non-production code or serious issues).

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl Bot added the backport-test-only Used to denote the backport has only non-production changes label May 28, 2026
@trunk-io trunk-io Bot merged commit d7b0fd2 into release-25.4 May 29, 2026
23 checks passed
@trunk-io trunk-io Bot deleted the blathers/backport-release-25.4-170947 branch May 29, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Label PR's that are backports to older release branches backport-test-only Used to denote the backport has only non-production changes blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. T-kv KV Team target-release-25.4.12

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants