Skip to content

[server] Add OTel metrics to RocksDBStats#2731

Merged
m-nagarajan merged 5 commits into
linkedin:mainfrom
m-nagarajan:mnagaraj/addOtelForRocksDbStats
May 5, 2026
Merged

[server] Add OTel metrics to RocksDBStats#2731
m-nagarajan merged 5 commits into
linkedin:mainfrom
m-nagarajan:mnagaraj/addOtelForRocksDbStats

Conversation

@m-nagarajan
Copy link
Copy Markdown
Contributor

@m-nagarajan m-nagarajan commented Apr 13, 2026

Problem Statement

RocksDBStats (gated by rocksdb.statistics.enabled, default false) reports 27 Tehuti AsyncGauge sensors for RocksDB ticker counters but has no OTel metrics. This prevents migration to OTel-based monitoring for RocksDB block cache, memtable, bloom filter, compaction, and read amplification statistics.

Solution

Add 13 OTel metrics alongside the existing 27 Tehuti sensors, consolidating per-component metrics using dimensions:

  • 4 per-component block cache metrics (miss.count, hit.count, add.count, bytes.inserted) with VENICE_ROCKSDB_BLOCK_CACHE_COMPONENT dimension (INDEX/FILTER/DATA/COMPRESSION_DICT) — consolidates 16 Tehuti sensors into 4 OTel metrics
  • 7 joint Tehuti+OTel metrics for block cache add failures, bytes read/write, bloom filter, memtable hit/miss, and compaction cancelled
  • 1 per-level get-hit metric (get.hit.count) with VENICE_ROCKSDB_LEVEL dimension (LEVEL_0/LEVEL_1/LEVEL_2_AND_UP) — consolidates 3 Tehuti sensors into 1 OTel metric
  • 1 read amplification factor using ASYNC_DOUBLE_GAUGE with split callbacks (Tehuti returns -1 sentinel, OTel returns NaN when unavailable so SDK drops data point)

Overall block cache miss/hit/add counters remain Tehuti-only — OTel totals are derivable via sum by (component). Block cache hit ratio is also Tehuti-only (derivable from per-component OTel gauges).

Tehuti → OTel metric mapping (click to expand)
Tehuti Sensor OTel Metric OTel Dimensions Notes
rocksdb_block_cache_index_miss rocksdb.block_cache.miss.count COMPONENT=INDEX Per-component
rocksdb_block_cache_filter_miss rocksdb.block_cache.miss.count COMPONENT=FILTER Per-component
rocksdb_block_cache_data_miss rocksdb.block_cache.miss.count COMPONENT=DATA Per-component
rocksdb_block_cache_compression_dict_miss rocksdb.block_cache.miss.count COMPONENT=COMPRESSION_DICT Per-component
rocksdb_block_cache_index_hit rocksdb.block_cache.hit.count COMPONENT=INDEX Per-component
rocksdb_block_cache_filter_hit rocksdb.block_cache.hit.count COMPONENT=FILTER Per-component
rocksdb_block_cache_data_hit rocksdb.block_cache.hit.count COMPONENT=DATA Per-component
rocksdb_block_cache_compression_dict_hit rocksdb.block_cache.hit.count COMPONENT=COMPRESSION_DICT Per-component
rocksdb_block_cache_index_add rocksdb.block_cache.add.count COMPONENT=INDEX Per-component
rocksdb_block_cache_filter_add rocksdb.block_cache.add.count COMPONENT=FILTER Per-component
rocksdb_block_cache_data_add rocksdb.block_cache.add.count COMPONENT=DATA Per-component
rocksdb_block_cache_compression_dict_add rocksdb.block_cache.add.count COMPONENT=COMPRESSION_DICT Per-component
rocksdb_block_cache_index_bytes_insert rocksdb.block_cache.bytes.inserted COMPONENT=INDEX Per-component
rocksdb_block_cache_filter_bytes_insert rocksdb.block_cache.bytes.inserted COMPONENT=FILTER Per-component
rocksdb_block_cache_data_bytes_insert rocksdb.block_cache.bytes.inserted COMPONENT=DATA Per-component
rocksdb_block_cache_compression_dict_bytes_insert rocksdb.block_cache.bytes.inserted COMPONENT=COMPRESSION_DICT Per-component
rocksdb_block_cache_add_failures rocksdb.block_cache.add.failure.count Joint
rocksdb_block_cache_bytes_read rocksdb.block_cache.bytes.read Joint
rocksdb_block_cache_bytes_write rocksdb.block_cache.bytes.written Joint
rocksdb_bloom_filter_useful rocksdb.bloom_filter.useful_count Joint
rocksdb_memtable_hit rocksdb.memtable.hit.count Joint
rocksdb_memtable_miss rocksdb.memtable.miss.count Joint
rocksdb_compaction_cancelled rocksdb.compaction.cancelled_count Joint
rocksdb_get_hit_l0 rocksdb.get.hit.count LEVEL=LEVEL_0 Per-level
rocksdb_get_hit_l1 rocksdb.get.hit.count LEVEL=LEVEL_1 Per-level
rocksdb_get_hit_l2_and_up rocksdb.get.hit.count LEVEL=LEVEL_2_AND_UP Per-level
rocksdb_read_amplification_factor rocksdb.read_amplification_factor ASYNC_DOUBLE_GAUGE
rocksdb_block_cache_miss Tehuti-only (OTel derives from sum)
rocksdb_block_cache_hit Tehuti-only (OTel derives from sum)
rocksdb_block_cache_add Tehuti-only (OTel derives from sum)
rocksdb_block_cache_hit_ratio Tehuti-only (derivable from OTel gauges)

Dimension-to-TickerType mappings use static EnumMaps. A missing mapping causes the liveStateResolver to return null (dormant — no data point emitted) rather than throwing; completeness is enforced by per-enum CI tests (testEveryVeniceRocksDBLevelEmitsForGetHitCount, testEveryVeniceRocksDBBlockCacheComponentEmitsForAllPerComponentMetrics) so adding a new VeniceRocksDBLevel or VeniceRocksDBBlockCacheComponent value without updating the ticker map fails CI rather than spamming failure metrics every collection cycle.

Also fixes: AggRocksDBStats lambda was capturing the outer metricsRepository constructor parameter instead of the lambda parameter metricsRepo, and adds null guard to setRocksDBStat. rocksDBStat field made volatile for JMM correctness. RocksDBStats does NOT propagate isTotalStats() to OpenTelemetryMetricsSetupAggRocksDBStats is fire-and-forget in VeniceServer.java, so the totalStats instance is the only RocksDBStats ever created in production. Propagating isTotalStats(true) would suppress every RocksDB OTel emission. This matches the pattern used by RocksDBMemoryStats (the closest single-instance analog).

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

New test files:

  • RocksDBStatsOtelTest (15 tests): per-component dimension validation for all 4 block cache metrics × 4 components, joint metrics, SST-level dimensions, read amplification ASYNC_DOUBLE_GAUGE with NaN edge case, pre-init behavior (joint AsyncMetricEntityStateBase metrics emit -1 sentinel; per-component / per-level AsyncMetricEntityStateOneEnum metrics honor the dormant contract and emit no data point), per-enum completeness guards (testEveryVeniceRocksDBLevelEmitsForGetHitCount, testEveryVeniceRocksDBBlockCacheComponentEmitsForAllPerComponentMetrics), isTotalStats regression guard (testOtelEmitsForTotalStatsName constructs RocksDBStats(repo, "total", ...) and asserts emission so re-introducing isTotalStats() propagation fails CI), NPE prevention (OTel disabled + plain MetricsRepository). Tests use a dedicated AsyncGauge.AsyncGaugeExecutor per VeniceMetricsRepository so close() doesn't shut down the JVM-wide singleton executor and break later AsyncGauge tests.
  • RocksDBStatsTehutiTest (5 tests): Tehuti sensor baseline (ticker values, hit ratio formula, read amplification -1 sentinel, NaN on zero denominator, setRocksDBStat guards)
  • RocksDBStatsOtelMetricEntityTest: validates all 13 entity definitions via ModuleMetricEntityTestFixture
  • VeniceRocksDBBlockCacheComponentTest + VeniceRocksDBLevelTest: dimension enum validation via VeniceDimensionTestFixture
  • VeniceMetricsDimensionsTest: updated for both new dimensions across all 3 naming formats
  • ServerMetricEntityTest: count updated from 159 to 172

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

Copilot AI review requested due to automatic review settings April 13, 2026 10:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenTelemetry (OTel) metric coverage for RocksDB ticker-based stats to enable migration from Tehuti-only monitoring, including new dimension enums for per-component/per-level breakdowns.

Changes:

  • Add 13 new OTel metric entities for RocksDB ticker stats (per-component block cache, per-level get-hit, joint counters, and read amplification).
  • Wire the new metric entities into server metric aggregation and fix AggRocksDBStats to pass the correct MetricsRepository/cluster name.
  • Add new dimension enums + comprehensive unit tests for both Tehuti and OTel reporting paths.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
services/venice-server/src/main/java/com/linkedin/venice/stats/RocksDBStats.java Registers Tehuti ticker gauges and adds OTel async metrics with component/level dimensions + read amplification behavior.
services/venice-server/src/main/java/com/linkedin/venice/stats/AggRocksDBStats.java Fixes factory lambda to use the lambda repo parameter and passes cluster name into RocksDBStats.
services/venice-server/src/test/java/com/linkedin/venice/stats/RocksDBStatsTehutiTest.java Adds Tehuti-baseline tests for ticker gauges, hit ratio edge case, read amplification sentinel behavior, and setter guards.
services/venice-server/src/test/java/com/linkedin/venice/stats/RocksDBStatsOtelTest.java Adds OTel emission tests validating attributes/dimensions, NaN-drop behavior, and NPE safety when OTel is disabled/plain repo.
internal/venice-client-common/src/main/java/com/linkedin/venice/stats/dimensions/VeniceRocksDBLevel.java Introduces VENICE_ROCKSDB_LEVEL dimension enum.
internal/venice-client-common/src/main/java/com/linkedin/venice/stats/dimensions/VeniceRocksDBBlockCacheComponent.java Introduces VENICE_ROCKSDB_BLOCK_CACHE_COMPONENT dimension enum.
internal/venice-client-common/src/main/java/com/linkedin/venice/stats/dimensions/VeniceMetricsDimensions.java Adds the two new dimension keys to the shared dimension registry.
internal/venice-client-common/src/test/java/com/linkedin/venice/stats/dimensions/VeniceRocksDBLevelTest.java Validates VeniceRocksDBLevel dimension values via the dimension fixture.
internal/venice-client-common/src/test/java/com/linkedin/venice/stats/dimensions/VeniceRocksDBBlockCacheComponentTest.java Validates VeniceRocksDBBlockCacheComponent dimension values via the dimension fixture.
internal/venice-client-common/src/test/java/com/linkedin/venice/stats/dimensions/VeniceMetricsDimensionsTest.java Updates naming-format assertions for the two new dimensions (snake/camel/pascal).
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/RocksDBStatsOtelMetricEntity.java Defines the new RocksDBStats OTel MetricEntity set (13 entities).
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/ServerMetricEntity.java Registers RocksDBStatsOtelMetricEntity into the server metric entity aggregation list.
clients/da-vinci-client/src/test/java/com/linkedin/davinci/stats/RocksDBStatsOtelMetricEntityTest.java Adds entity-definition validation for all new metric entities/dimensions.
clients/da-vinci-client/src/test/java/com/linkedin/davinci/stats/ServerMetricEntityTest.java Updates the expected aggregated entity count after adding the new enum.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/venice-server/src/main/java/com/linkedin/venice/stats/RocksDBStats.java Outdated
@m-nagarajan
Copy link
Copy Markdown
Contributor Author

services/venice-server/src/main/java/com/linkedin/venice/stats/RocksDBStats.java:829

💡 [SUGGESTION] Double-lambda wrapper in registerJointTickerMetric is unnecessary — the LongSupplier can be passed directly to AsyncGauge

The helper method creates a LongSupplier callback, then wraps it in a second lambda for Tehuti:

LongSupplier callback = () -> rocksDBStat != null ? rocksDBStat.getTickerCount(tickerType) : -1;
registerSensorIfAbsent(new AsyncGauge((ig, ig2) -> callback.getAsLong(), tehutiSensorName));

AsyncGauge accepts any MetricValueProvider, and the (ig, ig2) -> callback.getAsLong() wrapper only exists to bridge LongSupplierMetricValueProvider. The same approach is already used in registerTickerSensor with a direct inline lambda. The callback variable is valid because it's reused for both Tehuti and OTel, but the Tehuti wrapper could be:

registerSensorIfAbsent(new AsyncGauge((ig, ig2) -> callback.getAsLong(), tehutiSensorName));

…which is what the code already does, so no change needed. Actually the issue is that the callback could be used directly as an AsyncGauge if AsyncGauge accepted a LongSupplier. Since it doesn't, the current two-lambda approach is the minimal implementation. Minor note only — no code change needed.

@m-nagarajan
Copy link
Copy Markdown
Contributor Author

services/venice-server/src/main/java/com/linkedin/venice/stats/RocksDBStats.java:660

💡 [SUGGESTION] Overall block cache miss/hit/add OTel derivability should be documented at the caller site

The three overall block cache counters (BLOCK_CACHE_MISS, BLOCK_CACHE_HIT, BLOCK_CACHE_ADD) are registered as Tehuti-only with a comment "OTel derives totals via sum by (component)". The design is sound and documented in the class Javadoc, but the inline comments at the registration sites (lines 661–663) only say "Tehuti-only, OTel derives from per-component sum". For operators migrating alert rules from Tehuti to OTel, knowing the exact PromQL/OTel aggregation query would reduce friction. The class-level Javadoc already describes this, but the inline comment at the registration could reference it.

Suggested fix:
Add a short comment at the registerTickerSensor("rocksdb_block_cache_miss", ...) etc. lines pointing back to the class Javadoc: // Tehuti-only; OTel total derivable via: sum(rocksdb.block_cache.miss.count by cluster)

@m-nagarajan m-nagarajan added the needs-reviewer Looking for a reviewer to pick this up label Apr 20, 2026
Add 13 OTel metrics alongside existing 27 Tehuti sensors for RocksDB
ticker-based statistics (gated by rocksdb.statistics.enabled).

Per-component block cache metrics (miss, hit, add, bytes_inserted) are
consolidated into 4 OTel metrics with a BLOCK_CACHE_COMPONENT dimension
(INDEX/FILTER/DATA/COMPRESSION_DICT). Get-hit-by-level uses a LEVEL
dimension (LEVEL_0/LEVEL_1/LEVEL_2_AND_UP). Overall block cache
miss/hit/add are Tehuti-only (OTel derives via sum by component).
Block cache hit ratio is Tehuti-only (derivable from per-component
OTel gauges). Read amplification uses ASYNC_DOUBLE_GAUGE with split
callbacks (Tehuti returns -1, OTel returns NaN when unavailable).

Also fixes: AggRocksDBStats lambda capturing outer metricsRepository
instead of lambda parameter, and adds null guard to setRocksDBStat.
- Make rocksDBStat volatile for JMM correctness
- Rename block cache byte metrics for consistency: bytes.read, bytes.written
(matches bytes.inserted)
Copilot AI review requested due to automatic review settings May 2, 2026 02:07
@m-nagarajan m-nagarajan force-pushed the mnagaraj/addOtelForRocksDbStats branch from ed688f6 to 80deab0 Compare May 2, 2026 02:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/venice-server/src/main/java/com/linkedin/venice/stats/RocksDBStats.java Outdated
@sushantmane sushantmane added review-in-progress Reviewer is actively reviewing and removed needs-reviewer Looking for a reviewer to pick this up labels May 4, 2026
sushantmane
sushantmane previously approved these changes May 4, 2026
Copy link
Copy Markdown
Contributor

@sushantmane sushantmane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sushantmane sushantmane added approved PR has been approved and removed review-in-progress Reviewer is actively reviewing labels May 4, 2026
# Conflicts:
#	clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/ServerMetricEntity.java
#	clients/da-vinci-client/src/test/java/com/linkedin/davinci/stats/ServerMetricEntityTest.java
#	internal/venice-client-common/src/main/java/com/linkedin/venice/stats/dimensions/VeniceMetricsDimensions.java
#	internal/venice-client-common/src/test/java/com/linkedin/venice/stats/dimensions/VeniceMetricsDimensionsTest.java
Copilot AI review requested due to automatic review settings May 5, 2026 07:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

services/venice-server/src/main/java/com/linkedin/venice/stats/RocksDBStats.java:318

  • setRocksDBStat is still racy under concurrent callers: two threads can both observe rocksDBStat == null, both skip the exception, and the later write silently overwrites the first one. volatile fixes visibility for readers but does not make this one-time initialization check atomic.
  public void setRocksDBStat(Statistics stat) {
    if (stat == null) {
      throw new IllegalArgumentException("'stat' must not be null");
    }
    if (this.rocksDBStat != null) {
      throw new VeniceException("'rocksDBStat' has already been initialized");
    }
    this.rocksDBStat = stat;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@m-nagarajan m-nagarajan merged commit 9cd7569 into linkedin:main May 5, 2026
110 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved PR has been approved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants