Skip to content

[da-vinci] Add OTel metrics to AggVersionedDaVinciRecordTransformerStats#2725

Merged
m-nagarajan merged 6 commits into
linkedin:mainfrom
m-nagarajan:mnagaraj/addOtelMetricsForAggVersionedDaVinciRecordTransformerStats
May 6, 2026
Merged

[da-vinci] Add OTel metrics to AggVersionedDaVinciRecordTransformerStats#2725
m-nagarajan merged 6 commits into
linkedin:mainfrom
m-nagarajan:mnagaraj/addOtelMetricsForAggVersionedDaVinciRecordTransformerStats

Conversation

@m-nagarajan
Copy link
Copy Markdown
Contributor

@m-nagarajan m-nagarajan commented Apr 12, 2026

Problem Statement

DaVinciRecordTransformerStats has 4 Tehuti sensors for record transformer operations (put/delete latency and error counts) but no OTel counterparts, making these metrics unavailable in OTel-based monitoring dashboards.

Solution

Add 2 OTel metrics to AggVersionedDaVinciRecordTransformerStats with a VENICE_RECORD_TRANSFORMER_OPERATION dimension (PUT/DELETE):

  • record_transformer.latency (HISTOGRAM) — DaVinci record transformer operation latency
  • record_transformer.error_count (COUNTER) — DaVinci record transformer operation error count

Uses the separate Tehuti+OTel API: Tehuti metrics are managed by the Reporter layer (DaVinciRecordTransformerStatsReporter) with AsyncGauge polling, while OTel records directly at the call point in AggVersionedDaVinciRecordTransformerStats. This follows the established pattern for versioned stats where Tehuti and OTel have fundamentally different recording paths.

Key design decisions:

  • Per-store VeniceConcurrentHashMap maps for latency and error count metrics, cleaned up in handleStoreDeleted.
  • private final boolean emitOtelMetrics cached at construction; recordOtelLatency and recordOtelErrorCount early-return when OTel is disabled, so the per-store maps are never populated in non-OTel deployments. Mirrors the established AggVersionedDIVStats pattern.
  • @VisibleForTesting accessors (hasLatencyMetricFor, latencyStoreCount, etc.) let tests pin the cleanup contract on handleStoreDeleted without exposing the internal maps.

Code changes

  • Added new code behind a config.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

New tests:

  • AggVersionedDaVinciRecordTransformerStatsOtelTest (9 tests): OTel histogram for put/delete latency, OTel counter for put/delete errors, operation dimension isolation, multi-store isolation (validates per-store entries stay distinct via the doAnswer Store mock so AbstractVeniceAggVersionedStats doesn't collapse them), handleStoreDeleted cleanup contract (asserts per-store maps go from 1 entry → 0 → 1 across record/delete/re-record), NPE prevention with OTel disabled and plain MetricsRepository.
  • DaVinciRecordTransformerOtelMetricEntityTest: metric entity validation for both metrics.
  • VeniceRecordTransformerOperationTest: dimension value validation (PUT/DELETE).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

Copilot AI review requested due to automatic review settings April 12, 2026 00:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenTelemetry (OTel) coverage for DaVinci record transformer operation stats so they’re available in OTel-based dashboards, aligning this component with the existing metric-entity + dimension patterns used across DaVinci/server stats.

Changes:

  • Introduces a new OTel dimension VENICE_RECORD_TRANSFORMER_OPERATION (PUT/DELETE) and its enum value type.
  • Adds two new OTel metric entities: record_transformer.latency (histogram) and record_transformer.error_count (counter), and wires them into AggVersionedDaVinciRecordTransformerStats.
  • Registers the new metric entity enum in server metric aggregation and adds unit tests validating dimension naming and metric entity definitions/recording.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/venice-client-common/src/main/java/com/linkedin/venice/stats/dimensions/VeniceMetricsDimensions.java Adds the new VENICE_RECORD_TRANSFORMER_OPERATION dimension definition.
internal/venice-client-common/src/main/java/com/linkedin/venice/stats/dimensions/VeniceRecordTransformerOperation.java Introduces PUT/DELETE dimension enum implementing VeniceDimensionInterface.
internal/venice-client-common/src/test/java/com/linkedin/venice/stats/dimensions/VeniceMetricsDimensionsTest.java Extends dimension-name format tests to cover the new dimension.
internal/venice-client-common/src/test/java/com/linkedin/venice/stats/dimensions/VeniceRecordTransformerOperationTest.java Validates dimension value mappings for PUT/DELETE.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/DaVinciRecordTransformerOtelMetricEntity.java Defines OTel metric entities for record transformer latency and error count.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/ServerMetricEntity.java Registers the new metric entity enum so it’s part of server metric aggregation.
clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/AggVersionedDaVinciRecordTransformerStats.java Records OTel histogram/counter at the call site with per-store cached metric state.
clients/da-vinci-client/src/test/java/com/linkedin/davinci/stats/ServerMetricEntityTest.java Updates expected aggregated metric entity count after adding new entities.
clients/da-vinci-client/src/test/java/com/linkedin/davinci/stats/DaVinciRecordTransformerOtelMetricEntityTest.java Verifies metric entity definitions (name/type/unit/dimensions) for the new entities.
clients/da-vinci-client/src/test/java/com/linkedin/davinci/stats/AggVersionedDaVinciRecordTransformerStatsOtelTest.java Adds OTel recording tests for latency/errors and validates operation-dimension isolation + NPE prevention.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 13, 2026 11:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@m-nagarajan m-nagarajan added the needs-reviewer Looking for a reviewer to pick this up label Apr 20, 2026
Add 2 OTel metrics for DaVinci record transformer:
- record_transformer.latency (HISTOGRAM with PUT/DELETE operation dimension)
- record_transformer.error_count (COUNTER with PUT/DELETE operation dimension)

Separate Tehuti+OTel API: Tehuti uses the Reporter layer (AsyncGauge polling),
OTel records at the call point in AggVersionedDaVinciRecordTransformerStats.
Per-store VeniceConcurrentHashMap maps for latency and error count metrics,
cleaned up in handleStoreDeleted.
- Add emitOtelMetrics guard to skip OTel recording when disabled
- Fix Javadoc: reference AggVersionedDaVinciRecordTransformerStats (not Stats
class)
- Merge getOrCreateLatencyMetric/getOrCreateErrorCountMetric into single
getOrCreateMetric
@m-nagarajan m-nagarajan force-pushed the mnagaraj/addOtelMetricsForAggVersionedDaVinciRecordTransformerStats branch from 1ff7662 to 2818990 Compare April 30, 2026 22:14
Copilot AI review requested due to automatic review settings April 30, 2026 22:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@sushantmane sushantmane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed with help from a few specialized review agents. Four actionable items below — two important (matching the established AggVersionedDIVStats short-circuit pattern, and a multi-store mock that silently collapses), two suggestions. The HISTOGRAM+COUNTER split, separate Tehuti+OTel API choice, and handleStoreDeleted cleanup all look correct.

Separate from the inline comments: the PR is currently CONFLICTING with main and will need a rebase before merge.

@sushantmane sushantmane added requested-changes Reviewer requested changes and removed needs-reviewer Looking for a reviewer to pick this up labels May 5, 2026
mnagaraj/addOtelMetricsForAggVersionedDaVinciRecordTransformerStats

# Conflicts:
#	clients/da-vinci-client/src/main/java/com/linkedin/davinci/stats/ServerMetricEntity.java
#	clients/da-vinci-client/src/test/java/com/linkedin/davinci/stats/ServerMetricEntityTest.java
#	internal/venice-client-common/src/main/java/com/linkedin/venice/stats/dimensions/VeniceMetricsDimensions.java
#	internal/venice-client-common/src/test/java/com/linkedin/venice/stats/dimensions/VeniceMetricsDimensionsTest.java
Copilot AI review requested due to automatic review settings May 6, 2026 05:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.

@m-nagarajan m-nagarajan added review-addressed Author has addressed review comments and removed requested-changes Reviewer requested changes labels May 6, 2026
Copy link
Copy Markdown
Contributor

@sushantmane sushantmane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sushantmane sushantmane added approved PR has been approved and removed review-addressed Author has addressed review comments labels May 6, 2026
@m-nagarajan m-nagarajan merged commit 55a09c7 into linkedin:main May 6, 2026
110 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved PR has been approved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants