Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces tracking for retry attempts within OpenTelemetry spans for BigQuery operations. It utilizes a ContextKey to store an AtomicInteger in the OpenTelemetry context, allowing HttpTracingRequestInitializer to increment and record the http.request.resend_count attribute on outgoing request spans. The PR also includes visibility adjustments for testing and adds new integration and unit tests to verify the telemetry data. One piece of feedback suggests removing a redundant null check in BigQueryRetryHelper to improve code clarity.
...query/google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/BigQueryRetryHelper.java
Outdated
Show resolved
Hide resolved
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
| if (attemptTracker != null) { | ||
| int attempt = attemptTracker.getAndIncrement(); | ||
| if (attempt > 0) { | ||
| span.setAttribute(HTTP_REQUEST_RESEND_COUNT, (long) attempt); |
There was a problem hiding this comment.
This logic looks good to me. However, is there a way to implement it without using OpenTelemetry Context so it can be reused for other things?
For example, metrics may also need this attribute in the future. We can call Context.current() as well if traces are enabled, but it is also possible that customers only enable metrics but not traces.
There was a problem hiding this comment.
I have 3 options for how to implement this:
-
Add a retry counter object per API request. This would require modifying all API calls, as currently retries are handled via static method call toBigQueryRetryHelper.runWithRetries(. This is not scalable and I would prefer not to implement this solution.
-
The solution proposed in this PR: use the OpenTelemetry Context (which uses its own ThreadLocal object for storage)
-
create our own ThreadLocal object in BigQueryRetryHelper that stores this value that can be also accessed by metrics.
Let me know if you are okay with implementing choice no. 3
cc @lqiu96 , in case you have some input.
There was a problem hiding this comment.
I'm not too familiar with Otel's Context so I wouldn't know when/ when not to use it. If there are concerns with it, then we don't have to go this route.
Option 3 isn't my favorite (using static methods and thread locals to track retries), but to Blake's point, it can be re-used for both metrics and traces (assuming that Context is a trace specific thing).
Since metrics are in scope, a route forward can be to use Option 3 and then migrate to Option 1 in the future. WDYT?
This adds the http.request.resend_count to span tracing in the case of a retry. We achieve this by adding a context to the parent span that tracks all retries. This remains at null for the initial retry and then gets incremented for all subsequent retries.
This PR also introduces integration tests to fully validate the resent attribute.
example trace with resent attribute set