Skip to content

fix: measure latencySecond with monotonic high-resolution clock#200

Open
cre8ivejp wants to merge 7 commits into
masterfrom
fix-invalid-duration
Open

fix: measure latencySecond with monotonic high-resolution clock#200
cre8ivejp wants to merge 7 commits into
masterfrom
fix-invalid-duration

Conversation

@cre8ivejp
Copy link
Copy Markdown
Member

Summary

Replace Date.now()-based latency measurement with process.hrtime.bigint() so that sub-millisecond operations no longer report latencySecond: 0 and get rejected by the backend with duration is nil and latencySecond is 0: gateway: metrics event has invalid duration.

Why

The Node SDK measured latency as:

const startTime = Date.now();
// ... operation ...
const second = (Date.now() - startTime) / 1000;
this.eventEmitter.emit('pushLatencyMetricsEvent', { latency: second, apiId });

Date.now() is wall-clock and integer-millisecond resolution, so for any operation that completes in less than 1 ms the diff is 0 and the SDK ships latencySecond: 0.0. The backend's LatencyMetricsEvent validation correctly flags that as invalid (proto3 double has no field-presence, so 0 is indistinguishable from "unset"):

// pkg/api/api/metrics_event.go
if ev.Duration == nil && ev.LatencySecond == 0 {
    return fmt.Errorf("duration is nil and latencySecond is 0: %w", MetricsSaveErrInvalidDuration)
}

This was reproduced empirically: a 100 000-iteration sweep showed Date.now() - Date.now() returning 0 in 99 997 / 100 000 consecutive calls, and the exact wire payload shipped to the backend was {"apiId":4,"latencySecond":0,"labels":{"tag":"nodejs"},"@type":".../LatencyMetricsEvent"}. The hottest source of these in production is getEvaluationLocally (in-memory evaluation typically completes in < 1 ms), with the cache processors as a secondary source.

The same issue affected the Android client SDK (System.currentTimeMillis()); fixed in a separate PR.

The other Bucketeer SDKs are unaffected because they already use higher-resolution monotonic timers — iOS uses Date().timeIntervalSince(...) (sub-µs Double), Go uses time.Since(...) (ns).

What changed

  • src/utils/time.ts: add latencyStart() and latencySecondsSince(start) helpers backed by process.hrtime.bigint() (monotonic, sub-microsecond resolution; subtraction is performed in bigint first to preserve precision on long-running processes).
  • src/client.ts: switch getEvaluationRemotely and getEvaluationLocally to the new helpers.
  • src/cache/processor/featureFlagCacheProcessor.ts, src/cache/processor/segmentUsersCacheProcessor.ts: switch their gRPC-call latency timers to the new helpers. The clock option is left in place for source compatibility but is no longer used to time latency.
  • src/__tests__/time.ts: new unit tests, including a regression test that asserts latencySecondsSince(...) > 0 for an awaited microtask (the exact pattern of getEvaluationLocally) across 100 iterations, plus a sub-millisecond-resolution test that the old Date.now() clock would have been physically incapable of satisfying.
  • src/__tests__/cache/processor/{featureCache,segementUsersCache}/polling.ts: the existing tests injected a mock clock.getTime() returning hand-picked values to assert specific latency: 3.21 etc. Since latency now comes from process.hrtime.bigint() (un-mockable), assertions were tightened to "pushLatencyMetricsEvent is emitted thrice with a numeric latency" — the meaningful invariant that survives the fix.

No backend or proto changes are required.

Test plan

  • make test — 387/387 passing, including the 6 new time › … tests and the two updated polling tests.
  • make type-check — passes (tsc --noEmit over tsconfig.json, tsconfig.test.json, e2e/tsconfig.json).

Signed-off-by: Alessandro Yuichi Okimoto <yuichijpn@gmail.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Node SDK latency measurement to use a monotonic, high-resolution clock so sub-millisecond operations no longer emit latencySecond: 0 and get rejected by the backend.

Changes:

  • Added latencyStart() / latencySecondsSince() helpers backed by process.hrtime.bigint().
  • Switched client and cache processors to measure latency using the new helpers instead of Date.now() / injected clock.getTime().
  • Added/updated unit tests to cover the new helpers and adjusted polling tests to avoid asserting exact latency values.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/utils/time.ts Introduces monotonic, high-resolution latency timing helpers.
src/client.ts Uses new helpers for remote/local evaluation latency metrics.
src/cache/processor/featureFlagCacheProcessor.ts Uses new helpers for gRPC call latency metrics.
src/cache/processor/segmentUsersCacheProcessor.ts Uses new helpers for gRPC call latency metrics.
src/__tests__/time.ts Adds unit/regression tests for the new timing helpers.
src/__tests__/cache/processor/featureCache/polling.ts Updates polling assertions to accept real elapsed latency.
src/__tests__/cache/processor/segementUsersCache/polling.ts Updates polling assertions to accept real elapsed latency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/__tests__/time.ts Outdated
Comment thread src/__tests__/time.ts
Comment thread src/__tests__/cache/processor/featureCache/polling.ts
Comment thread src/__tests__/cache/processor/segementUsersCache/polling.ts Outdated
also revert recent changes in cache polling and processors
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment thread src/__tests__/time.ts Outdated
@duyhungtnn duyhungtnn marked this pull request as draft May 9, 2026 11:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Comment thread src/__tests__/time.ts
Comment thread src/__tests__/clock.ts
Comment thread src/__tests__/cache/processor/featureCache/polling.ts
Comment thread src/__tests__/cache/processor/segementUsersCache/polling.ts
Comment thread src/cache/processor/featureFlagCacheProcessor.ts
Comment thread src/cache/processor/segmentUsersCacheProcessor.ts
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment thread src/__tests__/time.ts
@duyhungtnn
Copy link
Copy Markdown
Collaborator

duyhungtnn commented May 9, 2026

I reverted the changes to route latency timing through an abstract Clock class (which internally delegates to process.hrtime.bigint()).
The Clock abstraction was already used previously.
It helps make unit tests easier to set up — by stubbing latencyStart/latencySecondsSince directly on the clock instance, we avoid patching process.hrtime.bigint at the module level, which is a common source of flaky tests.

Also included in the latest commit:

  • Restored the latency test that was accidentally removed in the previous commit. (00da8ad)
  • Fixed a flaky test. (064729d)

Clock Wrapper vs. Direct Helper Calls

Background

Issue Clock instances were replaced with direct calls to the latencyStart / latencySecondsSince helpers. This removed the injectable seam and caused several tests to silently lose their timing control — some tests were missing, others became coupled to process.hrtime internals.

This note explains why Clock must be kept and why the helpers alone are not a substitute.

The problem with calling helpers directly

latencyStart() and latencySecondsSince() in time.ts are statically imported free functions. If a component calls them directly:

// Not injectable — no seam for tests
import { latencyStart, latencySecondsSince } from '../utils/time';

class FeatureFlagCacheProcessor {
  measure() {
    const start = latencyStart();
    // ... work ...
    return latencySecondsSince(start);
  }
}

There is no way for a test to substitute a fake timer without reaching all the way down to process.hrtime.bigint — the wrong layer to be testing at.

Why not just stub the module with sinon?

sinon.stub(timeUtils, 'latencySecondsSince') looks like an easy escape hatch, but it has meaningful drawbacks when used in processor/component tests:

Scope leak risk — module stubbing patches a global object for the entire process durante that test. A missed restore() or concurrent test run causes the stub to bleed into unrelated tests. An injected Clock only affects the one instance under test.

ESM incompatibility — native ESM live bindings are read-only. sinon.stub on an ESM-exported function may silently fail or throw. Constructor injection works regardless of module format.

Hidden dependency — with injection the dependency is declared in the type signature:

constructor({ clock }: { clock: Clock })

Anyone reading the class knows it needs a clock. With module stubbing, the dependency is invisible until you read the full implementation.

Module stubbing is appropriate for testing Clock itself (verifying it delegates to the helpers). It is not a substitute for injection in the components that use Clock.

Why Clock exists

Clock is a thin injectable wrapper. Every component that needs timing receives it through its constructor:

// Injectable — tests control the clock
constructor({ clock, ... }: { clock: Clock; ... }) {
  this.clock = clock;
}

Used in: featureFlagCacheProcessor.ts, segmentUsersCacheProcessor.ts, client.ts, evaluator.ts.

Tests for those components pass a real Clock or a stub — without touching process or module internals at all.

Test layer ownership

Layer Stubs
time.ts helpers process.hrtime.bigint
Clock class timeUtils.latencyStart / timeUtils.latencySecondsSince
Processors / Client / Evaluator a real or mocked Clock instance

Each layer only stubs its direct dependency. Skipping Clock collapses two layers into one and breaks this contract.

Clock has no logic of its own. Its sole purpose is the injectable seam — removing it in favour of direct helper calls removes testability, not complexity.

@duyhungtnn duyhungtnn marked this pull request as ready for review May 10, 2026 00:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.

Comment thread src/client.ts
Comment thread src/utils/clock.ts
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated no new comments.

chore(test): add client latency test

chore: inject clock to all client tests
@duyhungtnn duyhungtnn force-pushed the fix-invalid-duration branch from 046cabf to 91519f7 Compare May 14, 2026 07:29
@duyhungtnn
Copy link
Copy Markdown
Collaborator

duyhungtnn commented May 14, 2026

@cre8ivejp I updated client.ts to use Clock and added a few more tests. 91519f7

please help me to take a look

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants