fix: measure latencySecond with monotonic high-resolution clock#200
fix: measure latencySecond with monotonic high-resolution clock#200cre8ivejp wants to merge 7 commits into
Conversation
Signed-off-by: Alessandro Yuichi Okimoto <yuichijpn@gmail.com>
There was a problem hiding this comment.
Pull request overview
Updates Node SDK latency measurement to use a monotonic, high-resolution clock so sub-millisecond operations no longer emit latencySecond: 0 and get rejected by the backend.
Changes:
- Added
latencyStart()/latencySecondsSince()helpers backed byprocess.hrtime.bigint(). - Switched client and cache processors to measure latency using the new helpers instead of
Date.now()/ injectedclock.getTime(). - Added/updated unit tests to cover the new helpers and adjusted polling tests to avoid asserting exact latency values.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/utils/time.ts |
Introduces monotonic, high-resolution latency timing helpers. |
src/client.ts |
Uses new helpers for remote/local evaluation latency metrics. |
src/cache/processor/featureFlagCacheProcessor.ts |
Uses new helpers for gRPC call latency metrics. |
src/cache/processor/segmentUsersCacheProcessor.ts |
Uses new helpers for gRPC call latency metrics. |
src/__tests__/time.ts |
Adds unit/regression tests for the new timing helpers. |
src/__tests__/cache/processor/featureCache/polling.ts |
Updates polling assertions to accept real elapsed latency. |
src/__tests__/cache/processor/segementUsersCache/polling.ts |
Updates polling assertions to accept real elapsed latency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
also revert recent changes in cache polling and processors
|
I reverted the changes to route latency timing through an abstract Clock class (which internally delegates to process.hrtime.bigint()). Also included in the latest commit:
Clock Wrapper vs. Direct Helper CallsBackgroundIssue This note explains why The problem with calling helpers directly
// Not injectable — no seam for tests
import { latencyStart, latencySecondsSince } from '../utils/time';
class FeatureFlagCacheProcessor {
measure() {
const start = latencyStart();
// ... work ...
return latencySecondsSince(start);
}
}There is no way for a test to substitute a fake timer without reaching all the way down to Why not just stub the module with sinon?
Scope leak risk — module stubbing patches a global object for the entire process durante that test. A missed ESM incompatibility — native ESM live bindings are read-only. Hidden dependency — with injection the dependency is declared in the type signature: constructor({ clock }: { clock: Clock })Anyone reading the class knows it needs a clock. With module stubbing, the dependency is invisible until you read the full implementation. Module stubbing is appropriate for testing Why Clock exists
// Injectable — tests control the clock
constructor({ clock, ... }: { clock: Clock; ... }) {
this.clock = clock;
}Used in: Tests for those components pass a real Test layer ownership
Each layer only stubs its direct dependency. Skipping
|
chore(test): add client latency test chore: inject clock to all client tests
046cabf to
91519f7
Compare
|
@cre8ivejp I updated please help me to take a look |
Summary
Replace
Date.now()-based latency measurement withprocess.hrtime.bigint()so that sub-millisecond operations no longer reportlatencySecond: 0and get rejected by the backend withduration is nil and latencySecond is 0: gateway: metrics event has invalid duration.Why
The Node SDK measured latency as:
Date.now()is wall-clock and integer-millisecond resolution, so for any operation that completes in less than 1 ms the diff is0and the SDK shipslatencySecond: 0.0. The backend'sLatencyMetricsEventvalidation correctly flags that as invalid (proto3doublehas no field-presence, so0is indistinguishable from "unset"):This was reproduced empirically: a 100 000-iteration sweep showed
Date.now() - Date.now()returning0in 99 997 / 100 000 consecutive calls, and the exact wire payload shipped to the backend was{"apiId":4,"latencySecond":0,"labels":{"tag":"nodejs"},"@type":".../LatencyMetricsEvent"}. The hottest source of these in production isgetEvaluationLocally(in-memory evaluation typically completes in < 1 ms), with the cache processors as a secondary source.The same issue affected the Android client SDK (
System.currentTimeMillis()); fixed in a separate PR.The other Bucketeer SDKs are unaffected because they already use higher-resolution monotonic timers — iOS uses
Date().timeIntervalSince(...)(sub-µsDouble), Go usestime.Since(...)(ns).What changed
src/utils/time.ts: addlatencyStart()andlatencySecondsSince(start)helpers backed byprocess.hrtime.bigint()(monotonic, sub-microsecond resolution; subtraction is performed inbigintfirst to preserve precision on long-running processes).src/client.ts: switchgetEvaluationRemotelyandgetEvaluationLocallyto the new helpers.src/cache/processor/featureFlagCacheProcessor.ts,src/cache/processor/segmentUsersCacheProcessor.ts: switch their gRPC-call latency timers to the new helpers. Theclockoption is left in place for source compatibility but is no longer used to time latency.src/__tests__/time.ts: new unit tests, including a regression test that assertslatencySecondsSince(...) > 0for an awaited microtask (the exact pattern ofgetEvaluationLocally) across 100 iterations, plus a sub-millisecond-resolution test that the oldDate.now()clock would have been physically incapable of satisfying.src/__tests__/cache/processor/{featureCache,segementUsersCache}/polling.ts: the existing tests injected a mockclock.getTime()returning hand-picked values to assert specificlatency: 3.21etc. Since latency now comes fromprocess.hrtime.bigint()(un-mockable), assertions were tightened to "pushLatencyMetricsEventis emitted thrice with a numericlatency" — the meaningful invariant that survives the fix.No backend or proto changes are required.
Test plan
make test— 387/387 passing, including the 6 newtime › …tests and the two updated polling tests.make type-check— passes (tsc --noEmitovertsconfig.json,tsconfig.test.json,e2e/tsconfig.json).