Skip to content

fix: normalize rollout timestamps before deriving started_at/ended_at#556

Open
eric-tramel wants to merge 1 commit intomainfrom
worktree-issue-497-timestamp-normalization
Open

fix: normalize rollout timestamps before deriving started_at/ended_at#556
eric-tramel wants to merge 1 commit intomainfrom
worktree-issue-497-timestamp-normalization

Conversation

@eric-tramel
Copy link
Copy Markdown
Contributor

📋 Summary

Parse ISO 8601 timestamps before comparing them in the Claude Code, Codex, and ATIF rollout ingesters. Raw string min()/max() can order chronologically-earlier timestamps as later when offsets or precisions differ (e.g. 2025-01-01T00:30:00+01:00 vs 2025-01-01T00:00:00Z), which could produce incorrect started_at / ended_at on otherwise valid rollouts.

🔗 Related Issue

Fixes #497

🔄 Changes

  • Add min_max_timestamps() + parse_iso8601() helpers in agent_rollout/utils.py. Naive timestamps are treated as UTC; unparseable values are silently skipped (matching the existing tolerance of coerce_optional_str). Winners are returned in their original string form, so the output contract is unchanged.
  • Swap min(timestamps) / max(timestamps) for the helper in claude_code.py, codex.py, and atif.py. Codex's session_meta["timestamp"] short-circuit for started_at is preserved.
  • Add a parameterized unit test covering the four interesting cases: mixed offsets, mixed precisions, naive-vs-aware, and unparseable-skipping.

🧪 Testing

  • make test passes (engine suite, 1867 tests; pre-existing interface failures unrelated to this change)
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Claude Code, Codex, and ATIF ingesters called min()/max() on raw
timestamp strings. That assumes lexicographic order matches chronological
order, which breaks for ISO 8601 timestamps with mixed UTC offsets or
precisions (e.g. 2025-01-01T00:30:00+01:00 is earlier than
2025-01-01T00:00:00Z but sorts later as a string).

Introduce min_max_timestamps() in the shared rollout utils that parses
each value as ISO 8601 (naive values treated as UTC, unparseable values
skipped) and picks the chronological extremes, returning them in their
original string form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel eric-tramel requested a review from a team as a code owner April 17, 2026 14:13
@github-actions
Copy link
Copy Markdown
Contributor

Code Review: PR #556 — fix: normalize rollout timestamps before deriving started_at/ended_at

Summary

This PR fixes a correctness bug in the agent rollout ingesters (Claude Code, Codex, ATIF) where started_at and ended_at were derived by calling min()/max() on raw ISO 8601 timestamp strings. Lexicographic comparison produces wrong results when timestamps have differing UTC offsets or sub-second precisions — e.g. "2025-01-01T00:30:00+01:00" sorts after "2025-01-01T00:00:00Z" as a string, but is chronologically 30 minutes earlier.

The fix introduces two helpers in utils.pyparse_iso8601() and min_max_timestamps() — that parse timestamps into datetime objects before comparison, then return the winners in their original string form so the output contract is unchanged. All three ingesters are updated to use the new helper. A parameterized test covers the key edge cases.

Files changed: 5 (3 ingesters, 1 utility module, 1 new test file)
Lines: +88 / -7

Findings

Correctness

  • Fix is sound. Parsing to datetime before comparison correctly handles mixed offsets, mixed precisions, and naive-vs-aware timestamps. The Z+00:00 replacement before fromisoformat() is the standard workaround for Python < 3.11 compatibility and is safe here since Z only appears as a trailing timezone designator in ISO 8601.
  • Output contract preserved. min_max_timestamps() returns the original string values, not stringified datetime objects, so downstream consumers are unaffected.
  • Codex short-circuit preserved. In codex.py, the session_meta.get("timestamp") fallback for started_at is correctly maintained (started_at=coerce_optional_str(session_meta.get("timestamp")) or earliest).

Design

  • Centralised helper is the right call. The same parsing logic was duplicated across three files; moving it to utils.py eliminates the duplication cleanly.
  • Silent skip of unparseable values is consistent. The PR description notes this mirrors the existing tolerance of coerce_optional_str. If all timestamps are unparseable, (None, None) is returned, which matches the previous behavior of an empty list producing None. No log warning is emitted for skipped values — this is a minor observation, not a blocker, since the ingestion layer already tolerates missing timestamps.

Style & Conventions

  • from __future__ import annotations present in both new/modified files — compliant.
  • SPDX license header present in the new test file.
  • Absolute imports only — no relative imports introduced.
  • Type annotations on all new functions and variables.
  • Naming: earliest/latest in codex.py avoids shadowing and reads naturally.

Tests

  • Coverage is good. The parameterized test (test_utils.py) covers 6 cases:
    1. Empty list → (None, None)
    2. Mixed offsets where lexicographic ordering disagrees with chronological
    3. Mixed sub-second precision
    4. Naive timestamp treated as UTC compared against aware
    5. Unparseable values skipped, valid ones retained
    6. Only unparseable values → (None, None)
  • No negative-offset test case. A case like "2025-01-01T00:00:00-05:00" vs "2025-01-01T04:00:00Z" (same instant, different representation) would exercise negative offsets, but this is not a gap since fromisoformat handles negative offsets identically to positive ones — nice-to-have at most.

Potential Concerns (non-blocking)

  1. value.replace("Z", "+00:00") — This replaces all occurrences of the character Z in the string, not just a trailing one. In practice, ISO 8601 timestamps won't contain a capital Z anywhere except as the UTC designator at the end, so this is safe. A more defensive approach would be value[:-1] + "+00:00" if value.endswith("Z") else value, but the current approach matches widespread convention and is fine.

  2. No logging on unparseable skip — If a rollout file contains garbled timestamps, the helper silently discards them. For debugging production ingestion issues, a logger.debug on skip could be useful, but this is a future enhancement, not a requirement for this fix.

Verdict

Approve. This is a clean, well-scoped bug fix with good test coverage. The implementation is correct, follows project conventions, and doesn't introduce unnecessary complexity. No blocking issues found.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 17, 2026

Greptile Summary

This PR replaces raw string min()/max() comparisons on ISO 8601 timestamp lists with a proper chronological sort in the Claude Code, Codex, and ATIF rollout ingesters. The new min_max_timestamps() / parse_iso8601() helpers in utils.py parse timestamps before comparing, treat naive timestamps as UTC, skip unparseable values, and return the winning entries in their original string form — preserving the downstream output contract while fixing the ordering bug.

Confidence Score: 5/5

Safe to merge — the fix is correct, well-tested, and output contract is unchanged.

All three ingesters are updated consistently, parse_iso8601 correctly normalises naive/aware datetimes, the Z-suffix workaround is cross-version compatible, and the test suite covers all advertised edge cases. No P0/P1 findings.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/utils.py Adds parse_iso8601() and min_max_timestamps() helpers; Z→+00:00 normalisation is correct, naive timestamps are coerced to UTC, unparseable values return None.
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/atif.py Replaces min/max string comparisons with min_max_timestamps(); change is minimal and correct.
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/claude_code.py Replaces min/max string comparisons with min_max_timestamps(); change is minimal and correct.
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/codex.py Replaces min/max string comparisons with min_max_timestamps(); session_meta["timestamp"] short-circuit for started_at is correctly preserved.
packages/data-designer-engine/tests/engine/resources/agent_rollout/test_utils.py New parameterised test covering empty list, mixed UTC offsets, mixed precision, naive-vs-aware, unparseable-skip, and all-unparseable cases — all correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["timestamps list[str]"] --> B{for each timestamp}
    B --> C["parse_iso8601(value)"]
    C --> D{ValueError?}
    D -- Yes --> E["skip - return None"]
    D -- No --> F{tzinfo present?}
    F -- No --> G["add UTC tzinfo"]
    F -- Yes --> H["keep as-is"]
    G --> I["append datetime + original string"]
    H --> I
    I --> B
    B --> J{list empty?}
    J -- Yes --> K["return None, None"]
    J -- No --> L["min by datetime - earliest string"]
    J -- No --> M["max by datetime - latest string"]
    L --> N["return earliest, latest"]
    M --> N
    N --> O["started_at / ended_at in NormalizedAgentRolloutRecord"]
Loading

Reviews (1): Last reviewed commit: "fix: normalize rollout timestamps before..." | Re-trigger Greptile

@andreatgretel
Copy link
Copy Markdown
Contributor

packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/pi_coding_agent.py:203-204

started_at=min(timestamps) if timestamps else None,
ended_at=max(timestamps) if timestamps else None,

looks like this file has the same min(timestamps) / max(timestamps) pattern that was fixed in the other three ingesters - worth including in this PR since the helper already exists?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Normalize rollout timestamps before deriving started_at/ended_at

2 participants