Skip to content

test(format): expand date-time suite to 132 exhaustive RFC 3339 cases#891

Open
AcEKaycgR wants to merge 1 commit into
json-schema-org:mainfrom
AcEKaycgR:exhaustive-date-time-suite
Open

test(format): expand date-time suite to 132 exhaustive RFC 3339 cases#891
AcEKaycgR wants to merge 1 commit into
json-schema-org:mainfrom
AcEKaycgR:exhaustive-date-time-suite

Conversation

@AcEKaycgR
Copy link
Copy Markdown
Contributor

Addresses #965. This PR expands the date-time format test suite from 29 to 132 cases across all drafts.The expansion is designed to provide complete coverage of RFC 3339 §5.6 ABNF and §5.7 prose constraints, serving as a standalone compliance target for implementors.

Technical Coverage Details

The new cases cover every structural dimension of the grammar:

  • Separator validation: T absent, space/tab/duplicate instead of T; colons absent/replaced/duplicated in time; hyphens absent/replaced in date.
  • Offset forms: Z-only, numeric offsets (+/-HH:MM), missing offset, offset without colon (+0000), Z followed by numeric offset, offset sign variants (±, ++, +-, -+), non-padded and 3-digit offset fields, and non-ASCII digits.
  • Year boundaries: 3-digit, 5-digit, characters just outside ASCII digit range (/ and :), non-ASCII Bengali digits, negative prefixes, and year 0000 (valid) and 9999 (valid) limits.
  • Month/Day logic: Full Gregorian calendar coverage—including non-century leap (1996), century/400 leap (2000), century non-leap (1900), and year-0000 leap—month-specific maximums (Jan 31/Apr 30/Feb 28), and day 00 rejection.
  • Time & Precision: Floor/ceiling valid values, non-padded fields, and 3-digit hour/minute/second fields; secfrac validation covering 1-digit, 19-digit (arbitrary precision), all-zeros, and non-ASCII mid-frac characters.
  • Leap seconds: Confirmed 2016-12-31 IERS date and offset scenarios that produce incorrect UTC.
  • Whitespace & Composites: Rejection of empty strings, leading/trailing spaces, tab separators, and trailing garbage.

Standards & Traceability

Following the style established in the ipv4.json suite[cite: 41], this PR adds a top-level comment field citing the full RFC 3339 §5.6 ABNF and §5.7 prose rules. Additionally, it corrects one existing description: "an invalid date-time past leap second, UTC" is updated to "second 61 is above the absolute maximum of 60".

Triangulation Results

The suite was validated against the following implementations:

  • Ajv (v8 + ajv-formats): 129 Pass / 3 Mismatch
  • Python jsonschema: 126 Pass / 6 Mismatch
  • validator.js: 118 Pass / 14 Mismatch

These mismatches represent implementation differences from strict RFC 3339 compliance rather than errors in the test vectors.

Feedback on the coverage matrix and test structure is appreciated @jviotti @jdesrosiers

@AcEKaycgR AcEKaycgR requested a review from a team as a code owner April 11, 2026 03:30
@karenetheridge
Copy link
Copy Markdown
Member

Please respect the existing formatting of these tests and avoid making unnecessary whitespace changes.

Comment thread tests/draft2019-09/optional/format/date-time.json Outdated
Comment thread tests/draft2019-09/optional/format/date-time.json Outdated
@AcEKaycgR AcEKaycgR force-pushed the exhaustive-date-time-suite branch from c20e88b to 339a28e Compare April 11, 2026 17:33
@AcEKaycgR
Copy link
Copy Markdown
Contributor Author

AcEKaycgR commented Apr 11, 2026

Thanks for the review, @karenetheridge.

All three points have been addressed in the latest commit:

  • Comment: Trimmed down to a concise prose summary; no more full ABNF quote inline.
  • Year 9999 tests: Removed both the "year 9999 is valid" case and the "maximum boundary" composite that also used 9999, for the same timezone/DST reason.
  • Indentation: Restored to 4 spaces to match the rest of the suite. For future PRs ,I’ll make sure to respect and match the existing file formatting rather than reformatting on write.

@jviotti
Copy link
Copy Markdown
Member

jviotti commented May 6, 2026

Let me try to post some comments here, as GitHub is refusing to let me comment on specific diffs (internal errors)

  • On the last §5.2 mention for negative 00:00, shouldn't that be https://www.rfc-editor.org/rfc/rfc3339#section-4.3?
  • The "standard valid date-time" one seems a duplicate of "a valid date-time string"
  • Some tests say "rejects before reaching", not sure I understand what that means? Maybe a better description needed?
  • Let's preserve ehe old one-line form "schema": { "format": "date-time" }. No reason to expand it here?

@jviotti
Copy link
Copy Markdown
Member

jviotti commented May 6, 2026

Other than that, I can't find any case that is not RFC 3339 compliant. It's an interesting one, as we check RFC 3339, which is stricter than ISO 8601, potentially tripping up some implementations.

@AcEKaycgR AcEKaycgR force-pushed the exhaustive-date-time-suite branch from 339a28e to 9e3f971 Compare May 6, 2026 16:17
@AcEKaycgR
Copy link
Copy Markdown
Contributor Author

Thanks for the review, @jviotti. I’ve pushed the updates to address your feedback:

  • RFC Reference: Updated to §4.3 for -00:00.
  • Case Cleanup: Removed the duplicate "standard valid date-time" case.
  • Descriptions: Reworded "rejects before reaching" to explicitly mention invalid months. (I originally used the former to track the parsing sequence, but agreed that naming the error cause is clearer).
  • Schema Formatting: Restored the one-line "schema": { "format": "date-time" } format. (I had expanded these to match the later drafts, but the concise version is better for the standard suite).

I'll also keep the note on RFC 3339 vs. ISO 8601 strictness in mind ,keeping the tests strict will definitely help surface implementation gaps.

Copy link
Copy Markdown
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the third PR I've looked at today that's too large to review properly. I ran the tests against my implementation and didn't find any issues, but in cases like this I question the value of these tests. Are all of these tests really providing value? It's hard to say.

I'm approving because I didn't find anything wrong, but I'm anxious about pressing the merge button.

@jviotti
Copy link
Copy Markdown
Member

jviotti commented May 6, 2026

I ran the tests against my implementation and didn't find any issues, but in cases like this I question the value of these tests. Are all of these tests really providing value? It's hard to say.

I think there is bias in that. We from the TSC read these things more carefully than other people and our implementations are likely correct or close to correct. But not every implementor is in the TSC. For date-time, I do remember seeing various implementations going the ISO side, for instance. I think plenty of tests are great for ensuring we all indeed comply to the same thing.

but I'm anxious about pressing the merge button.

If anything is off, we will realise when running more implementations on it. If any, at least we get to discuss these things. The current suites are not very thorough and I bet there is divergence on these things already, that we would never know about it.

@jdesrosiers
Copy link
Copy Markdown
Member

Lots of tests are great and we should have tests to make sure implementations aren't just using ISO validators. My concern is the quality of the tests. Are they covering really covering unique and valuable cases? I don't have the bandwidth to review that many tests in detail to verify quality. It could be the case that 100 of them are testing the a slightly different variation of the same edge case and we could get the same value out of 32 new tests. I don't know because I don't have all day to sit with the spec and go through every test in detail. Every test we add incurs a maintenance cost and I don't want the test suite to get unmaintainable because we keep adding a bunch of tests without sufficient care.

@jviotti
Copy link
Copy Markdown
Member

jviotti commented May 7, 2026

Yeah, I get your point too. We probably need to decide on a good strategy for the GSoC Format Assertion project in general. Some thoughts and looking forward to your input:

  1. We could prioritise tests that we prove break at least one implementation. The caveat here is that many implementations don't support format-assertion because our suite is a bit loose, so its a bit of a chicken and egg problem
  2. We could do passes over the tests added in every PR to make sure they don't test the same thing or very similar thing twice. Might help a bit with the size of the PRs
  3. We could be more liberal with accepting tests where our review agrees they seem compliant, so we can eventually get closer to (1). Then if anybody (including us) catch something odd, at least we surface it to discuss it
  4. We can try to submit only up to N new tests on each PR to make it easier to review, though if many PRs come in, we might get into git conflict hell

Any thoughts?

@jdesrosiers
Copy link
Copy Markdown
Member

  1. We could prioritise tests that we prove break at least one implementation.

👍

  1. We could do passes over the tests added in every PR to make sure they don't test the same thing or very similar thing twice.

That's the part that's hard to do at this scale. Ideally, it's trimmed down before it gets to us. Maybe we can use AI to flag issues. I don't imagine it's going to do very well at that task, but it might help a little.

  1. We could be more liberal with accepting tests where our review agrees they seem compliant, so we can eventually get closer to (1).

I hope it doesn't come to that, but that might be what we have to do for the format tests.

  1. We can try to submit only up to N new tests on each PR to make it easier to review, though if many PRs come in, we might get into git conflict hell

I was thinking more like only allowing one category of tests to be added in each PR. For example, this PR indicated seven categories of tests they added. That could be seven separate PRs, which would each be easier to review not just because they're smaller, but because they're related.

Yes, it could increase git conflicts. But, I think it's worth a try and we can always try something else if it gets to be too big a problem. I'm pretty comfortable with dealing with merge conflicts, but I know not everyone is.

@jviotti
Copy link
Copy Markdown
Member

jviotti commented May 8, 2026

I was thinking more like only allowing one category of tests to be added in each PR. For example, this PR indicated seven categories of tests they added. That could be seven separate PRs, which would each be easier to review not just because they're smaller, but because they're related.

I like this approach! cc @AcEKaycgR @vtushar06

@AcEKaycgR
Copy link
Copy Markdown
Contributor Author

@jdesrosiers @jviotti I understand and completely agree with the category-based approach as well ,It will make the reviews much more focused and help ensure every case provides unique value. I’ll follow this for all future formats and submit them in smaller, related batches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants