Skip to content

Expand email format tests#893

Open
vtushar06 wants to merge 10 commits into
json-schema-org:mainfrom
vtushar06:expand-email-format-tests
Open

Expand email format tests#893
vtushar06 wants to merge 10 commits into
json-schema-org:mainfrom
vtushar06:expand-email-format-tests

Conversation

@vtushar06
Copy link
Copy Markdown
Contributor

@vtushar06 vtushar06 commented Apr 13, 2026

The existing email format files have between 11 and 27 tests per draft. I went through RFC 5321 §4.1.2 and §4.1.3 and added 148 new tests per draft covering the parts that were missing.

Coverage

  • Quoted-string local parts: "test"@iana.org, ""@iana.org, "\""@iana.org, "\\"@iana.org, " "@iana.org - RFC 5321 §4.1.2 defines Local-part = Dot-string / Quoted-string so both forms are valid. Also tests the tricky @ inside quoted strings like "a@b"@iana.org where the mailbox separator is the unquoted @.
  • Address literals: IPv4 ([255.255.255.255]), full and compressed IPv6, IPv6v4 hybrid forms ([IPv6:...:255.255.255.255]), and the invalid cases - wrong octet count, missing IPv6: prefix, empty brackets, leading space inside brackets.
  • Domain forms: Single-label domains (test@io), all-digit labels (test@iana.123), dotted-quad without brackets (test@255.255.255.255), consecutive hyphens, hyphen at label boundaries.
  • Local-part boundaries: Leading/trailing/consecutive dots, multiple unquoted @ signs, mixed dot-string and quoted-string, brackets in local-part position.
  • Character boundaries: All atext specials including backtick (```), control characters (TAB, BEL, NUL, DEL, SOH, US, CR, LF), non-ASCII bytes - RFC 5321 §4.1.2 lines 2349-2351 has a MUST NOT for anything outside 7-bit ASCII printable range.
  • RFC 5322 FWS/comments: Parenthesized comments and folding whitespace sequences before/after the address - RFC 5321 §4.1.2 Mailbox grammar has none of this.

One thing flagged for discussion: a@[extn:test] syntactically matches General-address-literal in §4.1.3 but the MUST clause requires IANA registration. Currently marked invalid - been discussing with @jdesrosiers and @jviotti and happy to go either way based on consensus here.

Draft Coverage

  • draft3 -11 to 158
  • draft4 - 20 to 167
  • draft6 - 20 to 167
  • draft7 - 20 to 167
  • draft2019-09 - 20 to 167
  • draft2020-12 - 27 to 174
  • v1 - 27 to 174

Copilot AI review requested due to automatic review settings April 13, 2026 18:47
@vtushar06 vtushar06 requested a review from a team as a code owner April 13, 2026 18:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds substantial new JSON Schema email-format test coverage across all supported drafts to better exercise RFC 5321-valid cases (notably quoted-string local parts and address literals) and common edge/invalid cases.

Changes:

  • Expands each draft’s email format test suite with a large set of new valid/invalid examples.
  • Adds coverage for quoted-string local parts, IPv4/IPv6 address literals (incl. IPv6v4 hybrids), single-label domains, dotted-quad domains, and control/FWS/comment edge cases.
  • Keeps the new cases largely consistent across drafts/v1 to make cross-implementation comparisons easier.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/v1/format/email.json Adds expanded email-format test corpus for v1.
tests/draft7/optional/format/email.json Adds expanded email-format test corpus for draft7 optional tests.
tests/draft6/optional/format/email.json Adds expanded email-format test corpus for draft6 optional tests.
tests/draft4/optional/format/email.json Adds expanded email-format test corpus for draft4 optional tests.
tests/draft3/optional/format/email.json Adds expanded email-format test corpus for draft3 optional tests.
tests/draft2020-12/optional/format/email.json Adds expanded email-format test corpus for 2020-12 optional tests.
tests/draft2019-09/optional/format/email.json Adds expanded email-format test corpus for 2019-09 optional tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/draft3/optional/format/email.json Outdated
Comment thread tests/draft2020-12/optional/format/email.json Outdated
Comment thread tests/draft2019-09/optional/format/email.json Outdated
Comment thread tests/v1/format/email.json Outdated
Comment thread tests/draft7/optional/format/email.json Outdated
Comment thread tests/draft6/optional/format/email.json Outdated
Comment thread tests/draft4/optional/format/email.json Outdated
@karenetheridge
Copy link
Copy Markdown
Member

I'm not interested in what other test suites say, or what other implementations accept or reject. What do the relevant RFCs say about these test cases? You need to check these yourself: do not rely on AI.

Until you've done this, I don't consider this PR to be mergeable.

@vtushar06
Copy link
Copy Markdown
Contributor Author

vtushar06 commented Apr 14, 2026

Sorry for the noise in the description thought it should be more informative and well structured so used AI to organise this. And also I have gone back through RFC 5321 §4.1.2 and §4.1.3 directly for every test.

The bracket issue in test@[RFC-5322]-domain-literal] was a genuine mistake - fixed in the latest commit. The content after the first ] was outside the brackets which made the test ambiguous.

Two tests I am genuinely unsure about and would appreciate guidance on: a@[extn:test] and a@[1111:2222:...:8888] without the IPv6: prefix. Both syntactically match the General-address-literal grammar in §4.1.3 (Standardized-tag ":" 1*dcontent) but §4.1.3 also says the tag MUST be IANA registered.
let me know what to mark these valid or invalid based on what you think the test suite should do.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/draft2019-09/optional/format/email.json Outdated
Copy link
Copy Markdown
Member

@jviotti jviotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped one comment about a potentially duplicated tests, but overall I can't find anything non-compliant with these tests. They already increase coverage quite a lot more, which is great. E-mails are hard.

Copy link
Copy Markdown
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to do with this. It's much too large to review everything. I'm sure there are plenty of good tests in there, but I'm not convinced that a lot of them are unnecessary.

What I could do was run these tests against my implementation and analyze what failed. I found a couple of things that I don't think are right.

"description": "domain label ending with hyphen is not valid",
"data": "test@iana-.com",
"valid": false
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find anything in the specification that states that domain labels have the same restrictions as hostnames. The spec just says ALPHA / DIGIT / "-".

Comment on lines +649 to +653
{
"description": "IPv6 address without 'IPv6:' prefix is not valid",
"data": "a@[1111:2222:3333:4444:5555:6666:7777:8888]",
"valid": false
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a valid general address literal with tag 1111. I'm not sure what our requirements should be about handling these. Is it sufficient to match the general address literal format, or must the implementation validate the specification for the address literal tag? What if the implementation doesn't support the tag? (see the next comment for more discussion on this)

Comment on lines +694 to +698
{
"description": "general address literal with unregistered tag is not valid",
"data": "a@[extn:test]",
"valid": false
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General address literals are an extension mechanism, which are optional to implement. I think the right thing to do when encountering an extension you don't support is to throw an error because you don't know if it's valid or not. I would do that even if it's not in the IANA registry because the status of the registry can change and then the implementation behavior would change from being false to being an error. I'd rather it consistently be an error until it's supported.

The thing that makes me consider that it might be ok to make an exception for this one is that it's likely that the IANA registry isn't going to change much if ever.

Comment on lines +704 to +728
{
"description": "comment-like parentheses after domain are not valid",
"data": "test@iana.org(comment\\)",
"valid": false
},
{
"description": "open parenthesis with backslash after domain is not valid",
"data": "test@iana.org(comment\\",
"valid": false
},
{
"description": "CR inside parentheses after domain is not valid",
"data": "test@iana.org(\r)",
"valid": false
},
{
"description": "CR inside parentheses before address is not valid",
"data": "(\r)test@iana.org",
"valid": false
},
{
"description": "LF inside parentheses before address is not valid",
"data": "(\n)test@iana.org",
"valid": false
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with all the parenthesis tests? I don't think that's a valid syntax anywhere in an email address. This looks like a bunch of edge case tests for something that's not an edge case.

Comment on lines +654 to +658
{
"description": "lowercase 'ipv6:' prefix in address literal is valid",
"data": "a@[ipv6:::]",
"valid": true
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find anything indicating that the IPv6 label is case insensitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants