Expand email format tests#893
Conversation
There was a problem hiding this comment.
Pull request overview
Adds substantial new JSON Schema email-format test coverage across all supported drafts to better exercise RFC 5321-valid cases (notably quoted-string local parts and address literals) and common edge/invalid cases.
Changes:
- Expands each draft’s
emailformat test suite with a large set of new valid/invalid examples. - Adds coverage for quoted-string local parts, IPv4/IPv6 address literals (incl. IPv6v4 hybrids), single-label domains, dotted-quad domains, and control/FWS/comment edge cases.
- Keeps the new cases largely consistent across drafts/v1 to make cross-implementation comparisons easier.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/v1/format/email.json | Adds expanded email-format test corpus for v1. |
| tests/draft7/optional/format/email.json | Adds expanded email-format test corpus for draft7 optional tests. |
| tests/draft6/optional/format/email.json | Adds expanded email-format test corpus for draft6 optional tests. |
| tests/draft4/optional/format/email.json | Adds expanded email-format test corpus for draft4 optional tests. |
| tests/draft3/optional/format/email.json | Adds expanded email-format test corpus for draft3 optional tests. |
| tests/draft2020-12/optional/format/email.json | Adds expanded email-format test corpus for 2020-12 optional tests. |
| tests/draft2019-09/optional/format/email.json | Adds expanded email-format test corpus for 2019-09 optional tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I'm not interested in what other test suites say, or what other implementations accept or reject. What do the relevant RFCs say about these test cases? You need to check these yourself: do not rely on AI. Until you've done this, I don't consider this PR to be mergeable. |
|
Sorry for the noise in the description thought it should be more informative and well structured so used AI to organise this. And also I have gone back through RFC 5321 §4.1.2 and §4.1.3 directly for every test. The bracket issue in test@[RFC-5322]-domain-literal] was a genuine mistake - fixed in the latest commit. The content after the first ] was outside the brackets which made the test ambiguous. Two tests I am genuinely unsure about and would appreciate guidance on: a@[extn:test] and a@[1111:2222:...:8888] without the IPv6: prefix. Both syntactically match the General-address-literal grammar in §4.1.3 (Standardized-tag ":" 1*dcontent) but §4.1.3 also says the tag MUST be IANA registered. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jviotti
left a comment
There was a problem hiding this comment.
I dropped one comment about a potentially duplicated tests, but overall I can't find anything non-compliant with these tests. They already increase coverage quite a lot more, which is great. E-mails are hard.
jdesrosiers
left a comment
There was a problem hiding this comment.
I'm not sure what to do with this. It's much too large to review everything. I'm sure there are plenty of good tests in there, but I'm not convinced that a lot of them are unnecessary.
What I could do was run these tests against my implementation and analyze what failed. I found a couple of things that I don't think are right.
| "description": "domain label ending with hyphen is not valid", | ||
| "data": "test@iana-.com", | ||
| "valid": false | ||
| }, |
There was a problem hiding this comment.
I didn't find anything in the specification that states that domain labels have the same restrictions as hostnames. The spec just says ALPHA / DIGIT / "-".
| { | ||
| "description": "IPv6 address without 'IPv6:' prefix is not valid", | ||
| "data": "a@[1111:2222:3333:4444:5555:6666:7777:8888]", | ||
| "valid": false | ||
| }, |
There was a problem hiding this comment.
This is technically a valid general address literal with tag 1111. I'm not sure what our requirements should be about handling these. Is it sufficient to match the general address literal format, or must the implementation validate the specification for the address literal tag? What if the implementation doesn't support the tag? (see the next comment for more discussion on this)
| { | ||
| "description": "general address literal with unregistered tag is not valid", | ||
| "data": "a@[extn:test]", | ||
| "valid": false | ||
| }, |
There was a problem hiding this comment.
General address literals are an extension mechanism, which are optional to implement. I think the right thing to do when encountering an extension you don't support is to throw an error because you don't know if it's valid or not. I would do that even if it's not in the IANA registry because the status of the registry can change and then the implementation behavior would change from being false to being an error. I'd rather it consistently be an error until it's supported.
The thing that makes me consider that it might be ok to make an exception for this one is that it's likely that the IANA registry isn't going to change much if ever.
| { | ||
| "description": "comment-like parentheses after domain are not valid", | ||
| "data": "test@iana.org(comment\\)", | ||
| "valid": false | ||
| }, | ||
| { | ||
| "description": "open parenthesis with backslash after domain is not valid", | ||
| "data": "test@iana.org(comment\\", | ||
| "valid": false | ||
| }, | ||
| { | ||
| "description": "CR inside parentheses after domain is not valid", | ||
| "data": "test@iana.org(\r)", | ||
| "valid": false | ||
| }, | ||
| { | ||
| "description": "CR inside parentheses before address is not valid", | ||
| "data": "(\r)test@iana.org", | ||
| "valid": false | ||
| }, | ||
| { | ||
| "description": "LF inside parentheses before address is not valid", | ||
| "data": "(\n)test@iana.org", | ||
| "valid": false | ||
| }, |
There was a problem hiding this comment.
What's up with all the parenthesis tests? I don't think that's a valid syntax anywhere in an email address. This looks like a bunch of edge case tests for something that's not an edge case.
| { | ||
| "description": "lowercase 'ipv6:' prefix in address literal is valid", | ||
| "data": "a@[ipv6:::]", | ||
| "valid": true | ||
| }, |
There was a problem hiding this comment.
I didn't find anything indicating that the IPv6 label is case insensitive.
The existing email format files have between 11 and 27 tests per draft. I went through RFC 5321 §4.1.2 and §4.1.3 and added 148 new tests per draft covering the parts that were missing.
Coverage
"test"@iana.org,""@iana.org,"\""@iana.org,"\\"@iana.org," "@iana.org- RFC 5321 §4.1.2 definesLocal-part = Dot-string / Quoted-stringso both forms are valid. Also tests the tricky@inside quoted strings like"a@b"@iana.orgwhere the mailbox separator is the unquoted@.[255.255.255.255]), full and compressed IPv6, IPv6v4 hybrid forms ([IPv6:...:255.255.255.255]), and the invalid cases - wrong octet count, missingIPv6:prefix, empty brackets, leading space inside brackets.test@io), all-digit labels (test@iana.123), dotted-quad without brackets (test@255.255.255.255), consecutive hyphens, hyphen at label boundaries.@signs, mixed dot-string and quoted-string, brackets in local-part position.One thing flagged for discussion:
a@[extn:test]syntactically matchesGeneral-address-literalin §4.1.3 but the MUST clause requires IANA registration. Currently marked invalid - been discussing with @jdesrosiers and @jviotti and happy to go either way based on consensus here.Draft Coverage