Skip to content

gh-145264: Do not ignore excess Base64 data after the first padded quad#145267

Open
serhiy-storchaka wants to merge 3 commits intopython:mainfrom
serhiy-storchaka:base64-excess-data
Open

gh-145264: Do not ignore excess Base64 data after the first padded quad#145267
serhiy-storchaka wants to merge 3 commits intopython:mainfrom
serhiy-storchaka:base64-excess-data

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Feb 26, 2026

Base64 decoder (see binascii.a2b_base64(), base64.b64decode(), etc) no longer ignores excess data after the first padded quad in non-strict (default) mode. Instead, in conformance with RFC 4648, it ignores the pad character, "=", if it is present before the end of the encoded data.

…ded quad

Base64 decoder (see binascii.a2b_base64(), base64.b64decode(), etc)
no longer ignores excess data after the first padded quad in non-strict
(default) mode.  Instead, in conformance with RFC 4648, it ignores the
pad character, "=", if it is present before the end of the encoded data.
*/
goto done;
}
if (!strict_mode || ignorechar(BASE64_PAD, ignorechars, ignorecache)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment in this block linking to the RFC section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,4 @@
Base64 decoder (see :func:`binascii.a2b_base64`, :func:`base64.b64decode`, etc) no
longer ignores excess data after the first padded quad in non-strict
(default) mode. Instead, in conformance with :rfc:`4648`, it ignores
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is in accordance with the MAY in https://datatracker.ietf.org/doc/html/rfc4648#section-3.3 about ignoring PADs as non-alphabet data? it'd be good to cite the specific section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Test excess data exceptions
def assertExcessData(data, non_strict_expected,
ignore_padchar_expected=None):
def assertExcessData(data, non_strict_expected):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this from non_strict_expected to just expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In strict mode you get an error. You get that value only in non-strict mode, either when strict_mode=False, or when ignorechars contains "=".

But I agree that expected is shorter. The old name was even longer: non_strict_mode_expected_result.


assertExcessData(b'ab==c', b'i')
assertExcessData(b'ab==cd', b'i', b'i\xb7\x1d')
assertExcessData(b'abc=d', b'i\xb7', b'i\xb7\x1d')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test used to highlight the difference between strict and non-strict mode. we should keep a test highlighting that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In strict mode we get an error. We get a result only when strict_mode=False or new argument ignorechars contains "=", and they given different results. Now this difference has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants