Skip to content

requests_toolbelt.multipart.decoder.ImproperBodyPartContentException: content does not contain CR-LF-CR-LF #352

@enigmathix

Description

@enigmathix

This error appears even though there is no requirement in RFC 2046 to have the body end with 2 CR-LF. From https://www.rfc-editor.org/rfc/rfc2046.html#section-5.1.1:

Overall, the body of a "multipart" entity may be specified as
   follows:

     dash-boundary := "--" boundary
                      ; boundary taken from the value of
                      ; boundary parameter of the
                      ; Content-Type field.

     multipart-body := [preamble CRLF]
                       dash-boundary transport-padding CRLF
                       body-part *encapsulation
                       close-delimiter transport-padding
                       [CRLF epilogue]

     transport-padding := *LWSP-char
                          ; Composers MUST NOT generate
                          ; non-zero length transport
                          ; padding, but receivers MUST
                          ; be able to handle padding
                          ; added by message transports.

     encapsulation := delimiter transport-padding
                      CRLF body-part

     delimiter := CRLF dash-boundary

     close-delimiter := delimiter "--"

     preamble := discard-text

     epilogue := discard-text

     discard-text := *(*text CRLF) *text
                     ; May be ignored or discarded.

     body-part := MIME-part-headers [CRLF *OCTET]
                  ; Lines in a body-part must not start
                  ; with the specified dash-boundary and
                  ; the delimiter must not appear anywhere
                  ; in the body part.  Note that the
                  ; semantics of a body-part differ from
                  ; the semantics of a message, as
                  ; described in the text.

     OCTET := <any 0-255 octet value>

As per this spec, the simplest multipart would look like this:

--boundary CRLF
MIME-part-headers
[CRLF MIME-part-headers]
[*]
CRLF --boundary--

There is one CRLF required at the end of the body, not two. In fact, the Google App Engine posts data internally that contains only 1 CRLF when a form field is left empty (the example below is using the data it generates).

Step to reproduce:

from requests_toolbelt.multipart import decoder
data = b'--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=empty\r\n\r\n--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=text\r\n\r\nSome Text\r\n--foo--'

decoder.MultipartDecoder(data, 'multipart/form-data; boundary="foo"')

output:

Traceback (most recent call last):
  File "/Users/christophe/toolbelt.py", line 4, in <module>
    decoder.MultipartDecoder(data, 'multipart/form-data; boundary="foo"')
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 111, in __init__
    self._parse_body(content)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 150, in _parse_body
    self.parts = tuple(body_part(x) for x in parts if test_part(x))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 150, in <genexpr>
    self.parts = tuple(body_part(x) for x in parts if test_part(x))
                       ^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 141, in body_part
    return BodyPart(fixed, self.encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 63, in __init__
    raise ImproperBodyPartContentException(
requests_toolbelt.multipart.decoder.ImproperBodyPartContentException: content does not contain CR-LF-CR-LF

For comparison, here is the same data processed with cgi:

from io import BytesIO
import cgi

data = b'--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=empty\r\n\r\n--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=text\r\n\r\nSome Text\r\n--foo--'
environ = {'CONTENT_LENGTH': str(len(data)),
        'CONTENT_TYPE': 'multipart/form-data; boundary="foo"',
        'REQUEST_METHOD': 'POST',
        'boundary': b'foo'}

stream = BytesIO(data)
print(cgi.parse_multipart(stream, environ))

Output:

{'empty': [''], 'text': ['Some Text']}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions