Skip to content

Conversation

@evantahler
Copy link

@evantahler evantahler commented Nov 14, 2024

Hello! Thank you for the vrc library - It makes testing our multi-service application /possible/.

In one of our tests, we want to ensure that application A can upload a file to application B and get some data back. We do something like this in our code:

files = {"file": ("my_file.pdf", open("my_file.pdf", "rb"))}

async with httpx.AsyncClient() as client:
  response = await client.post(
      "https://my-upload-service.com/api/post",
      json=request_payload,
      files=files,
  )

Recording this interaction with VCR throws an error because the PDF file in question can't be serialized to UTF8 without error, as it is a binary file

httpx_request = <Request('POST', 'http://parser:changeme123@localhost:8200/parser/api/v1/parse')>, kwargs = {}

    def _make_vcr_request(httpx_request, **kwargs):
>       body = httpx_request.read().decode("utf-8")
E       UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 154: invalid continuation byte

As all the existing vcr filters require the request to be parsed so that we can inspect the body/headers/etc, they won't help us here. The assumption that most requests are UTF-8 serializable makes perfect sense, and this is a bit of a weird edge case. So, I'd like to keep the existing behavior as much as possible, but in the case of a UnicodeDecodeError`, let's try parsing again, and drop any bytes that are causing trouble. In our case, it didn't make a meaningul difference to the cassette recording.

@evantahler
Copy link
Author

For the moment, we've gone with a monkeypatching approach:

import warnings

import vcr  # type: ignore[import-untyped]
from vcr.request import Request as VcrRequest  # type: ignore[import-untyped]
from vcr.stubs.httpx_stubs import (  # type: ignore
    _make_vcr_request,  # noqa: F401 this is needed for some reason so python knows this method exists
)


def _fixed__make_vcr_request(  # type: ignore
    httpx_request,
    **kwargs,  # noqa: ARG001
) -> VcrRequest:
    try:
        body = httpx_request.read().decode("utf-8")
    except UnicodeDecodeError as e:  # noqa: F841
        body = httpx_request.read().decode("utf-8", errors="ignore")
        warnings.warn(
            f"Could not decode full request payload as UTF8, recording may have lost bytes. {e}",
            stacklevel=2,
        )
    uri = str(httpx_request.url)
    headers = dict(httpx_request.headers)
    return VcrRequest(httpx_request.method, uri, body, headers)


vcr.stubs.httpx_stubs._make_vcr_request = _fixed__make_vcr_request

@kevin1024
Copy link
Owner

Closing this PR as v8.0.0 included a complete rewrite of httpx support (now patching httpcore instead of httpx). The code path this PR was modifying has changed significantly. If you're still experiencing UTF-8 decoding issues with httpx on v8.0.0, please open a new issue and we can revisit. Thanks for the contribution!

@kevin1024 kevin1024 closed this Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants