Skip to content

Enabling DEBUG logging breaks WebSocket connections when sending large Unicode messages #1695

@kazuki-ma

Description

@kazuki-ma

simply enabling DEBUG logging causes WebSocket connections to fail when sending large messages containing multi-byte UTF-8 characters (e.g., Japanese, Chinese, emoji).

Summary

The websockets library itself fragments large messages at byte boundaries, and then fails to log those same frames it created. This is a bug in the library's Frame.__str__() method, not a user error.

When DEBUG logging is enabled, Frame.__str__() raises UnicodeDecodeError for OP_TEXT frames that contain incomplete UTF-8 byte sequences.

This happens when:

  1. A large text message is sent (e.g., websocket.send(large_json_with_japanese))
  2. The websockets library fragments the message into multiple frames (at byte boundaries, not UTF-8 character boundaries)
  3. DEBUG logging attempts to log the sent frame with logger.debug("> %s", frame)
  4. Frame.__str__() tries to decode the frame data as UTF-8, which fails if the frame ends in the middle of a multi-byte character

This causes the WebSocket connection to close with code 1007 (INVALID_DATA) even though the message data itself is valid - it's just split across multiple frames.

Environment

  • websockets version: 16.0
  • Python version: 3.11
  • OS: Linux (Azure App Service)

Reproduction

Minimal example (direct Frame creation)

from websockets.frames import OP_TEXT, Frame

# Create a large Japanese text (UTF-8 multi-byte characters)
text = "あいうえおかきくけこ" * 700
data = text.encode("utf-8")

# Truncate at position 16373 (middle of a multi-byte character)
# This simulates what happens when a large message is fragmented
truncated = data[:16373]

# OP_TEXT frame with incomplete UTF-8 (simulating first fragment)
frame = Frame(opcode=OP_TEXT, data=truncated, fin=False)

# This raises UnicodeDecodeError
print(str(frame))

Output:

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 16371-16372: unexpected end of data

Realistic example (WebSocket server and client)

This example demonstrates the actual scenario where a user simply sends a large message and the library fails:

import asyncio
import logging

import websockets

# Enable DEBUG logging - this is all it takes to trigger the bug
logging.basicConfig(level=logging.DEBUG)


async def echo_handler(websocket):
    """Simple echo server - receives and echoes back messages."""
    async for message in websocket:
        await websocket.send(message)


async def test_large_unicode_message():
    """Test that sending a large Unicode message fails with DEBUG logging."""

    # Start a WebSocket server
    async with websockets.serve(echo_handler, "localhost", 8765):
        # Connect as a client
        async with websockets.connect("ws://localhost:8765") as websocket:
            # Create a large message with Japanese text (multi-byte UTF-8)
            # This is a realistic scenario - e.g., sending a large JSON with Japanese content
            large_message = "こんにちは世界" * 3000  # ~63KB of Japanese text

            try:
                # This will fail when the library tries to log the fragmented frame
                await websocket.send(large_message)
                response = await websocket.recv()
                print(f"Success: received {len(response)} characters")
            except websockets.exceptions.ConnectionClosedError as e:
                print(f"Connection closed unexpectedly: {e}")
                # Expected output with DEBUG logging:
                # ConnectionClosedError: sent 1007 (invalid frame payload data)
                #   unexpected end of data at position XXXXX


if __name__ == "__main__":
    asyncio.run(test_large_unicode_message())

Expected behavior: Message is sent and echoed back successfully.

Actual behavior with DEBUG logging:

ConnectionClosedError: sent 1007 (invalid frame payload data)
  unexpected end of data at position 16373

Note: The same code works perfectly when logging level is INFO or higher. The bug only manifests when DEBUG logging is enabled.

Expected behavior

Frame.__str__() should gracefully handle incomplete UTF-8 sequences, similar to how it handles OP_CONT frames.

Actual behavior

Frame.__str__() raises UnicodeDecodeError for OP_TEXT frames, which can propagate through the logging system and cause the connection to be closed.

Root cause

In frames.py, the __str__ method has inconsistent exception handling:

For OP_TEXT (line 159-162) - NO exception handling:

if self.opcode is OP_TEXT:
    # Decode the entire payload then elide later if necessary.
    data = repr(bytes(self.data).decode())  # <-- No try-except!

For OP_CONT/PING/PONG (line 179-182) - HAS exception handling:

try:
    data = repr(bytes(self.data).decode())
    coding = "text"
except (UnicodeDecodeError, AttributeError):
    # fallback to binary representation

Impact

Users are not doing anything wrong - they're simply:

  1. Enabling DEBUG logging (a standard practice for debugging)
  2. Sending a normal text message via websocket.send()

The library then:

  • Fragments the message into multiple frames (internal behavior)
  • Attempts to log those frames
  • Fails to decode its own fragmented frames
  • Closes the connection with an error

When DEBUG logging is enabled globally (e.g., logging.getLogger().setLevel(logging.DEBUG)), the following happens:

  1. Protocol.debug becomes True (line 108 in protocol.py)
  2. When sending a large message, the library fragments it into multiple frames
  3. For each frame being sent, self.logger.debug("> %s", frame) is called
  4. Frame.__str__() is invoked
  5. For fragmented OP_TEXT frames with incomplete UTF-8, UnicodeDecodeError is raised
  6. The exception propagates and the connection is closed with code 1007

This is particularly problematic for applications that:

  • Use DEBUG logging in development/staging environments
  • Send large text messages (which are fragmented by the library)
  • Process non-ASCII text (like Japanese, which uses 3-byte UTF-8 sequences)

Suggested fix

Add the same exception handling to OP_TEXT frames:

if self.opcode is OP_TEXT:
    try:
        data = repr(bytes(self.data).decode())
    except (UnicodeDecodeError, AttributeError):
        # Fallback to binary representation for incomplete UTF-8
        binary = self.data
        if len(binary) > self.MAX_LOG_SIZE // 3:
            cut = (self.MAX_LOG_SIZE // 3 - 1) // 3
            binary = b"".join([binary[: 2 * cut], b"\x00\x00", binary[-cut:]])
        data = " ".join(f"{byte:02x}" for byte in binary)
        coding = "binary (incomplete UTF-8)"

Workaround

As a workaround, set the websockets logger level to INFO or higher:

import logging
logging.getLogger("websockets").setLevel(logging.INFO)

This prevents Frame.__str__() from being called since Protocol.debug will be False.

Related

  • This issue is similar to how the library handles UTF-8 decode errors in the actual message processing (which correctly uses incremental UTF-8 decoding in the Assembler class)
  • The __str__ method is only used for debugging/logging purposes, so it should be fault-tolerant

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions