Enabling DEBUG logging breaks WebSocket connections when sending large Unicode messages

simply enabling DEBUG logging causes WebSocket connections to fail when sending large messages containing multi-byte UTF-8 characters (e.g., Japanese, Chinese, emoji).

## Summary

**The websockets library itself fragments large messages at byte boundaries, and then fails to log those same frames it created.** This is a bug in the library's `Frame.__str__()` method, not a user error.

When DEBUG logging is enabled, `Frame.__str__()` raises `UnicodeDecodeError` for `OP_TEXT` frames that contain incomplete UTF-8 byte sequences.

This happens when:
1. A large text message is **sent** (e.g., `websocket.send(large_json_with_japanese)`)
2. The websockets library fragments the message into multiple frames (at byte boundaries, not UTF-8 character boundaries)
3. DEBUG logging attempts to log the sent frame with `logger.debug("> %s", frame)`
4. `Frame.__str__()` tries to decode the frame data as UTF-8, which fails if the frame ends in the middle of a multi-byte character

This causes the WebSocket connection to close with code 1007 (INVALID_DATA) even though the message data itself is valid - it's just split across multiple frames.

## Environment

- websockets version: 16.0
- Python version: 3.11
- OS: Linux (Azure App Service)

## Reproduction

### Minimal example (direct Frame creation)

```python
from websockets.frames import OP_TEXT, Frame

# Create a large Japanese text (UTF-8 multi-byte characters)
text = "あいうえおかきくけこ" * 700
data = text.encode("utf-8")

# Truncate at position 16373 (middle of a multi-byte character)
# This simulates what happens when a large message is fragmented
truncated = data[:16373]

# OP_TEXT frame with incomplete UTF-8 (simulating first fragment)
frame = Frame(opcode=OP_TEXT, data=truncated, fin=False)

# This raises UnicodeDecodeError
print(str(frame))
```

**Output:**
```
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 16371-16372: unexpected end of data
```

### Realistic example (WebSocket server and client)

This example demonstrates the actual scenario where a user simply sends a large message and the library fails:

```python
import asyncio
import logging

import websockets

# Enable DEBUG logging - this is all it takes to trigger the bug
logging.basicConfig(level=logging.DEBUG)


async def echo_handler(websocket):
    """Simple echo server - receives and echoes back messages."""
    async for message in websocket:
        await websocket.send(message)


async def test_large_unicode_message():
    """Test that sending a large Unicode message fails with DEBUG logging."""

    # Start a WebSocket server
    async with websockets.serve(echo_handler, "localhost", 8765):
        # Connect as a client
        async with websockets.connect("ws://localhost:8765") as websocket:
            # Create a large message with Japanese text (multi-byte UTF-8)
            # This is a realistic scenario - e.g., sending a large JSON with Japanese content
            large_message = "こんにちは世界" * 3000  # ~63KB of Japanese text

            try:
                # This will fail when the library tries to log the fragmented frame
                await websocket.send(large_message)
                response = await websocket.recv()
                print(f"Success: received {len(response)} characters")
            except websockets.exceptions.ConnectionClosedError as e:
                print(f"Connection closed unexpectedly: {e}")
                # Expected output with DEBUG logging:
                # ConnectionClosedError: sent 1007 (invalid frame payload data)
                #   unexpected end of data at position XXXXX


if __name__ == "__main__":
    asyncio.run(test_large_unicode_message())
```

**Expected behavior:** Message is sent and echoed back successfully.

**Actual behavior with DEBUG logging:**
```
ConnectionClosedError: sent 1007 (invalid frame payload data)
  unexpected end of data at position 16373
```

**Note:** The same code works perfectly when logging level is INFO or higher. The bug only manifests when DEBUG logging is enabled.

## Expected behavior

`Frame.__str__()` should gracefully handle incomplete UTF-8 sequences, similar to how it handles `OP_CONT` frames.

## Actual behavior

`Frame.__str__()` raises `UnicodeDecodeError` for `OP_TEXT` frames, which can propagate through the logging system and cause the connection to be closed.

## Root cause

In `frames.py`, the `__str__` method has inconsistent exception handling:

**For OP_TEXT (line 159-162) - NO exception handling:**
```python
if self.opcode is OP_TEXT:
    # Decode the entire payload then elide later if necessary.
    data = repr(bytes(self.data).decode())  # <-- No try-except!
```

**For OP_CONT/PING/PONG (line 179-182) - HAS exception handling:**
```python
try:
    data = repr(bytes(self.data).decode())
    coding = "text"
except (UnicodeDecodeError, AttributeError):
    # fallback to binary representation
```

## Impact

**Users are not doing anything wrong** - they're simply:
1. Enabling DEBUG logging (a standard practice for debugging)
2. Sending a normal text message via `websocket.send()`

The library then:
- Fragments the message into multiple frames (internal behavior)
- Attempts to log those frames
- Fails to decode its own fragmented frames
- Closes the connection with an error

When DEBUG logging is enabled globally (e.g., `logging.getLogger().setLevel(logging.DEBUG)`), the following happens:

1. `Protocol.debug` becomes `True` (line 108 in protocol.py)
2. When **sending** a large message, the library fragments it into multiple frames
3. For each frame being sent, `self.logger.debug("> %s", frame)` is called
4. `Frame.__str__()` is invoked
5. For fragmented OP_TEXT frames with incomplete UTF-8, `UnicodeDecodeError` is raised
6. The exception propagates and the connection is closed with code 1007

This is particularly problematic for applications that:
- Use DEBUG logging in development/staging environments
- **Send** large text messages (which are fragmented by the library)
- Process non-ASCII text (like Japanese, which uses 3-byte UTF-8 sequences)

## Suggested fix

Add the same exception handling to OP_TEXT frames:

```python
if self.opcode is OP_TEXT:
    try:
        data = repr(bytes(self.data).decode())
    except (UnicodeDecodeError, AttributeError):
        # Fallback to binary representation for incomplete UTF-8
        binary = self.data
        if len(binary) > self.MAX_LOG_SIZE // 3:
            cut = (self.MAX_LOG_SIZE // 3 - 1) // 3
            binary = b"".join([binary[: 2 * cut], b"\x00\x00", binary[-cut:]])
        data = " ".join(f"{byte:02x}" for byte in binary)
        coding = "binary (incomplete UTF-8)"
```

## Workaround

As a workaround, set the websockets logger level to INFO or higher:

```python
import logging
logging.getLogger("websockets").setLevel(logging.INFO)
```

This prevents `Frame.__str__()` from being called since `Protocol.debug` will be `False`.

## Related

- This issue is similar to how the library handles UTF-8 decode errors in the actual message processing (which correctly uses incremental UTF-8 decoding in the Assembler class)
- The `__str__` method is only used for debugging/logging purposes, so it should be fault-tolerant

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabling DEBUG logging breaks WebSocket connections when sending large Unicode messages #1695

Summary

Environment

Reproduction

Minimal example (direct Frame creation)

Realistic example (WebSocket server and client)

Expected behavior

Actual behavior

Root cause

Impact

Suggested fix

Workaround

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Enabling DEBUG logging breaks WebSocket connections when sending large Unicode messages #1695

Description

Summary

Environment

Reproduction

Minimal example (direct Frame creation)

Realistic example (WebSocket server and client)

Expected behavior

Actual behavior

Root cause

Impact

Suggested fix

Workaround

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions