Skip to content

🐛 Bug Report: [Critical Bug] OpenAI streaming instrumentation crashes production with "unhashable type: 'list'" when using tool calls #3428

@zwanzigg

Description

@zwanzigg

Which component is this bug for?

OpenAI Instrumentation

📜 Description

Severity: CRITICAL

The OpenAI instrumentation causes production crashes when using streaming chat completions with tool definitions.
The error occurs during metric recording with TypeError: unhashable type: 'list', causing the entire request to fail with a 500 error.

Environment

  • traceloop-sdk version: 0.47.4, 0.47.5 (bug present in both)
  • Python version: 3.13
  • OpenAI SDK version: 1.66.5
  • OpenTelemetry SDK version: 1.38.0
  • Framework: LangGraph with direct OpenAI client calls

Root Cause Analysis

  1. OpenAI tool definitions contain lists/arrays (e.g., "required": ["param1", "param2"])
  2. Instrumentation captures these in _shared_attributes() for metric recording
  3. OpenTelemetry metrics require hashable attributes (for aggregation keys via frozenset())
  4. Lists are not hashableTypeError crashes the streaming iterator
  5. Error propagates to user code → entire request fails with 500

Why This is Critical

Production Impact

  • ✅ User's code is 100% correct
  • ❌ Observability library crashes production
  • ❌ No way to catch the error (happens in sync iterator)
  • ❌ Results in 500 errors for end users

Failed Workarounds

  • should_enrich_metrics=False - Still crashes
  • span_postprocess_callback - Too late, error happens during streaming
  • block_instruments={Instruments.OPENAI} - Only working solution (loses all OpenAI telemetry)

Proposed Fix

In opentelemetry/instrumentation/openai/shared/chat_wrappers.py, sanitize attributes before metric recording:

def _shared_attributes(self):
    """Get attributes for metrics - sanitize unhashable types."""
    attrs = {
        # ... existing attributes
    }
    
    # Sanitize for metric recording
    sanitized = {}
    for key, value in attrs.items():
        if isinstance(value, (list, dict)):
            # Convert to JSON string for hashability
            try:
                sanitized[key] = json.dumps(value)
            except (TypeError, ValueError):
                sanitized[key] = str(value)
        else:
            sanitized[key] = value
    
    return sanitized

OR wrap metric recording in try/except:

def _process_item(self, chunk):
    try:
        self._streaming_time_to_first_token.record(
            self._time_of_first_token - self._start_time,
            attributes=self._shared_attributes(),
        )
    except (TypeError, ValueError) as e:
        # Log but don't crash user code
        logger.warning(f"Failed to record metric: {e}")

This bug makes Traceloop unusable in production for any agent using OpenAI tools.

👟 Reproduction steps

Minimal Reproduction

from traceloop.sdk import Traceloop
from openai import OpenAI

# Initialize Traceloop (any configuration)
Traceloop.init(
    app_name="test-app",
    should_enrich_metrics=False,  # Even with this disabled, still crashes!
)

# Setup OpenAI client
client = OpenAI(api_key="your-api-key")

# Make streaming call with tools - THIS CRASHES
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather?"}],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string"}
                    },
                    "required": ["location"]  # ← LIST causes crash
                }
            }
        }
    ],
    tool_choice="auto",
    stream=True,  # Only crashes with streaming
)

# Crash happens during iteration
for chunk in response:  # TypeError on first chunk with tool_calls
    if chunk.choices:
        print(chunk.choices[0].delta)

👍 Expected behavior

Instrumentation should:

  1. Never crash user code - fail gracefully or skip problematic metrics
  2. Sanitize attributes before recording - convert unhashable types to strings
  3. Handle errors defensively - log warning and continue

👎 Actual Behavior with Screenshots

Complete Stack Trace

File "my_app.py", line 25, in main
    for chunk in response:
                 ^^^^^^^^
File "/venv/lib/python3.13/site-packages/opentelemetry/instrumentation/openai/shared/chat_wrappers.py", line 693, in __next__
    self._process_item(chunk)
    ~~~~~~~~~~~~~~~~~~^^^^^^^
File "/venv/lib/python3.13/site-packages/opentelemetry/instrumentation/openai/shared/chat_wrappers.py", line 718, in _process_item
    self._streaming_time_to_first_token.record(
        self._time_of_first_token - self._start_time,
        attributes=self._shared_attributes(),  # ← Problem: contains unhashable lists
    )
File "/venv/lib/python3.13/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 428, in record
    self._real_instrument.record(amount, attributes, context)
File "/venv/lib/python3.13/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py", line 264, in record
    self._measurement_consumer.consume_measurement(...)
File "/venv/lib/python3.13/site-packages/opentelemetry/sdk/metrics/_internal/_view_instrument_match.py", line 105, in consume_measurement
    aggr_key = frozenset(attributes.items())  # ← Crash: can't hash lists
TypeError: unhashable type: 'list'

🤖 Python Version

3.13

📃 Provide any additional context for the Bug.

  • Non-streaming calls work fine (different code path)
  • Calls without tools work fine
  • Error occurs even with should_enrich_metrics=False
  • Only solution is block_instruments={Instruments.OPENAI}

👀 Have you spent some time to check if this bug has been raised before?

  • I checked and didn't find similar issue

Are you willing to submit PR?

Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions