Skip to content

openai-v2: StreamWrapper.headers fails for with_raw_response streaming even after #4184 fix #46

@adityamehra

Description

@adityamehra

Description

The fix in open-telemetry/opentelemetry-python-contrib#4184 added __getattr__ to StreamWrapper to proxy unknown attributes to self.stream. However, accessing .headers still fails when using with_raw_response.create(stream=True) with LiteLLM (or any client that accesses raw_response.headers directly on the stream wrapper).

This is a follow-up to open-telemetry/opentelemetry-python-contrib#4032 and open-telemetry/opentelemetry-python-contrib#4113.

Note: Originally filed as open-telemetry/opentelemetry-python-contrib#4606 — moving here as the openai-v2 instrumentation now lives in this repo.

Root Cause

__getattr__ proxies to self.stream, which is an AsyncStream (the result of calling result.parse()). However, AsyncStream does not have a .headers attribute — headers live on the original LegacyAPIResponse that was discarded before wrapping.

The call chain in patch.py:

result = await wrapped(*args, **kwargs)        # LegacyAPIResponse (has .headers)
parsed_result = result.parse()                  # AsyncStream (no .headers)
return StreamWrapper(parsed_result, ...)        # self.stream = AsyncStream

When a caller then does:

headers = dict(raw_response.headers)            # StreamWrapper.__getattr__("headers")
                                                # → getattr(AsyncStream, "headers")
                                                # → AttributeError  ← still crashes!

Steps to Reproduce

import asyncio
from unittest.mock import AsyncMock, MagicMock, patch

import httpx
from openai import AsyncAzureOpenAI
from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor
from opentelemetry.sdk.trace import TracerProvider

SSE_CHUNKS = [
    b'data: {"id":"chatcmpl-123","object":"chat.completion.chunk","model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}\n\n',
    b'data: {"id":"chatcmpl-123","object":"chat.completion.chunk","model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}]}\n\n',
    b"data: [DONE]\n\n",
]

def make_mock_response():
    async def aiter_bytes(_=None):
        for chunk in SSE_CHUNKS:
            yield chunk
    mock_request = MagicMock(spec=httpx.Request)
    mock_request.headers = httpx.Headers({"X-Stainless-Raw-Response": "true"})
    mock_response = MagicMock(spec=httpx.Response)
    mock_response.status_code = 200
    mock_response.headers = httpx.Headers({"content-type": "text/event-stream", "x-request-id": "abc"})
    mock_response.aiter_bytes = aiter_bytes
    mock_response.aclose = AsyncMock()
    mock_response.request = mock_request
    mock_response.http_version = "HTTP/1.1"
    mock_response.elapsed = MagicMock()
    return mock_response

# Verbatim copy of LiteLLM's make_azure_openai_chat_completion_request
async def make_azure_openai_chat_completion_request(azure_client, data, timeout):
    raw_response = await azure_client.chat.completions.with_raw_response.create(
        **data, timeout=timeout
    )
    headers = dict(raw_response.headers)   # <-- crashes here
    response = raw_response.parse()
    return headers, response

async def reproducer():
    OpenAIInstrumentor().instrument(tracer_provider=TracerProvider())
    client = AsyncAzureOpenAI(
        api_key="test", azure_endpoint="https://test.openai.azure.com", api_version="2024-02-15-preview"
    )
    with patch.object(client._client, "send", new_callable=AsyncMock, return_value=make_mock_response()):
        headers, response = await make_azure_openai_chat_completion_request(
            client, {"model": "gpt-4o", "messages": [{"role": "user", "content": "hi"}], "stream": True}, 60.0
        )
    print("headers:", headers)

asyncio.run(reproducer())

Actual Behavior

AttributeError: 'StreamWrapper' object has no attribute 'headers'

Observed in production when OpenAI v2 instrumentation was active and LiteLLM's Azure provider called with_raw_response.create(stream=True).

Expected Behavior

raw_response.headers should return the HTTP response headers from the original LegacyAPIResponse, same as when uninstrumented.
raw_response.parse() should return the StreamWrapper itself for iteration.

Suggested Fix

Capture LegacyAPIResponse.headers before calling .parse(), and store them directly on the wrapper:

# Before result.parse():
raw_headers = getattr(result, "headers", None)
parsed_result = result.parse()
if is_streaming(kwargs):
    return ChatStreamWrapper(parsed_result, ..., raw_headers=raw_headers)

# In ChatStreamWrapper / _ChatStreamMixin:
self._self_raw_headers = raw_headers

@property
def headers(self):
    return self._self_raw_headers

def parse(self):
    return self

Environment

  • opentelemetry-instrumentation-openai-v2: latest main
  • openai: 1.82.0
  • litellm: 1.x (Azure provider)
  • Python: 3.11

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions