OpenAI Integration

Automatic tracing for OpenAI API calls.

Quick Start

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import tracium
from openai import OpenAI
# Enable auto-instrumentation
tracium.trace()
# Use OpenAI normally - all calls are traced
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)

What Gets Captured

For each OpenAI call, Tracium automatically captures:

  • Input messages - The full messages array sent to OpenAI
  • Model - The model being used (gpt-4, gpt-3.5-turbo, etc.)
  • Output - The response content
  • Token usage - Input tokens, output tokens, cached tokens
  • Latency - Time taken for the API call
  • Errors - Any errors or exceptions

Streaming Support

Streaming responses are fully supported. Tracium captures each chunk and records the complete response when the stream finishes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import tracium
from openai import OpenAI
tracium.trace()
client = OpenAI()
# Streaming is automatically traced
stream = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Write a poem about coding."}],
stream=True
)
# Chunks are captured, full response recorded at stream end
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")

Async Support

Async OpenAI calls are automatically traced:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import tracium
import asyncio
from openai import AsyncOpenAI
tracium.trace()
client = AsyncOpenAI()
async def generate_response(prompt: str) -> str:
# Async calls are traced automatically
response = await client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
# Run async function
result = asyncio.run(generate_response("Hello, world!"))

With Manual Traces

Combine auto-instrumentation with manual traces for full control:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import tracium
from openai import OpenAI
client = tracium.init()
openai = OpenAI()
with client.agent_trace(agent_name="qa-bot") as trace:
# Add custom context
with trace.span(span_type="retrieval", name="fetch_context") as span:
span.record_input({"query": user_query})
context = get_relevant_context(user_query)
span.record_output({"context": context})
# OpenAI call is auto-traced as a span within this trace
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": f"Context: {context}"},
{"role": "user", "content": user_query}
]
)

Function Calling

Function calls (tools) are captured in the trace:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import tracium
from openai import OpenAI
tracium.trace()
client = OpenAI()
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
}
]
# Function calls are captured in the trace
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "What's the weather in NYC?"}],
tools=tools,
tool_choice="auto"
)

Completions API

The legacy Completions API is also supported:

import tracium
from openai import OpenAI

tracium.trace()
client = OpenAI()

# Completions API is traced
response = client.completions.create(
    model="gpt-3.5-turbo-instruct",
    prompt="Write a haiku about programming:",
    max_tokens=50
)

Error Handling

Errors are captured and the span is marked as failed:

import tracium
from openai import OpenAI, RateLimitError

tracium.trace()
client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except RateLimitError as e:
    # The span is automatically marked as failed with error details
    # including error type, message, and traceback
    handle_rate_limit(e)