Proxy
Transparent LLM proxy that records traces automatically. No SDK needed.
The Traceway proxy is a transparent HTTP reverse proxy that sits between your application and your LLM provider. It forwards requests unchanged, records the full request and response as spans, and returns the response to your application. No code changes needed beyond swapping the base URL.
Starting the proxy
The proxy starts automatically alongside the API server:
traceway serve
# API on :3000, Proxy on :3001Basic usage
Point your LLM client's base URL at the proxy:
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:3001/v1', // proxy
apiKey: process.env.OPENAI_API_KEY, // passed through to the real provider
});
// Use the client normally
const completion = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});The proxy:
- Receives the request from your application
- Creates a trace and an
llm_callspan inrunningstatus - Forwards the request to the real provider (e.g.,
api.openai.com) - Records the full response
- Extracts token counts from the response
- Estimates cost using the model pricing table
- Completes the span
- Returns the response unchanged to your application
Your application sees the same response it would get from the provider directly. The proxy adds no modifications to the request or response.
Provider detection
The proxy auto-detects which LLM provider to forward to based on the request URL and headers:
| Provider | Detection | Forward URL |
|---|---|---|
| OpenAI | Default (any request to /v1/chat/completions) | https://api.openai.com |
| Anthropic | X-API-Key header present, or /v1/messages path | https://api.anthropic.com |
| Ollama | X-Ollama-Base header, or configured Ollama URL | http://localhost:11434 |
For other OpenAI-compatible providers (Together, Groq, Fireworks, etc.), you can set the target URL via the X-Forward-URL header:
const client = new OpenAI({
baseURL: 'http://localhost:3001/v1',
apiKey: process.env.TOGETHER_API_KEY,
defaultHeaders: {
'X-Forward-URL': 'https://api.together.xyz',
},
});What gets recorded
For each request through the proxy, Traceway creates:
- One trace — named after the model (e.g.,
gpt-4o) - One
llm_callspan — with the full request and response
The span includes:
| Field | Value |
|---|---|
name | The model name from the request |
kind.model | The model identifier |
kind.provider | Auto-detected provider |
kind.input_tokens | Extracted from the response |
kind.output_tokens | Extracted from the response |
kind.cost | Estimated from the pricing table |
input | The full request body (messages, system prompt, tools, etc.) |
output | The full response body |
Token extraction
Token counts are extracted differently per provider:
- OpenAI — from
response.usage.prompt_tokensandresponse.usage.completion_tokens - Anthropic — from
response.usage.input_tokensandresponse.usage.output_tokens - Ollama — from
response.eval_countandresponse.prompt_eval_count
If token counts aren't available in the response (e.g., streaming without stream_options.include_usage), the fields are left empty.
Streaming
The proxy supports streaming responses. For OpenAI-style streaming (stream: true), the proxy:
- Forwards the streaming response to your application in real-time (no buffering delay)
- Accumulates chunks in memory to build the full response
- After the stream ends, records the complete response as the span's output
If you want token counts with streaming, include stream_options: { include_usage: true } in your OpenAI request. The proxy will extract tokens from the final usage chunk.
const stream = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Write a poem' }],
stream: true,
stream_options: { include_usage: true },
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}Capture modes
The proxy supports three capture modes that control how much data is recorded:
| Mode | Behavior |
|---|---|
Full | Records the complete request and response (default) |
Preview(N) | Records only the first N characters of input and output |
Off | Creates spans with metadata (model, tokens, cost) but no input/output |
This is useful for production deployments where you want cost and latency tracking without storing potentially sensitive prompt content.
Authentication
In cloud mode, the proxy requires authentication. Pass your Traceway API key as a query parameter or header:
const client = new OpenAI({
baseURL: 'https://api.traceway.ai:3001/v1?token=tw_sk_...',
apiKey: process.env.OPENAI_API_KEY,
});The proxy uses the Traceway API key for authentication and the provider's API key (from the Authorization or X-API-Key header) for the upstream request.
Error handling
If the upstream provider returns an error (4xx or 5xx), the proxy:
- Records the error response as the span's output
- Fails the span with the error status code and message
- Returns the error response unchanged to your application
If the proxy itself can't reach the upstream provider (DNS failure, timeout, connection refused), it returns a 502 Bad Gateway response and fails the span with the connection error.
Limitations
- The proxy currently runs on the same machine as the API server. It can't be deployed separately.
- Each request through the proxy creates exactly one trace with one span. If your application makes multiple LLM calls in a pipeline, each will be a separate trace. For correlated traces, use the SDK instead.
- The proxy doesn't support request modification (e.g., injecting system prompts or adding headers). It's purely transparent.