LLM Observability

See every call your models make.

Vector traces every request through your AI stack, flags regressions before users do, and keeps cost and latency in plain sight. The instrumentation layer your LLM features were missing.

Start free Read the docs

SOC 2 Type II Self-host or cloud 12-line install

POST /v1/chat · trace_9f3c1a 612 ms

http.request612

retriever.search134

prompt.build41

model.completion307

tool.lookup · retry98

Trusted by teams shipping AI to production

NorthwindLumenCohere-ishMistralyticsParabolaTldraw-ishReplicantVellumSierraHatch

Why teams switch

Shipping AI without observability is flying blind.

You wouldn't run a backend with no logs. Most teams run LLM features with exactly that.

Black-box outputs

A user reports a bad answer and you have no idea which prompt, model, or retrieval step produced it.

Silent cost creep

Token spend doubles in a week and the first you hear of it is the invoice, not a dashboard.

No idea why it broke

A prompt change quietly tanks quality. Without evals, the regression ships and nobody notices for days.

The platform

One layer, the whole picture.

Four capabilities that share one timeline, so a latency spike, a cost jump, and a quality dip line up on the same view.

Tracing

Follow every request, span by span.

Retrieval, prompt assembly, model call, tool use. See where the time actually goes.

Evals

Catch regressions before users do.

Run scored checks on every prompt change. Green means ship, amber means look.

pass rate 96.4%

Cost & latency

Watch spend and speed in real time.

p50 and p95 latency, tokens, and dollars on one timeline. Set a threshold, get told the moment it breaks.

p50 318msp95 901mstokens/min 42.1kspend/day $184

Prompt versioning

Diff prompts like code.

Every prompt is versioned. Compare any two, see what changed, and roll back in one click.

In the product

A console built for reading, not squinting.

app.vector.dev / traces

Install

Live in about five minutes.

Install the SDK

One package, Python or TypeScript. No agents, no sidecars.

Wrap your client

One line around your existing OpenAI, Anthropic, or custom client.

Watch traces land

Open the console and your first traces are already streaming in.

app.py

# 1. install


# 2. wrap your client
from vector import trace
from openai import OpenAI

client = trace(OpenAI())

# 3. ship. that's it.
client.chat.completions.create(
    model="gpt-4o",
    messages=msgs,
)

"We cut a latency regression from days of guessing to a ten-minute fix. Vector showed us the slow span on the first trace we opened."

Priya NairStaff Engineer, AI Platform at Parabola

lines to instrument an app

<0%

runtime overhead added

0ms

p95 ingest, end to end

Trust

Your prompts and data stay yours.

Run Vector in our cloud or fully self-hosted inside your own VPC. We never train on your data, and you control retention to the day.

Read the security overview

✓ SOC 2 Type II ✓ Self-host / VPC ✓ GDPR ready ✓ No training on your data

Start in minutes

Put your models in the light.

Start free Read the docs

Free up to 100k traces a month. No card required.