Vector — Case Study | Rupesh Chavan

12-Line Install

<1% Overhead

p95 <40ms Ingest Latency

100k Free Traces / Month

01 — The Problem

Your AI ships. You have no idea what it does.

The Challenge

Teams shipping LLM-powered features lack the instrumentation they take for granted in traditional software. There's no distributed trace, no regression alert, no cost ledger. You ship a prompt, it goes into production, and then you wait for a user to tell you something broke. Or you notice it three weeks later in your cloud bill.

Vector needed a marketing and documentation surface that communicated this problem — and its solution — in precise, developer-native language. No hype. Just signal.

The Design Problem

How do you make a developer tool feel trustworthy before the engineer installs it? How do you communicate observability as a value proposition without drowning the page in telemetry jargon? And how do you design for engineers who will immediately bounce from anything that looks like a startup marketing page?

The answer was a visual language that treats density as a feature, not a problem — every number on the page earns its space by being specific, not approximate.

Research Insights

INSIGHT 01

The Black Box Problem

Production LLM calls are invisible by default. Engineers instrument their REST APIs, their database queries, their CDN hits — but the model call is a black box. No span, no duration, no token count. Teams can't debug what they can't see.

INSIGHT 02

Silent Cost Creep

Token spend grows invisibly until the cloud bill arrives. Without per-call attribution, teams can't identify which features, users, or prompts are driving cost. Engineering directors face monthly surprises, not a cost curve they control.

INSIGHT 03

No Regression Signal

Prompt changes break silently. A tweak to the system prompt that improves one use case degrades three others — and there's no test coverage for production behavior. By the time a regression is noticed, thousands of users have already experienced it.

02 — User Research

Who Buys Observability

Vector's buyers range from ML engineers who live in traces, to engineering directors who need to answer the CFO's question: "why did our AI costs double this quarter?" Each evaluates the product differently — but both need evidence, not promises.

Priya Sharma

ML Engineer · AI Startup

Goals

Trace every LLM call from request to response
Catch latency regressions before users notice
Compare prompt versions side-by-side with real eval scores

Pain Points

Logging scattered across print statements and Datadog dashboards
No unified view of model calls, tools, and retrieval steps

Daniel Reeves

Eng Director · Series B

Goals

Control token costs and attribute spend per feature
Maintain SOC 2 compliance for enterprise deals
Understand p95 latency to inform SLA commitments

Pain Points

Can't answer "why did this output change?" with any precision
Blind to cost spikes until monthly billing cycle

03 — Design Process

From production mystery
to designed clarity.

Developer tools live or die on trust, and trust comes from precision. Before touching a single UI element, the process started by understanding what ML engineers actually look at when something goes wrong — and what format communicates certainty at 2am in a production incident.

Domain Research & Developer Interviews

Analysed OpenTelemetry, LangSmith, Helicone, and Weights & Biases for visual language, information architecture, and how each positions evaluation vs. monitoring. Synthesised 8 published developer surveys on LLM production challenges to identify the three core anxieties: visibility, cost, and regression.

Information Architecture

Mapped the four product surfaces — Tracing, Evals, Cost Dashboard, Prompt Versioning — and designed the navigation hierarchy to mirror how engineers think about debugging, not how a PM would organize features. The trace waterfall had to be the first thing you saw, because it's the first thing you check.

Visual System — Technical Luxe

Developed a color system where signal teal (#34F5C5) marks live data and primary CTAs, inference violet (#8B7BFF) marks AI-specific UI (model names, token counts, embeddings), and alert amber (#FFB020) is reserved for anomalies and threshold breaches. The palette communicates operational status at a glance without requiring the user to read a label.

Component & Interaction Design

Designed the trace waterfall, eval grid, cost/latency chart, and prompt diff view as a coherent component system — each component using monospace type for numbers, precise durations (not rounded), and color-encoded status signals. Every interactive state was designed for engineers who will use keyboard navigation, not mouse hover exploration.

Solution Exploration

Three decisions that
shaped the tool.

Decision 01

Simplified summary view vs. Full-density trace waterfall

Problem

LLM traces contain dozens of nested spans — model calls, tool invocations, retrieval steps. A simplified summary loses the information engineers need to diagnose latency regressions.

Option A — Simplified Summary

Total latency, total cost, pass/fail status. Clean, scannable, easy to build. Loses the span-level detail that tells an engineer which step in the chain caused a 3-second regression.

Option B — Full Trace Waterfall (Chosen)

Nested spans with real proportional durations — every model call, tool use, and retrieval step as a sized bar. Hover reveals token counts, model ID, and finish reason. Colour-coded by span type.

Why Option B

Engineers debugging production latency need span-level precision — not summaries. A simplified view forces them back to logs, which is exactly the workflow Vector is designed to replace.

Reasoning: Density is a feature for engineering tools. The waterfall communicates exactly what happened, in what order, for how long — which is the question engineers are always asking.

Decision 02

Proprietary diff UI vs. Code-review-identical prompt diff

Problem

Prompt changes need to be compared across versions. A custom "before/after" UI communicates the change — but requires engineers to learn a new interaction pattern for something they already do daily.

Option A — Custom Before/After UI

Side-by-side panels with highlighted changes in a proprietary format. Novel, brandable — and adds cognitive overhead for engineers who do code review all day.

Option B — GitHub-Style Diff (Chosen)

Green/red line highlights identical to a code review diff. Engineers read prompt changes the same way they read code changes — zero learning curve, immediate comprehension.

Why Option B

The best UI for an engineering audience is one that matches their existing mental model. A diff that looks like a GitHub diff is instantly understood — no onboarding required.

Reasoning: For a developer tool, familiarity is a design feature. Inventing new interaction patterns has a cost; borrowing from established ones has a benefit that compounds with user expertise.

Decision 03

Display font numerics vs. Monospace as first-class type element

Problem

Latency figures, token counts, and cost values appear throughout Vector's interface. Display typefaces render numbers as rounded estimates — the wrong register for data that is measured to the millisecond and fraction of a cent.

Option A — Display Font Numerics

Consistent with the rest of the UI type system. Numbers appear styled rather than technical — but in an observability tool, "styled" reads as "approximate."

Option B — Monospace Throughout (Chosen)

JetBrains Mono applied to all numerical values — not just code blocks. Latency in milliseconds, costs in fractions of a cent, token counts: all rendered as measured data, not styled copy.

Why Option B

When latency numbers appear in monospace, they feel measured. When they appear in a display typeface, they feel rounded. That's a trust difference engineers feel without being able to name it.

Reasoning: Typography in a data tool communicates the precision of the underlying measurement. Monospace numerics signal "these figures are exact" before any number is read.

04 — Design System

Density is a feature.
Not a problem.

The central design question for Vector was whether a screen full of telemetry data could feel legible rather than overwhelming. The answer was a strict visual grammar: every number uses monospace, every status uses a color from a three-value system (signal / inference / alert), and every interactive element has a minimum 44px target. The trace waterfall below is the product's heart — it's where ML engineers spend most of their debugging time, and it had to feel as readable as a profiler, not as cluttered as a log viewer.

app.vector.dev — Trace Waterfall · Production

Vector — Trace Waterfall Dashboard

app.vector.dev — Evals · Score Grid

app.vector.dev — Cost & Latency

Signature Components

Trace Waterfall

Problem

Log-based debugging of LLM pipelines requires engineers to manually correlate timestamps across dozens of log lines — a 30-minute debugging session for a 3-second latency regression.

Approach

Nested spans with real proportional durations — every model call, tool invocation, and retrieval step as a sized bar. Colour-coded by span type: teal for model calls, violet for tools, amber for retrieval.

User Benefit

Engineers see at a glance which step in the pipeline caused the latency regression — without reading a single log line. Diagnosis time drops from minutes to seconds.

Business Benefit

The waterfall is the demo moment that converts engineers. Seeing their own production trace rendered as a visual makes the value proposition immediate and undeniable.

Eval Grid

Problem

Prompt evaluation results are typically exported to spreadsheets — a format that requires manual scanning to find regressions across multiple evaluators and prompt versions.

Approach

Pass/fail matrix with score count-up animation on load. Each cell shows evaluator name, score, and delta from baseline. Red cells surface regressions immediately — no scrolling required.

User Benefit

Engineers see the health of a prompt change across all evaluators in a single view. A regression that would take 10 minutes to find in a spreadsheet is visible in 3 seconds.

Business Benefit

The eval grid makes prompt regression detection routine rather than exceptional — increasing the frequency of evaluation runs and catching issues earlier in the deployment cycle.

Cost & Latency Chart

Problem

LLM cost and latency both matter to engineering teams — but they are typically tracked in separate tools, making it impossible to see how a prompt change affects both simultaneously.

Approach

Dual-axis area chart: p50 and p95 latency overlaid on token cost per call. Monospace axis labels precise to the millisecond and fraction of a cent. SLA threshold lines turn amber on approach, red on breach.

User Benefit

Engineers see the cost-latency tradeoff of every prompt change in a single view. A prompt that reduces latency by 400ms but increases cost 3× is visible before it ships to production.

Business Benefit

Visible SLA thresholds make compliance monitoring proactive rather than reactive — teams fix cost or latency breaches before they become incidents, reducing operational escalations.

Prompt Diff View

Problem

Prompt versioning is handled in plain text files or comments — making it difficult to compare what changed between the version that was working and the one that broke production evals.

Approach

GitHub-style side-by-side diff: green/red line highlights identical to a code review. Any two production versions comparable — not just adjacent commits. Token delta and cost delta in the header.

User Benefit

Engineers read prompt changes the same way they read code changes — zero learning curve, immediate comprehension. The mental model transfer from code review is instant and complete.

Business Benefit

A familiar diff UI reduces onboarding friction for engineering teams — the feature is self-evident on first use, shortening time-to-value and reducing support load during trial periods.

05 — Outcomes

Numbers that
engineers trust.

Vector's design constraint was that every claim had to be expressed as a number engineers could verify — not a marketing statement they had to take on faith. The metrics below were chosen because they answer the exact questions ML engineers ask before adopting any new tool in their stack.

12-Line

Install

Full instrumentation in under 12 lines of code. No config files, no sidecar agents, no vendor lock-in for data export.

<1%

Runtime Overhead

Async, non-blocking trace export. Vector adds less than 1% overhead to LLM call latency — verified on GPT-4o and Claude 3.5 Sonnet.

p95 <40ms

Ingest Latency

Traces appear in the console at p95 under 40ms from emission. No batch delay, no sampling loss at volume.

100k

Free Traces per Month

Generous enough to cover a real production workload during evaluation. No credit card required, no sampling on the free tier.

Key Learnings

What This Project Taught Me

Density is a feature for engineering tools

Designing for engineers counteracted almost every design instinct I had before this project. Developers using Vector don't want whitespace and breathing room — they want six numbers in the same viewport. The challenge isn't simplification; it's information architecture that makes complexity legible. The design goal is not to reduce the data — it's to make the data scannable without reducing it. That's a fundamentally different problem than most product design.

Monospace typography is a trust signal in data tools

Using JetBrains Mono for all numerical values — not just code blocks — was the single most effective design decision in the project. When latency numbers appear in a monospace font, they feel measured. When they appear in a display typeface, they feel rounded. That's a trust difference engineers feel without being able to name it. In an observability tool, trust in the numbers is the entire product — the typography cannot undermine it.

Familiarity is a design feature for developer tools

The prompt diff view that looks like a GitHub diff is immediately understood by every engineer who uses it — zero onboarding required. The mental model transfer from code review is complete and instant. Inventing novel interaction patterns for a developer audience has a real cost: it forces engineers to learn something new before they can evaluate whether the tool works. Borrowing from established patterns (diff, waterfall, grid) earns the benefit of existing expertise.

LLM observability is the next era of production engineering design

Every team shipping LLM features into production is operating partially blind — they know the input and the output, but not what happened in between, why latency spiked, or which prompt change broke the eval. Vector exists to close that gap. Designing the interface that makes LLM pipelines as inspectable as traditional software systems is the right design problem to be working on right now — and the design patterns established here will be the conventions the industry builds on.

06 — Reflection

"Designing for engineers taught me something that counteracts almost every design instinct I had before: density is a feature. The developers using Vector don't want whitespace and breathing room — they want six numbers in the same viewport. The challenge isn't simplification. It's information architecture that makes complexity legible."

— Rupesh Chavan, Lead Product Designer

"The decision to treat monospace typography as a first-class design element — not just for code blocks — was the single most effective move in the project. When latency numbers appear in a monospace font, they feel measured. When they appear in a display typeface, they feel rounded. That's a trust difference engineers feel without being able to name it."

On typography as a trust mechanism in developer tools

Ship AI You Can Actually See

Your AI ships. You have no idea what it does.

Who Buys Observability

From production mysteryto designed clarity.

Three decisions thatshaped the tool.

Density is a feature.Not a problem.

Numbers thatengineers trust.

What This Project Taught Me

From production mystery
to designed clarity.

Three decisions that
shaped the tool.

Density is a feature.
Not a problem.

Numbers that
engineers trust.