Akforges
← All work
01— AI · Production
Telos AI
Sales intelligence · YC W24

From Cursor demo to 18,000 paying users.

We re-architected the inference pipeline, replaced a fragile LangChain demo with a typed agent graph, added evals and observability, and cut p95 latency by 71%.

−71%
p95 latency
−58%
AI cost / req
5 wks
Time to ship
18k
Paying users

The problem

Telos AI was a YC W24 company with a working sales intelligence prototype — built fast in Cursor, demoed to investors, and suddenly acquired 18,000 beta signups. The prototype worked. Production didn't.

The inference pipeline was a raw LangChain chain — no evals, no guardrails, no observability. P95 latency was 8.4 seconds. The model frequently returned malformed JSON that crashed downstream code. Costs were running $0.0038 per request with no visibility into which calls were expensive.

They had 5 weeks before their public launch. The founding team couldn't afford to spend that time debugging LangChain internals — they needed to ship features and close enterprise pilots.

What we did

Week 1 was an audit. We read every prompt, traced every model call, and benchmarked each stage. The root causes were clear: the chain was doing 6 sequential LLM calls where 3 would do, structured output was not enforced (so the model could and did return anything), and there was zero caching.

We replaced the LangChain chain with a typed LangGraph agent graph. Each node had a strict Zod schema for its output. Parallel calls replaced sequential ones where the model calls were independent. A Redis semantic cache cut repeat lookups by 34%.

We wired LangSmith tracing on every call — latency, token count, cost, and model version all logged with the user's org ID. This gave the Telos team the observability to debug issues themselves after handoff.

We shipped an eval harness with 240 hand-labelled test cases covering their most common sales research queries. It ran on every PR and became the team's definition of "the model is working."

Results

Delivered in 5 weeks. P95 latency dropped from 8.4s to 2.4s. AI cost per request fell from $0.0038 to $0.0016 — $58% reduction — through parallel calls, model routing (GPT-4o mini for cheaper tasks), and Redis caching.

Telos launched publicly, hit 18,000 paying users in the first month, and the inference pipeline has not been the bottleneck since. The eval harness caught two silent regressions from OpenAI model updates that would have reached users.

Tech stack

TypeScriptLangGraphOpenAI GPT-4oLangSmithPostgreSQLRedisAWS ECSGitHub Actions

They shipped in five weeks what our last vendor couldn't in nine months. The eval harness alone changed how we think about model updates.

Mei Park
CTO, Telos AI
Start a similar projectNext case →