← Notes from production
PlatformObservabilityOps
Observability for LLM systems that actually helps
You cannot improve what you cannot see, and you cannot debug an incident from a vibe. LLM systems need observability wired in from day one.
What to capture
Trace each request end to end: the query, retrieved chunks, the prompt, the response, eval scores, and token cost. Aggregate the metrics that map to outcomes, and alert on the ones that map to spend.
The goal is simple: when something regresses, you can see exactly where and exactly what it cost.