Notes from production
PlatformObservabilityOps

Observability for LLM systems that actually helps

You cannot improve what you cannot see, and you cannot debug an incident from a vibe. LLM systems need observability wired in from day one.

What to capture

Trace each request end to end: the query, retrieved chunks, the prompt, the response, eval scores, and token cost. Aggregate the metrics that map to outcomes, and alert on the ones that map to spend.

The goal is simple: when something regresses, you can see exactly where and exactly what it cost.