Why “faithfulness ≥ 0.90” should gate your deploys
A practical look at turning eval scores into a CI gate — and what to do when a release fails it.
EvalCI/CDRAG
Read articleField notes on shipping AI you can trust — evals, RAG, agents, and the engineering discipline around them.
A practical look at turning eval scores into a CI gate — and what to do when a release fails it.
BM25 vs dense vs hybrid, and why a reranker is usually the cheapest accuracy you can buy.
Jailbreaks, prompt injection and PII leakage — a checklist mapped to the OWASP LLM Top 10.
Tool-use orchestration with guardrails on every hop, and how to keep the bill predictable.
When to retry retrieval, when to abstain, and how to wire the feedback loop.
The traces, metrics and cost signals worth wiring from day one — not after the incident.