← Notes from production
RAGRetrievalReranking
Hybrid retrieval + reranking, explained
Dense vectors capture meaning; sparse (BM25) captures exact terms. Most production questions need both, which is why hybrid retrieval beats either alone on real corpora.
Why add a reranker
Retrieval is recall-oriented — cast a wide net. A cross-encoder reranker then re-scores the top candidates with far more precision than the first-stage retriever can afford. It is usually the cheapest accuracy you can buy.
Budget the latency: retrieve k≈50, rerank to the top 5–8 you actually pass to the model. Measure context precision before and after — the lift is typically obvious.