Ch 13: Retrieval-Augmented Generation (RAG) - Advanced¶
Track: Practitioner | Try code in Playground | Back to chapter overview
Read online or run locally
To run the code interactively, clone the repo and open chapters/chapter-13-retrieval-augmented-generation/notebooks/03_advanced_rag.ipynb in Jupyter.
Chapter 13: RAG — Notebook 03 (Advanced RAG)¶
This notebook tackles hybrid search (dense + BM25 with reciprocal rank fusion), query rewriting / HyDE / multi-query, faithfulness and answer-relevance metrics, agentic / multi-hop intuition, and production concerns (latency, caching, freshness, sharding, cost).
What you'll learn¶
| Topic | Section |
|---|---|
| Hybrid search: dense + BM25 with reciprocal rank fusion | §1 |
| Query rewriting, HyDE, and multi-query expansion | §2 |
| Faithfulness and answer-relevance metrics | §3 |
| Agentic / multi-hop retrieval intuition | §4 |
| Production: latency, caching, freshness, sharding, cost | §5 |
| Capstone design and bridge to Chapter 14 | §6 |
Time estimate: 2 hours
Key concepts¶
- Hybrid search — Combine dense (semantic) and sparse (keyword) retrieval; RRF fuses their rankings.
- HyDE — Have the LLM draft a hypothetical answer, embed it, then retrieve with that vector.
- Faithfulness — Does the answer only use information from the retrieved context?
- Multi-hop — Some questions need iterative retrieve-and-reason loops.
- Production RAG — Cache embeddings/answers, refresh stale chunks, shard the index, watch p95 latency.
Run the full notebook for code and outputs.
Generated by Berta AI