Skip to content

Ch 13: Retrieval-Augmented Generation (RAG) - Advanced

Track: Practitioner | Try code in Playground | Back to chapter overview

Read online or run locally

To run the code interactively, clone the repo and open chapters/chapter-13-retrieval-augmented-generation/notebooks/03_advanced_rag.ipynb in Jupyter.


Chapter 13: RAG — Notebook 03 (Advanced RAG)

This notebook tackles hybrid search (dense + BM25 with reciprocal rank fusion), query rewriting / HyDE / multi-query, faithfulness and answer-relevance metrics, agentic / multi-hop intuition, and production concerns (latency, caching, freshness, sharding, cost).

What you'll learn

Topic Section
Hybrid search: dense + BM25 with reciprocal rank fusion §1
Query rewriting, HyDE, and multi-query expansion §2
Faithfulness and answer-relevance metrics §3
Agentic / multi-hop retrieval intuition §4
Production: latency, caching, freshness, sharding, cost §5
Capstone design and bridge to Chapter 14 §6

Time estimate: 2 hours


Key concepts

  • Hybrid search — Combine dense (semantic) and sparse (keyword) retrieval; RRF fuses their rankings.
  • HyDE — Have the LLM draft a hypothetical answer, embed it, then retrieve with that vector.
  • Faithfulness — Does the answer only use information from the retrieved context?
  • Multi-hop — Some questions need iterative retrieve-and-reason loops.
  • Production RAG — Cache embeddings/answers, refresh stale chunks, shard the index, watch p95 latency.

Run the full notebook for code and outputs.


Generated by Berta AI