Ch 13: Retrieval-Augmented Generation (RAG) - Advanced¶

Track: Practitioner | Try code in Playground | Back to chapter overview

Read online or run locally

To run the code interactively, clone the repo and open chapters/chapter-13-retrieval-augmented-generation/notebooks/03_advanced_rag.ipynb in Jupyter.

Chapter 13: RAG — Notebook 03 (Advanced RAG)¶

This notebook tackles hybrid search (dense + BM25 with reciprocal rank fusion), query rewriting / HyDE / multi-query, faithfulness and answer-relevance metrics, agentic / multi-hop intuition, and production concerns (latency, caching, freshness, sharding, cost).

What you'll learn¶

Topic	Section
Hybrid search: dense + BM25 with reciprocal rank fusion	§1
Query rewriting, HyDE, and multi-query expansion	§2
Faithfulness and answer-relevance metrics	§3
Agentic / multi-hop retrieval intuition	§4
Production: latency, caching, freshness, sharding, cost	§5
Capstone design and bridge to Chapter 14	§6

Time estimate: 2 hours

Key concepts¶

Hybrid search — Combine dense (semantic) and sparse (keyword) retrieval; RRF fuses their rankings.
HyDE — Have the LLM draft a hypothetical answer, embed it, then retrieve with that vector.
Faithfulness — Does the answer only use information from the retrieved context?
Multi-hop — Some questions need iterative retrieve-and-reason loops.
Production RAG — Cache embeddings/answers, refresh stale chunks, shard the index, watch p95 latency.

Run the full notebook for code and outputs.

Generated by Berta AI