Chapter 13: Retrieval-Augmented Generation (RAG)¶
Ground LLMs in your private data—chunking, embeddings, vector stores, hybrid search, reranking, citations, and end-to-end RAG evaluation, all running offline by default.
Metadata¶
| Field | Value |
|---|---|
| Track | Practitioner |
| Time | 8 hours |
| Prerequisites | Chapter 11 (LLMs & Transformers) and Chapter 12 (Prompt Engineering) |
Learning Objectives¶
- Explain why RAG: hallucination, recency, private data, context-window limits
- Implement vector similarity from scratch (cosine, top-k, in-memory index)
- Choose chunking strategies (fixed, sliding, sentence, semantic) for your data
- Use embeddings effectively and combine with TF-IDF / BM25 for hybrid search
- Apply reranking, query rewriting, HyDE, and multi-query expansion
- Evaluate RAG (hit@k, MRR, faithfulness, answer relevance) and design for production
What's Included¶
Notebooks¶
| Notebook | Description |
|---|---|
01_rag_fundamentals.ipynb | Why RAG, embeddings, cosine similarity, in-memory vector store, first end-to-end |
02_rag_pipeline.ipynb | Chunking strategies, embedding choices, vector stores, reranking, citations |
03_advanced_rag.ipynb | Hybrid search, query rewriting / HyDE, evaluation, production, capstone |
Scripts¶
config.py— Chapter config, mock-LLM toggle, vector-store pathschunking.py— Fixed, sliding-window, sentence, and semantic chunkersvectorstore.py—InMemoryVectorStorewithadd,search,save,loadrag_pipeline.py— End-to-end load → chunk → embed → retrieve → prompt → generate → cite
Exercises¶
- Problem Set 1 (notebook) — Cosine similarity from scratch, build a chunker, encode + retrieve, top-k accuracy, compare chunk sizes, source-citing prompt template
- Problem Set 2 (notebook) — BM25 + dense hybrid, query rewriting, faithfulness scorer, multi-hop retrieval, RAG evaluation harness, latency profiling
- Solutions — In
exercises/solutions/(notebooks andsolutions.pyfor CI)
Diagrams (Mermaid)¶
rag_architecture.mermaid,chunking_strategies.mermaid,retrieval_pipeline.mermaid
Read Online¶
- 13.1 Introduction — RAG motivation, embeddings, cosine, vector store from scratch
- 13.2 Intermediate — Chunking, embedding choices, vector stores, reranking, citations
- 13.3 Advanced — Hybrid search, query rewriting, RAG eval, production, capstone
Or try the code in the Playground.
How to Use This Chapter¶
Quick Start
Follow these steps to get coding in minutes.
1. Clone and install dependencies
git clone https://github.com/luigipascal/berta-chapters.git
cd berta-chapters
pip install -r requirements.txt
2. Navigate to the chapter
cd chapters/chapter-13-retrieval-augmented-generation
pip install -r requirements.txt
python -c "import nltk; nltk.download('punkt')"
3. (Optional) Install higher-quality dense embeddings and a vector DB
4. Launch Jupyter
GitHub Folder
All chapter materials live in: chapters/chapter-13-retrieval-augmented-generation/
Created by Luigi Pascal Rondanini | Generated by Berta AI