Chapter 13: Retrieval-Augmented Generation (RAG)¶

Ground LLMs in your private data—chunking, embeddings, vector stores, hybrid search, reranking, citations, and end-to-end RAG evaluation, all running offline by default.

Metadata¶

Field	Value
Track	Practitioner
Time	8 hours
Prerequisites	Chapter 11 (LLMs & Transformers) and Chapter 12 (Prompt Engineering)

Learning Objectives¶

Explain why RAG: hallucination, recency, private data, context-window limits
Implement vector similarity from scratch (cosine, top-k, in-memory index)
Choose chunking strategies (fixed, sliding, sentence, semantic) for your data
Use embeddings effectively and combine with TF-IDF / BM25 for hybrid search
Apply reranking, query rewriting, HyDE, and multi-query expansion
Evaluate RAG (hit@k, MRR, faithfulness, answer relevance) and design for production

What's Included¶

Notebooks¶

Notebook	Description
`01_rag_fundamentals.ipynb`	Why RAG, embeddings, cosine similarity, in-memory vector store, first end-to-end
`02_rag_pipeline.ipynb`	Chunking strategies, embedding choices, vector stores, reranking, citations
`03_advanced_rag.ipynb`	Hybrid search, query rewriting / HyDE, evaluation, production, capstone

Scripts¶

config.py — Chapter config, mock-LLM toggle, vector-store paths
chunking.py — Fixed, sliding-window, sentence, and semantic chunkers
vectorstore.py — InMemoryVectorStore with add, search, save, load
rag_pipeline.py — End-to-end load → chunk → embed → retrieve → prompt → generate → cite

Exercises¶

Problem Set 1 (notebook) — Cosine similarity from scratch, build a chunker, encode + retrieve, top-k accuracy, compare chunk sizes, source-citing prompt template
Problem Set 2 (notebook) — BM25 + dense hybrid, query rewriting, faithfulness scorer, multi-hop retrieval, RAG evaluation harness, latency profiling
Solutions — In exercises/solutions/ (notebooks and solutions.py for CI)

Diagrams (Mermaid)¶

rag_architecture.mermaid, chunking_strategies.mermaid, retrieval_pipeline.mermaid

Read Online¶

13.1 Introduction — RAG motivation, embeddings, cosine, vector store from scratch
13.2 Intermediate — Chunking, embedding choices, vector stores, reranking, citations
13.3 Advanced — Hybrid search, query rewriting, RAG eval, production, capstone

Or try the code in the Playground.

How to Use This Chapter¶

Quick Start

Follow these steps to get coding in minutes.

1. Clone and install dependencies

git clone https://github.com/luigipascal/berta-chapters.git
cd berta-chapters
pip install -r requirements.txt

2. Navigate to the chapter

cd chapters/chapter-13-retrieval-augmented-generation
pip install -r requirements.txt
python -c "import nltk; nltk.download('punkt')"

3. (Optional) Install higher-quality dense embeddings and a vector DB

pip install sentence-transformers faiss-cpu chromadb

4. Launch Jupyter

jupyter notebook notebooks/01_rag_fundamentals.ipynb

GitHub Folder

All chapter materials live in: chapters/chapter-13-retrieval-augmented-generation/

Created by Luigi Pascal Rondanini | Generated by Berta AI