Skip to content

Chapter 13: Retrieval-Augmented Generation (RAG)

Ground LLMs in your private data—chunking, embeddings, vector stores, hybrid search, reranking, citations, and end-to-end RAG evaluation, all running offline by default.


Metadata

Field Value
Track Practitioner
Time 8 hours
Prerequisites Chapter 11 (LLMs & Transformers) and Chapter 12 (Prompt Engineering)

Learning Objectives

  • Explain why RAG: hallucination, recency, private data, context-window limits
  • Implement vector similarity from scratch (cosine, top-k, in-memory index)
  • Choose chunking strategies (fixed, sliding, sentence, semantic) for your data
  • Use embeddings effectively and combine with TF-IDF / BM25 for hybrid search
  • Apply reranking, query rewriting, HyDE, and multi-query expansion
  • Evaluate RAG (hit@k, MRR, faithfulness, answer relevance) and design for production

What's Included

Notebooks

Notebook Description
01_rag_fundamentals.ipynb Why RAG, embeddings, cosine similarity, in-memory vector store, first end-to-end
02_rag_pipeline.ipynb Chunking strategies, embedding choices, vector stores, reranking, citations
03_advanced_rag.ipynb Hybrid search, query rewriting / HyDE, evaluation, production, capstone

Scripts

  • config.py — Chapter config, mock-LLM toggle, vector-store paths
  • chunking.py — Fixed, sliding-window, sentence, and semantic chunkers
  • vectorstore.pyInMemoryVectorStore with add, search, save, load
  • rag_pipeline.py — End-to-end load → chunk → embed → retrieve → prompt → generate → cite

Exercises

  • Problem Set 1 (notebook) — Cosine similarity from scratch, build a chunker, encode + retrieve, top-k accuracy, compare chunk sizes, source-citing prompt template
  • Problem Set 2 (notebook) — BM25 + dense hybrid, query rewriting, faithfulness scorer, multi-hop retrieval, RAG evaluation harness, latency profiling
  • Solutions — In exercises/solutions/ (notebooks and solutions.py for CI)

Diagrams (Mermaid)

  • rag_architecture.mermaid, chunking_strategies.mermaid, retrieval_pipeline.mermaid

Read Online

  • 13.1 Introduction — RAG motivation, embeddings, cosine, vector store from scratch
  • 13.2 Intermediate — Chunking, embedding choices, vector stores, reranking, citations
  • 13.3 Advanced — Hybrid search, query rewriting, RAG eval, production, capstone

Or try the code in the Playground.

How to Use This Chapter

Quick Start

Follow these steps to get coding in minutes.

1. Clone and install dependencies

git clone https://github.com/luigipascal/berta-chapters.git
cd berta-chapters
pip install -r requirements.txt

2. Navigate to the chapter

cd chapters/chapter-13-retrieval-augmented-generation
pip install -r requirements.txt
python -c "import nltk; nltk.download('punkt')"

3. (Optional) Install higher-quality dense embeddings and a vector DB

pip install sentence-transformers faiss-cpu chromadb

4. Launch Jupyter

jupyter notebook notebooks/01_rag_fundamentals.ipynb

GitHub Folder

All chapter materials live in: chapters/chapter-13-retrieval-augmented-generation/


Created by Luigi Pascal Rondanini | Generated by Berta AI