Ch 13: Retrieval-Augmented Generation (RAG) - Intermediate¶

Track: Practitioner | Try code in Playground | Back to chapter overview

Read online or run locally

To run the code interactively, clone the repo and open chapters/chapter-13-retrieval-augmented-generation/notebooks/02_rag_pipeline.ipynb in Jupyter.

Chapter 13: RAG — Notebook 02 (Building the RAG Pipeline)¶

This notebook makes the pipeline real: chunking strategies (fixed / sliding / sentence / semantic), embedding model choices with TF-IDF fallback, vector store options (FAISS / Chroma sketches), a full RAG pipeline, reranking, and prompt assembly with citations.

What you'll learn¶

Topic	Section
Chunking strategies: fixed, sliding window, sentence, semantic	§1
Embedding model choices and TF-IDF fallback	§2
Vector store options: in-memory NumPy, FAISS, Chroma	§3
End-to-end pipeline class	§4
Reranking with a cross-encoder (sketch)	§5
Prompt assembly with citations and source IDs	§6

Time estimate: 2.5 hours

Key concepts¶

Chunking — Split documents into retrievable units; overlap and granularity affect recall and cost.
Embedding model — Choice trades quality vs latency vs cost; TF-IDF is a strong, free baseline.
Vector store — In-memory NumPy is fine for prototypes; FAISS / Chroma for scale.
Reranking — Use a stronger cross-encoder to reorder the cheap retriever's top results.
Citations — Always return source IDs / spans so users (and graders) can verify answers.

Run the full notebook for code and outputs.

Generated by Berta AI