Ch 13: Retrieval-Augmented Generation (RAG) - Intermediate¶
Track: Practitioner | Try code in Playground | Back to chapter overview
Read online or run locally
To run the code interactively, clone the repo and open chapters/chapter-13-retrieval-augmented-generation/notebooks/02_rag_pipeline.ipynb in Jupyter.
Chapter 13: RAG — Notebook 02 (Building the RAG Pipeline)¶
This notebook makes the pipeline real: chunking strategies (fixed / sliding / sentence / semantic), embedding model choices with TF-IDF fallback, vector store options (FAISS / Chroma sketches), a full RAG pipeline, reranking, and prompt assembly with citations.
What you'll learn¶
| Topic | Section |
|---|---|
| Chunking strategies: fixed, sliding window, sentence, semantic | §1 |
| Embedding model choices and TF-IDF fallback | §2 |
| Vector store options: in-memory NumPy, FAISS, Chroma | §3 |
| End-to-end pipeline class | §4 |
| Reranking with a cross-encoder (sketch) | §5 |
| Prompt assembly with citations and source IDs | §6 |
Time estimate: 2.5 hours
Key concepts¶
- Chunking — Split documents into retrievable units; overlap and granularity affect recall and cost.
- Embedding model — Choice trades quality vs latency vs cost; TF-IDF is a strong, free baseline.
- Vector store — In-memory NumPy is fine for prototypes; FAISS / Chroma for scale.
- Reranking — Use a stronger cross-encoder to reorder the cheap retriever's top results.
- Citations — Always return source IDs / spans so users (and graders) can verify answers.
Run the full notebook for code and outputs.
Generated by Berta AI