Skip to content

Ch 13: Retrieval-Augmented Generation (RAG) - Intermediate

Track: Practitioner | Try code in Playground | Back to chapter overview

Read online or run locally

To run the code interactively, clone the repo and open chapters/chapter-13-retrieval-augmented-generation/notebooks/02_rag_pipeline.ipynb in Jupyter.


Chapter 13: RAG — Notebook 02 (Building the RAG Pipeline)

This notebook makes the pipeline real: chunking strategies (fixed / sliding / sentence / semantic), embedding model choices with TF-IDF fallback, vector store options (FAISS / Chroma sketches), a full RAG pipeline, reranking, and prompt assembly with citations.

What you'll learn

Topic Section
Chunking strategies: fixed, sliding window, sentence, semantic §1
Embedding model choices and TF-IDF fallback §2
Vector store options: in-memory NumPy, FAISS, Chroma §3
End-to-end pipeline class §4
Reranking with a cross-encoder (sketch) §5
Prompt assembly with citations and source IDs §6

Time estimate: 2.5 hours


Key concepts

  • Chunking — Split documents into retrievable units; overlap and granularity affect recall and cost.
  • Embedding model — Choice trades quality vs latency vs cost; TF-IDF is a strong, free baseline.
  • Vector store — In-memory NumPy is fine for prototypes; FAISS / Chroma for scale.
  • Reranking — Use a stronger cross-encoder to reorder the cheap retriever's top results.
  • Citations — Always return source IDs / spans so users (and graders) can verify answers.

Run the full notebook for code and outputs.


Generated by Berta AI