Ch 13: Retrieval-Augmented Generation (RAG) - Introduction¶

Track: Practitioner | Try code in Playground | Back to chapter overview

Read online or run locally

You can read this content here on the web. To run the code interactively, either use the Playground or clone the repo and open chapters/chapter-13-retrieval-augmented-generation/notebooks/01_rag_fundamentals.ipynb in Jupyter.

Chapter 13: RAG — Notebook 01 (RAG Fundamentals)¶

This notebook motivates RAG, recaps embeddings and cosine similarity, builds an in-memory vector store from scratch, and ties it all together in a first end-to-end RAG pipeline with a mock LLM.

What you'll learn¶

Topic	Section
Why RAG: hallucination, recency, private data, context-window limits	§1
Embeddings recap and cosine similarity	§2
In-memory vector store: `add`, `search`, top-k	§3
Naive retrieval and prompt assembly	§4
First end-to-end RAG with a mock LLM	§5
Retrieval metrics: hit@k, MRR, precision@k	§6

Time estimate: 2.5 hours

Key concepts¶

RAG — Retrieve relevant snippets at query time and inject them into the prompt for grounded answers.
Embeddings — Dense vectors so semantically similar text is geometrically close.
Cosine similarity — Angle-based score that's invariant to vector magnitude.
Vector store — Indexes embeddings for fast top-k nearest-neighbor search.
hit@k / MRR — Standard retrieval metrics; measure whether the right document is in the top-k.

Run the full notebook in the chapter folder for code and outputs.

Generated by Berta AI