Chapter 11: Large Language Models & Transformers¶
Build a deep, hands-on understanding of the Transformer architecture and pretrained LLMs—from scaled dot-product attention in NumPy to embeddings, decoding strategies, and shipping LLM-powered features.
Metadata¶
| Field | Value |
|---|---|
| Track | Practitioner |
| Time | 10 hours |
| Prerequisites | Chapter 10 (NLP Basics) and Chapter 9 (Deep Learning Fundamentals) |
Learning Objectives¶
- Explain the Transformer: self-attention, multi-head attention, positional encoding, residuals, layer norm
- Implement scaled dot-product and multi-head attention from scratch in NumPy
- Distinguish encoder, decoder, and encoder-decoder families and pick the right one
- Use pretrained LLMs (BERT, DistilBERT, GPT-style) for embeddings and downstream tasks
- Generate text with controlled decoding (greedy, sampling, temperature, top-k, top-p)
- Evaluate LLMs (perplexity, BLEU/ROUGE, win-rate) and design LLM-powered systems
What's Included¶
Notebooks¶
| Notebook | Description |
|---|---|
01_transformer_architecture.ipynb | Attention from scratch, multi-head, positional encoding, encoder block, model families |
02_pretrained_llms.ipynb | Hugging Face models, tokenizers, embeddings, frozen-embedding classifier |
03_advanced_llms.ipynb | Decoding strategies, KV cache, scaling laws, evaluation, LLM apps, capstone |
Scripts¶
config.py— Shared chapter config (model names, paths, fallback flags)transformer_utils.py— NumPy attention, multi-head, positional encoding, encoder block helpersllm_utils.py— Pretrained-model loaders, tokenizer wrappers, embedding utilitiesgeneration_utils.py— Greedy, top-k, top-p, temperature samplers and decoding helpers
Exercises¶
- Problem Set 1 (notebook) — Scaled dot-product attention, positional encoding, attention heatmap, BPE, multi-head shapes, model-family comparison
- Problem Set 2 (notebook) — Top-k sampling, tiny transformer block, perplexity, embedding classifier, prompt vs context-window trade-offs
- Solutions — In
exercises/solutions/(notebooks andsolutions.pyfor CI)
Diagrams (Mermaid)¶
transformer_architecture.mermaid,self_attention.mermaid,multi_head_attention.mermaid
Read Online¶
- 11.1 Introduction — Transformer architecture: attention, multi-head, positional encoding, encoder block
- 11.2 Intermediate — Pretrained LLMs, tokenizers, embeddings, frozen-embedding classification
- 11.3 Advanced — Decoding strategies, KV cache, scaling, evaluation, LLM applications
Or try the code in the Playground.
How to Use This Chapter¶
Quick Start
Follow these steps to get coding in minutes.
1. Clone and install dependencies
git clone https://github.com/luigipascal/berta-chapters.git
cd berta-chapters
pip install -r requirements.txt
2. Navigate to the chapter
3. (Optional) Install the pretrained-LLM extras
4. Launch Jupyter
GitHub Folder
All chapter materials live in: chapters/chapter-11-large-language-models-and-transformers/
Created by Luigi Pascal Rondanini | Generated by Berta AI