Chapter 11: Large Language Models & Transformers¶

Build a deep, hands-on understanding of the Transformer architecture and pretrained LLMs—from scaled dot-product attention in NumPy to embeddings, decoding strategies, and shipping LLM-powered features.

Metadata¶

Field	Value
Track	Practitioner
Time	10 hours
Prerequisites	Chapter 10 (NLP Basics) and Chapter 9 (Deep Learning Fundamentals)

Learning Objectives¶

Explain the Transformer: self-attention, multi-head attention, positional encoding, residuals, layer norm
Implement scaled dot-product and multi-head attention from scratch in NumPy
Distinguish encoder, decoder, and encoder-decoder families and pick the right one
Use pretrained LLMs (BERT, DistilBERT, GPT-style) for embeddings and downstream tasks
Generate text with controlled decoding (greedy, sampling, temperature, top-k, top-p)
Evaluate LLMs (perplexity, BLEU/ROUGE, win-rate) and design LLM-powered systems

What's Included¶

Notebooks¶

Notebook	Description
`01_transformer_architecture.ipynb`	Attention from scratch, multi-head, positional encoding, encoder block, model families
`02_pretrained_llms.ipynb`	Hugging Face models, tokenizers, embeddings, frozen-embedding classifier
`03_advanced_llms.ipynb`	Decoding strategies, KV cache, scaling laws, evaluation, LLM apps, capstone

Scripts¶

config.py — Shared chapter config (model names, paths, fallback flags)
transformer_utils.py — NumPy attention, multi-head, positional encoding, encoder block helpers
llm_utils.py — Pretrained-model loaders, tokenizer wrappers, embedding utilities
generation_utils.py — Greedy, top-k, top-p, temperature samplers and decoding helpers

Exercises¶

Problem Set 1 (notebook) — Scaled dot-product attention, positional encoding, attention heatmap, BPE, multi-head shapes, model-family comparison
Problem Set 2 (notebook) — Top-k sampling, tiny transformer block, perplexity, embedding classifier, prompt vs context-window trade-offs
Solutions — In exercises/solutions/ (notebooks and solutions.py for CI)

Diagrams (Mermaid)¶

transformer_architecture.mermaid, self_attention.mermaid, multi_head_attention.mermaid

Read Online¶

11.1 Introduction — Transformer architecture: attention, multi-head, positional encoding, encoder block
11.2 Intermediate — Pretrained LLMs, tokenizers, embeddings, frozen-embedding classification
11.3 Advanced — Decoding strategies, KV cache, scaling, evaluation, LLM applications

Or try the code in the Playground.

How to Use This Chapter¶

Quick Start

Follow these steps to get coding in minutes.

1. Clone and install dependencies

git clone https://github.com/luigipascal/berta-chapters.git
cd berta-chapters
pip install -r requirements.txt

2. Navigate to the chapter

cd chapters/chapter-11-large-language-models-and-transformers
pip install -r requirements.txt

3. (Optional) Install the pretrained-LLM extras

pip install torch transformers tokenizers accelerate datasets sentencepiece huggingface-hub

4. Launch Jupyter

jupyter notebook notebooks/01_transformer_architecture.ipynb

GitHub Folder

All chapter materials live in: chapters/chapter-11-large-language-models-and-transformers/

Created by Luigi Pascal Rondanini | Generated by Berta AI