Skip to content

Chapter 11: Large Language Models & Transformers

Build a deep, hands-on understanding of the Transformer architecture and pretrained LLMs—from scaled dot-product attention in NumPy to embeddings, decoding strategies, and shipping LLM-powered features.


Metadata

Field Value
Track Practitioner
Time 10 hours
Prerequisites Chapter 10 (NLP Basics) and Chapter 9 (Deep Learning Fundamentals)

Learning Objectives

  • Explain the Transformer: self-attention, multi-head attention, positional encoding, residuals, layer norm
  • Implement scaled dot-product and multi-head attention from scratch in NumPy
  • Distinguish encoder, decoder, and encoder-decoder families and pick the right one
  • Use pretrained LLMs (BERT, DistilBERT, GPT-style) for embeddings and downstream tasks
  • Generate text with controlled decoding (greedy, sampling, temperature, top-k, top-p)
  • Evaluate LLMs (perplexity, BLEU/ROUGE, win-rate) and design LLM-powered systems

What's Included

Notebooks

Notebook Description
01_transformer_architecture.ipynb Attention from scratch, multi-head, positional encoding, encoder block, model families
02_pretrained_llms.ipynb Hugging Face models, tokenizers, embeddings, frozen-embedding classifier
03_advanced_llms.ipynb Decoding strategies, KV cache, scaling laws, evaluation, LLM apps, capstone

Scripts

  • config.py — Shared chapter config (model names, paths, fallback flags)
  • transformer_utils.py — NumPy attention, multi-head, positional encoding, encoder block helpers
  • llm_utils.py — Pretrained-model loaders, tokenizer wrappers, embedding utilities
  • generation_utils.py — Greedy, top-k, top-p, temperature samplers and decoding helpers

Exercises

  • Problem Set 1 (notebook) — Scaled dot-product attention, positional encoding, attention heatmap, BPE, multi-head shapes, model-family comparison
  • Problem Set 2 (notebook) — Top-k sampling, tiny transformer block, perplexity, embedding classifier, prompt vs context-window trade-offs
  • Solutions — In exercises/solutions/ (notebooks and solutions.py for CI)

Diagrams (Mermaid)

  • transformer_architecture.mermaid, self_attention.mermaid, multi_head_attention.mermaid

Read Online

  • 11.1 Introduction — Transformer architecture: attention, multi-head, positional encoding, encoder block
  • 11.2 Intermediate — Pretrained LLMs, tokenizers, embeddings, frozen-embedding classification
  • 11.3 Advanced — Decoding strategies, KV cache, scaling, evaluation, LLM applications

Or try the code in the Playground.

How to Use This Chapter

Quick Start

Follow these steps to get coding in minutes.

1. Clone and install dependencies

git clone https://github.com/luigipascal/berta-chapters.git
cd berta-chapters
pip install -r requirements.txt

2. Navigate to the chapter

cd chapters/chapter-11-large-language-models-and-transformers
pip install -r requirements.txt

3. (Optional) Install the pretrained-LLM extras

pip install torch transformers tokenizers accelerate datasets sentencepiece huggingface-hub

4. Launch Jupyter

jupyter notebook notebooks/01_transformer_architecture.ipynb

GitHub Folder

All chapter materials live in: chapters/chapter-11-large-language-models-and-transformers/


Created by Luigi Pascal Rondanini | Generated by Berta AI