Ch 11: Large Language Models & Transformers - Advanced¶
Track: Practitioner | Try code in Playground | Back to chapter overview
Read online or run locally
To run the code interactively, clone the repo and open chapters/chapter-11-large-language-models-and-transformers/notebooks/03_advanced_llms.ipynb in Jupyter.
Chapter 11: LLMs & Transformers — Notebook 03 (Advanced LLMs)¶
This notebook covers decoding strategies, KV cache mechanics, scaling laws, evaluation (perplexity, BLEU/ROUGE, LLM-as-judge), and the patterns for building real LLM applications (chunking, streaming, function calling). It sets up Chapter 12 (Prompt Engineering) and Chapter 13 (RAG).
What you'll learn¶
| Topic | Section |
|---|---|
| Decoding: greedy, sampling, temperature, top-k, top-p, repetition penalty | §1 |
| KV cache shapes and inference efficiency | §2 |
| Scaling laws (parameters, data, compute) | §3 |
| Evaluation: perplexity, BLEU/ROUGE, win-rate, LLM-as-judge limits | §4 |
| Building LLM apps: chunking, streaming, function calling | §5 |
| Capstone design and bridge to Chapters 12–13 | §6–7 |
Time estimate: 2.5 hours
Key concepts¶
- Decoding — Choose the next token from logits; controls quality vs diversity (greedy → top-p sampling).
- KV cache — Reuse past key/value tensors at inference so generation is
O(n)per token, notO(n²). - Scaling laws — Loss falls predictably with model size, data, and compute — guides budget choices.
- Evaluation — Combine automatic metrics (perplexity, ROUGE) with human or LLM-as-judge win-rates.
- LLM apps — Real systems chunk inputs, stream tokens to the UI, and call tools/functions on demand.
Run the full notebook for code and outputs.
Generated by Berta AI