Ch 11: Large Language Models & Transformers - Advanced¶

Track: Practitioner | Try code in Playground | Back to chapter overview

Read online or run locally

To run the code interactively, clone the repo and open chapters/chapter-11-large-language-models-and-transformers/notebooks/03_advanced_llms.ipynb in Jupyter.

Chapter 11: LLMs & Transformers — Notebook 03 (Advanced LLMs)¶

This notebook covers decoding strategies, KV cache mechanics, scaling laws, evaluation (perplexity, BLEU/ROUGE, LLM-as-judge), and the patterns for building real LLM applications (chunking, streaming, function calling). It sets up Chapter 12 (Prompt Engineering) and Chapter 13 (RAG).

What you'll learn¶

Topic	Section
Decoding: greedy, sampling, temperature, top-k, top-p, repetition penalty	§1
KV cache shapes and inference efficiency	§2
Scaling laws (parameters, data, compute)	§3
Evaluation: perplexity, BLEU/ROUGE, win-rate, LLM-as-judge limits	§4
Building LLM apps: chunking, streaming, function calling	§5
Capstone design and bridge to Chapters 12–13	§6–7

Time estimate: 2.5 hours

Key concepts¶

Decoding — Choose the next token from logits; controls quality vs diversity (greedy → top-p sampling).
KV cache — Reuse past key/value tensors at inference so generation is O(n) per token, not O(n²).
Scaling laws — Loss falls predictably with model size, data, and compute — guides budget choices.
Evaluation — Combine automatic metrics (perplexity, ROUGE) with human or LLM-as-judge win-rates.
LLM apps — Real systems chunk inputs, stream tokens to the UI, and call tools/functions on demand.

Run the full notebook for code and outputs.

Generated by Berta AI