Skip to content

Ch 11: Large Language Models & Transformers - Advanced

Track: Practitioner | Try code in Playground | Back to chapter overview

Read online or run locally

To run the code interactively, clone the repo and open chapters/chapter-11-large-language-models-and-transformers/notebooks/03_advanced_llms.ipynb in Jupyter.


Chapter 11: LLMs & Transformers — Notebook 03 (Advanced LLMs)

This notebook covers decoding strategies, KV cache mechanics, scaling laws, evaluation (perplexity, BLEU/ROUGE, LLM-as-judge), and the patterns for building real LLM applications (chunking, streaming, function calling). It sets up Chapter 12 (Prompt Engineering) and Chapter 13 (RAG).

What you'll learn

Topic Section
Decoding: greedy, sampling, temperature, top-k, top-p, repetition penalty §1
KV cache shapes and inference efficiency §2
Scaling laws (parameters, data, compute) §3
Evaluation: perplexity, BLEU/ROUGE, win-rate, LLM-as-judge limits §4
Building LLM apps: chunking, streaming, function calling §5
Capstone design and bridge to Chapters 12–13 §6–7

Time estimate: 2.5 hours


Key concepts

  • Decoding — Choose the next token from logits; controls quality vs diversity (greedy → top-p sampling).
  • KV cache — Reuse past key/value tensors at inference so generation is O(n) per token, not O(n²).
  • Scaling laws — Loss falls predictably with model size, data, and compute — guides budget choices.
  • Evaluation — Combine automatic metrics (perplexity, ROUGE) with human or LLM-as-judge win-rates.
  • LLM apps — Real systems chunk inputs, stream tokens to the UI, and call tools/functions on demand.

Run the full notebook for code and outputs.


Generated by Berta AI