Skip to content

Ch 14: Fine-tuning & Adaptation - Advanced

Track: Practitioner | Try code in Playground | Back to chapter overview

Read online or run locally

To run the code interactively, clone the repo and open chapters/chapter-14-fine-tuning-and-adaptation/notebooks/03_advanced_adaptation.ipynb in Jupyter.


Chapter 14: Fine-tuning — Notebook 03 (Advanced Adaptation)

This notebook covers instruction-tuning datasets (Alpaca format), RLHF and DPO (with a NumPy DPO loss), rigorous evaluation, catastrophic forgetting, and a model registry / versioning stub that hands off to Chapter 15.

What you'll learn

Topic Section
Instruction tuning and Alpaca-style datasets §1
RLHF concepts and DPO (Direct Preference Optimization) §2
NumPy DPO loss implementation §3
Held-out eval, win-rates, LLM-as-judge caveats §4
Catastrophic forgetting and how to avoid it §5
Model registry / versioning bridge to Chapter 15 §6

Time estimate: 2 hours


Key concepts

  • Instruction tuning — Fine-tune on diverse (instruction, response) pairs for general helpfulness.
  • RLHF / DPO — Use preference data (chosen vs rejected); DPO replaces the RL loop with a closed-form loss.
  • Win-rate eval — Compare adapted vs base model head-to-head; bootstrap CIs to avoid overclaiming.
  • Catastrophic forgetting — Adapted models can lose general skills; mix in general data and lower the LR.
  • Registry — Version every run with hyperparams, eval scores, and adapter pointers — set up for Ch 15.

Run the full notebook for code and outputs.


Generated by Berta AI