Ch 14: Fine-tuning & Adaptation - Advanced¶

Track: Practitioner | Try code in Playground | Back to chapter overview

Read online or run locally

To run the code interactively, clone the repo and open chapters/chapter-14-fine-tuning-and-adaptation/notebooks/03_advanced_adaptation.ipynb in Jupyter.

Chapter 14: Fine-tuning — Notebook 03 (Advanced Adaptation)¶

This notebook covers instruction-tuning datasets (Alpaca format), RLHF and DPO (with a NumPy DPO loss), rigorous evaluation, catastrophic forgetting, and a model registry / versioning stub that hands off to Chapter 15.

What you'll learn¶

Topic	Section
Instruction tuning and Alpaca-style datasets	§1
RLHF concepts and DPO (Direct Preference Optimization)	§2
NumPy DPO loss implementation	§3
Held-out eval, win-rates, LLM-as-judge caveats	§4
Catastrophic forgetting and how to avoid it	§5
Model registry / versioning bridge to Chapter 15	§6

Time estimate: 2 hours

Key concepts¶

Instruction tuning — Fine-tune on diverse (instruction, response) pairs for general helpfulness.
RLHF / DPO — Use preference data (chosen vs rejected); DPO replaces the RL loop with a closed-form loss.
Win-rate eval — Compare adapted vs base model head-to-head; bootstrap CIs to avoid overclaiming.
Catastrophic forgetting — Adapted models can lose general skills; mix in general data and lower the LR.
Registry — Version every run with hyperparams, eval scores, and adapter pointers — set up for Ch 15.

Run the full notebook for code and outputs.

Generated by Berta AI