Ch 14: Fine-tuning & Adaptation - Advanced¶
Track: Practitioner | Try code in Playground | Back to chapter overview
Read online or run locally
To run the code interactively, clone the repo and open chapters/chapter-14-fine-tuning-and-adaptation/notebooks/03_advanced_adaptation.ipynb in Jupyter.
Chapter 14: Fine-tuning — Notebook 03 (Advanced Adaptation)¶
This notebook covers instruction-tuning datasets (Alpaca format), RLHF and DPO (with a NumPy DPO loss), rigorous evaluation, catastrophic forgetting, and a model registry / versioning stub that hands off to Chapter 15.
What you'll learn¶
| Topic | Section |
|---|---|
| Instruction tuning and Alpaca-style datasets | §1 |
| RLHF concepts and DPO (Direct Preference Optimization) | §2 |
| NumPy DPO loss implementation | §3 |
| Held-out eval, win-rates, LLM-as-judge caveats | §4 |
| Catastrophic forgetting and how to avoid it | §5 |
| Model registry / versioning bridge to Chapter 15 | §6 |
Time estimate: 2 hours
Key concepts¶
- Instruction tuning — Fine-tune on diverse
(instruction, response)pairs for general helpfulness. - RLHF / DPO — Use preference data (chosen vs rejected); DPO replaces the RL loop with a closed-form loss.
- Win-rate eval — Compare adapted vs base model head-to-head; bootstrap CIs to avoid overclaiming.
- Catastrophic forgetting — Adapted models can lose general skills; mix in general data and lower the LR.
- Registry — Version every run with hyperparams, eval scores, and adapter pointers — set up for Ch 15.
Run the full notebook for code and outputs.
Generated by Berta AI