Ch 14: Fine-tuning & Adaptation - Intermediate¶
Track: Practitioner | Try code in Playground | Back to chapter overview
Read online or run locally
To run the code interactively, clone the repo and open chapters/chapter-14-fine-tuning-and-adaptation/notebooks/02_peft_lora.ipynb in Jupyter.
Chapter 14: Fine-tuning — Notebook 02 (PEFT & LoRA)¶
This notebook digs into parameter-efficient fine-tuning: full FT vs PEFT trade-offs, LoRA math and a NumPy implementation, QLoRA, adapters, prefix tuning, IA3, and adapter merging / multi-adapter serving.
What you'll learn¶
| Topic | Section |
|---|---|
| Full fine-tuning vs PEFT trade-offs | §1 |
| LoRA math: low-rank update, rank, alpha, scaling | §2 |
| NumPy LoRA adapter from scratch | §3 |
| QLoRA conceptual (4-bit base + LoRA) | §4 |
| Adapters, prefix tuning, IA3 — when to use which | §5 |
| Merging adapters and multi-adapter serving | §6 |
Time estimate: 2.5 hours
Key concepts¶
- PEFT — Train tiny additional parameters; freeze the rest of the base model.
- LoRA — Inject low-rank
B @ Aupdates into linear layers; scaled byalpha / r. - QLoRA — LoRA on top of a 4-bit-quantized base — fits big models on a single GPU.
- Adapter merging — Fold
B @ A * (alpha / r)back into the base weights for zero-overhead inference. - Multi-adapter serving — Keep one base model loaded; hot-swap small adapters per tenant or task.
Run the full notebook for code and outputs.
Generated by Berta AI