Ch 14: Fine-tuning & Adaptation - Intermediate¶

Track: Practitioner | Try code in Playground | Back to chapter overview

Read online or run locally

To run the code interactively, clone the repo and open chapters/chapter-14-fine-tuning-and-adaptation/notebooks/02_peft_lora.ipynb in Jupyter.

Chapter 14: Fine-tuning — Notebook 02 (PEFT & LoRA)¶

This notebook digs into parameter-efficient fine-tuning: full FT vs PEFT trade-offs, LoRA math and a NumPy implementation, QLoRA, adapters, prefix tuning, IA3, and adapter merging / multi-adapter serving.

What you'll learn¶

Topic	Section
Full fine-tuning vs PEFT trade-offs	§1
LoRA math: low-rank update, rank, alpha, scaling	§2
NumPy LoRA adapter from scratch	§3
QLoRA conceptual (4-bit base + LoRA)	§4
Adapters, prefix tuning, IA3 — when to use which	§5
Merging adapters and multi-adapter serving	§6

Time estimate: 2.5 hours

Key concepts¶

PEFT — Train tiny additional parameters; freeze the rest of the base model.
LoRA — Inject low-rank B @ A updates into linear layers; scaled by alpha / r.
QLoRA — LoRA on top of a 4-bit-quantized base — fits big models on a single GPU.
Adapter merging — Fold B @ A * (alpha / r) back into the base weights for zero-overhead inference.
Multi-adapter serving — Keep one base model loaded; hot-swap small adapters per tenant or task.

Run the full notebook for code and outputs.

Generated by Berta AI