Skip to content

Ch 14: Fine-tuning & Adaptation - Intermediate

Track: Practitioner | Try code in Playground | Back to chapter overview

Read online or run locally

To run the code interactively, clone the repo and open chapters/chapter-14-fine-tuning-and-adaptation/notebooks/02_peft_lora.ipynb in Jupyter.


Chapter 14: Fine-tuning — Notebook 02 (PEFT & LoRA)

This notebook digs into parameter-efficient fine-tuning: full FT vs PEFT trade-offs, LoRA math and a NumPy implementation, QLoRA, adapters, prefix tuning, IA3, and adapter merging / multi-adapter serving.

What you'll learn

Topic Section
Full fine-tuning vs PEFT trade-offs §1
LoRA math: low-rank update, rank, alpha, scaling §2
NumPy LoRA adapter from scratch §3
QLoRA conceptual (4-bit base + LoRA) §4
Adapters, prefix tuning, IA3 — when to use which §5
Merging adapters and multi-adapter serving §6

Time estimate: 2.5 hours


Key concepts

  • PEFT — Train tiny additional parameters; freeze the rest of the base model.
  • LoRA — Inject low-rank B @ A updates into linear layers; scaled by alpha / r.
  • QLoRA — LoRA on top of a 4-bit-quantized base — fits big models on a single GPU.
  • Adapter merging — Fold B @ A * (alpha / r) back into the base weights for zero-overhead inference.
  • Multi-adapter serving — Keep one base model loaded; hot-swap small adapters per tenant or task.

Run the full notebook for code and outputs.


Generated by Berta AI