Back to Projects
QLoRALoRA (PEFT)Llama-3.2BitsAndBytesSFTTrainerPyTorchHugging FaceHuggingFace Hub

llama-3.2-3b-alpaca-qlora

QLoRA fine-tune of Llama-3.2-3B-Instruct on 52K instruction examples with an end-to-end training, evaluation, and HuggingFace Hub deployment pipeline.

Core Impact

"Reduced perplexity by 81.3% (25.84→4.82) and improved ROUGE-L by 36.3% by fine-tuning only 0.67% of parameters via QLoRA on a single 24GB GPU."

llama-3.2-3b-alpaca-qlora

Architecture Breakdown

01

Fine-tuned Llama-3.2-3B-Instruct on 52K Alpaca instruction examples via QLoRA (4-bit NF4 + double quantization, LoRA r=16 α=32 across all 7 attention/MLP projections), reducing perplexity 81.3% (25.84→4.82) and improving ROUGE-L +36.3% (0.259→0.353) on a 2,601-example held-out split.

02

Reduced GPU memory ~40% by combining 4-bit NF4 double quantization (~0.4 bits/param savings via nested quant), gradient checkpointing, and paged AdamW-8bit — fine-tuning only 20M of 3B parameters (0.67%) on a single 24GB RTX 4090 in ~2.5 hours.

03

Implemented correct SFT loss masking by applying the model's native chat template via apply_chat_template and routing through SFTTrainer's formatting_func, ensuring causal LM loss is computed only over assistant tokens — a common pitfall in instruction tuning pipelines.

04

Built an end-to-end MLOps pipeline with deterministic before/after evaluation (shared seed=42 split across train/eval scripts, perplexity via teacher-forcing + ROUGE-L via greedy decoding), versioned JSON result artifacts, and automated HuggingFace Hub deployment with real eval numbers injected into the model card.

Systems Analysis Concluded

© 2026Marian Glen Louis

Engineered with Next.js, Tailwind v4 & Framer Motion