llama-3.2-3b-sql-qlora
QLoRA fine-tune of Llama-3.2-3B on 19K+ SQL samples with end-to-end training, evaluation, vLLM inference server, and automated HuggingFace Hub deployment pipeline.
Core Impact
“94.6% perplexity drop (35.1 → 1.88) and ROUGE-L 0.909 → 0.986 on SQL generation by fine-tuning only 0.67% of parameters via QLoRA.”

Architecture Breakdown
Fine-tuned Llama-3.2-3B on 19,000+ SQL samples via QLoRA (4-bit NF4), achieving a 94.6% perplexity drop (35.1 → 1.88) while training only 0.67% of model weights (~20M of 3B params).
ROUGE-L improved 0.909 → 0.986 on a 200-sample held-out test, indicating near-exact SQL query generation vs. ground truth.
Designed reproducible eval harness with locked data split (seed=42) ensuring identical train/eval sets across all runs; stored before/after results as versioned JSON artifacts.
Built full inference stack: CLI tool with interactive REPL, base model comparison mode, and remote adapter loading from HuggingFace Hub.
Deployed production inference server on Modal using vLLM for optimized throughput; built Next.js frontend for live query generation demos.
Automated Hub publishing pipeline — injects real eval metrics into model card README before upload, ensuring published numbers always match actual results.
Systems Analysis Concluded