llama-3.2-3b-sql-qlora
QLoRA fine-tune of Llama-3.2-3B on 19K+ SQL samples with end-to-end training, evaluation, vLLM inference server, and automated HuggingFace Hub deployment pipeline.
Core Impact
“94.6% perplexity drop (35.1 → 1.88) and ROUGE-L 0.909 → 0.986 on SQL generation by fine-tuning only 0.67% of parameters via QLoRA.”

Architecture Breakdown
Fine-tuned Llama-3.2-3B on 19,000+ SQL samples via QLoRA (4-bit NF4 + double quantization, LoRA r=16 α=32 across all 7 attention/MLP projections) — 94.6% perplexity drop (35.1→1.88) while training only 0.67% of model weights (~20M of 3B params).
ROUGE-L improved 0.909→0.986 on 200-sample held-out test, indicating near-exact SQL query generation vs. ground truth.
Built full inference stack: CLI tool with interactive REPL, base model comparison mode, and remote adapter loading from HuggingFace Hub.
Deployed production inference server on Modal using vLLM for optimized throughput; built Next.js frontend for live query generation demos.
Automated Hub publishing pipeline — injects real eval metrics into model card README before upload, ensuring published numbers always match actual results.
Systems Analysis Concluded