Back
QLoRALoRA (PEFT)Llama-3.2BitsAndBytesSFTTrainervLLMModalPyTorchNext.jsHugging FaceHuggingFace Hub

llama-3.2-3b-sql-qlora

QLoRA fine-tune of Llama-3.2-3B on 19K+ SQL samples with end-to-end training, evaluation, vLLM inference server, and automated HuggingFace Hub deployment pipeline.

Core Impact

94.6% perplexity drop (35.1 → 1.88) and ROUGE-L 0.909 → 0.986 on SQL generation by fine-tuning only 0.67% of parameters via QLoRA.

llama-3.2-3b-sql-qlora

Architecture Breakdown

01

Fine-tuned Llama-3.2-3B on 19,000+ SQL samples via QLoRA (4-bit NF4 + double quantization, LoRA r=16 α=32 across all 7 attention/MLP projections) — 94.6% perplexity drop (35.1→1.88) while training only 0.67% of model weights (~20M of 3B params).

02

ROUGE-L improved 0.909→0.986 on 200-sample held-out test, indicating near-exact SQL query generation vs. ground truth.

03

Built full inference stack: CLI tool with interactive REPL, base model comparison mode, and remote adapter loading from HuggingFace Hub.

04

Deployed production inference server on Modal using vLLM for optimized throughput; built Next.js frontend for live query generation demos.

05

Automated Hub publishing pipeline — injects real eval metrics into model card README before upload, ensuring published numbers always match actual results.

Systems Analysis Concluded

© 2026Marian Glen Louis

Engineered with Next.js, Tailwind v4 & Framer Motion

Press / for terminal