Back to Projects/Computer Vision & Deep Learning/llama-3.2-3b-sql-qlora

QLoRALoRA (PEFT)Llama-3.2BitsAndBytesSFTTrainervLLMModalPyTorchNext.jsHugging FaceHuggingFace Hub

llama-3.2-3b-sql-qlora

QLoRA fine-tune of Llama-3.2-3B on 19K+ SQL samples with end-to-end training, evaluation, vLLM inference server, and automated HuggingFace Hub deployment pipeline.

Core Impact

“94.6% perplexity drop (35.1 → 1.88) and ROUGE-L 0.909 → 0.986 on SQL generation by fine-tuning only 0.67% of parameters via QLoRA.”

Exploration Source

Production View

Architecture Breakdown

Fine-tuned Llama-3.2-3B on 19,000+ SQL samples via QLoRA (4-bit NF4), achieving a 94.6% perplexity drop (35.1 → 1.88) while training only 0.67% of model weights (~20M of 3B params).

ROUGE-L improved 0.909 → 0.986 on a 200-sample held-out test, indicating near-exact SQL query generation vs. ground truth.

Designed reproducible eval harness with locked data split (seed=42) ensuring identical train/eval sets across all runs; stored before/after results as versioned JSON artifacts.

Built full inference stack: CLI tool with interactive REPL, base model comparison mode, and remote adapter loading from HuggingFace Hub.

Deployed production inference server on Modal using vLLM for optimized throughput; built Next.js frontend for live query generation demos.

Automated Hub publishing pipeline — injects real eval metrics into model card README before upload, ensuring published numbers always match actual results.

Systems Analysis Concluded

Continue Exploration