Back to Projects/Computer Vision & Deep Learning/llama-3.2-3b-sql-qlora
QLoRALoRA (PEFT)Llama-3.2BitsAndBytesSFTTrainervLLMModalPyTorchNext.jsHugging FaceHuggingFace Hub

llama-3.2-3b-sql-qlora

QLoRA fine-tune of Llama-3.2-3B on 19K+ SQL samples with end-to-end training, evaluation, vLLM inference server, and automated HuggingFace Hub deployment pipeline.

Core Impact

94.6% perplexity drop (35.1 → 1.88) and ROUGE-L 0.909 → 0.986 on SQL generation by fine-tuning only 0.67% of parameters via QLoRA.

llama-3.2-3b-sql-qlora

Architecture Breakdown

01

Fine-tuned Llama-3.2-3B on 19,000+ SQL samples via QLoRA (4-bit NF4), achieving a 94.6% perplexity drop (35.1 → 1.88) while training only 0.67% of model weights (~20M of 3B params).

02

ROUGE-L improved 0.909 → 0.986 on a 200-sample held-out test, indicating near-exact SQL query generation vs. ground truth.

03

Designed reproducible eval harness with locked data split (seed=42) ensuring identical train/eval sets across all runs; stored before/after results as versioned JSON artifacts.

04

Built full inference stack: CLI tool with interactive REPL, base model comparison mode, and remote adapter loading from HuggingFace Hub.

05

Deployed production inference server on Modal using vLLM for optimized throughput; built Next.js frontend for live query generation demos.

06

Automated Hub publishing pipeline — injects real eval metrics into model card README before upload, ensuring published numbers always match actual results.

Systems Analysis Concluded

© 2026Marian Glen Louis

Engineered with Next.js, Tailwind v4 & Framer Motion

Press / for terminal