Back
QLoRADPO (TRL DPOTrainer)Qwen2.5-7B-InstructSFTTrainerLoRA (PEFT)BitsAndBytesPyTorchWeights & BiasesHuggingFace HubAWS SageMakerPython

FinReason:Financial QA LLM (SFT + DPO)

Fine-tuned Qwen2.5-7B-Instruct on FinQA (SEC filings) using QLoRA SFT followed by DPO alignment, targeting multi-step numerical reasoning over financial tables. Model published to HuggingFace Hub with automated eval metrics injected into the model card.

Core Impact

0.3% → 56.5% accuracy and 6.46 → 1.71 perplexity drop on FinQA SEC earnings via QLoRA SFT + DPO, training only 0.67% of Qwen2.5-7B parameters.

Architecture Breakdown

01

Fine-tuned Qwen2.5-7B-Instruct on 8K FinQA SEC earnings samples via QLoRA (NF4 4-bit, double quant, LoRA r=16 α=32 across all 7 projections) — accuracy jumped 0.3% → 56.5% (+56.2pp) while training only 0.67% of parameters.

02

Perplexity dropped 6.46 → 1.71 (base → SFT) on FinQA test set (313 examples), indicating strong adaptation to multi-step arithmetic reasoning over financial tables.

03

Built DPO alignment stage using TRL DPOTrainer with synthetic preference pairs on top of SFT checkpoint — win rate parity confirmed SFT had already internalized correct reasoning patterns.

04

Automated HuggingFace Hub publishing pipeline — injects real eval metrics (accuracy, perplexity) into model card README before upload, ensuring published numbers always match actual results.

05

Implemented AWS SageMaker training job scripts for both SFT and DPO stages plus endpoint deployment to `ml.g5.2xlarge` — full production inference path beyond Colab.

Systems Analysis Concluded

© 2026Marian Glen Louis

Engineered with Next.js, Tailwind v4 & Framer Motion

Press / for terminal