CitiBike Demand Forecaster:Recursive ML & MLOps Pipeline
End-to-end ML system for 24-hour City Bike demand forecasting using LightGBM with a recursive multi-step prediction engine, automated Champion/Challenger MLOps pipeline, and a full-stack Next.js dashboard.
Core Impact
“Engineered a high-precision recursive forecasting engine with automated drift detection and Champion/Challenger model promotion.”

Architecture Breakdown
Built a production MLOps system on GitHub Actions that automatically ingests NYC Citi Bike trip data, engineers 32 lag and temporal features, trains a LightGBM forecasting model, and deploys hourly predictions — achieving a MAE of 5.1 rides/hour and 24% WMAPE across 1,051 held-out test samples with zero manual intervention.
Designed a champion/challenger promotion loop using Hopsworks Model Registry and MLflow on DagsHub, where a new challenger model is automatically promoted only if it strictly improves MAE over the incumbent — eliminating manual model approval and ensuring production always runs the best-performing version.
Engineered a recursive inference algorithm that bridges a ~20-day Citi Bike data publication lag by walking hour-by-hour from the last known data point to present using 28 lag features (top feature: lag_1 at 25% importance), then generates a 24-hour demand forecast updated every hour via GitHub Actions.
Deployed a Next.js 14 dashboard on Vercel that reads live Parquet predictions directly from S3 using hyparquet — eliminating a dedicated API server entirely — displaying hourly demand forecasts across NYC stations with an interactive heatmap, reducing infrastructure complexity and operational cost.
Systems Analysis Concluded