Forecasting Irregular Sales: A Machine Learning Approach to Predicting Customer Purchasing Behaviour
Information
Författare: Pontus Fredstam, Gustav MolinBeräknat färdigt: 2026-06
Handledare: Anindya Kimell Gupta
Handledares företag/institution: Optinova
Ämnesgranskare: Stefanos Kaxiras
Övrigt: -
Presentationer
Presentation av Pontus FredstamPresentationstid: 2026-06-10 14:15
Presentation av Gustav Molin
Presentationstid: 2026-06-10 15:15
Opponenter: Elias Ihrefjord, Karl Johansson
Abstract
Sales forecasting is an important area for manufacturing companies. Whether it is accurate or not impacts the planning of production. This thesis is carried out in collaboration with Optinova and explores how a customer-level sales forecasting system can be created using machine learning. A recurring challenge in this work is the intermittent and lumpy characteristics of business-to-business (B2B) demand. Optinova’s customers are no exception to this, with approximately 95% of the customers exhibiting intermittent or lumpy characteristics. Since standard forecasting approaches often assume continuous demand in the data, the zeroinflated data requires adapting these approaches.
To solve this issue, a two-stage hurdle model was constructed. In the first stage, a binary classifier predicts whether a customer will purchase or not in a given month. If the classifier predicts yes, a regression model then estimates the magnitude of that purchase. If the classifier predicts no, the revenue for that customer is set to zero. Two classifiers (XGBoost, LightGBM) and three regressors (XGBoost, LightGBM, Random Forest) created a total of six hurdle-pairs that were all trained and evaluated on the historical data. A feature engineering effort was performed to mimic demand planning concepts and customer behavioral metrics.
The results show that the best hurdle-pair (LightGBM classifier and XGBoost regressor) achieved a monthly mean absolute percentage error of 11.89%. When aggregated to a full year forecast, the best hurdle-pair achieved an absolute percentage error of 1.43%, as erratic monthly behavior is largely canceled out over a full year. The model systematically beats a naive baseline across all evaluation levels and captures seasonality trends in the historical data, as well as capturing over 80% of the variance in annual customer spend. The findings presented in this thesis suggest that machine learning can be used as a valuable decisionsupport tool for customer-level forecasting in B2B environments characterized by intermittent demand.