Predicting purchase intentions of customers by using web data
Information
Författare: Olle Kåhre ZällBeräknat färdigt: 2022-01
Handledare: Helena Sjöberg
Handledares företag/institution: Bonava AB
Ämnesgranskare: Niklas Wahlström
Övrigt: -
Presentation
Presentatör: Olle Kåhre ZällPresentationstid: 2022-01-28 13:15
Opponent: Alexander Fors
Abstract
This master thesis aims to investigate the possibilities of predicting purchase intentions of customers during their sales processes in the real estate sector. Also, the web activity of customers on a real estate company’s web site is used as the basis for the forecasting. A machine learning framework has been developed, where its compliance with the GDPR is also assessed. Five supervised machine learning algorithms – logistic regression, k-nearest neighbors, decision tree, random forest, multilayer perceptron – have been utilized for predicting the classes of the customers: buyers and non-buyers. Three data sets were generated, which represented the total number of active customers at different points in time: at the same day as a sales process starts (day 0) and 10 and 20 days after it. The algorithms were applied and evaluated on these data sets to identify when it is suitable to predict the purchase intentions of customers. To increase the generalization capability of the algorithms, hyperparameter optimization along with data resampling by combining undersampling and synthetic minority over-sampling techniques, k-fold cross validation and mutual information, as feature selection, were applied.
The results show that the number of visited web pages, sessions, searched projects (concerning accommodations) and searched locations were relevant for all three data sets. The average price (in total and per square meter) of the most frequently visited web page regarding projects were also included in all the data sets. In addition, the total number of registration of interests sent, and the total amount of time spent on the company’s web site were considered in the second (day 10) and third data set (day 20). Further, a multilayer perceptron – applied 10 days after the start of a sales process – was considered as the optimal model for classifying the purchase intentions of customers. Moreover, the developed machine learning framework is argued to be compliant with the GDPR. Further evaluation regarding the compliance needs to be conducted if the methodology of this machine learning framework would be implemented in practice.