Effektivisering av fakturaklassificering enligt UNSPSC-standarden: en maskininlärningslösning
Information
Författare: Elaf Salam, Max NorbergBeräknat färdigt: 2024-06
Handledare: Kim Grandell
Handledares företag/institution: Business Vision Consulting AB
Ämnesgranskare: Olle Gällmo
Övrigt: -
Presentationer
Presentation av Elaf SalamPresentationstid: 2024-06-04 09:15
Presentation av Max Norberg
Presentationstid: 2024-06-04 10:15
Opponenter: Niclas Björkqvist, Viktor Ernlund Evestam
Abstract
Procurement requirements and procurement analysis are approaches used by the Swedish
National Agency for Public Procurement to ensure and maintain a sustainable societal
development. The aim is to safeguard tax funds and ensure that they are used for their intended
reason. In addition, the objective is to promote healthy competition between actors. One way to
impose this is through supply chain management and spend analysis, which can help companies
improve spend efficiency by gaining more insight into their supply chain. This thesis aims to
explore the necessary prerequisites for Business Vision Consulting AB to develop and train a
machine learning model used for classifying invoices to improve and facilitate spend analysis.
By applying several preprocessing methods, two natural language processing algorithms and
training predictive models using four different machine learning algorithms, this thesis proposes
solutions to classify invoice lines to their corresponding UNSPSC-codes. The four chosen
machine learning algorithms are: logistic regression, boosted decision trees, decision forest, and
neural network. Out of the four proposed algorithms, logistic regression with n-gram features
method to transform words to numbers, proved to be the most effective with classifying invoice
lines. In the three highest levels, segment, family, and class, in the hierarchical structure of
UNSPSC, a logistic regression model with n-gram features managed to produce an overall
accuracy of 96.3%, 95.5%, and 99.0% respectively. While these accuracies are adequate, the
end of the thesis proposes areas to delve deeper into for further improvements.