Comparing job adverts and resumes using state-of-the-art natural language processing models
Information
Författare: Lise Rückert, Henry SjögrenBeräknat färdigt: 2022-06
Handledare: Mikael Nelsson
Handledares företag/institution: Violet AI Lab AB
Ämnesgranskare: Thomas Schön
Övrigt: -
Presentationer
Presentation av Lise RückertPresentationstid: 2022-06-02 08:15
Presentation av Henry Sjögren
Presentationstid: 2022-06-02 09:15
Opponenter: Mandus Hjelm, Eric Andersson
Abstract
The ability to automate the process of comparing and matching resumes with job adverts is a growing research field. This can be done through the use of the machine learning area Natural Language Processing (NLP), which enables a model to learn human language. This thesis explores and evaluates the application of the state-of-the-art NLP model, SBERT, on the task of comparing and calculating a measure of similarity between extracted text from resumes and adverts. This thesis also investigates what type of data that generates the best performing model on said task. The results show that SBERT quickly can be trained on unlabeled data from the HR domain with the usage of a Triplet network, and achieves high performance and good results when tested on various tasks. The models are shown to be bilingual, can tackle unseen vocabulary and understand the concept and descriptive context of entire sentences instead of solely single words. Thus, the conclusion is that the models have a neat understanding of semantic similarity and relatedness. However, in some cases, the models are also shown to become binary in their calculations of similarity between inputs. Moreover, it is hard to tune a model that is exhaustively comprehensive of such diverse domain such as HR. A model fine-tuned on clean and generic data extracted from adverts shows