Sentiment analysis of Swedish reviews and domain adaption using Convolutional Neural Networks
Information
Författare: Johan SundströmBeräknat färdigt: 2018-01
Handledare: Nils Dahlbom
Handledares företag/institution: Findwise AB
Ämnesgranskare: Dave Zachariah
Övrigt: -
Presentation
Presentatör: Johan SundströmPresentationstid: 2018-01-05 12:15
Opponent: Emil Fleron
Abstract
Sentiment analysis is a field within machine learning that focus on determine the contextual polarity of subjective information. It is a technique that can be used to analyze the “voice of the customer” and has been applied with success for opinionated information such as customer reviews, political opinions or social media data. A major problem regarding machine learning models is that they are domain dependent and will not perform well for other domains. Transfer learning is a research field that study a model’s ability of transferring knowledge across domains. The purpose of this thesis is to investigate how well suited the deep machine learning model Convolutional Neural Network (CNN) are for cross-domain sentiment analysis of Swedish reviews. This has been done by investigating how models perform when trained with data from different domains with varying amount of data. The impact of using sophisticated text representation has also been studied.This study has shown that a fairly simple CNN without pre-trained word embeddings is not that well suited for transfer learning since it perform worse than a traditional logistic regression model. Substituting 20% of source training data with target data can in many of the test cases boost the performance with 7-8% both for the logistic regression as well as the CNN model. Initialize a CNN with word embeddings that have been pre-trained increases the transferability as well as the in-domain performance and outperform the logistic regression and the CNN model without pre-trained word embeddings in the majority of test cases.