Hoppa till innehåll

  • Start
  • Nyheter
  • Om Programmet
    • Varför STS?
    • Fördjupning om programmet
    • Ämnesöversikt
    • Intervjuer
  • Arbetsmarknad
    • För studenten
  • Student på programmet
    • Studieresurser
    • C-uppsatser
    • Utlandsstudier
  • Examensarbete
    • Att skriva examensarbete
    • Registrera examensarbete
    • Boka tid för presentation
    • Listor över examensarbeten
    • Kommande Exjobbspresentationer
Sök

Lights, Camera, BERT! – Autonomizing the Process of Reading and Interpreting Swedish Film Scripts

Information

Författare: Leon Henzel
Beräknat färdigt: 2023-06
Handledare: Björn Mosten
Handledares företag/institution: Björn Mosten Företag
Ämnesgranskare: Maria Andreina Francisco Rodriguez
Övrigt: -


Presentation

Presentatör: Leon Henzel
Presentationstid: 2023-06-20 10:15
Opponent: Adam Bergman Karlsson

Abstract

In this thesis, the autonomization of reading PDFs of Swedish film scripts through various machine learning techniques and NER is explored. Furthermore, it is explored if labeled data needed for the NER tasks can be reduced to some degree with the goal of saving time. The autonomization process is split into two subsystems, one for extracting larger chunks of text and one for extracting relevant information through named entities from some of the larger text-chunks using NER. The methods explored for accelerating the labeling time for NER are active learning and self learning. For active learning, three methods are explored: Logprob and Word Entropy as uncertainty based active learning methods, and ALPS as a diversity based method. For self learning, a threshold is found based on the mean value of the Word Entropy uncertainty score. The results find that ALPS is the highest performing active learning method when it comes to saving time on labeling data for NER, but by applying self learning trough the found threshold did not improve the NER-models performance, the reason behind this is inconclusive. The entire script reading system got evaluated by competing against a human extracting information from a film script, where the human and system competes on time and accuracy. Accuracy is defined a custom F1-score based on the F1-score for NER. Overall the system performed magnitudes faster than the human, while still retaining fairly high accuracy. The system for extracting named entities had quite low accuracy, which is hypothesised to mainly be due to high data imbalance and too little diversity in the training data.

Ladda ner rapporten

Lights, Camera, BERT! – Autonomizing the Process of Reading and Interpreting Swedish Film Scripts
  • Start
  • Nyheter
  • Om Programmet
    • Varför STS?
    • Fördjupning om programmet
    • Ämnesöversikt
    • Intervjuer
  • Arbetsmarknad
    • För studenten
  • Student på programmet
    • Studieresurser
    • C-uppsatser
    • Utlandsstudier
  • Examensarbete
    • Att skriva examensarbete
    • Registrera examensarbete
    • Boka tid för presentation
    • Listor över examensarbeten
    • Kommande Exjobbspresentationer

Kontakt

Hemsideansvarig
Studievägledare
STS-sektionen

Andra webbplatser

Uppsala Universitet
Schema
Antagning.se
Antagningsstatistik
Hittatenta.se
STS-sektionens hemsida

 

Integritetspolicy | STS-programmet 2024