Graph Attention Networks for Link Prediction in Semantic Word Grouping
Information
Författare: Anton GollboBeräknat färdigt: 2023-01
Handledare: Kevin Ajamlou
Handledares företag/institution: Violet Automation
Ämnesgranskare: Ben Blamey
Övrigt: -
Presentation
Presentatör: Anton GollboPresentationstid: 2023-01-25 10:15
Opponent: Jakob Nikamo
Abstract
Manually extracting relevant information from extensive amounts of data can be time-consuming and labour-intensive. Automating this process can allow for a shift of focus toward analysis and utilization of the extracted information, rather than allocating time and resources to data collection and preparation. Information extraction refers to methods of automatically extracting structured information from unstructured or semi-structured documents. An example of such documents can be referred to as visually rich document, a term encompassing documents that contain a significant amount of visual content, such as images, charts, or diagrams, in addition to text. Examples of visually rich documents include PDFs and scanned documents.
A graph neural network (GNN) is a type of neural network that is designed to perform inference on data represented as graphs. Utilizing graph representations, GNNs have the ability to incorporate both visual and textual information in performing inference. As such, leveraging GNNs can be particularly useful
for information extraction from visually rich documents, such documents commonly contain inherent structures that are essential for understanding.
Semantic word grouping is a technique that groups individual word entities into corresponding entity groups. This thesis analyses the performance of state-of-the-art GNNs on the task of link prediction between nodes in a data set of labeled restaurant menus. The method shows promising results in the field of link prediction in an information extraction setting. Further, incorporating additional features related to structural information in the documents can significantly improve performance.