AR glasses for Speech-to-Text (STAR)
Information
Författare: Julia Falk, Siri EksvärdBeräknat färdigt: 2021-01
Handledare: Kjell Brunnström
Handledares företag/institution: RISE Research Institute of Sweden
Ämnesgranskare: Lars Oestreicher
Övrigt: -
Presentationer
Presentation av Julia FalkPresentationstid: 2021-01-14 13:15
Presentation av Siri Eksvärd
Presentationstid: 2021-01-14 14:15
Opponenter: Johanna Dagfalk, Ellen Kyhle
Abstract
Suffering from a hearing impairment or deafness has major consequences on the individual’s social life. Today, there exist various aids, but there are some challenges with these, like availability, reliability and high cognitive load when the user trying to focus on both the aid and the surrounding context. To overcome these challenges, one potential solution could make use of a combination of Augmented Reality (AR) and speech-to-text systems, where speech is converted into text that is then presented in AR-glasses. However, in AR, one crucial problem is the legibility and readability of text under different environmental conditions. Moreover, different types of AR-glasses have different usage characteristics, which implies that a certain type of glasses might be more suitable for the proposed system than others. For speech-to-text systems, it is necessary to consider factors such as accuracy, latency and robustness when used in different acoustic environments and with different speech audio.
In this master thesis, two different AR-glasses are being evaluated based on the different characteristics of the glasses, such as optical, visual and ergonomic. Moreover, user tests are conducted with 23 normal hearing individuals to evaluate the legibility and readability of text under different environmental contexts. Due to the pandemic, it was not possible to conduct the tests with hearing impaired individuals. Finally, a literature review is performed on speech-to-text systems available on the Swedish market.
The results indicate that the legibility and readability are affected by several factors, such as ambient illuminance, background properties and also how the text is presented with respect to polarity, opacity, size and number of lines. Moreover, the characteristics of the glasses impact the user experience, but which glasses are preferable depends on the individual’s preferences.
For the choice of a speech-to-text system, four speech-to-text APIs available on the Swedish market were identified. Based on our research, Google Cloud Speech API is recommended for the proposed system. However, a more extensive evaluation of these systems would be required to determine this.