Tagging a Norwegian Dialect Corpus.

Andre Kåsen,Anders Nøklestad,Kristin Hagen,Joel Priestley

Tagging a Norwegian Dialect Corpus.

2019

Andre Kåsen
Anders Nøklestad
Kristin Hagen
Joel Priestley

This paper describes an evaluation of five data-driven part-of-speech (PoS) taggers for spoken Norwegian. The taggers all rely on different machine learning mechanisms: decision trees, hidden Markov models (HMMs), conditional random fields (CRFs), long-short term memory networks (LSTMs), and convolutional neural networks (CNNs). We go into some of the challenges posed by the task of tagging spoken, as opposed to written, language, and in particular a wide range of dialects as is found in the recordings of the LIA (Language Infrastructure made Accessible) project. The results show that the taggers based on either conditional random fields or neural networks perform much better than the rest, with the LSTM tagger getting the highest score.

Keywords:

Norwegian
History
Linguistics

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations