Unsupervised morphological analysis of minority languages with NPYLM: –Consideration for situations where training data is too small–

2021 
In this study, we propose a method based on NPYLM to support the segmentation of speech information into words in the archive work performed by linguists. Due to limited data and prior linguistic knowledge in minority languages, training data for unsupervised morphological analysis was not large enough to efficiently construct NPYLM. We propose two methods to improve the accuracy of analysis with NPYLM for application to small data. The first is replacing all the words obtained in the previous steps with different symbols, and the second is replacing only the uncommon words based on TF-IDF with other symbols. Our experiments show that both of these two methods worked effectively. Therefore we confirm that unsupervised morphological analysis with NPYLM supports the segmentation of speech information into word units even when the available data size is small.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []