A comparative study of hidden Markov model and conditional random fields on a Yorùba part-of-speech tagging task

2017 
Parts-of-speech tagging, the predictive sequential labeling of words in a sentence, given a context, is a challenging problem both because of ambiguity and the infinite nature of natural language vocabulary. Unlike English and most European languages, Yoruba language has no publicly available part-of-speech tagging tool. In this paper, we present the achievements of variants of a bigram hidden Markov model (HMM) as compared to the achievement of a linear-chain conditional random fields (CRF) on a Yoruba part-of-speech tagging task. We have investigated the likely improvements due to using smoothing techniques and morphological affixes on the HMM-based models. For the CRF model, we defined feature functions to capture similar contexts available to the HMM-based models. Both kinds of models were trained and evaluated on the same data set. Experimental results show that the performance of the two kinds of models are encouraging with the CRF model being able to recognize more out-of-vocabulary (OOV) words than the best HMM model by a margin of 3.05 %. While the overall accuracy of the best HMM-based model is 83.62 %, that of CRF is 84.66 %. Although CRF model gives marginal superior performance, both HMM and CRF modeling approaches are clearly promising, given their OOV words recognition rates.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    4
    Citations
    NaN
    KQI
    []