Unsupervised Accent Modeling for Language Identification

2014 
In this paper we propose to cluster iVectors to model different accents within a language. The motivation is that not all the speakers of the same language have the same pronunciation style. This source of variability is not usually considered in state-of-the-art language identification systems, and we show that taking it into account helps. For each language, the iVector space is partitioned according to the similarity of the iVectors, and each cluster is considered a different accent. Then, a simplified probabilistic linear discriminant analysis model is trained with all the accents, and during the test, each utterance is evaluated against all of them. The highest score of each language is selected to make decisions. The experiment was carried out on 6 languages of the 2011 NIST LRE dataset. For the 30 s condition, the relative improvement over the baseline was of 11%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []