Subword analysis of small vocabulary and large vocabulary ASR for Punjabi language

2020 
Modeling of words into phones should be done quite carefully, as these phones or sound units are used to build the acoustic model. Various techniques have been proposed for modeling the acoustic unit like phone, character, syllable, subword etc. Problem occurs when too many unique subwords/phones are generated in dictionary; it makes the automatic speech recognition process difficult. Various researchers have formulated diverse techniques to deal with it. In this paper, subword based dictionary has been explored for Punjabi language. For large vocabulary, number of subwords generated is quite more than the number permissible for computation. To reduce the number of subwords to be modeled, an algorithm has been proposed to replace least occurring subword with subword having similar sound. Acoustic model has been developed using the small and large vocabulary data. WER and size comparison has been done. Results reveal that large vocabulary models give high recognition rate having only 6% of WER.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    2
    Citations
    NaN
    KQI
    []