Prodorshok I: A bengali isolated speech dataset for voice-based assistive technologies: A comparative analysis of the effects of data augmentation on HMM-GMM and DNN classifiers

Mohi Reza,Warida Rashid,Moin Mostakim

Prodorshok I: A bengali isolated speech dataset for voice-based assistive technologies: A comparative analysis of the effects of data augmentation on HMM-GMM and DNN classifiers

2017

Mohi Reza
Warida Rashid
Moin Mostakim

Prodorshok I is a Bengali isolated word dataset tailored to help create speaker-independent, voice-command driven automated speech recognition (ASR) based assistive technologies to help improve human-computer interaction (HCI). This paper presents the results of an objective analysis that was undertaken using a subset of words from Prodorshok I to assess its reliability in ASR systems that utilize Hidden Markov Models (HMM) with Gaussian emissions and Deep Neural Networks (DNN). The results show that simple data augmentation involving a small pitch shift can make surprisingly tangible improvements to accuracy levels in speech recognition.

Keywords:

Speech recognition
Computer science
Bengali
Gaussian
Artificial neural network
Mel-frequency cepstrum
Pitch shift
Feature extraction
Hidden Markov model
Artificial intelligence
objective analysis
deep neural networks
Machine learning

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations