Automatic detection of new words in a large vocabulary continuous speech recognition system

Ayman Asadi Richard Schwartz John Makhoul

Citation

Reference

Related Paper

Abstract:

In practical large vocabulary speech recognition systems, it is nearly impossible for a speaker to remember which words are in the vocabulary. The probability of the speaker using words outside the vocabulary can be quite high. For the case when a speaker uses a new word, current systems will always' recognize other words within the vocabulary in place of the new word, and the speaker wouldn't know what the problem is.In this paper, we describe a preliminary investigation of techniques that automatically detect when the speaker has used a word that is not in the vocabulary. We developed a technique that uses a general model for the acoustics of any word to recognize the existence of new words. Using this general word model, we measure the correct detection of new words versus the false alarm rate.Experiments were run using the DARPA 1000-word Resource Management Database for continuous speech recognition. The recognition system used is the BBN BYBLOS continuous speech recognition system (Chow et al., 1987). The preliminary results indicate a detection rate of 74% with a false alarm rate of 3.4%.

Keywords:

Word error rate

Topics:

Speech Recognition and Synthesis

Natural Language Processing Techniques

10.3115/1075434.1075477

Cite

PDF

HMM-GMM based Amazigh speech recognition system

International Journal of Signal and Imaging Systems Engineering (2020)

Safâa El Ouahabi Mohamed Atounti Mohamed Bellouki

This study presents conception and realisation of an automatic independent speech recognition system using hidden Markov model (HMM). The system recognises 33 letters in Amazigh language. System is found well performed and can identify the Amazigh spoken letters at 88, 44% recognition rate, which is well acceptable rate of accuracy for speech recognition. The tests were taken based on the HMM and Gaussian mixture distributions. Hidden Markov toolkit (HTK) has been used in implementation and test phases. The word error rate (WER) came initially to 29.41 and reduced to about 11.52% thanks to extensive testing and change of the recognition's parameters.

Word error rate

10.1504/ijsise.2020.113564

Cite

Citations (2)

Acceleration Strategies for Speech Recognition Based on Deep Neural Networks

Applied Mechanics and Materials (2014)

Chao Tian Jia Liu Zhao Peng

The Context-Dependent Deep-Neural-Network HMM, or CD-DNN-HMM, is a powerful acoustic modeling technique for HMM-based speech recognition systems. The CD-DNN-HMM can greatly outperform against the conventional Gaussian-mixture HMMs. Therefore, we build a CD-DNN-HMM LVCSR system by modifying a mature GMM-HMM system. The baseline CD-DNN-HMM system achieve word-error rate of 18.6% that is far better than 24.9% achieved by the GMM-HMM system. However, the speed of the baseline CD-DNN-HMM system becomes a major roadblock for its real-time rate reaches 0.72 on the standard NIST 2000 Hub5 evaluation set. In this paper, we realize several optimization algorithms in our baseline system to accelerate the recognition speed. Testing the optimized system on the same evaluation set, we achieve real-time rate of 0.39, a relative reduction of 45.8%.

NIST

Word error rate

10.4028/www.scientific.net/amm.556-562.5181

Cite

Citations (0)

GMM-UBM Modeling for Speaker Recognition on a Romanian Large Speech Corpora

2018 International Conference on Communications (COMM) (2018)

Alexandru-Lucian Georgescu Horia Cucu

In recent years, non-intrusive biometric identification systems have evolved very quickly. Their main advantage over classical systems, based on usernames and passwords, is the fact that the anatomical features are less exposed to stealing or losing. In this context, this study deals with speaker recognition - one of the most fashionable biometric technologies. It presents speaker verification and speaker identification experiments carried out on Romanian speech, using the GMM-UBM framework in different configurations. The experiments are performed on a corpus about 5 times larger than other previous attempts for Romanian language. It comprises connected digits uttered in Romanian by over 120 speakers. The results show that the false rejection rate is close to 0% and false acceptance rate is around 1.50%. For speaker identification, the results are relatively good for the closed-set scenario (identification error < 1%), but not still acceptable for the open-set scenario. The error rates were computed in different configurations, error trends being obtained depending on the variation of some main system parameters.

Word error rate

Romanian

Identification

Speaker identification

Speaker diarisation

10.1109/iccomm.2018.8453633

Cite

Citations (2)

An Improved Speaker Recognition by Using VQ & HMM

Atharva Ankush Malode Shashikant L. Sahare

The domain area of this topic is biometric. Speaker Recognition is biometric system. This paper deals with speaker recognition by HMM (Hidden Markov Model) method. The system is able to recognize the speaker by translating the speech waveform into a set of feature vectors using MelFrequency Cepstral Coefficients (MFCC) technique. But, input speech signals at different time may contain variations. Same speaker may utter the same word at different speed which gives us variation in the total number of MFCC coefficients. Vector Quantization (VQ) is used to make the same number of MFCC coefficients. Hidden Markov Model (HMM) provides a highly reliable way of recognizing a speaker. Hidden Markov Models have been widely used, which are usually considered as a set of states with Markovian properties and observations generated independently of those states. With the help of Viterbi decoding most likely state sequence is obtained. This state sequence is used for speaker recognition. For a database of size 50 in normal environment, obtained result is 98% which is better than previous methods used for speaker recognition.

Mel-frequency cepstrum

Viterbi algorithm

Viterbi decoder

Speaker diarisation

Sequence labeling

Feature (linguistics)

Feature vector

10.1049/cp.2012.2242

Cite

Citations (5)

Improved HMM training and scoring strategies with application to accent classification

Levent M. Arslan John H. L. Hansen

We propose two methods to improve HMM speech recognition performance. The first method employs an adjustment in the training stage, whereas the second method employs it in the scoring stage. It is well known that a speech recognition system performance increases when the amount of labeled training data is large. However, due to factors such as inaccurate phonetic labeling, end-point detection, and voiced-unvoiced decisions, the labeling procedure can be prone to errors. We propose a selective hidden Markov model (HMM) training procedure in order to reduce the adverse influence of atypical training data on the generated models. To demonstrate its usefulness, selective training is applied to the problem of accent classification, resulting in a 9.4% improvement in classification error rate. The second goal is to improve HMM scoring performance. The objective of HMM training algorithms is to maximize the probability over the training tokens for each model. However, this does not guarantee a minimized error rate across the entire model set. Typically, biases in the confusion matrices can be observed. We propose a method for estimating the bias from input training data, and incorporating it into the general scoring algorithm. Using this technique, a 9.8% improvement is achieved in accent classification error rate.

Word error rate

Stress

Training set

10.1109/icassp.1996.543189

Cite

Citations (6)

Comparing computation in Gaussian mixture and neural network based large-vocabulary speech recognition

Interspeech 2022 (2013)

Vishwa Gupta Gilles Boulianne

In this paper we look at real-time computing issues in large vocabulary speech recognition. We use the French broadcast audio transcription task from ETAPE 2011 for this evaluation. We compare word error rate (WER) versus overall computing time for hidden Markov models with Gaussian mixtures (GMM-HMM) and deep neural networks (DNN-HMM). We show that for a similar computing during recognition, the DNNHMM combination is superior to the GMM-HMM. For a realtime computing scenario, the error rate for the ETAPE dev set is 23.5% for DNN-HMM versus 27.9% for the GMM-HMM: a significant difference in accuracy for comparable computing. Rescoring lattices (generated by DNN-HMM acoustic model) with a quadgram language model (LM), and then with a neural net LM reduces the WER to 22.0% while still providing realtime computing.

Word error rate

10.21437/interspeech.2013-180

Cite

Citations (2)

Large vocabulary continuous speech recognition with context-dependent DBN-HMMS

George E. Dahl Dong Yu Li Deng Alex Acero

The context-independent deep belief network (DBN) hidden Markov model (HMM) hybrid architecture has recently achieved promising results for phone recognition. In this work, we propose a context-dependent DBN-HMM system that dramatically outperforms strong Gaussian mixture model (GMM)-HMM baselines on a challenging, large vocabulary, spontaneous speech recognition dataset from the Bing mobile voice search task. Our system achieves absolute sentence accuracy improvements of 5.8% and 9.2% over GMM-HMMs trained using the minimum phone error rate (MPE) and maximum likelihood (ML) criteria, respectively, which translate to relative error reductions of 16.0% and 23.2%.

Word error rate

10.1109/icassp.2011.5947401

Cite

Citations (184)

HMM-GMM based Amazigh speech recognition system

International Journal of Signal and Imaging Systems Engineering (2020)

Mohamed Atounti Safâa El Ouahabi Mohamed Bellouki

Word error rate

10.1504/ijsise.2020.10036146

Cite

Citations (1)

Recognition of handwritten word: First and second order hidden Markov model based approach

Pattern Recognition (1989)

Amlan Kundu Yang He Paramvir Bahl

Viterbi algorithm

Sequence (biology)

Sequence labeling

10.1016/0031-3203(89)90076-9

Cite

Citations (155)

On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition

Xuedong Huang K.F. Lee

The DARPA Resource Management task is used as the domain to investigate the performance of speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. The authors already have a state-of-the-art speaker-independent speech recognition system, SPHINX. The error rate for RM2 test set is 4.3%. They extended SPHINX to speaker-dependent speech recognition. The error rate is reduced to 1.4-2.6% with 600-2400 training sentences for each speaker, which demonstrated a substantial difference between speaker-dependent and -independent systems. Based on speaker-independent models, a study was made of speaker-adaptive speech recognition. With 40 adaptation sentences for each speaker, the error rate can be reduced from 4.3% to 3.1%.< >

Sphinx

Word error rate

Speaker diarisation

10.1109/icassp.1991.150478

Cite

Citations (17)