Automatic detection of new words in a large vocabulary continuous speech recognition system
9
Citation
1
Reference
10
Related Paper
Abstract:
In practical large vocabulary speech recognition systems, it is nearly impossible for a speaker to remember which words are in the vocabulary. The probability of the speaker using words outside the vocabulary can be quite high. For the case when a speaker uses a new word, current systems will always' recognize other words within the vocabulary in place of the new word, and the speaker wouldn't know what the problem is.In this paper, we describe a preliminary investigation of techniques that automatically detect when the speaker has used a word that is not in the vocabulary. We developed a technique that uses a general model for the acoustics of any word to recognize the existence of new words. Using this general word model, we measure the correct detection of new words versus the false alarm rate.Experiments were run using the DARPA 1000-word Resource Management Database for continuous speech recognition. The recognition system used is the BBN BYBLOS continuous speech recognition system (Chow et al., 1987). The preliminary results indicate a detection rate of 74% with a false alarm rate of 3.4%.Keywords:
Word error rate
This study presents conception and realisation of an automatic independent speech recognition system using hidden Markov model (HMM). The system recognises 33 letters in Amazigh language. System is found well performed and can identify the Amazigh spoken letters at 88, 44% recognition rate, which is well acceptable rate of accuracy for speech recognition. The tests were taken based on the HMM and Gaussian mixture distributions. Hidden Markov toolkit (HTK) has been used in implementation and test phases. The word error rate (WER) came initially to 29.41 and reduced to about 11.52% thanks to extensive testing and change of the recognition's parameters.
Word error rate
Cite
Citations (2)
The Context-Dependent Deep-Neural-Network HMM, or CD-DNN-HMM, is a powerful acoustic modeling technique for HMM-based speech recognition systems. The CD-DNN-HMM can greatly outperform against the conventional Gaussian-mixture HMMs. Therefore, we build a CD-DNN-HMM LVCSR system by modifying a mature GMM-HMM system. The baseline CD-DNN-HMM system achieve word-error rate of 18.6% that is far better than 24.9% achieved by the GMM-HMM system. However, the speed of the baseline CD-DNN-HMM system becomes a major roadblock for its real-time rate reaches 0.72 on the standard NIST 2000 Hub5 evaluation set. In this paper, we realize several optimization algorithms in our baseline system to accelerate the recognition speed. Testing the optimized system on the same evaluation set, we achieve real-time rate of 0.39, a relative reduction of 45.8%.
NIST
Word error rate
Cite
Citations (0)
In recent years, non-intrusive biometric identification systems have evolved very quickly. Their main advantage over classical systems, based on usernames and passwords, is the fact that the anatomical features are less exposed to stealing or losing. In this context, this study deals with speaker recognition - one of the most fashionable biometric technologies. It presents speaker verification and speaker identification experiments carried out on Romanian speech, using the GMM-UBM framework in different configurations. The experiments are performed on a corpus about 5 times larger than other previous attempts for Romanian language. It comprises connected digits uttered in Romanian by over 120 speakers. The results show that the false rejection rate is close to 0% and false acceptance rate is around 1.50%. For speaker identification, the results are relatively good for the closed-set scenario (identification error < 1%), but not still acceptable for the open-set scenario. The error rates were computed in different configurations, error trends being obtained depending on the variation of some main system parameters.
Word error rate
Romanian
Identification
Speaker identification
Speaker diarisation
Cite
Citations (2)
The domain area of this topic is biometric. Speaker Recognition is biometric system. This paper deals with speaker recognition by HMM (Hidden Markov Model) method. The system is able to recognize the speaker by translating the speech waveform into a set of feature vectors using MelFrequency Cepstral Coefficients (MFCC) technique. But, input speech signals at different time may contain variations. Same speaker may utter the same word at different speed which gives us variation in the total number of MFCC coefficients. Vector Quantization (VQ) is used to make the same number of MFCC coefficients. Hidden Markov Model (HMM) provides a highly reliable way of recognizing a speaker. Hidden Markov Models have been widely used, which are usually considered as a set of states with Markovian properties and observations generated independently of those states. With the help of Viterbi decoding most likely state sequence is obtained. This state sequence is used for speaker recognition. For a database of size 50 in normal environment, obtained result is 98% which is better than previous methods used for speaker recognition.
Mel-frequency cepstrum
Viterbi algorithm
Viterbi decoder
Speaker diarisation
Sequence labeling
Feature (linguistics)
Feature vector
Cite
Citations (5)
We propose two methods to improve HMM speech recognition performance. The first method employs an adjustment in the training stage, whereas the second method employs it in the scoring stage. It is well known that a speech recognition system performance increases when the amount of labeled training data is large. However, due to factors such as inaccurate phonetic labeling, end-point detection, and voiced-unvoiced decisions, the labeling procedure can be prone to errors. We propose a selective hidden Markov model (HMM) training procedure in order to reduce the adverse influence of atypical training data on the generated models. To demonstrate its usefulness, selective training is applied to the problem of accent classification, resulting in a 9.4% improvement in classification error rate. The second goal is to improve HMM scoring performance. The objective of HMM training algorithms is to maximize the probability over the training tokens for each model. However, this does not guarantee a minimized error rate across the entire model set. Typically, biases in the confusion matrices can be observed. We propose a method for estimating the bias from input training data, and incorporating it into the general scoring algorithm. Using this technique, a 9.8% improvement is achieved in accent classification error rate.
Word error rate
Stress
Training set
Cite
Citations (6)
In this paper we look at real-time computing issues in large vocabulary speech recognition. We use the French broadcast audio transcription task from ETAPE 2011 for this evaluation. We compare word error rate (WER) versus overall computing time for hidden Markov models with Gaussian mixtures (GMM-HMM) and deep neural networks (DNN-HMM). We show that for a similar computing during recognition, the DNNHMM combination is superior to the GMM-HMM. For a realtime computing scenario, the error rate for the ETAPE dev set is 23.5% for DNN-HMM versus 27.9% for the GMM-HMM: a significant difference in accuracy for comparable computing. Rescoring lattices (generated by DNN-HMM acoustic model) with a quadgram language model (LM), and then with a neural net LM reduces the WER to 22.0% while still providing realtime computing.
Word error rate
Cite
Citations (2)
The context-independent deep belief network (DBN) hidden Markov model (HMM) hybrid architecture has recently achieved promising results for phone recognition. In this work, we propose a context-dependent DBN-HMM system that dramatically outperforms strong Gaussian mixture model (GMM)-HMM baselines on a challenging, large vocabulary, spontaneous speech recognition dataset from the Bing mobile voice search task. Our system achieves absolute sentence accuracy improvements of 5.8% and 9.2% over GMM-HMMs trained using the minimum phone error rate (MPE) and maximum likelihood (ML) criteria, respectively, which translate to relative error reductions of 16.0% and 23.2%.
Word error rate
Cite
Citations (184)
This study presents conception and realisation of an automatic independent speech recognition system using hidden Markov model (HMM). The system recognises 33 letters in Amazigh language. System is found well performed and can identify the Amazigh spoken letters at 88, 44% recognition rate, which is well acceptable rate of accuracy for speech recognition. The tests were taken based on the HMM and Gaussian mixture distributions. Hidden Markov toolkit (HTK) has been used in implementation and test phases. The word error rate (WER) came initially to 29.41 and reduced to about 11.52% thanks to extensive testing and change of the recognition's parameters.
Word error rate
Cite
Citations (1)
Viterbi algorithm
Sequence (biology)
Sequence labeling
Cite
Citations (155)
The DARPA Resource Management task is used as the domain to investigate the performance of speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. The authors already have a state-of-the-art speaker-independent speech recognition system, SPHINX. The error rate for RM2 test set is 4.3%. They extended SPHINX to speaker-dependent speech recognition. The error rate is reduced to 1.4-2.6% with 600-2400 training sentences for each speaker, which demonstrated a substantial difference between speaker-dependent and -independent systems. Based on speaker-independent models, a study was made of speaker-adaptive speech recognition. With 40 adaptation sentences for each speaker, the error rate can be reduced from 4.3% to 3.1%.< >
Sphinx
Word error rate
Speaker diarisation
Cite
Citations (17)