Computer Program to Aid in the Selection and Evaluation of Vocabularies for Word Recognizers
0
Citation
0
Reference
10
Related Paper
Abstract:
The performance of a word recognizer depends upon the vocabulary that it attempts to recognize. A computer program has been written to aid in the selection and evaluation of vocabularies for word recognizers. The program operates by predicting the performance of a given recognizer on a given vocabulary. The words in the vocabulary are treated as strings of phonemes, and it is assumed that the word recognizer being evaluated operates in terms of such strings. The program searches the vocabulary for word pairs that would be confused with each other if the recognizer were unable to make certain discriminations. When a confusable pair is found, it may be added to a list of confusions for later printing, or one word of the pair may be deleted from the vocabulary. The latter mode of operation results in a reduced vocabulary on which the recognizer should perform well. Results are presented for a vocabulary of about 1000 common words.A word has many senses, and each sense can be mapped into many target words. Therefore, to select the appropriate translation with a correct sense, the sense of a source word should be disambiguated before selecting a target word. Based on this observation, we propose a hybrid method for translation selection that combines disambiguation of a source word sense and selection of a target word. Knowledge for translation selection is extracted from a bilingual dictionary and target language corpora. Dividing translation selection into the two sub-problems, we can make knowledge acquisition straight-forward and select more appropriate target words.
Word Sense Disambiguation
SemEval
Cite
Citations (7)
Word list
Cite
Citations (22)
Recent work at Bell Laboratories has shown that statistical clustering techniques could be used to provide a reliable set of reference templates for a speaker-independent isolated-word recognition system. The vocabulary on which the system was tested consisted of the 26 letters of the alphabet, the 10 digits (0 to 9), and 3 command words. Since this vocabulary consisted of a large number of acoustically similar words (e.g., b, c, d, e, g, p, t, v, z), the recognition accuracy on the top candidate was only about 80 percent. In this paper results are presented using a considerably less difficult 54 word vocabulary of computer terms. Recognition accuracies from 95-98 percent were obtained across a wide variety of talkers. These results tend to support the hypothesis that carefully trained speaker-independent word recognizers can perform essentially as well as casually trained speaker-independent systems.
Word error rate
Cite
Citations (43)
Utterance verification is used in variable vocabulary word recognition to reject the word that does not belong to in-vocabulary word or does not belong to correctly recognized word. Utterance verification is very important to design a user-friendly speech recognition system. We propose a new utterance verification algorithm that, with no-training required, is based on the minimum verification error. First, using PBW (Phonetically Balanced Words) DB (445 words), we generate no-training antiphoneme models. Then, for OOV (Out-Of-Vocabulary) rejection, a new confidence measure which uses the likelihood between phoneme model and anti-phoneme model is designed. Using our proposed anti-phoneme model and confidence measure, we achieve significant performance improvement; CA (Correctly Accept for In-Vocabulary) is about 89%, and CR (Correctly Reject for OOV) is about 90%, improving about 15-21% in ERR (Error Reduction Rate), in the vocabulary-independent case.
Utterance
Word error rate
Cite
Citations (2)
Yardstick
British National Corpus
Cite
Citations (16)
Abstract : This work was an initial effort in the use of voice data entry for information data handling. The objective of this effort was to develop the technology for a large vocabulary (1000 word) isolated word recognition system capable of quick adaptation and high accuracy for a limited number of people. Techniques for word boundary detection, noise suppression, and frequency sealing were examined. Tests were conducted on a 1000 word and a 100 word unstructured vocabulary. Recognition accuracies of 30.5% and 66% were obtained for the untrained case and 62.4% and 90% after training each word once. (Author)
Cite
Citations (2)
The modifications made to a connected word speech recognition algorithm based on hidden Markov models (HMMs) which allow it to recognize words from a predefined vocabulary list spoken in an unconstrained fashion are described. The novelty of this approach is that statistical models of both the actual vocabulary word and the extraneous speech and background are created. An HMM-based connected word recognition system is then used to find the best sequence of background, extraneous speech, and vocabulary word models for matching the actual input. Word recognition accuracy of 99.3% on purely isolated speech (i.e., only vocabulary items and background noise were present), and 95.1% when the vocabulary word was embedded in unconstrained extraneous speech, were obtained for the five word vocabulary using the proposed recognition algorithm.< >
Cite
Citations (407)
In practical large vocabulary speech recognition systems, it is nearly impossible for a speaker to remember which words are in the vocabulary. The probability of the speaker using words outside the vocabulary can be quite high. For the case when a speaker uses a new word, current systems will always' recognize other words within the vocabulary in place of the new word, and the speaker wouldn't know what the problem is.In this paper, we describe a preliminary investigation of techniques that automatically detect when the speaker has used a word that is not in the vocabulary. We developed a technique that uses a general model for the acoustics of any word to recognize the existence of new words. Using this general word model, we measure the correct detection of new words versus the false alarm rate.Experiments were run using the DARPA 1000-word Resource Management Database for continuous speech recognition. The recognition system used is the BBN BYBLOS continuous speech recognition system (Chow et al., 1987). The preliminary results indicate a detection rate of 74% with a false alarm rate of 3.4%.
Word error rate
Cite
Citations (9)
The small vocabulary isolated word recognition systems as well as large vocabulary continuous speech recognition systems can be applicable in many areas. For the isolated word recognition systems to be deployable in actual applications, the ability to reject the out-of-vocabulary is required. This paper presents a rejection method which uses the clustered phoneme modeling combined with postprocessing by likelihood ratio scoring. Our baseline speech recognition was based on the whole-word continuous HMM. Six clustered phoneme models were generated using the statistical method, monophone clustering algorithm, from the 45 context independent Korean phoneme models, which were trained using the phonetically balanced Korean speech database. The performance of this method was assessed in terms of the out-of-vocabulary rejection rate and the accuracy on the pre-defined-vocabulary. The performance test for speaker independent isolated words recognition task on the 22 section names shows that this method is superior to the conventional postprocessing method, which is performing rejection according to the likelihood difference between the first and second candidates. Furthermore, these clusters phoneme models do not require retraining for the other isolated word recognition systems with different vocabulary sets.
Word error rate
Cite
Citations (0)
This report presents our 2nd place solution to ECCV 2022 challenge on Out-of-Vocabulary Scene Text Understanding (OOV-ST) : Cropped Word Recognition. This challenge is held in the context of ECCV 2022 workshop on Text in Everything (TiE), which aims to extract out-of-vocabulary words from natural scene images. In the competition, we first pre-train SCATTER on the synthetic datasets, then fine-tune the model on the training set with data augmentations. Meanwhile, two additional models are trained specifically for long and vertical texts. Finally, we combine the output from different models with different layers, different backbones, and different seeds as the final results. Our solution achieves a word accuracy of 59.45\% when considering out-of-vocabulary words only.
Training set
Cite
Citations (0)