Effects of out of vocabulary words in spoken document retrieval.

2000 
The eects of out-of-vocabulary (OOV) items in spoken document retrieval (SDR) are investigated. Several sets of transcriptions were created for the TREC-8 SDR task using a speech recognition system varying the vocabulary sizes and OOV rates, and the relative retrieval perfor- mance measured. The eects of OOV terms on a simple baseline IR system and on more sophisticated retrieval systems are described. The use of a parallel corpus for query and document expansion is found to be especially beneÞcial, and with this data set, good retrieval perfor- mance can be achieved even for fairly high OOV rates.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    50
    Citations
    NaN
    KQI
    []