AN OVERVIEW OF THE AT&T SPOKEN DOCUMENT RETRIEVAL

John Choi,Don Hindle,Julia Hirschberg,Ivan Magrin-Chagnolleau,Christine H. Nakatani,Fernando Pereira,Amit Singhal,Steve Whittaker,Florham Park,Nj

AN OVERVIEW OF THE AT&T SPOKEN DOCUMENT RETRIEVAL

1998

John Choi
Don Hindle
Julia Hirschberg
Ivan Magrin-Chagnolleau
Christine H. Nakatani
Fernando Pereira
Amit Singhal
Steve Whittaker
Florham Park
Nj

We present an overview of a spoken document retrieval system developed at AT&T Labs-Research for the HUB4 Broadcast News corpus. This overview includes a description of the intonational phrase boundary detection, classification, speech recognition, information retrieval and user interface components of the system, along with updated system assessments based on the 49-query task defined for the TREC-6 SDR track. Results from a comparative ranking study, based on queries taken from AP Newswire headlines from the same time period that the Broadcast News corpus was recorded, are presented. For the AP task, retrieval accuracy is assessed by comparing the documents retrieved from ASR generated transcriptions with those from human generated transcriptions.

Keywords:

Document retrieval
Transcription (linguistics)
Human–computer information retrieval
Speech recognition
Ranking
Information retrieval
User interface
Document clustering
Phrase
Visual Word
Computer science
Artificial intelligence
Natural language processing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations