The 2003 ISL rich transcription system for conversational telephony speech
2004
This paper describes the ISL large vocabulary conversational telephony speech recognition system, which was tested in NIST's RT-03S ("Switchboard") evaluation. We present our experiments on improving preprocessing, acoustic modelling, and language modelling. The system features phone-dependent semi-tied full covariances, semi-tied clustering of septa-phones, clustering across phones, feature adaptive training, robust estimation of VTLN and MLLR, as well as context-dependent interpolation of language models. We present detailed results for each stage of our multi-pass transcription scheme. System development started with a 1997 SWB system, yielding a word error rate of 35.1% on our internal 1h development set. The final system performed at 21.8%, a 38% relative improvement. The error rate on the RT-03 CTS evaluation set is 23.4%.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
8
References
14
Citations
NaN
KQI