A progress report of the Taiwan Mandarin radio speech corpus project

Yuan-Fu Liao,Yung hsiang Shawn Chang,Sing-yue Wang,Jhih-wei Chen,Sheng-Ming Wang,Jenq-Haur Wang

A progress report of the Taiwan Mandarin radio speech corpus project

2017

Yuan-Fu Liao
Yung hsiang Shawn Chang
Sing-yue Wang
Jhih-wei Chen
Sheng-Ming Wang
Jenq-Haur Wang

The Taiwan Mandarin Radio Speech Corpus contains 300 (and growing) hours of high-quality recordings selected from Taiwan's National Education Radio (NER) archive. The corpus features speech (of various speaking styles, produced by hundreds of speakers) and their corresponding transcriptions (automatically transcribed and manually corrected) and annotations, which are suitable for speech and language research. In this paper, we report the progress of the corpus development and especially show the experimental results of audio event detection/segmentation and semi-supervised acoustic model training on this corpus.

Keywords:

Speech recognition
Transcription (linguistics)
Acoustic model
Mandarin Chinese
Speech corpus
Computer science
Segmentation
Natural language processing
speaking style
Artificial intelligence
language research
national education

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations