A progress report of the Taiwan Mandarin radio speech corpus project

2017 
The Taiwan Mandarin Radio Speech Corpus contains 300 (and growing) hours of high-quality recordings selected from Taiwan's National Education Radio (NER) archive. The corpus features speech (of various speaking styles, produced by hundreds of speakers) and their corresponding transcriptions (automatically transcribed and manually corrected) and annotations, which are suitable for speech and language research. In this paper, we report the progress of the corpus development and especially show the experimental results of audio event detection/segmentation and semi-supervised acoustic model training on this corpus.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    5
    Citations
    NaN
    KQI
    []