Automatic transcription of the Polish newsreel

2019 
This paper describes an automatic transcription system for the Polish Newsreel, which is a collection of mid to late 20th century news segments presented in audio and video form. They are characterized by their use of archaic language and poor audio quality, which makes them a demanding problem for speech recognition systems. Acoustic and language models had to be retrained using data from in-domain corpora. During the adaptation of the models, experiments were carried out to select optimal adaptation parameters. The experiments showed that the adaptation of the speech recognition system to a narrow and clearly defined domain significantly increases its efficiency. The final word error rate obtained for this domain was 10.97%
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []