Optical Character Recognition for Audio-Visual Broadcast Transcription System

Josef Chaloupka,Karel Palecek,Petr Cerva,Jindřich Ždánský

Optical Character Recognition for Audio-Visual Broadcast Transcription System

2020

Josef Chaloupka
Karel Palecek
Petr Cerva
Jindřich Ždánský

This paper investigates the use of optical character recognition (OCR) for system of audio-visual broadcast transcription. Characters were recognized from video frames by open-source program OCR Tesseract. The OCR in this program (from version 4) is based on Recurrent Neural Networks (RNN) and it uses text post-processing by bigram language model. However, the resulting recognized text contains a number of errors. In some images, the text is not detected and recognized correctly or it is not detected at all. We have designed and tested image pre-processing and text post-processing methods for OCR error reduction. The word error rate (WER) was reduced from 29,4% to 15,4%.

Keywords:

Bigram
Language model
Tesseract
Optical character recognition
Recurrent neural network
Speech recognition
audio visual
Word error rate
Broadcasting
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations