Text Line Detection in Corrupted and Damaged Historical Manuscripts

Irina Rabaev,Ofer Biller,Jihad El-Sana,Klara Kedem,Its'hak Dinstein

Text Line Detection in Corrupted and Damaged Historical Manuscripts

2013

Irina Rabaev
Ofer Biller
Jihad El-Sana
Klara Kedem
Its'hak Dinstein

Most of the algorithms proposed for text line detection are designed to process binary images as input. For severely degraded documents, binarization often introduces significant noise and other artifacts. In this work we present a novel method designed to detect text lines directly in gray scale images. The method consists of two stages. Potential characters are detected in the first stage. This is done by analyzing the evolution maps of connected components obtained by a sliding threshold. The detected potential characters are grouped into text lines in the second stage using sweep-line approach. The suggested method is especially powerful when applied to torn and damaged documents that other algorithms are not able to deal with.

Keywords:

Binary image
Computer vision
Computer science
Artificial intelligence
Object detection
Grayscale
Pattern recognition
Connected component
Text mining
two stages
document image processing

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations