Combining Learned Script Points and Combinatorial Optimization for Text Line Extraction

Joan Pastor-Pellicer,Angelika Garz,Rolf Ingold,María José Castro Bleda

Combining Learned Script Points and Combinatorial Optimization for Text Line Extraction

2015

Complex layouts, curved text lines, heterogeneous background, noise, and clutter still render text line extraction in the context of historical documents a challenging task where traditional methods do not excel. We propose a novel text line extraction method with two contributions: first, text-specific interest points extracted by supervised machine learning; and second, reformulating the problem of bottom-up text line aggregation as noise-robust combinatorial optimization. In a final step, unsupervised clustering eliminates invalid text lines. Building the method on top of interest points and posing aggregation as global optimization problem, we can detect text lines with arbitrary orientation and curvature, and are robust to noise and clutter. Experimental evaluations on the IAM Saint Gall dataset show promising results.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations