Combining Learned Script Points and Combinatorial Optimization for Text Line Extraction

2015 
Complex layouts, curved text lines, heterogeneous background, noise, and clutter still render text line extraction in the context of historical documents a challenging task where traditional methods do not excel. We propose a novel text line extraction method with two contributions: first, text-specific interest points extracted by supervised machine learning; and second, reformulating the problem of bottom-up text line aggregation as noise-robust combinatorial optimization. In a final step, unsupervised clustering eliminates invalid text lines. Building the method on top of interest points and posing aggregation as global optimization problem, we can detect text lines with arbitrary orientation and curvature, and are robust to noise and clutter. Experimental evaluations on the IAM Saint Gall dataset show promising results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    5
    Citations
    NaN
    KQI
    []