Generic method for grid line detection and removal in scanned documents

2018 
The detection and extraction of writing grid lines (WGL) in document images is an important task for a wide variety of systems. It is a pre-processing operation that tries to clean up the document image to make the recognition process easier. A lot of work has been proposed for staff line extraction in the context of Optical Music Recognition. Two competitions have been recently proposed in the 2011 and the 2013 ICDAR/GREC conferences. The method proposed in this paper aims to remove WGL without degrading the content. The whole method is based on the estimation of line_space (inter) and line_height and the use of run-length segments to locate WGL points. These points are then grouped together to form larger lines. Missing points are estimated by using a linear model and the context of other adjacent lines. We show that our method does not rely on the writing nature: printed or handwritten nor the language: musical symbols, Latin or Arabic writings. The results obtained are close to the state-of-the-art on not deformed documents. Furthermore, our method performs better than the ones that we have tested (at our disposal) on our image grid datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []