AColDPS : Robust and Unsupervised Automatic Color Document Processing System

2015 
This paper presents the first fully automatic color analysis system suited for business documents. Our pixel-based approach uses mainly color morphology and does not require any training , manual assistance , prior knowledge or model. We developed a robust color segmentation system adapted for invoices and forms with significant color complexity and dithered background. The system achieves several operations to segment automatically color images , separate text from noise and graphics and provides color information about text color. The contribution of our work is Tree-fold. Firstly , it is the usage of color morphology to simultaneously segment both text and inverted text. Our system processes inverted and non-inverted text automatically using conditional color dilation and erosion , even in cases where there are overlaps between the two. Secondly , it is the extraction of geodesic measures using morphological convolution in order to separate text , noise and graphical elements. Thirdly , we develop a method to disconnect characters touching or overlapping graphical elements. Our system can separate characters that touch straight lines , split overlapped characters with different colors and separate characters from graphics if they have different colors. A color analysis stage automatically calculates the number of character colors. The proposed system is generic enough to process a wide range of images of digitized business documents from different origins. It outperforms the classical approach that uses binarization of greyscale images .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    3
    Citations
    NaN
    KQI
    []