Business Forms Classification Using Earth Mover's Distance

2014 
Form Classification has not been focused on for the last decade. Unfortunately the algorithms published mainly in the 80s and 90s do not meet the requirements in our present commercial document analysis projects. There we are confronted with conditions and requirements unanticipated by that research, such as fax distortions and - even worse - form variations. In this work we introduce a new color-coded pixel-based form classification method using Earth Mover's Distance (EMD) that is robust against fax distortions and content variations. Experimental results prove the effectiveness of the presented method. It achieved more than 90% classification accuracy on a real-world business forms dataset, which is significantly better than the competing state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    4
    Citations
    NaN
    KQI
    []