A comparison of binarization methods for historical archive documents

2005 
This paper compares several alternative binarization algorithms for historical archive documents, by evaluating their effect on end-to-end word recognition performance in a complete archive document recognition system utilising a commercial OCR engine. The algorithms evaluated are: global thresholding; Niblack's and Sauvola's algorithms; adaptive versions of Niblack's and Sauvola's algorithms; and Niblack's and Sauvola's algorithms applied to background removed images. We found that, for our archive documents, Niblack's algorithm can achieve better performance than Sauvola's (which has been claimed as an evolution of Niblack's algorithm), and that it also achieved better performance than the internal binarization provided as part of the commercial OCR engine.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    99
    Citations
    NaN
    KQI
    []