A Document Analysis System Based on Text Line Matching of Multiple OCR Outputs

Yasuaki Nakano,Toshihiro Hananoi,Hidetoshi Miyao,Minoru Maruyama,Kenichi Maruyama

A Document Analysis System Based on Text Line Matching of Multiple OCR Outputs

2004

Yasuaki Nakano
Toshihiro Hananoi
Hidetoshi Miyao
Minoru Maruyama
Kenichi Maruyama

It is well known that integration of multiple OCR outputs can give higher performance than a single OCR. This idea was applied to the printed Japanese recognition and better performance was obtained. In the conventional experiments, however, the zoning, i.e. the extraction of the text region, was done manually and this has been a serious problem from the practical point of view. To solve the problem, an approach to match automatically the classified regions outputted by multiple OCRs was proposed. By the proposed method, a high recognition rate of 98.8% was obtained from OCR systems whose performance is no better than 97.6%.

Keywords:

Machine learning
Optical character recognition
Real-time computing
Document Structure Description
Speech recognition
Artificial intelligence
Computer science
Text mining
Pattern recognition
character recognition
document analysis
line matching

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations