Indexing for a Digital Library of George Washington’s Manuscripts: A Study of Word Matching Techniques

2002 
In a multimedia world, one would like electronic access to all kinds of information. But a lot of important information still only exists on paper and it is a challenge to efficiently access or navigate this information even if it is scanned in. The previously proposed “word spotting” idea is an approach for accessing and navigating a collection of handwritten documents available as images using an index automatically generated by matching words as pictures. The most difficult task in solving this problem is the matching of word images. The quality of the aged documents and the variations in handwriting make this a challenging problem. Here we present a number of word matching techniques along with new normalization methods that are crucial for their success. Efficient pruning techniques, which quickly reduce the set of possible matches for a given word, are also discussed. Our results show that the best of the discussed matching algorithms achieves an average precision of 73% for documents of reasonable quality.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    35
    Citations
    NaN
    KQI
    []