Document image decoding in the UC Berkeley Digital Library

1996 
The UC Berkeley Environmental Digital Library Project is one of six university-led projects that were initiated in the fall of 1994 as part of a four-year digital library initiative sponsored by the NSF, NASA, and ARPA. The Berkeley project is particularly interesting from a document image analysis perspective because its testbed collection consists almost entirely of scanned materials. As a result, the Berkeley project is making extensive use of document recognition and other image analysis technology to provide content-based access to the collection. The Document Image Decoding (DID) group at Xerox PARC is a member of the Berkeley team and is investigating the application of DID techniques to providing high-quality (accurate and properly structured) transcriptions of scanned documents in the collection. This paper briefly describes the Berkeley project, discusses some of its recognition requirements and presents examples of online structured documents created using DID technology.© (1996) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    9
    Citations
    NaN
    KQI
    []