Computer-assisted transcription of a historical botanical specimen book: organization and process overview

2014 
We describe a protocol designed for computer-assisted transcribing a XVII century botanical specimen book, based on Handwritten Text Recognition (HTR) technology. Here we focus on the organization and coordination aspects of this protocol and outline related technical issues. Using the proposed protocol, full ground truth data has been produced for the first book chapter and high-quality transcripts are being cost-effectively obtained for the rest of the approximately 1000 pages of the book. The process encompasses two main, computer-assisted steps; namely, image layout analysis and transcription. Layout analysis is based on a semi-supervised incremental approach and transcription makes use of an interactive-predictive HTR prototype known as CATTI. Currently, the first step of this procedure has been completed for the full book and the second step is close to be finished. Ultimately, all the data produced will be made publicly available for research and development.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    6
    Citations
    NaN
    KQI
    []