Textual restoration of occluded Tibetan document pages based on side-enhanced U-Net

2020 
It is very challenging to recognize the information of occluded Tibetan document pages due to the lack of digitization and their long-term storage. Multiple pages are stuck, and textual characters are occluded with each other, which causes difficulties in restoration. Due to the large size of Tibetan documents, it is impossible to separate and repair these occluded pages by professionals. Therefore, the separation of overlapping pages and restoration of occluded pages play important roles in the digitization of Tibetan documents. We extract underlying pages by show-through scanning and eliminating the text area of top pages. In order to restore the occluded underlying pages, we present a side-enhanced U-Net (SEU-Net) that attaches side feature extraction module and side classification module to the U-Net to improve the classification of textual edges. Experiments performed on the dataset of Tibetan documents restoration patches show that SEU-Net is able to classify the textual pixels in the occluded pages accurately, and both side feature extraction module and side classification module improve performance independently.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    0
    Citations
    NaN
    KQI
    []