A Table Segmentation and Text Information Extraction Method for Power Work Ticket

2021 
Manually extracting the information in the power work ticket is not only time-consuming, but also troublesome. With the maturity of optical character recognition (OCR) technology, it has become possible to automatically extract information from electronic documents. However, the power work ticket contains a lot of tabular data. The complex table structure often makes the recognition result of tabular data chaotic. In this paper, a method for automatically extracting text information in the work ticket is proposed. It contains two parts: work ticket table segmentation and text information extraction. In the table segmentation part, we first extract the table frame lines through a combination of corrosion and expansion operations. Then, the cell image is segmented based on the extraction result of the table frame lines. As for text information extraction, an OCR model based on CRNN is used. In order to obtain effective information, it is necessary to search and match keywords in the recognition results through regular expressions. At the end of this paper, the feasibility of the proposed method is verified by the experiment.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []