Basic research on a handwritten note image recognition system that combines two OCRs

2021 
Abstract In this paper, we propose an OCR system that recognizes the contents of manual writing in handouts. Since the target handouts contain a mixture of Japanese and mathematical formulas, we thought that one type of OCR systems alone would not provide sufficient accuracy. Therefore, in this proposed system, we will consider combining two types of OCR systems. The OCRs to be combined are Tesseract and Mathpix, which are good at recognizing Japanese and mathematical formulas, respectively. When the two OCRs are combined, the final plausible result must be selected, but we wondered if the OCR score could be used as an indicator of selection. The OCR score is a self-evaluation value that is output together with the recognition result. However, there is no guarantee that the OCR score and the actual recognition result show the same tendency. Therefore, in this paper, after classifying the written contents into three categories, words, sentences, and mathematical formulas, we verified whether the recognition results of each OCR can be appropriately selected based on the OCR score. At this point, the result is that further improvement is needed to make the selection based on the OCR score alone. However, if it becomes possible to classify handwritten contents with the proposal system by additional learning etc., it will be possible to analyze the handwritten contents efficiently, and it is expected that it will be possible to improve the handouts and provide feedback to the lesson plan.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []