SemiText: Scene Text Detection with Semi-supervised Learning

2020 
Abstract Scene text detection is an important step of scene text recognition and has achieved significant progress. However, the requirement of large amounts of annotated training data, which is used for training text detection model, has become a great challenge for existing methods. In this paper, we propose a semi-supervised scene text detection framework (SemiText), which trains robust and accurate scene text detectors using a pre-trained supervised model and the unannotated data. With a pre-trained model that is pre-trained on the fully annotated synthetic dataset, i.e., SynthText, we investigate the inductive and transductive semi-supervised learning on the unannotated dataset respectively. For inductive learning, the pre-trained model is applied to the unannotated training dataset to search for more training examples, which are further combined with SynthText to fine-tune the pre-trained model and achieve a superior detection model. For transductive learning, the unannotated training dataset is replaced with the unannotated test dataset. Meanwhile, for the aim of real-world applications, we adopt Mask R-CNN to detect text with arbitrary shapes and exploit context information to suppress false positives. Extensive experiments on different datasets show that the performance of our text detection method can be clearly improved under both inductive and transductive semi-supervision. Additionally, we also achieve state-of-the-art performance under full supervision.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    11
    Citations
    NaN
    KQI
    []