Towards End-to-End Text Spotting in Natural Scenes

2022 
Text spotting in natural scene images is of great importance for many image understanding tasks. It includes two sub-tasks: text detection and recognition. In this work, we propose a unified network that simultaneously localizes and recognizes text with a single forward pass, avoiding intermediate processes such as image cropping and feature re-calculation, word separation, and character grouping. The overall framework is trained end-to-end and is able to spot text of arbitrary shapes. The convolutional features are calculated only once and shared by both the detection and recognition modules. Through multi-task training, the learned features become more discriminative and improve the overall performance. By employing a 2D attention model in word recognition, the issue of text irregularity is robustly addressed. The attention model provides the spatial location for each character, which not only helps local feature extraction in word recognition, but also indicates an orientation angle to refine text localization. Experiments demonstrate that our proposed method can achieve state-of-the-art performance on several commonly used text spotting benchmarks, including both regular and irregular datasets. Extensive ablation experiments are performed to verify the effectiveness of each module design.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    65
    References
    0
    Citations
    NaN
    KQI
    []