language-icon Old Web
English
Sign In

Mask Scene Text Recognizer.

2021 
Scene text recognition is a challenging sequence modeling problem. In this paper, a novel mask scene text recognizer (MSTR) is proposed to incorporate a supervised learning task of predicting text image mask into a CNN (convolutional neural network)-Transformer framework for scene text recognition. The incorporated mask predicting branch is connected in parallel with the CNN backbone, and the predicted mask is used as attention weights for the feature maps output by the CNN. We investigate three variants of the incorporated mask predicting branches, i.e. a) mask branch which predicts text foreground image mask; b) boundary branch which predicts boundaries of characters in the input images; c) fused mask and boundary branches with different fusion schemes. Experimental results on seven commonly used scene text recognition datasets show that our method with fused mask and boundary branches has outperformed previous state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    47
    References
    0
    Citations
    NaN
    KQI
    []