SaHAN: Scale-aware hierarchical attention network for scene text recognition
2020
Abstract Scene text recognition has become a research hotspot owing to its abundant semantic information and various applications. Recent methods of scene text recognition usually focus on handling shape distortion, attention drift, or background noise, ignoring that text recognition encounters character scale-variation problem. To address this issue, in this paper, we propose a new scale-aware hierarchical attention network (SaHAN) for scene text recognition. Inspired by feature pyramid network, we exploit the inherent pyramidal structure of a deep convolutional network to retain multi-scale features for flexible receptive fields. Then, we construct a hierarchical attention decoder that performs the attention mechanism twice on multi-scale features to collect the most fine-grained information for prediction. The SaHAN is trained in a weak supervision way, requiring only images and corresponding text labels. Extensive experiments on seven benchmarks reveal that SaHAN achieves state-of-the-art performance.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
44
References
4
Citations
NaN
KQI