Rethinking Text Segmentation: A Novel Dataset and a Text-Specific Refinement Approach
2021
Text segmentation is a prerequisite in many real-world text-related tasks,
e.g., text style transfer, and scene text removal. However, facing the lack of
high-quality datasets and dedicated investigations, this critical prerequisite
has been left as an assumption in many works, and has been largely overlooked
by current research. To bridge this gap, we proposed TextSeg, a large-scale
fine-annotated text dataset with six types of annotations: word- and
character-wise bounding polygons, masks and transcriptions. We also introduce
Text Refinement Network (TexRNet), a novel text segmentation approach that
adapts to the unique properties of text, e.g. non-convex boundary, diverse
texture, etc., which often impose burdens on traditional segmentation models.
In our TexRNet, we propose text specific network designs to address such
challenges, including key features pooling and attention-based similarity
checking. We also introduce trimap and discriminator losses that show
significant improvement on text segmentation. Extensive experiments are carried
out on both our TextSeg dataset and other existing datasets. We demonstrate
that TexRNet consistently improves text segmentation performance by nearly 2%
compared to other state-of-the-art segmentation methods. Our dataset and code
will be made available at
https://github.com/SHI-Labs/Rethinking-Text-Segmentation.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
57
References
3
Citations
NaN
KQI