Neural Sign Language Synthesis: Words Are Our Glosses

Jan Zelinka,Jakub Kanis

Neural Sign Language Synthesis: Words Are Our Glosses

2020

Jan Zelinka
Jakub Kanis

This paper deals with a text-to-video sign language synthesis. Instead of direct video production, we focused on skeletal models production. Our main goal in this paper was to design a fully end-to-end automatic sign language synthesis system trained only on available free data (daily TV broadcasting). Thus, we excluded any manual video annotation. Furthermore, our designed approach even do not rely on any video segmentation. A proposed feed-forward transformer and recurrent transformer were investigated. To improve the performance of our sequence-to-sequence transformer, soft non-monotonic attention was employed in our training process. A benefit of character-level features was compared with word-level features. We focused our experiments on a weather forecasting dataset in the Czech Sign Language.

Keywords:

synthesis system
Weather forecasting
Transformer
Speech recognition
Computer vision
Video production
Computer science
Sign language
Czech
Segmentation
Artificial intelligence
Broadcasting

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations