Learning to Write Anywhere with Spatial Transformer Image-to-Motion Encoder-Decoder Networks
2019
Learning to recognize and reproduce handwritten characters is already a challenging task both for humans and robots alike, but learning to do the same thing for characters that can be transformed arbitrarily in space, as humans do when writing on a blackboard for instance, significantly ups the ante from a robot vision and control perspective. In previous work we proposed various different forms of encoder-decoder networks that were capable of mapping raw images of digits to dynamic movement primitives (DMPs) such that a robot could learn to translate the digit images into motion trajectories in order to reproduce them in written form. However, even with the addition of convolutional layers in the image encoder, the extent to which these networks are spatially invariant or equivariant is rather limited. In this paper, we propose a new architecture that incorporates both an image-to-motion encoder-decoder and a spatial transformer in a fully differentiable overall network that learns to rectify affine transformed digits in input images into canonical forms, before converting them into DMPs with accompanying motion trajectories that are finally transformed back to match up with the original digit drawings such that a robot can write them in their original forms. We present experiments with various challenging datasets that demonstrate the superiority of the new architecture compared to our previous work and demonstrate its use with a humanoid robot in a real writing task.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
25
References
1
Citations
NaN
KQI