Deep Multimodal Embedding: Manipulating Novel Objects with Point-clouds, Language and Trajectories

2015 
A robot operating in a real-world environment needs to perform reasoning with a variety of sensing modalities. However, manually designing features that allow a learning algorithm to relate these different modalities can be extremely challenging. In this work, we consider the task of manipulating novel objects and appliances. To this end, we learn to embed point-cloud, natural language, and manipulation trajectory data into a shared embedding space using a deep neural network. In order to learn semantically meaningful spaces throughout our network, we introduce a method for pre-training its lower layers for multimodal feature embedding and a method for fine-tuning this embedding space using a loss-based margin. We test our model on the Robobarista dataset [22], where we achieve significant improvements in both accuracy and inference time over the previous state of the art.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    0
    Citations
    NaN
    KQI
    []