PhotoReply: Automatically Suggesting Conversational Responses to Photos

2018 
We introduce the problem of automatically suggesting conversational responses to photos and present an intelligent assistant called PhotoReply that solves the problem in the context of a messaging application. For example, when a user receives a photo showing a dog, PhotoReply suggests responses such as "Aww»» and "Cute terrier!»» This simplifies composing responses on constrained mobile keyboards, and delights users with uncanny insights into the photos they receive. PhotoReply is an integral part of the Allo chat application, being its predictive assistance feature with highest click-through rate. We formalize the problem of suggesting responses to images as an instance of multimodal learning which is akin to caption generation models, a topic that has recently received significant attention \citeKarpathy2015,Vinyals2015,Xu2015. We then present a system that "translates»» image pixels to text responses. The system includes a conditioned language model, based on an LSTM, which, given an embedding of image pixels and the previous predicted words, calculates the probability of all words in a vocabulary of being the next word in the generated response and a triggering model trained with image embeddings and concept labels from a large concept taxonomy. We describe training of the models and a thorough experimental evaluation based on crowdsourced datasets and live traffic.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    6
    Citations
    NaN
    KQI
    []