Dialog-based Interactive Image Retrieval

2018 
Inspired by the enormous growth of huge online media collections of many types (e.g. images, audio, video, e-books, etc.), and the paucity of intelligent retrieval systems, this paper introduces a novel approach to interactive visual content retrieval. The proposed retrieval framework is guided by free-form natural language feedback from users, allowing for more natural and effective communication. Such a system constitutes a multi-modal dialog protocol where in each dialog turn, a user submits a natural language request to a retrieval agent, which then attempts to retrieve the optimal object. We formulate the retrieval task as a reinforcement learning problem, and reward the dialog system for improving the rank of the target object during each dialog turn. This framework can be applied to a variety of visual media types (images, videos, graphics, etc.), and in this paper, we study in-depth its application on the task of interactive image retrieval. To avoid the cumbersome and costly process of collecting human-machine conversations as the dialog system learns, we train the dialog system with a user simulator, which is itself trained to describe the differences between target and candidate images. The efficacy of our approach is demonstrated in a footwear image retrieval application. Extensive experiments on both simulated and real-world data show that: 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines; and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    67
    Citations
    NaN
    KQI
    []