DenseBert4Ret: Deep bi-modal for image retrieval

2022 
In this study, we focus on the task of desired image retrieval from an extensive database. This comes under the purview of image retrieval systems. Users may input an image and a text as a query and expect the system to retrieve an image. The retrieved image should be close to the users’ desires expressed in the query. Nowadays, digital media generates more than petabytes of imagery data in one day. Moreover, due to the internet, this massive amount of data is readily available to users. However, extracting the desired images from such a colossal databank is a challenging task. Users always prefer to extract an image that reflects their wishful thinking, and the user may desire to alter their visionary thoughts according to their abstract thoughts. For example, Elizabeth wants to have a laptop that is similar to her friend’s laptop. However, she wants to have the same kind of laptop with a built-in GPU and in silver color. So she expects the e-business platform to show a laptop according to her wish. This paper attempts to devise a multi-modal algorithm for such tasks. It takes care of the user’s visual and textual query. It has a query image and text as input and retrieves an image similar to the input image but modified according to the text query. This study focuses on a multi-modal image retrieval system that processes both image and text as input queries. Users can input an image and a text query to modify the image or add more information on it. The text reflects the desired modifications in the image. We proposed a bi-modal image retrieval system named that learns image and text features concurrently. As the name indicates, DenseNet and BERT models are used for image and text features extractions, respectively. It is based on deep learning techniques used for the joint representation of image and text features. We trained the model, which forces the input image to be modified according to the user’s textual query. We used deep information learning to train and test our model on three challenging real-world datasets, i.e., MIT States, Fshion200K and FashionIQ. We also show that the proposed model outperforms its predecessor with tuned parameters.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []