Image Captioning Using Inception V3 Transfer Learning Model

2021 
As artificial intelligence has grown rapidly in recent years, picture captioning has attracted the interest of numerous experts, which has become a fascinating and challenging challenge. A critical component of scene analysis which combines machine vision and the natural languages of language processing capabilities is visual subtitles which automatically generate natural language interpretations based on image details. This paper utilizes different NLP strategies for perceiving and clarifying an image meaning in a natural language such as English. The proposed Inception V3 image caption generator model uses CNN (Coevolutionary Neural Networks) and LSTM (Long Short-Term Memory) units. The InceptionV3 model has been educated in 1000 different classes on an ImageNet dataset. The model was imported directly from the Keras module of applications. Remove from the InceptionV3model the last classification layer for the dimension (1343,) vector. The embedded matrix is used for vocabulary connections. A building matrix is the linear transformation of the original space with important relations into a real-life space. Image captions are commonly used and, for example, important in implementing interaction between humans and the computer.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []