Object-based classification for Visual Question Answering
2020
Visual Question Answering (VQA) problem is expected to provide an accurate answer in natural language for a given question about the image in natural language. This paper focuses on the visual question answering problem using deep learning based architecture. The proposed model is based on two popular neural network architectures: Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM). We have used CNN for encoding the given image and word embeddings to encode the questions and the feature extraction. Moreover, we have used LSTM for the question understanding. Finally, the results are compared with the Multi-Layer Perceptron (MLP) and Stacked Attention Network (SAN) models. The effectiveness of the proposed model have also been compared.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
11
References
1
Citations
NaN
KQI