Extractive Text-Image Summarization Using Multi-Modal RNN

2018 
Rapid growth of multi-modal documents containing images on the Internet makes multi-modal summarization necessary. Recent advances in neural-based text summarization show the strength of deep learning technique in summarization. This paper proposes a neural-based extractive multi-modal summarization method based on multi-modal RNN. Our method first encodes documents and images with a multi-modal RNN, and then calculates the summary probability of sentences through a logistic classifier using text coverage, text redundancy, and image set coverage as features. We extend the DailyMail corpora by collecting images from the Web. Experiments show our method outperforms the state-of-the-art neural summarization methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    5
    Citations
    NaN
    KQI
    []