An Improved LDA Multi-document Summarization Model Based on TensorFlow

2017 
Latent Dirichlet Allocation (LDA), has been recently used to automatically generate text corpora topics, and applied to sentences extraction based multi-document summarization algorithms. In this paper, we propose a novel approach to automatic generation of aspect-oriented summaries from multiple documents. Our approach is to combine the traditional summary generation algorithm and the the abstract generation algorithm based on deep learning.We employ the improved traditional summary generation algorithm to convert multiple documents into a single document, and then using the resulting single document with the deep learning method to extract the final summary. At first, we apply improved LDA model to cluster sentences in all documents. Second, We employ the extended LexRank algorithm to sort the sentences in each cluster. Third, we use extended Hedge Trimmer algorithm for sentence compression. Fourth, We apply Integer Linear Programming for sentence selection, and in this step ,we get the single document. Finally, We employ the textum on TensorFlow to get the final abstract. The experiments showed that the proposed algorithm achieved better performance compared the other state-of-the-art algorithms on DUC2005 and TAC2010 corpus.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    3
    Citations
    NaN
    KQI
    []