An Approach to Generate Topic Similar Document by Seed Extraction-Based SeqGAN Training for Bait Document

2018 
In recent years, topic similar document generation has drawn more and more attention in both academia and industry. Especially, bait document generation is very important for security. For more-like and fast bait document generation, we proposed the topic similar document generation model based on SeqGAN model (TSDG-SeqGAN). In the training phrase, we used jieba word segmentation tool for training text to greatly reduce the training time. In the generation phrase, we extract keywords and key sentence from the subject document as seeds, and then enter the seeds into the trained generation network. Next, we get keyword-based documents and documents based on key sentences from generation network. Finally, we output documents that are most similar to the subject document as the final result. Experiments show the effectiveness of our model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []