Paraphrase Generation with Chinese Short Text Dataset

2020 
An obstacle of conducting investigation on paraphrase generation is short of high-quality, publicly-available labeled dataset of sentential paraphrases, which is particularly serious for Chinese paraphrase generation research. Therefore, the study in Chinese paraphrase generation is the starting stage. This paper aimed to use a novel way to create Chinese paraphrase dataset, which contains 8K sentences pairs. The data source comes from a bank QA dataset, in which there are several sentences to express each problem. By calculating the similarity between the same semantic sentences, we can obtain paraphrase pairs to create Chinese paraphrase dataset. Then, we achieve paraphrase generation task by leveraging a classical Seq2Sseq model with attention mechanism. Following previous work and evaluate paraphrase generation result on our Chinese dataset. Experimental results not only show that the dataset is suitable for Chinese paraphrase generation task, but also provides a benchmark for further research on this research area.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []