TransVae:A Novel Variational Sequence-to-Sequence Framework for Semi-supervised Learning and Diversity Improvement
0
Citation
39
Reference
10
Related Paper
Abstract:
Text generation tasks require that the generated text have certain diversity while ensuring the relevance. Traditional Seq2Seq models usually use cross entropy as the objective function. It demands the results keep strictly consistent with the ground truth texts, which easily leads to the lack of variability in generated texts. In this paper, we propose a novel framework, TransVAE, which applies Variational Auto-Encoder (VAE) to improve the Seq2Seq architecture. We design the Translator module to transform the latent variable spaces of origin input to target output, thus enhancing the diversity of generated texts and supporting semi-supervised learning. Moreover, we add attention and copy mechanisms to the TransVAE model to balance the relevance and diversity. Abundant experiments are carried out on three different string transduction tasks: dialogue generation, machine translation, and text summarization. The experiment results verify the effectiveness of our method.Keywords:
Relevance
Sequence (biology)
Text generation
With the development of Internet technology, the phenomenon of text information overload occurs frequently, and automatic text summarization technology has become a research hotspot. However, in real situations, there is no enough data accumulation in many fields, and lack of high-quality labeled summarization data. Therefore, the paper realizes the generation of Chinese text summarization based on LSTM and attention mechanism, by utilizing LSTM to capture semantic features and combining the attention mechanism based on contextual semantics. Experiments show that the text summarization generation model we constructed has an obvious improvement in F1 value compared with other models. It can complete the task of text summarization generation and solve the problem of low-quality text summarization in some fields.
Multi-document summarization
Information Overload
Text generation
Cite
Citations (1)
Text summarization is a process of distilling the most important content from text documents. While human beings have proven to be extremely capable summarizers, computer based automatic abstracting and summarizing has proven to be extremely challenging tasks. In this paper we report our experience with applying extractive summarization techniques to process news articles, economic reports and nursing narratives. We present analysis of the effect of different summarization methods and parameters on the summarization results. We also compare the performance of the summarizers across the three different document genres. The learned lessons are discussed and the possibilities for applying the theory of Computing with Words in text summarization are elaborated.
Multi-document summarization
Cite
Citations (8)
The experience summarization,which undertakes the summarization of the monographic study,serves as an important way for teaching and scientific research.It has such features as cause analysis,applications and practicality.This paper shows that the experience summarization is to determine the subject discussed,write an outline,collect and analyze materials,express in words and correct mistakes.The paper also points out that the experience summarization should be applied to the principle of applications,creativeness and science.
Multi-document summarization
Cite
Citations (0)
In order to produce summaries from dynamic content, we address the definition of the dynamic summarization. In this paper, the issue of modeling of dynamic summarization is discussed, and then two solutions of model improvement with set theory and algorithm improvement with reranking are proposed for dynamic summarization from classic summarization. Finally, the performances of these two solutions are evaluated on the dataset of DUC 2007. Our results demonstrate that the model improvement solution is more effective, but as another stride towards summarization, dynamic summarization research still need further study.
STRIDE
Multi-document summarization
Cite
Citations (0)
Cross-Lingual Summarization (CLS) is a task that extracts important information from a source document and summarizes it into a summary in another language. It is a challenging task that requires a system to understand, summarize, and translate at the same time, making it highly related to Monolingual Summarization (MS) and Machine Translation (MT). In practice, the training resources for Machine Translation are far more than that for cross-lingual and monolingual summarization. Thus incorporating the Machine Translation corpus into CLS would be beneficial for its performance. However, the present work only leverages a simple multi-task framework to bring Machine Translation in, lacking deeper exploration.
CLs upper limits
Multi-document summarization
Cite
Citations (8)
In order to produce summaries from dynamic content, we address the definition of the dynamic summarization. In this paper, the issue of modeling of dynamic summarization is discussed, and then two solutions of model improvement with set theory and algorithm improvement with reranking are proposed for dynamic summarization from classic summarization. Finally, the performances of these two solutions are evaluated on the dataset of DUC 2007. Our results demonstrate that the model improvement solution is more effective, but as another stride towards summarization, dynamic summarization research still need further study.
STRIDE
Multi-document summarization
Cite
Citations (0)
Dropped pronoun (DP) is a common problem in dialogue machine translation, in which pronouns are frequently dropped in the source sentence and thus are missing in its translation. In response to this problem, we propose a novel approach to improve the translation of DPs for dialogue machine translation. Firstly, we build a training data for DP generation, in which the DPs are automatically added according to the alignment information from a parallel corpus. Then we model the DP generation problem as a sequence labelling task, and develop a generation model based on recurrent neural networks and language models. Finally, we apply the DP generator to machine translation task by completing the source sentences with the missing pronouns. Experimental results show that our approach achieves a significant improvement of 1.7 BLEU points by recalling possible DPs in the source sentences.
Text generation
Parallel corpora
Transfer-based machine translation
Cite
Citations (16)
We study correlation of rankings of text summarization systems using evaluation methods with and without human models. We apply our comparison framework to various well-established content-based evaluation measures in text summarization such as coverage, Responsiveness, Pyramids and Rouge studying their associations in various text summarization tasks including generic and focus-based multi-document summarization in English and generic single-document summarization in French and Spanish. The research is carried out using a new content-based evaluation framework called Fresa to compute a variety of divergences among probability distributions.
Multi-document summarization
Cite
Citations (78)
In order to produce summaries from dynamic content, we address the definition of the dynamic summarization. In this paper, the issue of modeling of dynamic summarization is discussed, and then two solutions of model improvement with set theory and algorithm improvement with reranking are proposed for dynamic summarization from classic summarization. Finally, the performances of these two solutions are evaluated on the dataset of DUC 2007. Our results demonstrate that the model improvement solution is more effective, but as another stride towards summarization, dynamic summarization research still need further study.
STRIDE
Multi-document summarization
Cite
Citations (1)
Text generation
Multi-document summarization
Natural Language Generation
Cite
Citations (10)