ACROSS: An Alignment-based Framework for Low-Resource Many-to-One Cross-Lingual Summarization
2
Citation
28
Reference
10
Related Paper
Citation Trend
Abstract:
This research addresses the challenges of Cross-Lingual Summarization (CLS) in low-resource scenarios and over imbalanced multilingual data. Existing CLS studies mostly resort to pipeline frameworks or multi-task methods in bilingual settings. However, they ignore the data imbalance in multilingual scenarios and do not utilize the high-resource monolingual summarization data. In this paper, we propose the Aligned CROSs-lingual Summarization (ACROSS) model to tackle these issues. Our framework aligns low-resource cross-lingual data with high-resource monolingual data via contrastive and consistency loss, which help enrich low-resource information for high-quality summaries. In addition, we introduce a data augmentation method that can select informative monolingual sentences, which facilitates a deep exploration of high-resource information and introduce new information for low-resource languages. Experiments on the CrossSum dataset show that ACROSS outperforms baseline models and obtains consistently dominant performance on 45 language pairs.Keywords:
Baseline (sea)
The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of automatically creating a compressed version of a given document that provides useful information to users, and multi-document summarization is to produce a summary delivering the majority of information content from a set of documents about an explicit or implicit main topic. According to the input text, in this paper we use the knowledge base of Wikipedia and the words of the main text to create independent graphs. We will then determine the important of graphs. Then we are specified importance of graph and sentences that have topics with high importance. Finally, we extract sentences with high importance. The experimental results on an open benchmark datasets from DUC01 and DUC02 show that our proposed approach can improve the performance compared to state-of-the-art summarization approaches.
Multi-document summarization
Information Overload
Benchmark (surveying)
Cite
Citations (9)
Text Summarization is one important branch of Natural Language Processing and is very useful in Information Retrieval. This paper presents an approach of automatic text summarization based on Named Entity. This approach assumes that the number of Named Entities in a summary reflects the amount of information in it. By this approach we generate the summaries focused by events which are submitted to and evaluated by Document Understanding Conference 2003. The results of the evaluation show that this approach is effective. This paper also introduces the Document Understanding Conference.
Multi-document summarization
Named entity
Cite
Citations (0)
We report the results of DialogSum Challenge, the shared task on summarizing real-life scenario dialogues at INLG 2022. Four teams participate in this shared task and three submit their system reports, exploring different methods to improve the performance of dialogue summarization. Although there is a great improvement over the baseline models regarding automatic evaluation metrics, such as Rouge scores, we find that there is a salient gap between model generated outputs and human annotated summaries by human evaluation from multiple aspects. These findings demonstrate the difficulty of dialogue summarization and suggest that more fine-grained evaluatuion metrics are in need.
Baseline (sea)
Cite
Citations (1)
Multi-document summarization
On the fly
Cite
Citations (0)
Many applications require generation of summaries tailored to the user’s information needs, i.e., their intent. Methods that express intent via explicit user queries fall short when query interpretation is subjective. Several datasets exist for summarization with objective intents where, for each document and intent (e.g., “weather”), a single summary suffices for all users. No datasets exist, however, for subjective intents (e.g., “interesting places”) where different users will provide different summaries. We present SUBSUME, the first dataset for evaluation of SUBjective SUMmary Extraction systems. SUBSUME contains 2,200 (document, intent, summary) triplets over 48 Wikipedia pages, with ten intents of varying subjectivity, provided by 103 individuals over Mechanical Turk. We demonstrate statistically that the intents in SUBSUME vary systematically in subjectivity. To indicate SUBSUME’s usefulness, we explore a collection of baseline algorithms for subjective extractive summarization and show that (i) as expected, example-based approaches better capture subjective intents than query-based ones, and (ii) there is ample scope for improving upon the baseline algorithms, thereby motivating further research on this challenging problem.
Scope (computer science)
Baseline (sea)
Multi-document summarization
Cite
Citations (0)
Dependency parsing, which is a fundamental task in Natural Language Processing (NLP), has attracted a lot of interest in recent years. In general, it is a module in an NLP pipeline together with word segmentation and Part-Of-Speech (POS) tagging in real Chinese NLP application. The NLP pipeline, which is a cascade system, will lead to error propagation for the parsing. This paper proposes a global discriminative re-ranking model using non-local features from word segmentation, POS tagging and dependency parsing to re-rank the parse trees produced by an N-best enhanced NLP pipeline. Experimental results indicate that the proposed model can improve the performance of dependency parsing as well as word segmentation and POS tagging in an NLP pipeline.
Dependency grammar
Discriminative model
Text segmentation
Cite
Citations (0)
Text segmentation
Identification
Cite
Citations (1)
A multi-document summarization method based on sub topic partition and user's query is described in this paper.The similarity of sentences is measured by a thesaurus dictionary.Sub topics are found by sentence clustering and sorted by user's query.Then sentences from all sub topics are selected by using a dynamic strategy of scoring the sentences.The experiment result indicates that the summarization has less redundancy and more information.
Multi-document summarization
Cite
Citations (0)
Multi-document summarization
Folksonomy
Thesaurus
Cite
Citations (4)
Cross-lingual summarization (CLS) is the task of generating summaries in one specific language for source documents in different languages. Existing methods simply divide the cross-lingual task into two tasks: summarization task and translation task. Pipeline-based methods can cause problems with error propagation. In this paper, a round-trip translation strategy (RTT) is used to construct a Tibetan-Chinese cross-language abstract dataset from an existing monolingual Tibetan abstract dataset, and then an end-to-end framework is used to train the Tibetan-Chinese crosslingual abstract task. Experimental results show that our method achieves significant improvements in performance metrics compared to traditional pipeline methods.
Cite
Citations (0)