Background Fine-grained sentiment analysis is used to interpret consumers’ sentiments, from their written comments, towards specific entities on specific aspects. Previous researchers have introduced three main tasks in this field (ABSA, TABSA, MEABSA), covering all kinds of social media data ( e.g., review specific, questions and answers, and community-based). In this paper, we identify and address two common challenges encountered in these three tasks, including the low-resource problem and the sentiment polarity bias. Methods We propose a unified model called PEA by integrating data augmentation methodology with the pre-trained language model, which is suitable for all the ABSA, TABSA and MEABSA tasks. Two data augmentation methods, which are entity replacement and dual noise injection, are introduced to solve both challenges at the same time. An ensemble method is also introduced to incorporate the results of the basic RNN-based and BERT-based models. Results PEA shows significant improvements on all three fine-grained sentiment analysis tasks when compared with state-of-the-art models. It also achieves comparable results with what the baseline models obtain while using only 20% of their training data, which demonstrates its extraordinary performance under extreme low-resource conditions.
Starting from late 2019, the new coronavirus disease (COVID-19) has become a global crisis. With the development of online social media, people prefer to express their opinions and discuss the latest news online. We have witnessed the positive influence of online social media, which helped citizens and governments track the development of this pandemic in time. It is necessary to apply artificial intelligence (AI) techniques to online social media and automatically discover and track public opinions posted online. In this paper, we take Sina Weibo, the most widely used online social media in China, for analysis and experiments. We collect multi-modal microblogs about COVID-19 from 2020/1/1 to 2020/3/31 with a web crawler, including texts and images posted by users. In order to effectively discover what is being discussed about COVID-19 without human labeling, we propose a unified multi-modal framework, including an unsupervised short-text topic model to discover and track bursty topics, and a selfsupervised model to learn image features so that we can retrieve related images about COVID-19. Experimental results have shown the effectiveness and superiority of the proposed models, and also have shown the considerable application prospects for analyzing and tracking public opinions about COVID-19.
Rumor detection is a popular research topic in natural language processing and data mining. Since the outbreak of COVID-19, related rumors have been widely posted and spread on online social media, which have seriously affected people's daily lives, national economy, social stability, etc. It is both theoretically and practically essential to detect and refute COVID-19 rumors fast and effectively. As COVID-19 was an emergent event that was outbreaking drastically, the related rumor instances were very scarce and distinct at its early stage. This makes the detection task a typical few-shot learning problem. However, traditional rumor detection techniques focused on detecting existed events with enough training instances, so that they fail to detect emergent events such as COVID-19. Therefore, developing a new few-shot rumor detection framework has become critical and emergent to prevent outbreaking rumors at early stages.This article focuses on few-shot rumor detection, especially for detecting COVID-19 rumors from Sina Weibo with only a minimal number of labeled instances. We contribute a Sina Weibo COVID-19 rumor dataset for few-shot rumor detection and propose a few-shot learning-based multi-modality fusion model for few-shot rumor detection. A full microblog consists of the source post and corresponding comments, which are considered as two modalities and fused with the meta-learning methods.Experiments of few-shot rumor detection on the collected Weibo dataset and the PHEME public dataset have shown significant improvement and generality of the proposed model.
Exploiting label correlation is crucially important in multi-label learning, where each instance is associated with multiple labels simultaneously. Multi-label learning is more complex than single-label learning for that the labels tend to be correlated. Traditional multi-label learning algorithms learn independent classifiers for each label and employ ranking or threshold on the classification results. Most existing methods take label correlation as prior knowledge, which have worked well, but they failed to make full use of label dependency. As a result, the real relationship among labels may not be correctly characterized and the final prediction is not explicitly correlated. To address these problems, we propose a novel high-order multi-label learning algorithm of Label collAboration based Multi-laBel learning (LAMB). With regard to each label, LAMB utilizes collaboration between its own prediction and the prediction of other labels. Extensive experiments on various datasets demonstrate that our proposed LAMB algorithm achieves superior performance over existing state-of-the-art algorithms. In addition, one real-world dataset of channelrhodopsins chimeras is assessed, which would be of great value as pre-screen for membrane proteins function.
Chinese is a representative East Asian language. Chinese Named Entity Recognition (CNER) aims to recognize various entities. It is significant for other NLP tasks to utilize CNER. Recent research to develop CNER systems has been dedicated to either considering word enhancement or capturing global information to strengthen local composition and alleviate word ambiguity in the meanings of words. However, information on words acquired from external lexicons is often confused, and this has led to incorrect judgments regarding the boundaries of words. Moreover, relevant studies typically use excessively complex models to capture the global semantics of sentences. To solve these two problems, we incorporate a global representation into the procedure of local word enhancement. We propose an intuitive and effective dual-module interactive network that can enhance the boundaries of words and extract the global semantics by using a rethinking mechanism to refine the importance of local composition and global information. The results of experiments on four CNER datasets showed that the proposed model can outperform other baselines in terms of the F1 score.
Background.Rumor detection is a popular research topic in natural language processing and data mining.Since the outbreak of COVID-19, related rumors have been widely posted and spread on online social media, which have seriously affected people's daily lives, national economy, social stability, etc.It is both theoretically and practically essential to detect and refute COVID-19 rumors fast and effectively.As COVID-19 was an emergent event that was outbreaking drastically, the related rumor instances were so scarce and distinct at its early stage.This makes the detection task a typical few-shot learning problem.However, traditional rumor detection techniques focused on detecting existed events with enough training instances, so that they fail to detect emergent events such as COVID-19.Therefore, developing a new few-shot rumor detection framework has become critical and emergent to prevent outbreaking rumors at early stages.Methods.This article focuses on few-shot rumor detection, especially for detecting COVID-19 rumors from Sina Weibo with only a minimal number of labeled instances.We contribute a Sina Weibo COVID-19 rumor dataset for few-shot rumor detection and propose a few-shot learning-based multi-modality fusion model for few-shot rumor detection.A full microblog consists of the source post and corresponding comments, which are considered as two modalities and fused with the meta-learning methods.Results.Experiments of few-shot rumor detection on the collected Weibo dataset and the PHEME public dataset have shown significant improvement and generality of the proposed model.