Recent years have seen a surge of knowledge-based question answering (KB-QA) systems which provide crisp answers to user-issued questions by translating them to precise structured queries over a knowledge base (KB). A major challenge in KB-QA is bridging the gap between natural language expressions and the complex schema of the KB. As a result, existing methods focus on simple questions answerable with one main relation path in the KB and struggle with complex questions that require joining multiple relations. We propose a KB-QA system, TextRay, which answers complex questions using a novel decompose-execute-join approach. It constructs complex query patterns using a set of simple queries. It uses a semantic matching model which is able to learn simple queries using implicit supervision from question-answer pairs, thus eliminating the need for complex query patterns. Our proposed system significantly outperforms existing KB-QA systems on complex questions while achieving comparable results on simple questions.
Given a large, time-evolving graph of who-calls-whom-when, how can we help analysts find anomalies and fraudsters? How can we explain our decisions? We provide TgraphSpot, which carefully extracts features that are often related to fraud; and which provides informative, interactive plots that help analysts zoom down to the few strange nodes. We present the architecture and design decisions of TgraphSpot. Thanks to our careful feature-extraction algorithms, it scales linearly, taking 2.5 hours on a stock laptop, to process 29 million phone calls. More importantly, when applied on a real dataset of millions of phone calls, it discovered suspicious nodes; experts confirmed that those nodes are fraudsters that had been undetected so far.
The rational development and utilization of renewable clean energy resources such as hydropower, wind energy, and biomass energy is in line with the trajectory of energy development, and plays a major role in establishing a sustainable energy system and promoting national economic development and environmental protection.The race to net zero will redefine global energy security, with secure, resilient and sustainable clean energy supply chains at the heart of the global energy transition.This paper firstly screens and analyzes the risk factors of clean energy supply chain in the context of China's "dual carbon" goals, and preliminarily constructs a risk assessment index system with reference to the SCOR model.The fuzzy comprehensive evaluation method and entropy value method were used to evaluate the impact of risk factors in China's clean energy supply chain, establish a risk assessment model, and then analyze the risk factors of China's clean energy supply chain.Finally, based on the previous research, some suggestions are put forward for the risk management and optimization of China's energy supply chain.
Frequent pattern mining is a key area of study that gives insights into the structure and dynamics of evolving networks, such as social or road networks. However, not only does a network evolve, but often the way that it evolves, itself evolves. Thus, knowing, in addition to patterns' frequencies, for how long and how regularly they have occurred-i.e., their persistence-can add to our understanding of evolving networks. In this work, we propose the problem of mining activity that persists through time in continually evolving networks-i.e., activity that repeatedly and consistently occurs. We extend the notion of temporal motifs to capture activity among specific nodes, in what we call activity snippets, which are small sequences of edge-updates that reoccur. We propose axioms and properties that a measure of persistence should satisfy, and develop such a persistence measure. We also propose PENminer, an efficient framework for mining activity snippets' Persistence in Evolving Networks, and design both offline and streaming algorithms. We apply PENminer to numerous real, large-scale evolving networks and edge streams, and find activity that is surprisingly regular over a long period of time, but too infrequent to be discovered by aggregate count alone, and bursts of activity exposed by their lack of persistence. Our findings with PENminer include neighborhoods in NYC where taxi traffic persisted through Hurricane Sandy, the opening of new bike-stations, characteristics of social network users, and more. Moreover, we use PENminer towards identifying anomalies in multiple networks, outperforming baselines at identifying subtle anomalies by 9.8-48% in AUC.
Plants generate a range of physiological and molecular responses to sustain their growth and development when suffering heat stress. Avocado is a type of tropical fruit tree with high economic value. Most avocado cultivars delete, wither, or even die when exposed to heat stress for a long time, which seriously restricts the introduction and cultivation of avocados. In this study, samples of a heat-intolerant variety (‘Hass’) were treated under heat stress, and the transcriptomics and metabolomics were analyzed, with the expectation of providing information on the variety improvement and domestication of avocados. The differentially expressed genes identified using transcriptome analysis mainly involved metabolic pathways such as plant hormone signal transduction, plant–pathogen interaction, and protein processing in the endoplasmic reticulum. Combined transcriptome and metabolome analysis indicated that the down-regulation of Hass.g03.10206 and Hass.g03.10205 in heat shock-like proteins may result in the reduced Trehalose and Sinapoyl aldehyde content. Metabolomics analysis results indicated that the decrease in Trehalose and Sinapoyl aldehyde content may be an important factor for heat intolerance. These results provide important clues for understanding the physiological mechanisms of adaptation to heat stress in avocados.
With the spread and development of new epidemics, it is of great reference value to identify the changing trends of epidemics in public emotions. We designed and implemented the COVID-19 public opinion monitoring system based on time series thermal new word mining. A new word structure discovery scheme based on the timing explosion of network topics and a Chinese sentiment analysis method for the COVID-19 public opinion environment is proposed. Establish a "Scrapy-Redis-Bloomfilter" distributed crawler framework to collect data. The system can judge the positive and negative emotions of the reviewer based on the comments, and can also reflect the depth of the seven emotions such as Hopeful, Happy, and Depressed. Finally, we improved the sentiment discriminant model of this system and compared the sentiment discriminant error of COVID-19 related comments with the Jiagu deep learning model. The results show that our model has better generalization ability and smaller discriminant error. We designed a large data visualization screen, which can clearly show the trend of public emotions, the proportion of various emotion categories, keywords, hot topics, etc., and fully and intuitively reflect the development of public opinion.
Documents are often used for knowledge sharing and preservation in business and science, within which are tables that capture most of the critical data. Unfortunately, most documents are stored and distributed as PDF or scanned images, which fail to preserve logical table structure. Recent vision-based deep learning approaches have been proposed to address this gap, but most still cannot achieve state-of-the-art results. We present Global Table Extractor (GTE), a vision-guided systematic framework for joint table detection and cell structured recognition, which could be built on top of any object detection model. With GTE-Table, we invent a new penalty based on the natural cell containment constraint of tables to train our table network aided by cell location predictions. GTE-Cell is a new hierarchical cell detection network that leverages table styles. Further, we design a method to automatically label table and cell structure in existing documents to cheaply create a large corpus of training and test data. We use this to enhance PubTabNet with cell labels and create FinTabNet, real-world and complex scientific and financial datasets with detailed table structure annotations to help train and test structure recognition. Our framework surpasses previous state-of-the-art results on the ICDAR 2013 and ICDAR 2019 table competition in both table detection and cell structure recognition. Further experiments demonstrate a greater than 45% improvement in cell structure recognition when compared to a vanilla RetinaNet object detection model in our new out-of-domain FinTabNet.
To retrieve personalized campaigns and creatives while protecting user privacy, digital advertising is shifting from member-based identity to cohort-based identity. Under such identity regime, an accurate and efficient cohort building algorithm is desired to group users with similar characteristics. In this paper, we propose a scalable K-anonymous cohort building algorithm called consecutive consistent weighted sampling (CCWS). The proposed method combines the spirit of the (p-powered) consistent weighted sampling (CWS) and hierarchical clustering, so that the K-anonymity is ensured by enforcing a lower bound on the size of cohorts. Evaluations on a LinkedIn dataset consisting of >70M users and ads campaigns demonstrate that CCWS achieves substantial improvements over several hashing-based methods including sign random projections (SignRP), minwise hashing (MinHash), as well as the vanilla CWS.