Xinyi Zheng

Hainan University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Learning to Answer Complex Questions over Knowledge Bases with Query Composition

Nikita Bhutani Xinyi Zheng H. V. Jagadish

Recent years have seen a surge of knowledge-based question answering (KB-QA) systems which provide crisp answers to user-issued questions by translating them to precise structured queries over a knowledge base (KB). A major challenge in KB-QA is bridging the gap between natural language expressions and the complex schema of the KB. As a result, existing methods focus on simple questions answerable with one main relation path in the KB and struggle with complex questions that require joining multiple relations. We propose a KB-QA system, TextRay, which answers complex questions using a novel decompose-execute-join approach. It constructs complex query patterns using a set of simple queries. It uses a semantic matching model which is able to learn simple queries using implicit supervision from question-answer pairs, thus eliminating the need for complex query patterns. Our proposed system significantly outperforms existing KB-QA systems on complex questions while achieving comparable results on simple questions.

Schema (genetic algorithms)

10.1145/3357384.3358033

Cite

Citations (64)

TgraphSpot: Fast and Effective Anomaly Detection for Time-Evolving Graphs

2021 IEEE International Conference on Big Data (Big Data) (2022)

Mirela T. Cazzolato Saranya Vijayakumar Xinyi Zheng Namyong Park Meng-Chieh Lee

Given a large, time-evolving graph of who-calls-whom-when, how can we help analysts find anomalies and fraudsters? How can we explain our decisions? We provide TgraphSpot, which carefully extracts features that are often related to fraud; and which provides informative, interactive plots that help analysts zoom down to the few strange nodes. We present the architecture and design decisions of TgraphSpot. Thanks to our careful feature-extraction algorithms, it scales linearly, taking 2.5 hours on a stock laptop, to process 29 million phone calls. More importantly, when applied on a real dataset of millions of phone calls, it discovered suspicious nodes; experts confirmed that those nodes are fraudsters that had been undetected so far.

Laptop

10.1109/bigdata55660.2022.10020898

Cite

Citations (1)

Nitrogen deposition magnifies destabilizing effects of plant functional group loss

The Science of The Total Environment (2022)

Wenjin Li Shan Luo Junfeng Wang Xinyi Zheng Zhou Xi

Asynchrony (computer programming)

Extinction (optical mineralogy)

10.1016/j.scitotenv.2022.155419

Cite

Citations (2)

Risk Identification and Optimization of Clean Energy Supply Chain under the "Dual Carbon" Goal

Frontiers in Sustainable Development (2024)

Xinyi Zheng Jinsheng Zhu Lin Li

The rational development and utilization of renewable clean energy resources such as hydropower, wind energy, and biomass energy is in line with the trajectory of energy development, and plays a major role in establishing a sustainable energy system and promoting national economic development and environmental protection.The race to net zero will redefine global energy security, with secure, resilient and sustainable clean energy supply chains at the heart of the global energy transition.This paper firstly screens and analyzes the risk factors of clean energy supply chain in the context of China's "dual carbon" goals, and preliminarily constructs a risk assessment index system with reference to the SCOR model.The fuzzy comprehensive evaluation method and entropy value method were used to evaluate the impact of risk factors in China's clean energy supply chain, establish a risk assessment model, and then analyze the risk factors of China's clean energy supply chain.Finally, based on the previous research, some suggestions are put forward for the risk management and optimization of China's energy supply chain.

Identification

10.54691/rms4er41

Cite

Citations (0)

Mining Persistent Activity in Continually Evolving Networks

Caleb Belth Xinyi Zheng Danai Koutra

Frequent pattern mining is a key area of study that gives insights into the structure and dynamics of evolving networks, such as social or road networks. However, not only does a network evolve, but often the way that it evolves, itself evolves. Thus, knowing, in addition to patterns' frequencies, for how long and how regularly they have occurred-i.e., their persistence-can add to our understanding of evolving networks. In this work, we propose the problem of mining activity that persists through time in continually evolving networks-i.e., activity that repeatedly and consistently occurs. We extend the notion of temporal motifs to capture activity among specific nodes, in what we call activity snippets, which are small sequences of edge-updates that reoccur. We propose axioms and properties that a measure of persistence should satisfy, and develop such a persistence measure. We also propose PENminer, an efficient framework for mining activity snippets' Persistence in Evolving Networks, and design both offline and streaming algorithms. We apply PENminer to numerous real, large-scale evolving networks and edge streams, and find activity that is surprisingly regular over a long period of time, but too infrequent to be discovered by aggregate count alone, and bursts of activity exposed by their lack of persistence. Our findings with PENminer include neighborhoods in NYC where taxi traffic persisted through Hurricane Sandy, the opening of new bike-stations, characteristics of social network users, and more. Moreover, we use PENminer towards identifying anomalies in multiple networks, outperforming baselines at identifying subtle anomalies by 9.8-48% in AUC.

Persistence (discontinuity)

Evolving networks

10.1145/3394486.3403136

Cite

Citations (22)

One-pot fabrication of poly (ionic liquid)s functionalized magnetic adsorbent for efficient enrichment of phenylurea herbicides in environmental waters

Analytica Chimica Acta (2022)

Kaisheng Hong Youfang Huang Lingxin Zheng Xinyi Zheng Xiaojia Huang

Azobisisobutyronitrile

Ethylene glycol dimethacrylate

Solid phase extraction

Linear range

10.1016/j.aca.2022.339549

Cite

Citations (19)

Combined Analysis of Transcriptome and Metabolome Provides Insights in Response Mechanism under Heat Stress in Avocado (Persea americana Mill.)

International Journal of Molecular Sciences (2024)

Xinyi Zheng Qing Zhu Yi Liu Junxiang Chen Lingxia Wang

Plants generate a range of physiological and molecular responses to sustain their growth and development when suffering heat stress. Avocado is a type of tropical fruit tree with high economic value. Most avocado cultivars delete, wither, or even die when exposed to heat stress for a long time, which seriously restricts the introduction and cultivation of avocados. In this study, samples of a heat-intolerant variety (‘Hass’) were treated under heat stress, and the transcriptomics and metabolomics were analyzed, with the expectation of providing information on the variety improvement and domestication of avocados. The differentially expressed genes identified using transcriptome analysis mainly involved metabolic pathways such as plant hormone signal transduction, plant–pathogen interaction, and protein processing in the endoplasmic reticulum. Combined transcriptome and metabolome analysis indicated that the down-regulation of Hass.g03.10206 and Hass.g03.10205 in heat shock-like proteins may result in the reduced Trehalose and Sinapoyl aldehyde content. Metabolomics analysis results indicated that the decrease in Trehalose and Sinapoyl aldehyde content may be an important factor for heat intolerance. These results provide important clues for understanding the physiological mechanisms of adaptation to heat stress in avocados.

Persea

Metabolome

Proteome

Plant Physiology

10.3390/ijms251910312

Cite

Citations (0)

COVID-19 Public Opinion and Emotion Monitoring System Based on Time Series Thermal New Word Mining

arXiv (Cornell University) (2020)

Yixian Zhang Jieren Chen Boyi Liu Yifan Yang Haocheng Li

With the spread and development of new epidemics, it is of great reference value to identify the changing trends of epidemics in public emotions. We designed and implemented the COVID-19 public opinion monitoring system based on time series thermal new word mining. A new word structure discovery scheme based on the timing explosion of network topics and a Chinese sentiment analysis method for the COVID-19 public opinion environment is proposed. Establish a "Scrapy-Redis-Bloomfilter" distributed crawler framework to collect data. The system can judge the positive and negative emotions of the reviewer based on the comments, and can also reflect the depth of the seven emotions such as Hopeful, Happy, and Depressed. Finally, we improved the sentiment discriminant model of this system and compared the sentiment discriminant error of COVID-19 related comments with the Jiagu deep learning model. The results show that our model has better generalization ability and smaller discriminant error. We designed a large data visualization screen, which can clearly show the trend of public emotions, the proportion of various emotion categories, keywords, hot topics, etc., and fully and intuitively reflect the development of public opinion.

Sentiment Analysis

Web crawler

10.48550/arxiv.2005.11458

Cite

Citations (2)

Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition Using Visual Context

Xinyi Zheng Douglas Burdick Lucian Popa Zhong Xu Nancy Xin Ru Wang

Documents are often used for knowledge sharing and preservation in business and science, within which are tables that capture most of the critical data. Unfortunately, most documents are stored and distributed as PDF or scanned images, which fail to preserve logical table structure. Recent vision-based deep learning approaches have been proposed to address this gap, but most still cannot achieve state-of-the-art results. We present Global Table Extractor (GTE), a vision-guided systematic framework for joint table detection and cell structured recognition, which could be built on top of any object detection model. With GTE-Table, we invent a new penalty based on the natural cell containment constraint of tables to train our table network aided by cell location predictions. GTE-Cell is a new hierarchical cell detection network that leverages table styles. Further, we design a method to automatically label table and cell structure in existing documents to cheaply create a large corpus of training and test data. We use this to enhance PubTabNet with cell labels and create FinTabNet, real-world and complex scientific and financial datasets with detailed table structure annotations to help train and test structure recognition. Our framework surpasses previous state-of-the-art results on the ICDAR 2013 and ICDAR 2019 table competition in both table detection and cell structure recognition. Further experiments demonstrate a greater than 45% improvement in cell structure recognition when compared to a vanilla RetinaNet object detection model in our new out-of-domain FinTabNet.

Table (database)

10.1109/wacv48630.2021.00074

Cite

Citations (119)

Building K-Anonymous User Cohorts with Consecutive Consistent Weighted Sampling (CCWS)

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2023)

Xinyi Zheng Weijie Zhao X.R. Li Ping Li

To retrieve personalized campaigns and creatives while protecting user privacy, digital advertising is shifting from member-based identity to cohort-based identity. Under such identity regime, an accurate and efficient cohort building algorithm is desired to group users with similar characteristics. In this paper, we propose a scalable K-anonymous cohort building algorithm called consecutive consistent weighted sampling (CCWS). The proposed method combines the spirit of the (p-powered) consistent weighted sampling (CWS) and hierarchical clustering, so that the K-anonymity is ensured by enforcing a lower bound on the size of cohorts. Evaluations on a LinkedIn dataset consisting of >70M users and ads campaigns demonstrate that CCWS achieves substantial improvements over several hashing-based methods including sign random projections (SignRP), minwise hashing (MinHash), as well as the vanilla CWS.

10.1145/3539618.3591857

Cite

Citations (1)