Chao Zhang

Hanzhong Central Hospital

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Jiawei Han

Xi'an Jiaotong University

Yue Yu

Zhejiang University

Jiaming Shen

Taiyuan Heavy Industry (China)

Rongzhi Zhang

Georgia Institute of Technology

Yuchen Zhuang

Chongqing Medical University

Lingkai Kong

Zhejiang Lab

Meng Yu

Jiangsu Province Special Equipment Safety Supervision and Inspection Institute

Haoming Jiang

Amazon (United States)

Yinghao Li

Xidian University

Carl Yang

Emory University

Cooperative Institutions

Chinese Academy of Sciences

Tsinghua University

Peking University

Zhejiang University

Shanghai Jiao Tong University

109

University of Chinese Academy of Sciences

105

University of Illinois Urbana-Champaign

Harbin Institute of Technology

Georgia Institute of Technology

Sun Yat-sen University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Context-Aware Query Rewriting for Improving Users’ Search Experience on E-commerce Websites

Simiao Zuo Qingyu Yin Haoming Jiang Shaohui Xi Bing Yin

Simiao Zuo, Qingyu Yin, Haoming Jiang, Shaohui Xi, Bing Yin, Chao Zhang, Tuo Zhao. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track). 2023.

Zhàng

10.18653/v1/2023.acl-industry.59

Cite

Citations (3)

Author response for "Healthcare personnel in 2016–2019 prospective cohort infrequently got vaccinated, worked while ill, and frequently used antibiotics rather than antivirals against viral influenza illnesses"

Eduardo Azziz‐Baumgartner Joan Neyra Tat S. Yau Giselle Soto Daniel Owusu

Healthcare worker

10.1111/irv.13189/v2/response1

Cite

Citations (0)

Text-to-image approach combining information blending structure and attention

2022 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML) (2023)

Chuanao Bai Chao Zhang Linxuan Li Tian Lin

The semantic consistency, loss of detail features, and other problems are caused by the traditional text generation image method's focus on extracting the information transmitted by the previous layer while ignoring the information of the layer's features that are lost during the subsequent propagation process. Text-to-images combining information blending structure and attention (CAGAN) with information fusion capabilities is suggested in order to address the aforementioned issues. The word and sentence levels of the text's characteristics are encoded using the pre-trained BiLSTM model. The network structure includes a feature fusion module that receives upper-level transmission information while maintaining feature information at the same level. The resulting image's semantic coherence is enhanced, and the image's features are refined, thanks to the application of the affine transformation to map the visual features based on natural language descriptions. According to the experimental findings, the CAGAN's FID score in the CUB dataset is 15.93, and its IS index is 4.84±0.04 in the CUB dataset. It has been greatly improved when compared to the current standard AttGAN and DMGAN models, demonstrating the usefulness of the suggested network.

10.1109/icicml60161.2023.10424865

Cite

Citations (0)

Spherical Text Embedding

arXiv (Cornell University) (2019)

Meng Yu Jiaxin Huang Guangyuan Wang Chao Zhang Honglei Zhuang

Unsupervised text embedding has shown great power in a wide range of NLP tasks. While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding. To close this gap, we propose a spherical generative model based on which unsupervised word and paragraph embeddings are jointly learned. To learn text embeddings in the spherical space, we develop an efficient optimization algorithm with convergence guarantee based on Riemannian optimization. Our model enjoys high efficiency and achieves state-of-the-art performances on various text embedding tasks including word similarity and document clustering.

Similarity (geometry)

Paragraph

Generative model

Word embedding

Document Clustering

10.48550/arxiv.1911.01196

Cite

Citations (45)

CAMul: Calibrated and Accurate Multi-view Time-Series Forecasting

Proceedings of the ACM Web Conference 2022 (2022)

Harshavardhan Kamarthi Lingkai Kong Alexander Rodríguez Chao Zhang B. Aditya Prakash

Probabilistic time-series forecasting enables reliable decision making across many domains. Most forecasting problems have diverse sources of data containing multiple modalities and structures. Leveraging information from these data sources for accurate and well-calibrated forecasts is an important but challenging problem. Most previous works on multi-view time-series forecasting aggregate features from each data view by simple summation or concatenation and do not explicitly model uncertainty for each data view. We propose a general probabilistic multi-view forecasting framework CAMul, which can learn representations and uncertainty from diverse data sources. It integrates the information and uncertainty from each data view in a dynamic context-specific manner, assigning more importance to useful views to model a well-calibrated forecast distribution. We use CAMul for multiple domains with varied sources and modalities and show that CAMul outperforms other state-of-art probabilistic forecasting models by over 25% in accuracy and calibration.

Probabilistic Forecasting

Concatenation (mathematics)

Modalities

10.1145/3485447.3512037

Cite

Citations (8)

G-STO: Sequential Main Shopping Intention Detection via Graph-Regularized Stochastic Transformer

arXiv (Cornell University) (2023)

Yuchen Zhuang Xin Shen Yan Zhao Chaosheng Dong Ming Wang

Sequential recommendation requires understanding the dynamic patterns of users' behaviors, contexts, and preferences from their historical interactions. Most existing works focus on modeling user-item interactions only from the item level, ignoring that they are driven by latent shopping intentions (e.g., ballpoint pens, miniatures, etc). The detection of the underlying shopping intentions of users based on their historical interactions is a crucial aspect for e-commerce platforms, such as Amazon, to enhance the convenience and efficiency of their customers' shopping experiences. Despite its significance, the area of main shopping intention detection remains under-investigated in the academic literature. To fill this gap, we propose a graph-regularized stochastic Transformer method, G-STO. By considering intentions as sets of products and user preferences as compositions of intentions, we model both of them as stochastic Gaussian embeddings in the latent representation space. Instead of training the stochastic representations from scratch, we develop a global intention relational graph as prior knowledge for regularization, allowing relevant shopping intentions to be distributionally close. Finally, we feed the newly regularized stochastic embeddings into Transformer-based models to encode sequential information from the intention transitions. We evaluate our main shopping intention identification model on three different real-world datasets, where G-STO achieves significantly superior performances to the baselines by 18.08% in Hit@1, 7.01% in Hit@10, and 6.11% in NDCG@10 on average.

ENCODE

Regularization

10.48550/arxiv.2306.14314

Cite

Citations (0)

Computing Trajectory Similarity in Linear Time: A Generic Seed-Guided Neural Metric Learning Approach

2022 IEEE 38th International Conference on Data Engineering (ICDE) (2019)

Di Yao Gao Cong Chao Zhang Jingping Bi

Trajectory similarity computation is a fundamental problem for various applications in trajectory data analysis. However, the high computation cost of existing trajectory similarity measures has become the key bottleneck for trajectory analysis at scale. While there have been many research efforts for reducing the complexity, they are specific to one similarity measure and often yield limited speedups. We propose NeuTraj to accelerate trajectory similarity computation. NeuTraj is generic to accommodate any existing trajectory measure and fast to compute the similarity of a given trajectory pair in linear time. Furthermore, NeuTraj is elastic to collaborate with all spatial-based trajectory indexing methods to reduce the search space. NeuTraj samples a number of seed trajectories from the given database, and then uses their pair-wise similarities as guidance to approximate the similarity function with a neural metric learning framework. NeuTraj features two novel modules to achieve accurate approximation of the similarity function: (1) a spatial attention memory module that augments existing recurrent neural networks for trajectory encoding; and (2) a distance-weighted ranking loss that effectively transcribes information from the seed-based guidance. With these two modules, NeuTraj can yield high accuracies and fast convergence rates even if the training data is small. Our experiments on two real-life datasets show that NeuTraj achieves over 80% accuracy on Fre chet, Hausdorff, ERP and DTW measures, which outperforms state-of-the-art baselines consistently and significantly. It obtains 50x-1000x speedup over bruteforce methods and 3x-500x speedup over existing approximate algorithms, while yielding more accurate approximations of the similarity functions.

Speedup

Similarity (geometry)

Similarity measure

10.1109/icde.2019.00123

Cite

Citations (96)

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias

arXiv (Cornell University) (2023)

Yue Yu Yuchen Zhuang Jieyu Zhang Meng Yu Alexander Ratner

Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. While previous research has explored different approaches to training models using generated data, they generally rely on simple class-conditional prompts, which may limit the diversity of the generated data and inherit systematic biases of LLM. Thus, we investigate training data generation with diversely attributed prompts (e.g., specifying attributes like length and style), which have the potential to yield diverse and attributed generated data. Our investigation focuses on datasets with high cardinality and diverse domains, wherein we demonstrate that attributed prompts outperform simple class-conditional prompts in terms of the resulting model's performance. Additionally, we present a comprehensive empirical study on data generation encompassing vital aspects like bias, diversity, and efficiency, and highlight three key observations: firstly, synthetic datasets generated by simple prompts exhibit significant biases, such as regional bias; secondly, attribute diversity plays a pivotal role in enhancing model performance; lastly, attributed prompts achieve the performance of simple class-conditional prompts while utilizing only 5\% of the querying cost of ChatGPT associated with the latter. The data and code are available on \url{https://github.com/yueyu1030/AttrPrompt}.

Code (set theory)

Cardinality (data modeling)

Natural Language Generation

10.48550/arxiv.2306.15895

Cite

Citations (28)

Evaluation of CYP2C19 Genetic Variant and Its Lack of Association with Valproic Acid Plasma Concentrations Among Zhuang and Han Schizophrenia Patients in Guangxi

Pharmacogenomics and Personalized Medicine (2024)

Jun Mei Teng Shuiqing Qin Danyu Lu Yefa Gu Shi Tang

To investigate the

Valproic Acid

Kazakh