Tianxiang Hu

Beijing Electronic Science and Technology Institute

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Zuozhu Liu

Zhejiang University

Baosong Yang

Alibaba Group (United States)

Rui Wang

Shanghai Jiao Tong University

Shikun Zhang

Liaocheng University

Rui Xie

Peking University

Jun Xie

Taiyuan University of Technology

Sameh Tawfick

University of Illinois Urbana-Champaign

Minghui Zhang

Northwest University

Jiongxin Wang

Zhejiang University-University of Edinburgh Institute

Miles V. Bimrose

University of Illinois Urbana-Champaign

Cooperative Institutions

Peking University

Zhejiang University

University of Illinois Urbana-Champaign

Sichuan University

Alibaba Group (China)

Alibaba Group (Cayman Islands)

Alibaba Group (United States)

Tianjin University

Chinese Academy of Sciences

National University of Singapore

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

PolyLM: An Open Source Polyglot Large Language Model

arXiv (Cornell University) (2023)

Xiangpeng Wei Haoran Wei Huan Lin Tianhao Li Pei Zhang

Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning. To assess the model's performance, we collect several existing multilingual tasks, including multilingual understanding, question answering, generation, and translation. Extensive experiments show that PolyLM surpasses other open-source models such as LLaMA and BLOOM on multilingual tasks while maintaining comparable performance in English. Our models, alone with the instruction data and multilingual benchmark, are available at: \url{https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation}.

Polyglot

Benchmark (surveying)

Limiting

10.48550/arxiv.2307.06018

Cite

Citations (7)

Active Learning for Low-Resource Project-Specific Code Summarization

Lecture notes in computer science (2024)

Chengli Xing Tianxiang Hu Ninglin Liao Minghui Zhang Dongdong Du

Code (set theory)

10.1007/978-981-97-5489-2_5

Cite

Citations (0)

Correction: Detecting and classifying hidden defects in additively manufactured parts using deep learning and X-ray computed tomography

Journal of Intelligent Manufacturing (2024)

Miles V. Bimrose Tianxiang Hu Davis J. McGregor Jiongxin Wang Sameh Tawfick

10.1007/s10845-024-02466-4

Cite

Citations (0)

Automatic detection of hidden defects and qualification of additively manufactured parts using X-ray computed tomography and computer vision

Manufacturing Letters (2024)

Miles V. Bimrose Tianxiang Hu Davis J. McGregor Jiongxin Wang Sameh Tawfick

Industrial computed tomography

10.1016/j.mfglet.2024.09.147

Cite

Citations (0)

A Multi-task based Bilateral-Branch Network for Imbalanced Citation Intent Classification

2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM) (2022)

Tianxiang Hu Jiyi Li Fumiyo Fukumoto Renjie Zhou

Identifying the purpose of citations plays an important role in evaluating the impact of the literature. There is a data imbalanced problem on different types of citation intents which harms the performance of the classification model. To alleviate this problem, We adapt the bilateral-branch network proposed in the computer vision domain to our topic in the natural language processing domain by constructing shared and non-shared encoder layers using pre-trained language model and word attention layer respectively. In addition, to learn rich representations by leveraging the auxiliary information, we propose a multi-task based bilateral-branch network. On the issue of how to integrate multi-task model and bilateral-branch network, because one advantage of multi-task learning is using more data or information to learn better representations, we propose a solution of integrating the networks of the auxiliary tasks with the representation learning branch of the bilateral- branch network. The experimental results show that our model outperforms other models used for citation intent classification.

Task Analysis

Representation

10.1109/imcom53663.2022.9721746

Cite

Citations (3)

Keyword-Aware Encoder for Abstractive Text Summarization

Lecture notes in computer science (2021)

Tianxiang Hu Jingxi Liang Wei Ye Shikun Zhang

Benchmark (surveying)

Statement (logic)

10.1007/978-3-030-73197-7_3

Cite

Citations (4)

Learnable Privacy Neurons Localization in Language Models

arXiv (Cornell University) (2024)

Ruizhe Chen Tianxiang Hu Yang Feng Zuozhu Liu

Concerns regarding Large Language Models (LLMs) to memorize and disclose private information, particularly Personally Identifiable Information (PII), become prominent within the community. Many efforts have been made to mitigate the privacy risks. However, the mechanism through which LLMs memorize PII remains poorly understood. To bridge this gap, we introduce a pioneering method for pinpointing PII-sensitive neurons (privacy neurons) within LLMs. Our method employs learnable binary weight masks to localize specific neurons that account for the memorization of PII in LLMs through adversarial training. Our investigations discover that PII is memorized by a small subset of neurons across all layers, which shows the property of PII specificity. Furthermore, we propose to validate the potential in PII risk mitigation by deactivating the localized privacy neurons. Both quantitative and qualitative experiments demonstrate the effectiveness of our neuron localization algorithm.

10.48550/arxiv.2405.10989

Cite

Citations (0)

ToothSegNet: Image Degradation meets Tooth Segmentation in CBCT Images

arXiv (Cornell University) (2023)

Jiaxiang Liu Tianxiang Hu Feng Yang Wanghui Ding Zuozhu Liu

In computer-assisted orthodontics, three-dimensional tooth models are required for many medical treatments. Tooth segmentation from cone-beam computed tomography (CBCT) images is a crucial step in constructing the models. However, CBCT image quality problems such as metal artifacts and blurring caused by shooting equipment and patients' dental conditions make the segmentation difficult. In this paper, we propose ToothSegNet, a new framework which acquaints the segmentation model with generated degraded images during training. ToothSegNet merges the information of high and low quality images from the designed degradation simulation module using channel-wise cross fusion to reduce the semantic gap between encoder and decoder, and also refines the shape of tooth prediction through a structural constraint loss. Experimental results suggest that ToothSegNet produces more precise segmentation and outperforms the state-of-the-art medical image segmentation methods.

Degradation

10.48550/arxiv.2307.01979

Cite

Citations (0)

TransVae:A Novel Variational Sequence-to-Sequence Framework for Semi-supervised Learning and Diversity Improvement

2022 International Joint Conference on Neural Networks (IJCNN) (2021)

Xinyi Wang Tianxiang Hu Xingzhang Ren Jinan Sun Kai Liu

Text generation tasks require that the generated text have certain diversity while ensuring the relevance. Traditional Seq2Seq models usually use cross entropy as the objective function. It demands the results keep strictly consistent with the ground truth texts, which easily leads to the lack of variability in generated texts. In this paper, we propose a novel framework, TransVAE, which applies Variational Auto-Encoder (VAE) to improve the Seq2Seq architecture. We design the Translator module to transform the latent variable spaces of origin input to target output, thus enhancing the diversity of generated texts and supporting semi-supervised learning. Moreover, we add attention and copy mechanisms to the TransVAE model to balance the relevance and diversity. Abundant experiments are carried out on three different string transduction tasks: dialogue generation, machine translation, and text summarization. The experiment results verify the effectiveness of our method.

Relevance

Sequence (biology)

Text generation

10.1109/ijcnn52387.2021.9533638

Cite

Citations (0)

DeepLink: A Code Knowledge Graph Based Deep Learning Approach for Issue-Commit Link Recovery

Rui Xie Long Chen Wei Ye Zhiyu Li Tianxiang Hu

Links between issue reports and corresponding code commits to fix them can greatly reduce the maintenance costs of a software project. More often than not, however, these links are missing and thus cannot be fully utilized by developers. Current practices in issue-commit link recovery extract text features and code features in terms of textual similarity from issue reports and commit logs to train their models. These approaches are limited since semantic information could be lost. Furthermore, few of them consider the effect of source code files related to a commit on issue-commit link recovery, let alone the semantics of code context. To tackle these problems, we propose to construct code knowledge graph of a code repository and generate embeddings of source code files to capture the semantics of code context. We also use embeddings to capture the semantics of issue- or commit-related text. Then we use these embeddings to calculate semantic similarity and code similarity using a deep learning approach before training a SVM binary classification model with additional features. Evaluations on real-world projects show that our approach DeepLink can outperform the state-of-the-art method.

Commit

Code (set theory)

10.1109/saner.2019.8667969

Cite

Citations (29)