Dayiheng Liu

Sichuan University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

A Multi-Modal Chinese Poetry Generation Model

2022 International Joint Conference on Neural Networks (IJCNN) (2018)

Dayiheng Liu Quan Guo Wubo Li Jiancheng Lv

Recent studies in sequence-to-sequence learning demonstrate that RNN encoder-decoder structure can successfully generate Chinese poetry. However, existing methods can only generate poetry with a given first line or user's intent theme. In this paper, we proposed a three-stage multi-modal Chinese poetry generation approach. Given a picture, the first line, the title and the other lines of the poem are successively generated in three stages. According to the characteristics of Chinese poems, we propose a hierarchy-attention seq2seq model which can effectively capture character, phrase, and sentence information between contexts and improve the symmetry delivered in poems. In addition, the Latent Dirichlet allocation (LDA) model is utilized for title generation and improve the relevance of the whole poem and the title. Compared with strong baseline, the experimental results demonstrate the effectiveness of our approach, using machine evaluations as well as human judgments.

Phrase

Sequence (biology)

Natural Language Generation

Line (geometry)

Chinese poetry

10.1109/ijcnn.2018.8489579

Cite

Citations (30)

PolyLM: An Open Source Polyglot Large Language Model

arXiv (Cornell University) (2023)

Xiangpeng Wei Haoran Wei Huan Lin Tianhao Li Pei Zhang

Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning. To assess the model's performance, we collect several existing multilingual tasks, including multilingual understanding, question answering, generation, and translation. Extensive experiments show that PolyLM surpasses other open-source models such as LLaMA and BLOOM on multilingual tasks while maintaining comparable performance in English. Our models, alone with the instruction data and multilingual benchmark, are available at: \url{https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation}.

Polyglot

Benchmark (surveying)

Limiting

10.48550/arxiv.2307.06018

Cite

Citations (7)

Let's be Humorous: Knowledge Enhanced Humor Generation

arXiv (Cornell University) (2020)

Hang Zhang Dayiheng Liu Dongdong Chen Cheng Luo

The generation of humor is an under-explored and challenging problem. Previous works mainly utilize templates or replace phrases to generate humor. However, few works focus on freer forms and the background knowledge of humor. The linguistic theory of humor defines the structure of a humor sentence as set-up and punchline. In this paper, we explore how to generate a punchline given the set-up with the relevant knowledge. We propose a framework that can fuse the knowledge to end-to-end models. To our knowledge, this is the first attempt to generate punchlines with knowledge enhanced model. Furthermore, we create the first humor-knowledge dataset. The experimental results demonstrate that our method can make use of knowledge to generate fluent, funny punchlines, which outperforms several baselines.

Humor research

Source

Cite

Citations (1)

Alibaba-Translate China's Submission for WMT 2022 Quality Estimation Shared Task

arXiv (Cornell University) (2022)

Keqin Bao Yu Wan Dayiheng Liu Baosong Yang Wenqiang Lei

In this paper, we present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE (Unified Translation Evaluation). Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model. First, we apply the pseudo-labeled data examples for the continuously pre-training phase. Notably, to reduce the gap between pre-training and fine-tuning, we use data pruning and a ranking-based score normalization strategy. For the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years' WMT competitions. Finally, we collect the source-only evaluation results, and ensemble the predictions generated by two UniTE models, whose backbones are XLM-R and InfoXLM, respectively. Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings, showing relatively strong performances in this year's quality estimation competition.

Normalization

Benchmark (surveying)

10.48550/arxiv.2210.10049

Cite

Citations (1)

Enabling Scalable Oversight via Self-Evolving Critic

arXiv (Cornell University) (2025)

Zhimin Tang Zhao Li Zuo Xiao Tian Ding Ruoyu Sun

Despite their remarkable performance, the development of Large Language Models (LLMs) faces a critical challenge in scalable oversight: providing effective feedback for tasks where human evaluation is difficult or where LLMs outperform humans. While there is growing interest in using LLMs for critique, current approaches still rely on human annotations or more powerful models, leaving the issue of enhancing critique capabilities without external supervision unresolved. We introduce SCRIT (Self-evolving CRITic), a framework that enables genuine self-evolution of critique abilities. Technically, SCRIT self-improves by training on synthetic data, generated by a contrastive-based self-critic that uses reference solutions for step-by-step critique, and a self-validation mechanism that ensures critique quality through correction outcomes. Implemented with Qwen2.5-72B-Instruct, one of the most powerful LLMs, SCRIT achieves up to a 10.3\% improvement on critique-correction and error identification benchmarks. Our analysis reveals that SCRIT's performance scales positively with data and model size, outperforms alternative approaches, and benefits critically from its self-validation component.

Self driving

10.48550/arxiv.2501.05727

Cite

Citations (0)

A neural words encoding model

2022 International Joint Conference on Neural Networks (IJCNN) (2016)

Dayiheng Liu Jiancheng Lv Xiaofeng Qi Jiangshu Wei

This paper proposes a neural network model and learning algorithm that can be applied to encode words. The model realizes the function of words encoding and decoding which can be applied to text encryption/decryption and word-based compression. The model is based on Deep Belief Networks (DBNs) and it differs from traditional DBNs in that it is asymmetric structured and the output of it is a binary vector. With pre-training of multi-layer Restricted Boltzmann Machines (RBMs) and fine-tuning to reconstruct word set, the output of code layer can be used as a kind of representation code of words. We can change the number of neurons of code layer to control the length of representation code for different applications. This paper reports on experiments using English words of American National Corpus to train a neural words encoding model which can be used to encode/decode English words, realizing text encryption and data compression.

ENCODE

Code (set theory)

Boltzmann machine

Representation

Binary code

Restricted Boltzmann machine

10.1109/ijcnn.2016.7727245

Cite

Citations (1)

Bridging the Domain Gaps in Context Representations for k-Nearest Neighbor Neural Machine Translation-Nearest Neighbor Neural Machine Translation

Zhiwei Cao Baosong Yang Huan Lin Suhang Wu Xiangpeng Wei

Zhiwei Cao, Baosong Yang, Huan Lin, Suhang Wu, Xiangpeng Wei, Dayiheng Liu, Jun Xie, Min Zhang, Jinsong Su. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023.

Zhàng

10.18653/v1/2023.acl-long.321

Cite

Citations (1)

AnchiBERT: A Pre-Trained Model for Ancient ChineseLanguage Understanding and Generation

arXiv (Cornell University) (2020)

Huishuang Tian Kexin Yang Dayiheng Liu Jiancheng Lv

Ancient Chinese is the essence of Chinese culture. There are several natural language processing tasks of ancient Chinese domain, such as ancient-modern Chinese translation, poem generation, and couplet generation. Previous studies usually use the supervised models which deeply rely on parallel data. However, it is difficult to obtain large-scale parallel data of ancient Chinese. In order to make full use of the more easily available monolingual ancient Chinese corpora, we release AnchiBERT, a pre-trained language model based on the architecture of BERT, which is trained on large-scale ancient Chinese corpora. We evaluate AnchiBERT on both language understanding and generation tasks, including poem classification, ancient-modern Chinese translation, poem generation, and couplet generation. The experimental results show that AnchiBERT outperforms BERT as well as the non-pretrained models and achieves state-of-the-art results in all cases.

Couplet

Chinese culture

10.48550/arxiv.2009.11473

Cite

Citations (1)

Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space

arXiv (Cornell University) (2020)

Dayiheng Liu Yeyun Gong Jie Fu Yu Yan Jiusheng Chen

In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks. We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples. CRQDA utilizes a Transformer autoencoder to map the original discrete question into a continuous embedding space. It then uses a pre-trained MRC model to revise the question representation iteratively with gradient-based optimization. Finally, the revised question representations are mapped back into the discrete space, which serve as additional question data. Comprehensive experiments on SQuAD 2.0, SQuAD 1.1 question generation, and QNLI tasks demonstrate the effectiveness of CRQDA

Ask price

Autoencoder

Representation

10.48550/arxiv.2010.01475

Cite

Citations (1)

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

arXiv (Cornell University) (2025)

M-A-P Team Xinrun Du Yifan Yao Kaijing Ma Bingli Wang

Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

Graduate students

10.48550/arxiv.2502.14739

Cite

Citations (0)