Zihan Zhang

University of Manchester

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Ziheng Li Shaohan Huang Zihan Zhang Zhihong Deng Qiang Lou

Ziheng Li, Shaohan Huang, Zihan Zhang, Zhi-Hong Deng, Qiang Lou, Haizhen Huang, Jian Jiao, Furu Wei, Weiwei Deng, Qi Zhang. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023.

Zhàng

10.18653/v1/2023.acl-long.191

Cite

Citations (3)

E5-V: Universal Embeddings with Multimodal Large Language Models

arXiv (Cornell University) (2024)

Ting Jiang Minghui Song Zihan Zhang Haizhen Huang Wei‐Wei Deng

Multimodal large language models (MLLMs) have shown promising advancements in general visual and language understanding. However, the representation of multimodal information using MLLMs remains largely unexplored. In this work, we introduce a new framework, E5-V, designed to adapt MLLMs for achieving universal multimodal embeddings. Our findings highlight the significant potential of MLLMs in representing multimodal inputs compared to previous approaches. By leveraging MLLMs with prompts, E5-V effectively bridges the modality gap between different types of inputs, demonstrating strong performance in multimodal embeddings even without fine-tuning. We propose a single modality training approach for E5-V, where the model is trained exclusively on text pairs. This method demonstrates significant improvements over traditional multimodal training on image-text pairs, while reducing training costs by approximately 95%. Additionally, this approach eliminates the need for costly multimodal training data collection. Extensive experiments across four types of tasks demonstrate the effectiveness of E5-V. As a universal multimodal model, E5-V not only achieves but often surpasses state-of-the-art performance in each task, despite being trained on a single modality.

10.48550/arxiv.2407.12580

Cite

Citations (0)

Calculating Specific Integrals by Using Residue Theorem

Highlights in Science Engineering and Technology (2023)

Zihan Zhang Xinyang Xu Xinhao Yang

The aim of this paper is to solve the some specific integrals such as , , , and . The traditional method is that people normally decompose this polynomial into several partial fractions first. This process involves adding it all up, expanding brackets, and doing matrices computation, which takes too many steps of calculation. The partial fraction part requires using Euler’s formula and large amounts of expanding brackets to prove that the multiplication of those partial denominators is equal to the denominator given in the original equation. Once making one little mistake, the next process will be all nonsense. Therefore, introducing complex analysis and residue theorem can result in considerably fewer calculation steps than the traditional method does in order to calculate integral, as opposed to completing hundreds of steps of partial fraction decomposition and substitution. To solve the first integral: , we can rewrite Cauchy’s residue theorem in a new form, by considering residue at infinity , .Then, we can get the coefficient of and hence compute the integral. For the second and third integral: , and , we exploit the reciprocal function of Taylor polynomials, writing and in a new form to get the coefficient of Then we can compute the integrals by using the residue theorem.

Residue theorem

Partial fraction decomposition

10.54097/hset.v38i.5820

Cite

Citations (0)

Learning from My Friends: Few-Shot Personalized Conversation Systems via Social Networks

Proceedings of the AAAI Conference on Artificial Intelligence (2021)

Zhiliang Tian Wei Bi Zihan Zhang Dongkyu Lee Yiping Song

Personalized conversation models (PCMs) generate responses according to speaker preferences. Existing personalized conversation tasks typically require models to extract speaker preferences from user descriptions or their conversation histories, which are scarce for newcomers and inactive users. In this paper, we propose a few-shot personalized conversation task with an auxiliary social network. The task requires models to generate personalized responses for a speaker given a few conversations from the speaker and a social network. Existing methods are mainly designed to incorporate descriptions or conversation histories. Those methods can hardly model speakers with so few conversations or connections between speakers. To better cater for newcomers with few resources, we propose a personalized conversation model (PCM) that learns to adapt to new speakers as well as enabling new speakers to learn from resource-rich speakers. Particularly, based on a meta-learning based PCM, we propose a task aggregator (TA) to collect other speakers' information from the social network. The TA provides prior knowledge of the new speaker in its meta-learning. Experimental results show our methods outperform all baselines in appropriateness, diversity, and consistency with speakers.

Social network (sociolinguistics)

10.1609/aaai.v35i15.17638

Cite

Citations (10)

Horizontal Federated Traffic Speed Prediction Base on Secure Node Attribute Aggregation

Communications in computer and information science (2023)

Enjie Ye Kun Guo Wenzhong Guo Dangrun Chen Zihan Zhang

Homomorphic Encryption

Speedup

10.1007/978-981-99-2385-4_49

Cite

Citations (0)

Two-Stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023)

Mingshuai Liu Shubo Lv Zihan Zhang Runduo Han Xiang Hao

In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage neural model which improves speech signal quality induced by different distortions in a stage-wise divide-and-conquer fashion. Specifically, in the first stage, the speech improvement network focuses on recovering the missing components of the spectrum, while in the second stage, our model aims to further suppress noise, reverberation, and artifacts introduced by the first-stage model. Achieving 0.446 in the final score and 0.517 in the P.835 score, our system ranks 4th in the non-real-time track.

SIGNAL (programming language)

10.1109/icassp49357.2023.10094827

Cite

Citations (3)

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

arXiv (Cornell University) (2023)

Ziheng Li Shaohan Huang Zihan Zhang Zhihong Deng Qiang Lou

Recent studies have shown that dual encoder models trained with the sentence-level translation ranking task are effective methods for cross-lingual sentence embedding. However, our research indicates that token-level alignment is also crucial in multilingual scenarios, which has not been fully explored previously. Based on our findings, we propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding that incorporates both sentence-level and token-level alignment. To achieve this, we introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart. This reconstruction objective encourages the model to embed translation information into the token representation. Compared to other token-level alignment methods such as translation language modeling, RTL is more suitable for dual encoder architectures and is computationally efficient. Extensive experiments on three sentence-level cross-lingual benchmarks demonstrate that our approach can significantly improve sentence embedding. Our code is available at https://github.com/ChillingDream/DAP.

Representation

10.48550/arxiv.2305.09148

Cite

Citations (0)

A Resilience Engineering Based Analysis Framework for Network Systems

Fuchun Ren Jian Jiao Zihan Zhang Tingdi Zhao

Many real-world systems can be abstracted into network systems.They have made a great contribution to human daily life, however, risk and disadvantages of these network systems are also serious since a tiny fault may lead to a big disaster.So the ability of resilience that a system can response to an adverse disruption and recovery back to the normal condition after disruptions is needed for modern systems.This paper mainly proposed a resilience analysis framework based on the resilience engineering concept and numerical simulations are put forward based on a generated scale-free network.The simulation results reveal that factors of component reliability, failure propagation failure detection, and recovery strategy would indeed contribute to the resilience of network systems.

Resilience

10.2991/amms-17.2017.78

Cite

Citations (0)

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

arXiv (Cornell University) (2024)

Ting Jiang Shaohan Huang Shengyue Luo Zihan Zhang Haizhen Huang

Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. To achieve it, we introduce the corresponding non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix. Furthermore, these operators ensure that the weight can be merged back into LLMs, which makes our method can be deployed like LoRA. We perform a comprehensive evaluation of our method across five tasks: instruction tuning, mathematical reasoning, continual pretraining, memory and pretraining. Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.

Rank (graph theory)

10.48550/arxiv.2405.12130

Cite

Citations (1)

Increasing Parallelism in the ROOT I/O Subsystem

Journal of Physics Conference Series (2018)

G. Amádio Brian Bockelman Philippe Canal D. Piparo Enric Tejedor

When processing large amounts of data, the rate at which reading and writing can take place is a critical factor. High energy physics data processing relying on ROOT is no exception. The recent parallelisation of LHC experiments' software frameworks and the analysis of the ever increasing amount of collision data collected by experiments further emphasised this issue underlying the need of increasing the implicit parallelism expressed within the ROOT I/O.

Root (linguistics)

10.1088/1742-6596/1085/3/032014

Cite

Citations (1)