Zhaofeng Ye

Tencent (China)

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Jiezhong Qiu

Zhejiang Lab

Chang‐Yu Hsieh

Zhejiang University

Ziyi Yang

Anhui University of Science and Technology

Shengyu Zhang

Tencent (China)

Jonathan Allcock

Tencent (China)

Rongjun Feng

Tencent (China)

Ziwei Xie

Toyota Technological Institute at Chicago

Jie Tang

Sichuan University of Science and Engineering

Jinbo Xu

Toyota Technological Institute at Chicago

Bo Chen

Wuhan University

Cooperative Institutions

Tencent (China)

Zhejiang University

Zhejiang Lab

Tsinghua University

University of Science and Technology of China

Chinese Academy of Sciences

Wuhan University

Central South University

University of Chinese Academy of Sciences

Chinese Academy of Medical Sciences & Peking Union Medical College

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

MetalProGNet: a structure-based deep graph model for metalloprotein–ligand interaction predictions

Chemical Science (2023)

Dejun Jiang Zhaofeng Ye Chang‐Yu Hsieh Ziyi Yang Xujun Zhang

Metalloproteins play essential roles in various biological processes ranging from reaction catalysis to free radical scavenging, and they are also pertinent to numerous pathologies including cancer, HIV infection,and inflammation.

Metalloprotein

10.1039/d2sc06576b

Cite

Citations (10)

TensorCircuit: a Quantum Software Framework for the NISQ Era

Quantum (2023)

Shi‐Xin Zhang Jonathan Allcock Zhou‐Quan Wan Shuo Liu Jiace Sun

TensorCircuit is an open source quantum circuit simulator based on tensor network contraction, designed for speed, flexibility and code efficiency. Written purely in Python, and built on top of industry-standard machine learning frameworks, TensorCircuit supports automatic differentiation, just-in-time compilation, vectorized parallelism and hardware acceleration. These features allow TensorCircuit to simulate larger and more complex quantum circuits than existing simulators, and are especially suited to variational algorithms based on parameterized quantum circuits. TensorCircuit enables orders of magnitude speedup for various quantum simulation tasks compared to other common quantum software, and can simulate up to 600 qubits with moderate circuit depth and low-dimensional connectivity. With its time and space efficiency, flexible and extensible architecture and compact, user-friendly API, TensorCircuit has been built to facilitate the design, simulation and analysis of quantum algorithms in the Noisy Intermediate-Scale Quantum (NISQ) era.

Speedup

Quantum circuit

Python

10.22331/q-2023-02-02-912

Cite

Citations (65)

Improved the heterodimer protein complex prediction with protein language models

Briefings in Bioinformatics (2023)

Bo Chen Ziwei Xie Jiezhong Qiu Zhaofeng Ye Jinbo Xu

AlphaFold-Multimer has greatly improved the protein complex structure prediction, but its accuracy also depends on the quality of the multiple sequence alignment (MSA) formed by the interacting homologs (i.e. interologs) of the complex under prediction. Here we propose a novel method, ESMPair, that can identify interologs of a complex using protein language models. We show that ESMPair can generate better interologs than the default MSA generation method in AlphaFold-Multimer. Our method results in better complex structure prediction than AlphaFold-Multimer by a large margin (+10.7% in terms of the Top-5 best DockQ), especially when the predicted complex structures have low confidence. We further show that by combining several MSA generation methods, we may yield even better complex structure prediction accuracy than Alphafold-Multimer (+22% in terms of the Top-5 best DockQ). By systematically analyzing the impact factors of our algorithm we find that the diversity of MSA of interologs significantly affects the prediction accuracy. Moreover, we show that ESMPair performs particularly well on complexes in eucaryotes.

Sequence (biology)

Margin (machine learning)

10.1093/bib/bbad221

Cite

Citations (13)

Improved the Protein Complex Prediction with Protein Language Models

bioRxiv (Cold Spring Harbor Laboratory) (2022)

Bo Chen Ziwei Xie Jiezhong Qiu Zhaofeng Ye Jinbo Xu

Abstract AlphaFold-Multimer has greatly improved protein complex structure prediction, but its accuracy also depends on the quality of the multiple sequence alignment (MSA) formed by the interacting homologs (i.e., interologs) of the complex under prediction. Here we propose a novel method, denoted as ESMPair, that can identify interologs of a complex by making use of protein language models (PLMs). We show that ESMPair can generate better interologs than the default MSA generation method in AlphaFold-Multimer. Our method results in better complex structure prediction than AlphaFold-Multimer by a large margin (+10.7% in terms of the Top-5 best DockQ), especially when the predicted complex structures have low confidence. We further show that by combining several MSA generation methods, we may yield even better complex structure prediction accuracy than Alphafold-Multimer (+22% in terms of the Top-5 best DockQ). We systematically analyze the impact factors of our algorithm and find out the diversity of MSA of interologs significantly affects the prediction accuracy. Moreover, we show that ESMPair performs particularly well on complexes in eucaryotes.

Sequence (biology)

Margin (machine learning)

10.1101/2022.09.15.508065

Cite

Citations (5)

MdrDB: Mutation-induced drug resistance DataBase

bioRxiv (Cold Spring Harbor Laboratory) (2022)

Ziyi Yang Zhaofeng Ye Jiezhong Qiu Rongjun Feng Danyu Li

A bstract Mutation-induced drug resistance – where the efficacy of drugs is diminished by structural changes in proteins – presents a significant challenge to drug development and the clinical treatment of disease. Understanding the effects of mutation on protein-ligand binding affinities is a key step in developing more effective drugs and therapies, but as a research community we are currently hindered by the lack of a comprehensive database of relevant information. To address this issue, we have developed MdrDB, a database of information related to changes in protein-ligand affinity caused by mutations in protein structure. MdrDB combines data from seven publicly available datasets with calculated biochemical features, as well as 3D structures computed with PyMOL and AlphaFold 2.0, to form the largest database of its kind. With 3D structural information provided for all samples, MdrDB was specifically created to have the size, breadth, and complexity to be useful for practical protein mutation studies and drug resistance modeling. The database brings together wild type and mutant protein-ligand complexes, binding affinity changes upon mutation (ΔΔG), and biochemical features calculated from complexes to advance our understanding of mutation-induced drug resistance, the development of combination therapies, and the discovery of novel chemicals. In total, MdrDB contains 100,537 samples generated from 240 proteins (5,119 total PDB structures), 2,503 mutations, and 440 drugs. Of the total samples, 95,971 are based on available PDB structures, with the remaining 4,566 based on AlphaFold 2.0 predicted structures.

Protein Data Bank

10.1101/2022.10.20.513118

Cite

Citations (0)

SPLDExtraTrees: robust machine learning approach for predicting kinase inhibitor resistance

Briefings in Bioinformatics (2022)

Ziyi Yang Zhaofeng Ye Yijia Xiao Chang‐Yu Hsieh Shengyu Zhang

Drug resistance is a major threat to the global health and a significant concern throughout the clinical treatment of diseases and drug development. The mutation in proteins that is related to drug binding is a common cause for adaptive drug resistance. Therefore, quantitative estimations of how mutations would affect the interaction between a drug and the target protein would be of vital significance for the drug development and the clinical practice. Computational methods that rely on molecular dynamics simulations, Rosetta protocols, as well as machine learning methods have been proven to be capable of predicting ligand affinity changes upon protein mutation. However, the severely limited sample size and heavy noise induced overfitting and generalization issues have impeded wide adoption of machine learning for studying drug resistance. In this paper, we propose a robust machine learning method, termed SPLDExtraTrees, which can accurately predict ligand binding affinity changes upon protein mutation and identify resistance-causing mutations. Especially, the proposed method ranks training data following a specific scheme that starts with easy-to-learn samples and gradually incorporates harder and diverse samples into the training, and then iterates between sample weight recalculations and model updates. In addition, we calculate additional physics-based structural features to provide the machine learning model with the valuable domain knowledge on proteins for these data-limited predictive tasks. The experiments substantiate the capability of the proposed method for predicting kinase inhibitor resistance under three scenarios and achieve predictive accuracy comparable with that of molecular dynamics and Rosetta methods with much less computational costs.

Overfitting

10.1093/bib/bbac050

Cite

Citations (7)

A mutation-induced drug resistance database (MdrDB)

Communications Chemistry (2023)

Ziyi Yang Zhaofeng Ye Jiezhong Qiu Rongjun Feng Danyu Li

Mutation-induced drug resistance is a significant challenge to the clinical treatment of many diseases, as structural changes in proteins can diminish drug efficacy. Understanding how mutations affect protein-ligand binding affinities is crucial for developing new drugs and therapies. However, the lack of a large-scale and high-quality database has hindered the research progresses in this area. To address this issue, we have developed MdrDB, a database that integrates data from seven publicly available datasets, which is the largest database of its kind. By integrating information on drug sensitivity and cell line mutations from Genomics of Drug Sensitivity in Cancer and DepMap, MdrDB has substantially expanded the existing drug resistance data. MdrDB is comprised of 100,537 samples of 240 proteins (which encompass 5119 total PDB structures), 2503 mutations, and 440 drugs. Each sample brings together 3D structures of wild type and mutant protein-ligand complexes, binding affinity changes upon mutation (ΔΔG), and biochemical features. Experimental results with MdrDB demonstrate its effectiveness in significantly enhancing the performance of commonly used machine learning models when predicting ΔΔG in three standard benchmarking scenarios. In conclusion, MdrDB is a comprehensive database that can advance the understanding of mutation-induced drug resistance, and accelerate the discovery of novel chemicals.

Protein Data Bank

Structural genomics

Benchmarking

10.1038/s42004-023-00920-7

Cite

Citations (4)