WeightAln: Weighted Homologous Alignment for Protein Structure Property Prediction

2020 
Accurately predicting protein structure properties is essential in analyzing the structure and function of a protein, such as secondary structure, solvent accessibility, and dihedral angles. Multiple Sequence Alignment (MSA), which is a sequence alignment of multiple homologous protein sequences for the target protein, is widely used in the protein structure property prediction. The most popular strategy to exploit MSA is converting it into a position-specific scoring matrice (PSSM), then inputs the PSSM to the relevant prediction networks. PSSM is obtained by simply counting the frequency of amino acids presented at each position in the corresponding MSA, which means, each sequence in the MSA has the same weight to the target protein. However, simply setting the weights of homologous protein sequences of a protein as same cannot sufficiently model the complex relationships between them. Moreover, some sequences within the MSA are redundant, which raises a tantalizing question: can we generate a different weight for each sequence in the MSA and use the weighted PSSM to improve the performance of protein structure property prediction? To help answer this question, we present WeightAln framework, which to our knowledge, is the first attempt to generate learnable MSA weights for protein prediction tasks. We prove the effectiveness of our method by conducting extensive experiments on three protein structure property prediction tasks.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []