Prediction of pathological mutations in proteins: the challenge of integrating sequence conservation and structure stability principles

2014 
The recent drop in genome sequencing costs has created a promising horizon for the development of genomic medicine. Within the biomedical environment, sequencing data are increasingly used for disease diagnosis and prognosis, treatment development, counseling, and so on. Many of these applications rely on the identification of disease causing variants. This is a particularly challenging problem because of the large number and wide variety of sequence variants identified in sequencing projects, and also because we only have a limited understanding of the physicochemical/biochemical properties that differentiate neutral from pathologic variants. Nonetheless, these last years have witnessed important methodologicaladvancesforoneclassofvariants,thosecorrespondingtochanges in the amino-acid sequence of proteins. Proteins are a main constituent of living systems. We know that although their biological properties are essentially determined by the amino-acid sequence, not all the changes in this sequence have the same impact. Some are neutral, but others affect protein function and lead to disease. A large body of evidence shows that whether one or the other is the case that depends on properties such as mutation location in the protein structure, interspecies conservation, and so on. Mutation prediction methods based on these features have good success rates, in the 70‐90% range, although representation over time suggests there is a performance plateau that would limit their applicability. In light of the most recent advances in the field, and after reviewing the foundations of prediction methods, we discuss the existence of this performance threshold and how it can be overcomed. C � 2013 John Wiley & Sons, Ltd.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    159
    References
    18
    Citations
    NaN
    KQI
    []