Protein evolution is structure dependent and non-homogeneous across the tree of life

2020 
Protein sequence evolution is a complex process that varies across the tree of life and among-sites within proteins. Comparing evolutionary rate matrices for specific taxa ('clade-specific models') can reveal this variation and provide information about the basis for changes in the paterns of protein evolution over time. However, clade-specific models can only provide this information if the variation among taxa exceeds the variation among proteins. We showed this to be the case by demonstrating that clade-specific model fit could distinguish among proteins from the four taxa that we examined (vertebrates, plants, oomycetes, and yeasts). Model fit classified proteins correctly by clade of origin >70% of the time. A relatively small number of dimensions can explain differences among models. If model parameters are averaged across all sites ~80% of the variance among models reflects clade; for models that consider protein structure ~50% of the variance reflected relative solvent accessibility and ~25% reflected clade. Relaxed purifying selection in taxa with smaller long-term effective population sizes appears to explain much of the among clade variance. Relaxed selection on solvent-exposed sites was correlated with the degree of change in amino acid side-chain volume for substitutions; other differences among models were more complex. Beyond the information they reveal about protein evolution, our clade-specific models also represent tools for phylogenomic inference. Availability: model files are available from htps://github.com/ebraun68/clade_specific_prot_models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    63
    References
    2
    Citations
    NaN
    KQI
    []