Deep learning model for the prediction and classification of protein toxins across all domains of life

2021 
Toxins are widely produced by different organisms to disrupt the physiology of other organisms, and support their own existence. Their study is useful to understand protein evolution, environmental adaptation and survival competition. In-silico predictions of toxic proteins can support empirical frameworks, and help in the safety measurements needed for various industrial related processes. Some in-silico methods are slow, hard to implement or lack taxa representation in their training datasets. Here we present a deep learning model to classify protein toxins, through the use of Convolutional Neural Networks (ConvTOX). ConvTOX is able to accurately identify toxic proteins across the domains of life, with accuracies over 80% for animal and plant toxins, and over 50% for bacterial toxins. Moreover, ConvTOX is able to generalize the identification of differences among toxin types, such as neurotoxins and myotoxins, and to accurately identify structural similarities between different protein toxins. ConvTOX overcomes limitations from previous models by being able to predict toxin proteins from across all domains of life, and by not being limited to only short toxin peptides. Limitations are still clear in terms of lower accuracies for specific phylogenetic groups (such as bacterial toxins), but still this works presents itself as a one step forward for the universal use, classification and study of toxic proteins.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []