Feature extraction by statistical contact potentials and wavelet transform for predicting subcellular localizations in gram negative bacterial proteins.

2015 
Abstract Predicting the localization of a protein has become a useful practice for inferring its function. Most of the reported methods to predict subcellular localizations in Gram-negative bacterial proteins make use of standard protein representations that generally do not take into account the distribution of the amino acids and the structural information of the proteins. Here, we propose a protein representation based on the structural information contained in the pairwise statistical contact potentials. The wavelet transform decodes the information contained in the primary structure of the proteins, allowing the identification of patterns along the proteins, which are used to characterize the subcellular localizations. Then, a support vector machine classifier is trained to categorize them. Cellular compartments like periplasm and extracellular medium are difficult to predict, having a high false negative rate. The wavelet-based method achieves an overall high performance while maintaining a low false negative rate, particularly, on “periplasm” and “extracellular medium”. Our results suggest the proposed protein characterization is a useful alternative to representing and predicting protein sequences over the classical and cutting edge protein depictions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    2
    Citations
    NaN
    KQI
    []