Sparsity of Protein-Protein Interaction Networks Hinders Function Prediction in Non-Model Species

2019 
MotivationPhysical interaction between two proteins is strong evidence that the proteins are involved in the same biological process, making Protein-Protein Interaction (PPI) networks a valuable data resource for predicting the cellular functions of proteins. However, PPI networks are largely incomplete for non-model species. Here, we test whether these incomplete networks are still useful for genome-wide function prediction.nnResultsWe used a simple network-based classifier to predict Biological Process Gene Ontology terms from protein interaction data in three species: Saccharomyces cerevisiae, Arabidopsis thaliana and Solanum lycopersicum (tomato). The classifier had reasonable performance in the well-studied yeast, but performed poorly in the other two species. We show that this poor performance is because many proteins are disconnected in the network and that the performance can be considerably improved by adding edges predicted from various data sources. In yeast, the addition of predicted edges did not lead to improvement. It did help when we randomly removed a large amount of edges though.nnConclusionOur work highlights the necessity of obtaining more protein-protein interactions in non-model species, either by means of prediction or experiment.nnAvailabilityData and code to reproduce the results are available at github.com/stamakro/ppi-missing-data.nnContacts.makrodimitris@tudelft.nlnnSupplementary informationSupplementary data are available online.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    4
    Citations
    NaN
    KQI
    []