MitoFates: Improved Prediction of Mitochondrial Targeting Sequences and their Cleavage Sites

2015 
Mitochondria not only function as the provider of ATP but also play crucial roles in the metabolism of amino acids and lipids, the biosynthesis of iron-sulfur clusters, cell signaling pathways, and apoptosis in eukaryotic cells. Moreover, mitochondrial dysfunction has been implicated in a wide variety of medical conditions such as muscle and neurodegenerative disease, cardiovascular disease, diabetes, and cancer (1). Obtaining the complete proteome of mitochondria is an essential step toward fully understanding its role in health and disease. To this end, ∼900 (in yeast) and 1100 (in mouse) mitochondrial proteins have been identified by large-scale proteomics analyses (2, 3); and compiled with other relevant mitochondrial proteomics data in useful databases such as MitoCarta (3) and MitoMiner (4). However, these lists are probably not yet complete, and indeed fungi and animal mitochondria have been estimated to host ∼1000 and ∼1500 distinct proteins, respectively (5). Thus, many mitochondrial proteins seem to remain undiscovered even in model organisms. If high accuracy can be achieved, prediction of mitochondrial proteins from primary sequence can save time and effort by identifying promising novel candidate mitochondrial proteins. The vast majority of mitochondrial proteins are encoded in the nuclear genome and imported by translocator complexes in the mitochondrial membranes. These mitochondrial proteins can be classified into two groups based on the type of targeting signal they contain: an N-terminal cleavable targeting signal (presequence); or a noncleavable, internal targeting signal (6). A recent proteomic analysis of yeast estimated that ∼70% of mitochondrial proteins possess a presequence (7). Thus, improved prediction of presequences should contribute to detecting undiscovered mitochondrial proteins. Presequences reside in the first 10–90 N-terminal residues, exhibit a high composition of arginine and near absence of negatively charged residues (8, 9). Proteins containing such presequences are translocated by the TOM and TIM protein complexes in the outer and the inner membranes, respectively (6, 10, 11). Tom20 and Tom22 in the TOM complex are reported to initiate import of these proteins by recognizing presequence segments capable of forming a local amphiphilic α-helical structure with hydrophobic residues on one face and positively charged residues on the opposite face (6, 12, 13). Widely used prediction tools such as MitoProt, TargetP, Predotar, and TPpred2 were developed with these properties of presequences in mind (14–17). The cleavage of mitochondrial protein presequences is an important event implicated in efficient protein import (18) and disease (19). Upon import into mitochondria, most presequences are cleaved off by the heterodimer mitochondrial processing peptidase (MPP)1 in the matrix, and some of them subsequently further cleaved by intermediate peptidases such as Oct1 (20) and the recently discovered Icp55 (7). Although methods exist to predict these cleavage sites, their accuracy leaves much room for improvement (7, 21). Because the correct primary sequence of mature proteins is a prerequisite for precise structural modeling, improving the accuracy of cleavage site prediction should be useful for planning protein crystallography experiments or other structural studies of mitochondrial proteins. Also, accurate in silico prediction of the mature N-termini of mitochondrial proteins could in principle be used to improve the identification of N-terminal peptides in shotgun proteomics. In this study, we describe MitoFates, a novel method for mitochondrial presequence and cleavage site prediction. MitoFates formulates presequence prediction as a binary classification problem, employing a standard support vector machine (SVM) classifier. Our contribution is the preparation of an updated data set incorporating some recent proteomic data; and the selection of classical and novel sequence features such as amino acid composition, physicochemical properties, a novel positive amphiphilicity score, novel presequence motifs, and refined position weight matrices (PWMs) modeling peptidase cleavage sites. On the task of discriminating between presequences and nonpresequences, MitoFates achieves a true positive rate of 76% at a false positive rate of only 1.7%, improving significantly on previous methods. Moreover, MitoFates predicts the position of cleavage sites with an error rate of only ∼29% versus ∼47% for the best previous method. To investigate the potential of MitoFates to reveal interesting candidate mitochondrial proteins, we looked for undiscovered mitochondrial proteins among 42,217 human proteins (including isoforms such as alternative splicing or translation initiation variants), and obtained 580 candidate undiscovered mitochondrial proteins. Open source software downloads and a convenient MitoFates web server is available at http://mitf.cbrc.jp/MitoFates/.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    68
    References
    291
    Citations
    NaN
    KQI
    []