Predicting protein subcellular localization by approximate nearest neighbor searching

2017 
Protein subcellular location prediction is an important problem in bioinformatics. It is highly desirable to predict a protein's subcellular location from its sequence. We propose a novel prediction model combined with locality-sensitive hashing (LSH)-based approximate nearest neighbor searching (ANNS) and a global alignment dynamic programming algorithm. LSH was used to hash map protein sequence amino acid composition vector features, where sequences with similar features were placed into a hash bucket of corresponding key values in a hash table. Then, we determined similar sequences to the target sequence in the hash table, and compared them to the sequence of the closest Euclidean distance using the global alignment dynamic programming algorithm to predict the protein subcellular localization. Compared with other algorithms, this prediction model recorded relatively high overall accuracies on two benchmark datasets via jackknife testing, and it predicted target sequences quickly and effectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []