Mining Similarity-Aware Distinguishing Sequential Patterns from Biomedical Sequences

2017 
Mining distinguishing sequential patterns, which indicate unique properties of a target family of biomedical sequences, is useful for the explanation and characterization of phenomena concerning, as well as for the identification of biomarkers for, the target family. However, previous studies on mining distinguishing sequential patterns did not consider the important and widely-occurring case where biochemical similarity exists among the elements in a given type of biomedical sequences. To fill that gap, this paper considers mining distinguishing sequential patterns for data where sequence elements can be similar to each other; the associated patterns will be called similarity-aware distinguishing sequential patterns (simDSP). After presenting the challenges on mining simDSP, we present simDSP-Miner, a mining method with effective pruning techniques, for mining simDSPs with domain-specific similarity knowledge. Our empirical study using real-world protein sequences demonstrates that simDSP-Miner is effective and efficient, and it can discover more novel distinguishing sequential patterns than previous algorithms for mining distinguishing sequential patterns.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    31
    References
    4
    Citations
    NaN
    KQI
    []