Mining Contrast Sequential Patterns based on Subsequence Location Distribution from Biological Sequences

2019 
With the generation of a large amount of biological data, researches on methods that can automatically analyze these biological data has become a hot spot. Contrast sequential patterns play an important role in identifying the characteristics of different biological sequences. However, previous studies on mining contrast sequential pattern did not consider the effects of gene/amino acid location distribution on patterns in given biological sequences. In this paper, we introduce the subsequence location distribution into the conditions of the contrast sequence pattern mining, extending previous studies which only considered support of patterns. We also design a novel algorithm, SLD-tree, which compresses datasets into the tree to avoid repeated scanning of the dataset, and can effectively mines contrast sequential patterns based on subsequence location distribution. The empirical study using real-world biological sequence demonstrates the effectiveness of our method. Moreover, we carry out classification experiment, the results verify our method have higher classification accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []