Protein Sequence Motif Information Generated by Fuzzy - Hybrid Hierarchical K-means Clustering Algorithm.

2010 
Recurring amino acids sequence patterns are referred to as protein sequence motifs. The recurring patterns are so important because the conserved regions have the potential to reveal the role of the protein itself. In this paper, we modify the FGK model and apply the Hybrid Hierarchical K-means (HHK) clustering algorithm, which is a hybrid combination of Agglomerative Hierarchical Clustering and KMeans Clustering, instead of greedy K-means clustering algorithm to discover protein sequence motifs that transcend protein family boundaries. This dual algorithm requires no user-defined parameters to identify the similarities and dissimilarities between the protein sequences. After we analyze the motifs generated from the HHK algorithm, the results are not only significant in sequence area but also in secondary structure. We obtained more than 49% of the clusters share more than 60% secondary structure similarity and 14% of the clusters share more than 70% secondary structure similarity. By comparing with the previous work, which generates only 25% and 0% on 60% and 70% group, the newly proposed approach gives us a better understanding of the relationships between a set of sequences. We believe that the HHK-Means algorithm, along with the change to the FGK model, will generate better results than those have previously been shown.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    1
    Citations
    NaN
    KQI
    []