Feature extraction for improved profile HMM based biological sequence analysis

2004 
State-of-the-art systems for biological sequence analysis employ statistical modeling techniques, most notably so-called profile HMMs. However, all approaches still rely on a purely symbolic sequence representation, which severely limits their capabilities in describing weak similarities between remotely homologue members of sequence families. Therefore, we propose a multi-channel signal-like sequence representation based on a combination of several numerically encoded biochemical properties of the individual residues. From this representation features are extracted capturing relevant local sequence properties by applying wavelet and principal component analysis. Evaluation results on a challenging task of sequence family classification prove that profile HMMs trained on the feature-based sequence representation significantly outperform discrete models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []