A Time and Space Efficient Algorithm for Alignment and Visualization of Sparse Retro-elements over Whole Genome Scale

2011 
It is not easy to align DNA sequences of whole genomes to find some useful biomarkers (in general term), as whole genome has more than 10 mega bases. Thus, we do not apply the straightforward Smith-Waterman O(N^2) time and space algorithms. If some specific DNA segments in which we are interested are very sparse, then the O(n^2) dynamic programming algorithm for aligning those are inefficient since that algorithm depends on the total length of the input string rather than the number of markers appearing on it. When we consider the retro-element over the whole genome, this problem becomes very clear since the size of whole genome is much more than a few (about 10-20) retro-elements we want to compare. In this paper we propose another alignment problem that consists of only two kinds of symbols, retro-elements and other general symbols. Thus, the whole genome can be regarded as a binary string of '1' (retroelement) and '0' (others). We propose an alignment algorithm for simplified binary string to reveal the structural similarity of retro-elements over two different genomes. The time complexity of this algorithm is O(M^2), where M denotes the number of retro-elements appearing in the genomes. We studied structural similarities of all HERV(Human Endogenous RetroViruses) element of four typical primates including Human, Chimpanzees, Orangutan and Rhesus monkey to show the usefulness of our algorithm. Our system successfully revealed the similarity distribution of retro elements for the four primates.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []