Identification of endogenous retroviral sequences based on modular organization: proviral structure at the SSAV1 locus.

1997 
Abstract The current genome sequencing projects reveal megabases of unknown genomic sequences. About 1% of these sequences can be expected to be of retroviral origin. These are often severely deleted or mutated. Therefore, identification of the retroviral origin of these sequences can be very difficult due to the absence of convincing overall sequence similarity. There are also many copies of solo-LTRs (long terminal repeats) distributed throughout genomic sequences. LTR and envelope sequences in general are among the most divergent parts of the retroviral genome and thus especially hard to detect in mutated endogenous sequences. We took advantage of the fact that these retroviral sections contain short highly conserved sequence regions providing retroviral hallmarks even after loss of overall similarity. We defined several sequence elements and peptide motifs within LTR and Env sequences and used these elements to construct models for LTRs and Env proteins of mammalian C-type retroviruses. We then used this strategy to identify successfully the hitherto missing LTRs and an env -like region in the S71 human retroviral sequence. Our approach provides a new strategy for identifying remotely related retroviral sequences in genomic DNA (especially human DNA), of potential significance for the interpretation of genomic sequences obtained from the current large-scale sequencing projects.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    10
    Citations
    NaN
    KQI
    []