Analysis of DNA sequences of HIV virus using information theory

2012 
The acquired immunodeficiency syndrome (AIDS) is a leading global pandemic. Its virus, the HIV, enhances the destruction of the infected subject's immune system, and it has led millions of people to death every year [1]. HIV is a retrovirus belonging to the lentivirus family and has a great genetic diversity due to replication speed and evolution. It can be classified thus in various lineages, being HIV-1 the predominant worldwide. To assist in the study of these viruses, as well as other aspects of molecular biology, it is of vital importance to develop computational tools that allow analyzing biomolecules such as DNA, RNA and protein at a sequence organization level [2-4]. Thus, the possibility of making comparisons between DNA sequences of different strains of HIV may lead to elucidate aspects of the mutability of the virus and/or eventual deficiencies in its replication system. In this study, we selected two main genes (because of its importance in the HIV biology) from HIV viruses for sequence analysis: the GAG gene, which acts directly in the synthesis of structural proteins that make up the matrix, capsid and the nucleoprotein; and the POL gene, the enzyme responsible for reverse transcription and integration [5]. These sequences were obtained from the GenBank database [6] in 2011. Here we present a characterization methodology based on the entropy of triplet sequences (defined in the following) and how this methodology can help to find subgroups that are not apparent in the mere analysis of GC content (Guanine and Cytosine fraction) of these sequences.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []