ABSTRACT The human genome harbors many distinct families of human endogenous retroviruses (HERVs) that stem from exogenous retroviruses that infected the germ line millions of years ago. Many HERV families remain to be investigated. We report in the present study the detailed characterization of the HERV-K14I and HERV-K14CI families as they are represented in the human genome. Most of the 68 HERV-K14I and 23 HERV-K14CI proviruses are severely mutated, frequently displaying uniform deletions of retroviral genes and long terminal repeats (LTRs). Both HERV families entered the germ line ∼39 million years ago, as evidenced by homologous sequences in hominoids and Old World primates and calculation of evolutionary ages based on a molecular clock. Proviruses of both families were formed during a brief period. A majority of HERV-K14CI proviruses on the Y chromosome mimic a higher evolutionary age, showing that LTR-LTR divergence data can indicate false ages. Fully translatable consensus sequences encoding major retroviral proteins were generated. Most HERV-K14I loci lack an env gene and are structurally reminiscent of LTR retrotransposons. A minority of HERV-K14I variants display an env gene. HERV-K14I proviruses are associated with three distinct LTR families, while HERV-K14CI is associated with a single LTR family. Hybrid proviruses consisting of HERV-K14I and HERV-W sequences that appear to have produced provirus progeny in the genome were detected. Several HERV-K14I proviruses harbor TRPC6 mRNA portions, exemplifying mobilization of cellular transcripts by HERVs. Our analysis contributes essential information on two more HERV families and on the biology of HERV sequences in general.
A male Asian elephant ( Elephas maximus ) died at the Berlin zoological gardens in August 1998 of systemic infection with the novel endotheliotropic elephant herpesvirus (ElHV-1). This virus causes a fatal haemorrhagic disease in Asian elephants, the so-called endothelial inclusion body disease, as reported from North American zoological gardens. In the present work, ElHV-1 was visualized ultrastructurally in affected organ material. Furthermore, a gene block comprising the complete glycoprotein B (gB) and DNA polymerase (DPOL) genes as well as two partial genes was amplified by PCR-based genome walking and sequenced. The gene content and arrangement were similar to those of members of the Betaherpesvirinae . However, phylogenetic analysis with gB and DPOL consistently revealed a very distant relationship to the betaherpesviruses. Therefore, ElHV-1 may be a member of a new genus or even a new herpesvirus subfamily. The sequence information generated was used to set up a nested-PCR assay for diagnosis of suspected cases of endothelial inclusion body disease. Furthermore, it will aid in the development of antibody-based detection methods and of vaccination strategies against this fatal herpesvirus infection in the endangered Asian elephant.
Recently, there has been a surge of interest in gapped q-gram filters for approximate string matching. Important design parameters for filters are for example the value of q, the filter-threshold and in particular the shape (aka seed) of the filter. A good choice of parameters can improve the performance of a q-gram filter by orders of magnitude and optimizing these parameters is a nontrivial combinatorial problem. We describe a new method for analyzing gapped q-gram filters. This method is simple and generic. It applies to a variety of filters, overcomes many restrictions that are present in existing algorithms and can easily be extended to new filter variants. To implement our approach, we use an extended version of BDDs (Binary Decision Diagrams), a data structure that efficiently represents sets of bit-strings. In a second step, we define a new class of multi-shape filters and analyze these filters with the BDD-based approach. Experiments show that multi-shape filters can outperform the best single-shape filters, which are currently in use, in many aspects. The BDD-based algorithm is crucial for the design and analysis of these new and better multi-shape filters. Our results apply to the k-mismatches problem, i.e. approximate string matching with Hamming distance.