Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

2003 
The generation of expressed sequence tags (ESTs) provides a rapid means of gene discovery from single-pass sequencing of randomly selected cDNAs. This approach has been particularly useful for complex, model genomes including human (Hillier et al. 1996), rat (Scheetz et al. 2001), mouse (Marra et al. 1999b), fish (Clark et al. 2001), and rice (Ewing et al. 1999). One of the primary advantages of ESTs is that the identification of putative genes by BLAST comparisons (Altschul et al. 1990) enables researchers to begin biological analyses prior to the completion, or even initiation, of a full genome sequence. EST sequencing is likely to make its greatest impact on understudied genomes where little prior sequence data exists and where full genome sequencing projects may not be undertaken in the near future. Parasites provide such a group, and previous EST projects have revealed the tremendous utility of this approach for gene discovery (Reddy et al. 1993; Chakrabarti et al. 1994; Wan et al. 1995; Ajioka et al. 1998; Manger et al. 1998b; Howe 2001). EST sequencing allows not only the rapid identification of abundantly expressed genes, it also provides data sets for informing phylogenetic analyses, examining strain diversity, and exploring developmentally regulated genes. Tools for recognizing and combining ESTs generated from the same gene into nonredundant assemblies in silico have recently been refined, improving the chances for establishing gene identities. Such identities, although only putative, enable rapid analysis of gene function, thus greatly facilitating traditional research approaches. To further the process of gene discovery in protozoan parasites, we have undertaken large-scale EST sequencing projects for several apicomplexan parasites. The Apicomplexa is an ancient phylum of ∼5000 species, all of which are parasitic (Levine 1970). Apicomplexans are most closely related to dinoflagellates and ciliates as shown by phylogenetic reconstructions based on small subunit ribosomal RNA sequences (Gajadhar et al. 1991; Escalante and Ayala 1994), and more recently, by examining conserved protein sequences (Baldauf et al. 2000). The age of the Apicomplexa predicts that many of their features will have diverged since their last common ancestry with the major eukaryotic kingdoms of plants, animals, and fungi. The relationships of major taxa within the Apicomplexa are depicted in Figure ​Figure1.1. Included are all major groupings in which EST or genome projects are presently underway. Additionally, the outgroups of ciliates (Paramecium, Oxytricha) and dinoflagellates (Prorocentrum, Symbiodinium) are shown for comparison. Apicomplexan parasites infect a wide range of vertebrate hosts and cause diseases of medical importance in humans, or veterinary importance in a range of domestic animals. In the present study, we have chosen the following organisms for study: Plasmodium falciparum and Toxoplasma gondii, which are both agents of human disease, and Eimeria tenella, Neospora caninum, and Sarcocystis neurona, which cause important diseases in agricultural and companion animals (Dubey 1977; Long 1993; Dubey and Lindsay 1996). Figure 1. Unrooted distance phylogram generated from a neighbor-joining analysis of small subunit ribosomal genes. Organisms were chosen to illustrate relationships among the Alveolata with an emphasis on the Apicomplexa. The organisms chosen for study here include ... One of the major drawbacks of EST sequencing is the large number of database entries that are submitted separately to dbEST without extensive annotation. This complicates the problem of establishing which ESTs belong to a given gene, and whether similar ESTs belong to the same or closely related genes. Therefore, in addition to generating new ESTs from these organisms, we have clustered and assembled the resulting sequences into RNA consensus sequences and created a gene database that provides a variety of features for comparative analyses. Herein, we describe the creation of this database, illustrate several of its important features, and highlight several major features of gene content and expression in the Apicomplexa.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    54
    References
    144
    Citations
    NaN
    KQI
    []