Structural genomics: A pipeline for providing structures for the biologist

2002 
Progress in understanding the organization and sequences of genes in model organisms and humans is rapidly accelerating. Although genome sequences from prokaryotes have been available for some time, only recently have the genome sequences of several eukaryotic organisms been reported, including Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, and humans (Green 2001). A logical continuation of this line of scientific inquiry is to understand the structure and function of all genes in simple and complex organisms, including the pathways leading to the organization and biochemical function of macromolecular assemblies, organelles, cells, organs, and whole life forms. Such investigations have been variously called integrative or systems biology and -omics or high-throughput biology (Ideker et al. 2001, Greenbaum et al. 2001, Vidal 2001). These studies have blossomed because of advances in technologies that allow highly parallel examination of multiple genes and gene products as well as a vision of biology that is not purely reductionist. Although a unified understanding of biological organisms is still far in the future, new high-throughput biological approaches are having a drastic impact on the scientific mainstream. One offshoot of the high-throughput approach, which directly leverages the accumulating gene sequence information, involves mining the sequence data to detect important evolutionary relationships, to identify the basic set of genes necessary for independent life, and to reveal important metabolic processes in humans and clinically relevant pathogens. Programs such as MAGPIE (www.genomes.rockefeller.edu/magpie/magpie.html) compare organisms at a whole genome level (Gaasterland and Sensen 1996; Gaasterland and Ragan 1998) and ask what functions are conferred by the new genes that have evolved in higher organisms (Gaasterland and Oprea 2001). Concurrent with computational annotations of gene structure and function, thousands of full-length ORFs from yeast and higher eukaryotes have become available because of advances in cloning and other molecular biology techniques (Walhout et al. 2000a). Structural biologists have embraced high-throughput biology by developing and implementing technologies that will enable the structures of hundreds of protein domains to be solved in a relatively short time. Although thousands of structures are deposited annually in the Protein Data Bank (PDB), most are identical or very similar in sequence to a structure previously existing in the data bank, representing structures of mutants or different ligand bound states (Brenner et al. 1997). Providing structural information for a broader range of sequences requires a focused effort on determining structure for sequences that are divergent from those already in the database. Although structure does not always elucidate function, in many instances (including the structures of two proteins reported here) the atomic structure readily provides insight into the function of a protein whose function was previously unknown. Typically, such functional annotations are based on homologies that are not recognizable at the sequence level but that are clearly revealed on inspection of the protein fold, identification of a conserved constellation of side-chain functionalities, or by the observation of cofactors associated with function (Burley et al. 1999; Shi et al. 2001; Bonanno et al. 2002).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    52
    References
    150
    Citations
    NaN
    KQI
    []