Abstract Motivation: Identifying protein enzymatic or pharmacological activities are important areas of research in biology and chemistry. Biological and chemical databases are increasingly being populated with linkages between protein sequences and chemical structures. There is now sufficient information to apply machine-learning techniques to predict interactions between chemicals and proteins at a genome scale. Current machine-learning techniques use as input either protein sequences and structures or chemical information. We propose here a method to infer protein–chemical interactions using heterogeneous input consisting of both protein sequence and chemical information. Results: Our method relies on expressing proteins and chemicals with a common cheminformatics representation. We demonstrate our approach by predicting whether proteins can catalyze reactions not present in training sets. We also predict whether a given drug can bind a target, in the absence of prior binding information for that drug and target. Such predictions cannot be made with current machine-learning techniques requiring binding information for individual reactions or individual targets. Availability and Contact: For questions, paper reprints, please contact Jean-Loup Faulon at jfaulon@sandia.gov. Additional information on the signature molecular descriptor and codes can be downloaded at: http://www.cs.sandia.gov/~jfaulon/publication-signature.html Supplementary information: Supplementary data are available at Bioinformatics online.
ABSTRACT Highly washed membrane preparations from cells of the hyperthermophilic archaeon Pyrococcus furiosus contain high hydrogenase activity (9.4 μmol of H 2 evolved/mg at 80°C) using reduced methyl viologen as the electron donor. The enzyme was solubilized with n -dodecyl-β- d -maltoside and purified by multistep chromatography in the presence of Triton X-100. The purified preparation contained two major proteins (α and β) in an approximate 1:1 ratio with a minimum molecular mass near 65 kDa and contained ∼1 Ni and 4 Fe atoms/mol. The reduced enzyme gave rise to an electron paramagnetic resonance signal typical of the so-called Ni-C center of mesophilic NiFe-hydrogenases. Neither highly washed membranes nor the purified enzyme used NAD(P)(H) or P. furiosus ferredoxin as an electron carrier, nor did either catalyze the reduction of elemental sulfur with H 2 as the electron donor. Using N-terminal amino acid sequence information, the genes proposed to encode the α and β subunits were located in the genome database within a putative 14-gene operon (termed mbh ). The deduced sequences of the two subunits (Mbh 11 and 12) were distinctly different from those of the four subunits that comprise each of the two cytoplasmic NiFe-hydrogenases of P. furiosus and show that the α subunit contains the NiFe-catalytic site. Six of the open reading frames (ORFs) in the operon, including those encoding the α and β subunits, show high sequence similarity (>30% identity) with proteins associated with the membrane-bound NiFe-hydrogenase complexes from Methanosarcina barkeri , Escherichia coli , and Rhodospirillum rubrum . The remaining eight ORFs encode small (<19-kDa) hypothetical proteins. These data suggest that P. furiosus , which was thought to be solely a fermentative organism, may contain a previously unrecognized respiratory system in which H 2 metabolism is coupled to energy conservation.
Development of cellulosic biofuels from non-food crops is currently an area of intense research interest. Tailoring depolymerizing enzymes to particular feedstocks and pretreatment conditions is one promising avenue of research in this area. Here we added a green-waste compost inoculum to switchgrass (Panicum virgatum) and simulated thermophilic composting in a bioreactor to select for a switchgrass-adapted community and to facilitate targeted discovery of glycoside hydrolases. Small-subunit (SSU) rRNA-based community profiles revealed that the microbial community changed dramatically between the initial and switchgrass-adapted compost (SAC) with some bacterial populations being enriched over 20-fold. We obtained 225 Mbp of 454-titanium pyrosequence data from the SAC community and conservatively identified 800 genes encoding glycoside hydrolase domains that were biased toward depolymerizing grass cell wall components. Of these, approximately 10% were putative cellulases mostly belonging to families GH5 and GH9. We synthesized two SAC GH9 genes with codon optimization for heterologous expression in Escherichia coli and observed activity for one on carboxymethyl cellulose. The active GH9 enzyme has a temperature optimum of 50 degrees C and pH range of 5.5 to 8 consistent with the composting conditions applied. We demonstrate that microbial communities adapt to switchgrass decomposition using simulated composting condition and that full-length genes can be identified from complex metagenomic sequence data, synthesized and expressed resulting in active enzyme.
The hydrolysis of biomass to fermentable sugars using glycosyl hydrolases such as cellulases and hemicellulases is a limiting and costly step in the conversion of biomass to biofuels. Enhancement in hydrolysis efficiency is necessary and requires improvement in both enzymes and processing strategies. Advances in both areas in turn strongly depend on the progress in developing high-throughput assays to rapidly and quantitatively screen a large number of enzymes and processing conditions. For example, the characterization of various cellodextrins and xylooligomers produced during the time course of saccharification is important in the design of suitable reactors, enzyme cocktail compositions, and biomass pretreatment schemes. We have developed a microfluidic-chip-based assay for rapid and precise characterization of glycans and xylans resulting from biomass hydrolysis. The technique enables multiplexed separation of soluble cellodextrins and xylose oligomers in around 1 min (10-fold faster than HPLC). The microfluidic device was used to elucidate the mode of action of Tm_Cel5A, a novel cellulase from hyperthermophile Thermotoga maritima . The results demonstrate that the cellulase is active at 80 °C and effectively hydrolyzes cellodextrins and ionic-liquid-pretreated switchgrass and Avicel to glucose, cellobiose, and cellotriose. The proposed microscale approach is ideal for quantitative large-scale screening of enzyme libraries for biomass hydrolysis, for development of energy feedstocks, and for polysaccharide sequencing.
Author(s): Gaucher, Sara P.; Chirica, Gabriela; Sapra, Rajat; Redding, Alyssa M.; Mukhopadhyay, Aindrila; Buffleben, George M.; Kozina, Carrie; Phan, Richard; Joyner, Dominique C.; Keasling, Jay D.; Hazen, Terry C.; Arkin, Adam P.; Singh, Anup K. | Abstract: Sulfate reducing bacteria (SRB), found widely in nature, use sulfate as the terminal electron acceptor in their respiratory cycle, leading to the production of hydrogen sulfide. These bacteria have both ecological and economic importance. SRB play a role in various biogeochemical cycles including the sulfur and carbon cycles. They have a negative economic impact on the oil industry, where their metabolism causes corrosion and clogging of machinery, and fouling of oil wells. However, they have also been shown to reduce and/or immobilize toxic water-soluble metals such as copper (II), chromium (IV) and uranium (VI), and thus are candidates for bioremediation applications. Desulfovibrio vulgaris Hildenborough (DvH) is a member of the most well studied genus of SRBs. A goal of the Environmental Stress Pathway Project (ESPP) in the Virtual Institute for Microbial Stress and Survival (VIMSS) is to understand the regulatory networks in DvH for applications to bioremediation. One aspect of this is the elucidation of protein post-translational modifications (PTMs) in DvH.PTMs play various roles in the cell. Some modifications play a role in protein structure, such as lipid anchors or some disulfide bonds. Others are directly involved in regulation of protein function such as phosphorylation and glycosylation. Still others arise through cellular damage such as irreversible oxidation events. Whatever the role these PTMs play, they must be characterized at the protein level because they are not directly coded for in the genome. Furthermore, DvH may be particularly likely to use PTMs as a regulatory mechanism: Evidence for this includes the observation that the DvH genome encodes an abnormal number of histidine kinases. Our goal is to determine the types of protein modifications that arise in DvH and how these modifications affect the ability of DvH to survive or adapt to its environment. This work leverages the unique resources of the Virtual Institute for Microbial Stress and Survival: Quality controlled biomass produced at LBL (Hazen lab) is used for all proteomic LC/MS/MS measurements at LBL (Keasling lab). Our initial survey of PTMs in DvH was obtained by mining these numerous proteomic LC/MS/MS data sets acquired over the course of ESPP for evidence of modified peptides. Data mining for PTMs is performed at Sandia National Labs. The searched-for modifications were determined based on literature precedence and a genome search for the existence of relevant transferases. To date we have found preliminary evidence for cysteine oxidation, lysine acetylation, and methylation of lysine and arginine. Data mining for additional PTMs is ongoing. Future work will focus on validation of these findings and determining which, if any, of these modifications play a regulatory role in DvH. Validation will require selective isolation of the proteins of interest for further characterization. Here, protein isolation is made possible through the work being performed at LBL and the University of Missouri to generate DvH mutants containing tagged versions of DvH proteins.