We develop a statistical framework to study the relationship between chromatin features and gene expression. This can be used to predict gene expression of protein coding genes, as well as microRNAs. We demonstrate the prediction in a variety of contexts, focusing particularly on the modENCODE worm datasets. Moreover, our framework reveals the positional contribution around genes (upstream or downstream) of distinct chromatin features to the overall prediction of expression levels.
microRNAs (miRNAs) represent ∼4% of the genes in vertebrates, where they regulate deadenylation, translation, and decay of the target messenger RNAs (mRNAs). The integrated role of miRNAs to regulate gene expression and cell function remains largely unknown. Therefore, to identify the targets coordinately regulated by muscle miRNAs in vivo, we performed gene expression arrays on muscle cells sorted from wild type, dicer mutants, and single miRNA knockdown embryos. Our analysis reveals that two particular miRNAs, miR-1 and miR-133, influence gene expression patterns in the zebrafish embryo where they account for >54% of the miRNA-mediated regulation in the muscle. We also found that muscle miRNA targets (1) tend to be expressed at low levels in wild-type muscle but are more highly expressed in dicer mutant muscle, and (2) are enriched for actin-related and actin-binding proteins. Loss of dicer function or down-regulation of miR-1 and miR-133 alters muscle gene expression and disrupts actin organization during sarcomere assembly. These results suggest that miR-1 and miR-133 actively shape gene expression patterns in muscle tissue, where they regulate sarcomeric actin organization.
In recent years, terahertz radiation sources are increasingly being exploited in military and civil applications. However, only a few studies have so far been conducted to examine the biological effects associated with terahertz radiation. In this study, we evaluated the cellular response of mesenchymal mouse stem cells exposed to THz radiation. We apply low-power radiation from both a pulsed broad-band (centered at 10 THz) source and from a CW laser (2.52 THz) source. Modeling, empirical characterization, and monitoring techniques were applied to minimize the impact of radiation-induced increases in temperature. qRT-PCR was used to evaluate changes in the transcriptional activity of selected hyperthermic genes. We found that temperature increases were minimal, and that the differential expression of the investigated heat shock proteins (HSP105, HSP90, and CPR) was unaffected, while the expression of certain other genes (Adiponectin, GLUT4, and PPARG) showed clear effects of the THz irradiation after prolonged, broad-band exposure.
The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.
We have accumulated a large amount of biological network data and expect even more to come. Soon, we anticipate being able to compare many different biological networks as we commonly do for molecular sequences. It has long been believed that many of these networks change, or "rewire", at different rates. It is therefore important to develop a framework to quantify the differences between networks in a unified fashion. We developed such a formalism based on analogy to simple models of sequence evolution, and used it to conduct a systematic study of network rewiring on all the currently available biological networks. We found that, similar to sequences, biological networks show a decreased rate of change at large time divergences, because of saturation in potential substitutions. However, different types of biological networks consistently rewire at different rates. Using comparative genomics and proteomics data, we found a consistent ordering of the rewiring rates: transcription regulatory, phosphorylation regulatory, genetic interaction, miRNA regulatory, protein interaction, and metabolic pathway network, from fast to slow. This ordering was found in all comparisons we did of matched networks between organisms. To gain further intuition on network rewiring, we compared our observed rewirings with those obtained from simulation. We also investigated how readily our formalism could be mapped to other network contexts; in particular, we showed how it could be applied to analyze changes in a range of "commonplace" networks such as family trees, co-authorships and linux-kernel function dependencies.
We present an integrative machine learning method, incRNA , for whole-genome identification of noncoding RNAs (ncRNAs). It combines a large amount of expression data, RNA secondary-structure stability, and evolutionary conservation at the protein and nucleic-acid level. Using the incRNA model and data from the modENCODE consortium, we are able to separate known C. elegans ncRNAs from coding sequences and other genomic elements with a high level of accuracy (97% AUC on an independent validation set), and find more than 7000 novel ncRNA candidates, among which more than 1000 are located in the intergenic regions of C. elegans genome. Based on the validation set, we estimate that 91% of the approximately 7000 novel ncRNA candidates are true positives. We then analyze 15 novel ncRNA candidates by RT-PCR, detecting the expression for 14. In addition, we characterize the properties of all the novel ncRNA candidates and find that they have distinct expression patterns across developmental stages and tend to use novel RNA structural families. We also find that they are often targeted by specific transcription factors (∼59% of intergenic novel ncRNA candidates). Overall, our study identifies many new potential ncRNAs in C. elegans and provides a method that can be adapted to other organisms.
From Genome to Regulatory Networks For biologists, having a genome in hand is only the beginning—much more investigation is still needed to characterize how the genome is used to help to produce a functional organism (see the Perspective by Blaxter ). In this vein, Gerstein et al. (p. 1775 ) summarize for the Caenorhabditis elegans genome, and The modENCODE Consortium (p. 1787 ) summarize for the Drosophila melanogaster genome, full transcriptome analyses over developmental stages, genome-wide identification of transcription factor binding sites, and high-resolution maps of chromatin organization. Both studies identified regions of the nematode and fly genomes that show highly occupied targets (or HOT) regions where DNA was bound by more than 15 of the transcription factors analyzed and the expression of related genes were characterized. Overall, the studies provide insights into the organization, structure, and function of the two genomes and provide basic information needed to guide and correlate both focused and genome-wide studies.
The integration of molecular networks with other types of data, such as changing levels of gene expression or protein-structural features, can provide richer information about interactions than the simple node-and-edge representations commonly used in the network community. For example, the mapping of 3D-structural data onto networks enables classification of proteins into singlish- or multi-interface hubs (depending on whether they have >2 interfaces). Similarly, interactions can be classified as permanent or transient, depending on whether their interface is used by only one or by multiple partners. Here, we incorporate an additional dimension into molecular networks: dynamic conformational changes. We parse the entire PDB structural databank for alternate conformations of proteins and map these onto the protein interaction network, to compile a first version of the Dynamic Structural Interaction Network (DynaSIN). We make this network available as a readily downloadable resource file, and we then use it to address a variety of downstream questions. In particular, we show that multi-interface hubs display a greater degree of conformational change than do singlish-interface ones; thus, they show more plasticity which perhaps enables them to utilize more interfaces for interactions. We also find that transient associations involve smaller conformational changes than permanent ones. Although this may appear counterintuitive, it is understandable in the following framework: as proteins involved in transient interactions shuttle between interchangeable associations, they interact with domains that are similar to each other and so do not require drastic structural changes for their activity. We provide evidence for this hypothesis through showing that interfaces involved in transient interactions bind fewer classes of domains than those in a control set.
Abstract We propose a method to predict yeast transcription factor targets by integrating histone modification profiles with transcription factor binding motif information. It shows improved predictive power compared to a binding motif-only method. We find that transcription factors cluster into histone-sensitive and -insensitive classes. The target genes of histone-sensitive transcription factors have stronger histone modification signals than those of histone-insensitive ones. The two classes also differ in tendency to interact with histone modifiers, degree of connectivity in protein-protein interaction networks, position in the transcriptional regulation hierarchy, and in a number of additional features, indicating possible differences in their transcriptional regulation mechanisms.