Abstract Background The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies. Results we used two RNA-Seq datasets from human brain tissues and 10 mixed cell lines to investigate the completeness of human reference genome. First, we demonstrated that in previously identified ~5 Mb Asian and ~5 Mb African novel sequences that are absent from the human reference genome of NCBI build 36, ~211 kb and ~201 kb of them could be transcribed, respectively. Our results suggest that many of those transcribed regions are not specific to Asian and African, but also present in Caucasian. Then, we found that the expressions of 104 RefSeq genes that are unalignable to NCBI build 37 in brain and cell lines are higher than 0.1 RPKM. 55 of them are conserved across human, chimpanzee and macaque, suggesting that there are still a significant number of functional human genes absent from the human reference genome. Moreover, we identified hundreds of novel transcript contigs that cannot be aligned to NCBI build 37, RefSeq genes and EST sequences. Some of those novel transcript contigs are also conserved among human, chimpanzee and macaque. By positioning those contigs onto the human genome, we identified several large deletions in the reference genome. Several conserved novel transcript contigs were further validated by RT-PCR. Conclusion Our findings demonstrate that a significant number of genes are still absent from the incomplete human reference genome, highlighting the importance of further refining the human reference genome and curating those missing genes. Our study also shows the importance of de novo transcriptome assembly. The comparative approach between reference genome and other related human genomes based on the transcriptome provides an alternative way to refine the human reference genome.
Limited cohort of transcription factors is capable to structure various gene-expression patterns. Transcriptional cooperativity (TC) is deemed to be the main mechanism of complexity and precision in regulatory programs. Although many data types generated from numerous experimental technologies are utilized in an attempt to understand combinational transcriptional regulation, complementary computational approach that can integrate diverse data resources and assimilate them into biological model is still under development.We developed a novel Bayesian approach for integrative analysis of proteomic, transcriptomic and genomic data to identify specific TC. The model evaluation demonstrated distinguishable power of features derived from distinct data sources and their essentiality to model performance. Our model outperformed other classifiers and alternative methods. The application that contextualized TC within hepatocarcinogenesis revealed carcinoma associated alterations. Derived TC networks were highly significant in capturing validated cooperativity as well as revealing novel ones. Our methodology is the first multiple data integration approach to predict dynamic nature of TC. It is promising in identifying tissue- or disease-specific TC and can further facilitate the interpretation of underlying mechanisms for various physiological conditions.tieliushi01@gmail.comSupplementary data are available at Bioinformatics online.
Chloroplast development in plants is regulated by a series of coordinated biological processes. In this work, a genetic suppressor screen for the leaf variegation phenotype of the thylakoid formation 1 (thf1) mutant combined with a proteomic assay was employed to elucidate this complicated network. We identified a mutation in ClpR4, named clpR4-3, which leads to leaf virescence and also rescues the var2 variegation. Proteomic analysis showed that the chloroplast proteome of clpR4-3 thf1 is dominantly controlled by clpR4-3, providing molecular mechanisms that cause genetic epistasis of clpR4-3 to thf1. Classification of the proteins significantly mis-regulated in the mutants revealed that those functioning in the expression of plastid genes are oppositely regulated while proteins functioning in antioxidative stress, protein folding, and starch metabolism are changed in the same direction between thf1 and clpR4-3. The levels of FtsHs including FtsH2/VAR2, FtsH8, and FtsH5/VAR1 are greatly reduced in thf1 compared with those in the wild type, but are higher in clpR4-3 thf1 than in thf1. Quantitative PCR analysis revealed that FtsH expression in clpR4-3 thf1 is regulated post-transcriptionally. In addition, a number of ribosomal proteins are less expressed in the clpR4-3 proteome, which is in line with the reduced levels of rRNAs in clpR4-3. Furthermore, knocking out PRPL11, one of the most downregulated proteins in the clpR4-3 thf1 proteome, rescues the leaf variegation phenotype of the thf1 and var2 mutants. These results provide insights into molecular mechanisms by which the virescent clpR4-3 mutation suppresses leaf variegation of thf1 and var2.