Abstract:
This chapter contains sections titled: Summary of the Chapter Definitions of Keywords Introduction. Short History of the Human Genome Project Vectors. Description of the Main Types of Vectors Used in the HGP Mapping of the Human Genome Main Approaches to Sequence Human Genome Identification of Genes. How Many Genes Are in the Human Genome? Was it Worth to it Sequence Human Genome? Acknowledgment ReferencesKeywords:
ENCODE
Identification
Sequence (biology)
Cancer genome sequencing
Abstract Background The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies. Results we used two RNA-Seq datasets from human brain tissues and 10 mixed cell lines to investigate the completeness of human reference genome. First, we demonstrated that in previously identified ~5 Mb Asian and ~5 Mb African novel sequences that are absent from the human reference genome of NCBI build 36, ~211 kb and ~201 kb of them could be transcribed, respectively. Our results suggest that many of those transcribed regions are not specific to Asian and African, but also present in Caucasian. Then, we found that the expressions of 104 RefSeq genes that are unalignable to NCBI build 37 in brain and cell lines are higher than 0.1 RPKM. 55 of them are conserved across human, chimpanzee and macaque, suggesting that there are still a significant number of functional human genes absent from the human reference genome. Moreover, we identified hundreds of novel transcript contigs that cannot be aligned to NCBI build 37, RefSeq genes and EST sequences. Some of those novel transcript contigs are also conserved among human, chimpanzee and macaque. By positioning those contigs onto the human genome, we identified several large deletions in the reference genome. Several conserved novel transcript contigs were further validated by RT-PCR. Conclusion Our findings demonstrate that a significant number of genes are still absent from the incomplete human reference genome, highlighting the importance of further refining the human reference genome and curating those missing genes. Our study also shows the importance of de novo transcriptome assembly. The comparative approach between reference genome and other related human genomes based on the transcriptome provides an alternative way to refine the human reference genome.
RefSeq
Comparative Genomics
RNA-Seq
Cite
Citations (33)
The Encyclopedia of DNA Elements (ENCODE) project is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI), with the goal of delineating all functional elements encoded in the human genome. This project began in 2003 with a targeted analysis of a selected 1% of the human genome in order to testify the methods. The second phase of funding was then provided to scale the project to the entire human genome. This project has identified a large number of new DNA regulatory elements, based on novel relationships among DNA methylation, histone modification, nucleosome remodeling, and RNA-mediated targeting regulate many biological processes. The results of the second project comprised 1640 data sets, from 147 different cell types and the findings were released in a coordinated set of 30 publications across several journals. The ENCODE publications report that 80.4% of the human genome displays some functionality. The project gives us new insights into the organization and regulation of the human genome and epigenome, which significantly enhances our understanding of human health and diseases.
Key words:
ENCODE; Human genome; DNA methylation; Histone modification; Next generation sequencing
ENCODE
Epigenome
Cite
Citations (1)
ENCODE
Encyclopedia
Thematic map
Cite
Citations (15)
Abstract The availability of the human genome sequence has had an enormous impact on biomedical research. New discoveries emanating directly from the elucidation of the human genome sequence have included the unexpectedly low total number of genes, the existence of numerous transcribed but noncoding sequences and the multiplicity of low‐copy repeats and segmental duplications. The Human Genome Project has also spawned new research projects such as Encyclopedia of DNA Elements (ENCODE) and Haplotype Map (HapMap) which together are helping to reveal the remarkable complexity of the human genome. Finally, comparison of the human genome sequence with the genome sequences of other higher organisms has opened up numerous research avenues in evolutionary biology.
ENCODE
International HapMap Project
Noncoding DNA
Sequence (biology)
Personal genomics
Cite
Citations (0)
Abstract Identifying active cis -regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES, the first supervised deep learning approach for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis -regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data) and 26,000 candidate promoters (0.6% of the genome).
Cite
Citations (11)
The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (E1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. The results of this pilot phase will guide future efforts to analyze the entire human genome. With the complete human genome sequence now in hand (1–3), we face the enormous challenge of interpreting it and learning how to use that information to understand the biology of human health and disease. The ENCyclopedia Of DNA Elements (ENCODE) Project is predicated on the belief that a comprehensive catalog of the structural and functional components encoded in the human genome sequence will be critical for understanding human biology well enough to address those fundamental aims of biomedical research. Such a complete catalog, or Bparts list,[ would include protein-coding genes, non–protein-coding genes, transcriptional regulatory elements, and sequences that mediate chromosome structure and dynamics; undoubtedly, additional, yet-to-bedefined types of functional sequences will also need to be included. To illustrate the magnitude of the challenge involved, it only needs to be pointed out that an inventory of the best-defined functional components in the human genome— the protein-coding sequences—is still incomplete for a number of reasons, including the fragmented nature of human genes. Even with essentially all of the human genome sequence in hand, the number of protein-coding genes can still only be estimated (currently 20,000 to 25,000) (3). Non–protein-coding genes are much less well defined. Some, such as the ribosomal RNA and tRNA genes, were identified several decades ago, but more recent
ENCODE
Encyclopedia
Cite
Citations (4)
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research. This overview of the ENCODE project outlines the data accumulated so far, revealing that 80% of the human genome now has at least one biochemical function assigned to it; the newly identified functional elements should aid the interpretation of results of genome-wide association studies, as many correspond to sites of association with human disease. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription-factor association, chromatin structure and histone modification. In this overview, the Consortium guides the readers through the project itself, the data and their integrated analyses. Eighty per cent of the human genome now has at least one biochemical function assigned to it. In addition to expanding our understanding of how gene expression is regulated on a genome-wide scale, the newly identified functional elements should help researchers to interpret the results of genome-wide associated studies because many correspond to sites associated with human disease.
ENCODE
Encyclopedia
Expansive
Cite
Citations (17,070)
In September,2012,more than 30 papers published on Nature,Genome Research,and Genome Biology,the ENCODE(Encyclopedia of DNA Elements) project achieved its first success.After the HGP(Human Genome Project) came to an end with a map of human genome sequence in 2001,we didn't know clearly how/where genes regulated by other factors.So,in September 2003,National Human Genome Research Institute launched a public research consortium named ENCODE,the Encyclopedia of DNA Elements,to identify all functional elements in the human genome sequence.Researchers pinpointed hundreds of thousands of landing spots for proteins that influence gene activity,many thousands of stretches of DNA that code for different types of RNA,and lots of places where chemical modifications serve to silence stretches of our chromosomes,concluding that 80% of the genome was biochemically active.Here,we review correlative information of about ENCODE and give readers instructions to recognize and apply achievements of ENCODE for human health service.
ENCODE
Encyclopedia
Genome Biology
Cite
Citations (0)
ENCODE
Encyclopedia
Cite
Citations (0)
Abstract The Human Genome Project (HGP) was initiated in 1990 and completed in 2003. It aimed to sequence the whole human genome. Although it represented an advance in understanding the human genome and its complexity, many questions remained unanswered. Other projects were launched in order to unravel the mysteries of our genome, including the ENCyclopedia of DNA Elements (ENCODE). This review aims to analyze the evolution of scientific knowledge related to both the HGP and ENCODE projects. Data were retrieved from scientific articles published in 1990–2014, a period comprising the development and the 10 years following the HGP completion. The fact that only 20,000 genes are protein and RNA‐coding is one of the most striking HGP results. A new concept about the organization of genome arose. The ENCODE project was initiated in 2003 and targeted to map the functional elements of the human genome. This project revealed that the human genome is pervasively transcribed. Therefore, it was determined that a large part of the non‐protein coding regions are functional. Finally, a more sophisticated view of chromatin structure emerged. The mechanistic functioning of the genome has been redrafted, revealing a much more complex picture. Besides, a gene‐centric conception of the organism has to be reviewed. A number of criticisms have emerged against the ENCODE project approaches, raising the question of whether non‐conserved but biochemically active regions are truly functional. Thus, HGP and ENCODE projects accomplished a great map of the human genome, but the data generated still requires further in depth analysis. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:215–223, 2016.
Cite
Citations (100)