How Much Information is Provided by Human Epigenomic Data? An Evolutionary View

2018 
Here, we ask the question, "How much information do available epigenomic data sets provide about human genomic function, individually or in combination?" We measure genomic function by using signatures of natural selection as a proxy, and we measure information in terms of reductions in entropy derived from a probabilistic evolutionary model. Our analysis of the human genome considers measures of chromatin accessibility (DNase-seq), chromatin states (ChromHMM), RNA expression (RNA-seq), small RNAs, and DNA methylation across 115 cell types from the Roadmap Epigenomics project, together with gene annotations, splice sites, predicted transcription factor binding sites, and predicted DNA melting temperatures. Selective pressure is measured using patterns of genetic variation across ~50 modern humans and several nonhuman primates. We find that protein-coding gene annotations are most informative about genomic function, followed in decreasing order by RNA-seq, ChromHMM, splice sites, melting temperature, and DNase-seq. Several features exhibit clear synergy, meaning that they yield more information in combination than they do individually. Strikingly, most of the entropy in human genetic variation, by far, reflects mutation and neutral drift; indeed, the genome-wide information associated with natural selection in our data set measures only about 13 Mbits (1.6 MB), roughly equivalent to a typical digital photograph. Based on this framework, we produce cell-type-specific maps of the probability that a mutation at each nucleotide will have fitness consequences (FitCons scores), available as UCSC Genome Browser tracks for each cell type. These scores are predictive of known functional elements and disease-associated variants, they reveal relationships among cell types, and they suggest that ~8% of nucleotide sites are constrained by natural selection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    53
    References
    1
    Citations
    NaN
    KQI
    []