Polymer physics and machine learning reveal a combinatorial code linking chromatin 3D architecture to 1D epigenetics

2021 
The mammalian genome has a complex 3D organization, serving vital functional purposes, yet it remains largely unknown how the multitude of specific DNA contacts, e.g., between transcribed and regulatory regions, is orchestrated by chromatin organizers, such as Transcription Factors. Here, we implement a method combining machine learning and polymer physics to infer from only Hi-C data the genomic 1D arrangement of the minimal set of binding sites sufficient to recapitulate, through only physics, 3D contact patterns genome-wide in human and mouse cells. The inferred binding sites are validated by their predictions on how chromatin refolds in a set of duplications at the Sox9 locus against available independent cHi-C data, showing that their different phenotypes originate from distinct enhancer hijackings in their 3D structure. Albeit derived from only Hi-C, our binding sites fall in epigenetic classes that well match chromatin states from epigenetic segmentation studies, such as active, poised and repressed states. However, the inferred binding domains have an overlapping, combinatorial organization along chromosomes, missing in epigenetic segmentations, which is required to explain Hi-C contact specificity with high accuracy. In a reverse approach, the epigenetic profile of binding domains provides a code to derive from only epigenetic marks the DNA binding sites and, hence, the 3D architecture, as validated by successful predictions of Hi-C matrices in an independent set of chromosomes. Overall, our results shed light on how complex 3D architectural information is encrypted in 1D epigenetics via the related, combinatorial arrangement of specific binding sites along the genome.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    65
    References
    4
    Citations
    NaN
    KQI
    []