CoRE-ATAC: A Deep Learning model for the functional Classification of Regulatory Elements from single cell and bulk ATAC-seq data

2020 
Cis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping the accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) and ATAC-seq read pileups. CoRE-ATAC was trained on 4 cell types (n=6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n=40 samples) that were not used in model training (average precision=0.80). CoRE-ATAC enhancer predictions from 19 human islets coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred functionality of cis-REs from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped well with known functional annotations in sorted immune cells. Performances on snATAC-seq data demonstrate CoRE-ATAC9s ability to infer cis-RE function in rare cell populations that can be identified by unsupervised clustering of snATAC-seq cells but difficult to capture in bulk ATAC-seq. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    51
    References
    3
    Citations
    NaN
    KQI
    []