Spatial Clustering of Multivariate Genomic and Epigenomic Information

2009 
The combination of fully sequence genomes and new technologies for high density arrays and ultra-rapid sequencing enables the mapping of gene-regulatory and epigenetics marks on a global scale. This new experimental methodology was recently applied to map multiple histone marks and genomic factors, characterizing patterns of genome organization and discovering interactions among processes of epigenetic reprogramming during cellular differentiation. The new data poses a significant computational challenge in both size and statistical heterogeneity. Understanding it collectively and without bias remains an open problem. Here we introduce spatial clustering - a new unsupervised clustering methodology for dissection of large, multi-track genomic and epigenomic data sets into a spatially organized set of distinct combinatorial behaviors. We develop a probabilistic algorithm that finds spatial clustering solutions by learning an HMM model and inferring the most likely genomic layout of clusters. Application of our methods to meta-analysis of combined ChIP-seq and ChIP-chip epigenomic datasets in mouse and human reveals known and novel patterns of local co-occurrence among histone modification and related factors. Moreover, the model weaves together these local patterns into a coherent global model that reflects the higher level organization of the epigenome. Spatial clustering constitutes a powerful and scalable analysis methodology for dissecting even larger scale genomic dataset that will soon become available.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    35
    Citations
    NaN
    KQI
    []