Aggregation Tool for Genomic Concepts (ATGC): A deep learning framework for sparse genomic measures

2021 
Deep learning has the ability to extract meaningful features from data given enough training examples. Large scale genomic data are well suited for this class of machine learning algorithms; however, for many of these data the labels are at the level of the sample instead of at the level of the individual genomic measures and features. To leverage the power of deep learning for these types of data we turn to a multiple instance learning framework, and present an easily extensible tool built with TensorFlow and Keras. We show how this tool can be applied to somatic variants (featurizing genomic position, sequence context, and read counts) on a range of artificial tasks (classification, regression, Cox regression). In addition, we confirm the model can achieve high performance on real-world problems, accurately classifying samples according to whether they contain a specific variant (hotspot or tumor suppressor), groups of variants (tumor clonality), or a type of variant (microsatellite instability). Our results suggest this framework will lead to improvements on sample-level tasks that require aggregation of a set of genomic measures.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    0
    Citations
    NaN
    KQI
    []