GENVISAGE: Rapid Identification of Discriminative and Explainable Feature Pairs for Genomic Analysis

2020 
Motivation. A common but critical task in genomic data analysis is finding features that separate and thereby help explain differences between two classes of biological objects, e.g., genes that explain the differences between healthy and diseased patients. As lower-cost, high-throughput experimental methods greatly increase the number of samples that are assayed as objects for analysis, computational methods are needed to quickly provide insights into high-dimensional datasets with tens of thousands of objects and features. Results . We develop an interactive exploration tool called G ENVISAGE that rapidly discovers the most discriminative feature pairs that best separate two classes in a dataset, and displays the corresponding visualizations. Since quickly finding top feature pairs is computationally challenging, especially when the numbers of objects and features are large, we propose a suite of optimizations to make G ENVISAGE more responsive and demonstrate that our optimizations lead to a 400X speedup over competitive baselines for multiple biological data sets. With this speedup, G ENVISAGE enables the exploration of more large-scale datasets and alternate hypotheses in an interactive and interpretable fashion. We apply G ENVISAGE to uncover pairs of genes whose transcriptomic responses significantly discriminate treatments of several chemotherapy drugs. Availability. Free webserver at http://genvisage.knoweng.org:443/ with source code at https://github.com/KnowEnG/Genvisage
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    47
    References
    0
    Citations
    NaN
    KQI
    []