Sparse Epistatic Regularization of Deep Neural Networks for Inferring Fitness Functions

2020 
Despite recent advances in high-throughput combinatorial mutagenesis assays, the number of labeled sequences available to predict molecular functions has remained small for the vastness of the sequence space combined with the ruggedness of many fitness functions. Expressive models in machine learning (ML), such as deep neural networks (DNNs), can model the nonlinearities in rugged fitness functions, which manifest as high-order epistatic interactions among the mutational sites. However, in the absence of an inductive bias, DNNs overfit to the small number of labeled sequences available for training. Herein, we exploit the recent biological evidence that epistatic interactions in many fitness functions are sparse; this knowledge can be used as an effective inductive bias to regularize DNNs. We have developed a method for sparse epistatic regularization of DNNs, called the epistatic net (EN), which constrains the number of non-zero coefficients in the spectral representation of DNNs. For larger sequences, where finding the spectral transform becomes computationally intractable, we have developed a scalable extension of EN, which subsamples the combinatorial sequence space uniformly inducing a sparse-graph-code structure, and regularizes DNNs using the resulting novel greedy optimization method. Results on several biological landscapes, from bacterial to protein fitness functions, showed that EN consistently improves the prediction accuracy of DNNs and enables them to outperform baseline supervised models in ML which assume other forms of inductive biases. EN estimates all the higher-order epistatic interactions of DNNs trained on massive combinatorial sequence spaces---a computational problem that takes years to solve without leveraging the epistatic sparsity structure in the fitness functions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    57
    References
    4
    Citations
    NaN
    KQI
    []