A data-driven method to learn a jump diffusion process from aggregate biological gene expression data

2021 
Dynamic models of gene expression are urgently required. Different from trajectory inference and RNA velocity, our method reveals gene dynamics by learning a jump diffusion process for modeling the biological process directly. The algorithm needs aggregate gene expression data as input and outputs the parameters of the jump diffusion process. The learned jump diffusion process can predict population distributions of gene expression at any developmental stage, achieve long-time trajectories for individual cells, and offer a novel approach to computing RNA velocity. Moreover, it studies biological systems from a stochastic dynamics perspective. Gene expression data at a time point, which is a snapshot of a cellular process, is treated as an empirical marginal distribution of a stochastic process. The Wasserstein distance between the empirical distribution and predicted distribution by the jump diffusion process is minimized to learn the dynamics. For the learned jump diffusion equation, its trajectories correspond to the development process of cells and stochasticity determines the heterogeneity of cells. Its instantaneous rate of state change can be taken as "RNA velocity", and the changes in scales and orientations of clusters can be noticed too. We demonstrate that our method can recover the underlying nonlinear dynamics better compared to parametric models and diffusion processes driven by Brownian motion for both synthetic and real world datasets. Our method is also robust to perturbations of data because it only involves population expectations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    0
    Citations
    NaN
    KQI
    []