Predicting the Emergence of SARS-CoV-2 Clades

2020 
Evolution is a process of change where mutations in the viral RNA are selected based on their fitness for replication and survival. Given that current phylogenetic analysis of SARS-CoV-2 identifies new viral clades after they exhibit evolutionary selections, one wonders whether we can identify the viral selection and predict the emergence of new viral clades? Inspired by the Kolmogorov complexity concept, we propose a generative complexity (algorithmic) framework capable to analyze the viral RNA sequences by mapping the multiscale nucleotide dependencies onto a state machine, where states represent subsequences of nucleotides and state-transition probabilities encode the higher order interactions between these states. We apply computational learning and classification techniques to identify the active state-transitions and use those as features in clade classifiers to decipher the transient mutations (still evolving within a clade) and stable mutations (typical to a clade). As opposed to current analysis tools that rely on the edit distance between sequences and require sequence alignment, our method is computationally local, does not require sequence alignment and is robust to random errors (substitution, insertions and deletions). Relying on the GISAID viral sequence database, we demonstrate that our method can predict clade emergence, potentially aiding with the design of medications and vaccines.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    1
    Citations
    NaN
    KQI
    []