scSNVIndel. accurate and efficient calling of SNVs and indels from single cell sequencing using integrated Bi-LSTM

2020 
Single-cell data are sparse and have coverage fluctuations, making it difficult, in comparison with data obtained from next-generation sequencing (NGS), to call single nucleotide variants (SNVs) and indels. Furthermore, most existing sequencing methods are unable to effectively call whole-genome SNVs and indels from single cell sequencing (SCS) data. In this study, we propose a new method for the efficient identification of SNVs and indels from SCS data, called scSNVIndel. scSNVIndel uses bidirectional long short-term memory (Bi-LSTM) as its base and integrates new natural language processing (NLP) technology. It automatically extracts features and accurately calls SNVs and indels when using SCS data, which is characterized by uneven and discontinuous coverage. Moreover, scSNVIndel can call variants from the sequence directly, retaining valuable information from the SCS data, as it does not convert the sequence into an image like the DeepVariant method. The results show that scSNVIndel performs better in terms of accuracy and recall for calling variants, when compared with other existing methods. scSNVIndel is currently an open-source method, available at https://github.com/CSuperlei/scSNVIndel, and its usage methods are published on the following website: https://www.aiguqu.com/2020/06/18/scSNVIndel/.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []