Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data.

2021 
Nanopore sequencing provides signal data corresponding to the nucleotide motifs sequenced. Through machine learning-based methods, these signals are translated into long-read sequences that overcome the read size limit of short-read sequencing. However, analyzing the raw nanopore signal data provides many more opportunities beyond just sequencing genomes and transcriptomes: algorithms that use machine learning approaches to extract biological information from these signals allow the detection of DNA and RNA modifications, the estimation of poly(A) tail length, and the prediction of RNA secondary structures. In this review, we discuss how developments in machine learning methodologies contributed to more accurate basecalling and lower error rates, and how these methods enable new biological discoveries. We argue that direct nanopore sequencing of DNA and RNA provides a new dimensionality for genomics experiments and highlight challenges and future directions for computational approaches to extract the additional information provided by nanopore signal data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    80
    References
    0
    Citations
    NaN
    KQI
    []