LRScaf: Improving Draft Genomes Using Long Noisy Reads
2018
Background: The advent of Third Generation Sequencing (TGS) technologies opens the door to improve genome assembly. Long reads are promised to enhance the quality of fragmental draft assemblies constructed from Next Generation Sequencing (NGS) technologies. To date, a few of algorithms, i.e., SSPACE-LongRead, OPERA-LG, SMIS, npScarf, DBG2OLC, Unicycler, and LINKS, have been released that are capable of improving draft assemblies. However, hybrid assembly on large genomes is still challenging. Results: We develop a scalable and computationally efficient scaffolder, Long Reads Scaffolder (LRScaf), that is capable of boosting assembly contiguity to a large extent using long reads. In our experiment, our method significantly improves the contiguity of human draft assemblies, increasing the NG50 value of CHM1 from 127.5 Kb to 10.4 Mb using 20-fold coverage PacBio dataset and the NG50 value of NA12878 from 115.7 Kb to 17.4 Mb using 35-fold coverage Nanopore dataset. The run time for the scaffolding procedure using LRScaf is the shortest in all cases of our experiment. Compared with the run time of SSPACE-LongRead, LRScaf is faster 300 times for S. cerevisiae and 2,300 times for D. melanogaster. The peak RAM of LRScaf, by contrast, is more efficient than LINKS in our test. For the rice case, the peak RAM of LINKS (877.72 Gb) is about 196 times higher than LRScaf. For the experiment of human assembly, the peak RAM of LINKS is beyond the capacity of system memory (1 Tb) whereas LRScaf takes 20.28 and 41.20 Gb on CHM1 and NA12878 datasets. Conclusions: The new method, LRScaf, yields the best or at least moderate contiguity and accuracy of scaffolds in the shortest run time compared with the state-of-the-art methods. Furthermore, it offers a new opportunity for the hybrid assembly of large genomes.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
46
References
6
Citations
NaN
KQI