High-Quality Nuclear Genome and Mitogenome of Bipolaris sorokiniana LK93, a Devastating Pathogen Causing Wheat Root Rot
Wanying ZhangQun YangLei YangHaiyang LiWenqing ZhouJiaxing MengYanfeng HuLimin WangRuijiao KangHonglian LiShengli DingGuotian Li
3
Citation
28
Reference
10
Related Paper
Citation Trend
Abstract:
Bipolaris sorokiniana, one of the most devastating hemibiotrophic fungal pathogens, causes root rot, crown rot, leaf blotching, and black embryos of gramineous crops worldwide, posing a serious threat to global food security. However, the host-pathogen interaction mechanism between B. sorokiniana and wheat remains poorly understood. To facilitate related studies, we sequenced and assembled the genome of B. sorokiniana LK93. Nanopore long reads and next generation sequencing short reads were applied in the genome assembly, and the final 36.4-Mb genome assembly contains 16 contigs with the contig N50 of 2.3 Mb. Subsequently, we annotated 11,811 protein-coding genes. Of these, 10,620 were functional genes, 258 of which were identified as secretory proteins, including 211 predicted effectors. Additionally, the 111,581-bp mitogenome of LK93 was assembled and annotated. The LK93 genomes presented in this study will facilitate research in the B. sorokiniana-wheat pathosystem for better control of crop diseases. [Formula: see text] Copyright © 2023 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.Keywords:
Bipolaris
Pathosystem
Sequence assembly
Abstract Background Long DNA reads produced by single molecule and pore-based sequencers are more suitable for assembly and structural variation discovery than short read DNA fragments. For de novo assembly, PacBio and Oxford Nanopore Technologies (ONT) are favorite options. However, PacBio’s SMRT sequencing is expensive for a full human genome assembly and costs over 40,000 USD for 30x coverage as of 2019. ONT PromethION sequencing, on the other hand, is one-twelfth the price of PacBio for the same coverage. This study aimed to compare the cost-effectiveness of ONT PromethION and PacBio’s SMRT sequencing in relation to the quality. Findings We performed whole genome de novo assemblies and comparison to construct an improved version of KOREF, the Korean reference genome, using sequencing data produced by PromethION and PacBio. With PromethION, an assembly using sequenced reads with 64x coverage (193 Gb, 3 flowcell sequencing) resulted in 3,725 contigs with N50s of 16.7 Mbp and a total genome length of 2.8 Gbp. It was comparable to a KOREF assembly constructed using PacBio at 62x coverage (188 Gbp, 2,695 contigs and N50s of 17.9 Mbp). When we applied Hi-C-derived long-range mapping data, an even higher quality assembly for the 64x coverage was achieved, resulting in 3,179 scaffolds with an N50 of 56.4 Mbp. Conclusion The pore-based PromethION approach provides a good quality chromosome-scale human genome assembly at a low cost with long maximum contig and scaffold lengths and is more cost-effective than PacBio at comparable quality measurements.
Sequence assembly
Hybrid genome assembly
Minion
Cite
Citations (3)
De novo DNA sequence assembly is very important in genome sequence analysis. In this paper, we investigated two of the major approaches for de novo DNA sequence assembly of very short reads: overlap-layout-consensus (OLC) and Eulerian path. From that investigation, we developed a new assembly technique by combining the OLC and the Eulerian path methods in a hierarchical process. The contigs yielded by these two approaches were treated as reads and were assembled again to yield longer contigs. We tested our approach using three real very-short-read datasets generated by an Illumina Genome Analyzer and four simulated very-short-read datasets that contained sequencing errors. The sequencing errors were modeled based on Illumina's sequencing technology. As a result, our combined approach yielded longer contigs than those of Edena (OLC) and Velvet (Eulerian path) in various coverage depths and was comparable to SOAPdenovo, in terms of N50 size and maximum contig lengths. The assembly results were also validated by comparing contigs that were produced by assemblers with their reference sequence from an NCBI database. The results show that our approach produces more accurate results than Velvet, Edena, and SOAPdenovo alone. This comparison indicates that our approach is a viable way to assemble very short reads from next generation sequencers.
Sequence assembly
Hybrid genome assembly
Sequence (biology)
Velvet
Path length
k-mer
Cite
Citations (4)
Advances in DNA sequencing have made it easier to sequence and assemble plant genomes. Here, we extend an earlier study, and compare recent methods for long read sequencing and assembly. Updated Oxford Nanopore Technology software improved assemblies. Using more accurate sequences produced by repeated sequencing of the same molecule (Pacific Biosciences HiFi) resulted in less fragmented assembly of sequencing reads. Using data for increased genome coverage resulted in longer contigs, but reduced total assembly length and improved genome completeness. The original model species, Macadamia jansenii, was also compared with three other Macadamia species, as well as avocado (Persea americana) and jojoba (Simmondsia chinensis). In these angiosperms, increasing sequence data volumes caused a linear increase in contig size, decreased assembly length and further improved already high completeness. Differences in genome size and sequence complexity influenced the success of assembly. Advances in long read sequencing technology continue to improve plant genome sequencing and assembly. However, results were improved by greater genome coverage, with the amount needed to achieve a particular level of assembly being species dependent.
Sequence assembly
Hybrid genome assembly
Genome size
Cite
Citations (26)
For a long time, the construction of complete reference genomes for complex eukaryotic genomes has been hindered by the limitations of sequencing technologies. Recently, the Pacific Biosciences (PacBio) HiFi data and Oxford Nanopore Technologies (ONT) Ultra-Long data, leveraging their respective advantages in accuracy and length, have provided an opportunity for generating complete chromosome sequences. Nevertheless, for the majority of genomes, the chromosome-level assemblies generated using existing methods still miss a high proportion of sequences due to losing small contigs in the step of assembly and scaffolding. To address this shortcoming, in this paper, we propose a novel method that is able to identify and fill the gaps in the chromosome-level assembly by recalling the sequences in the lost small contigs. Experimental results on both real and simulated datasets demonstrate that this method is able to improve the completeness of the chromosome-level assembly.
Sequence assembly
Completeness (order theory)
Cite
Citations (1)
Updates in nanopore technology have made it possible to obtain gigabases of sequence data. Prior to this, nanopore sequencing technology was mainly used to analyze microbial samples. Here, we describe the generation of a comprehensive nanopore sequencing data set with a median read length of 11,979 bp for a self-compatible accession of the wild tomato species Solanum pennellii We describe the assembly of its genome to a contig N50 of 2.5 MB. The assembly pipeline comprised initial read correction with Canu and assembly with SMARTdenovo. The resulting raw nanopore-based de novo genome is structurally highly similar to that of the reference S. pennellii LA716 accession but has a high error rate and was rich in homopolymer deletions. After polishing the assembly with Illumina reads, we obtained an error rate of <0.02% when assessed versus the same Illumina data. We obtained a gene completeness of 96.53%, slightly surpassing that of the reference S. pennellii Taken together, our data indicate that such long read sequencing data can be used to affordably sequence and assemble gigabase-sized plant genomes.
Sequence assembly
Nanopore
Hybrid genome assembly
Minion
Illumina dye sequencing
Cite
Citations (206)
Nanopore
Sequence assembly
Bacterial genome size
Cite
Citations (1,273)
Abstract The availability of reference genomes has revolutionized the study of biology. Multiple competing technologies have been developed to improve the quality and robustness of genome assemblies during the last decade. The two widely-used long read sequencing providers – Pacbio (PB) and Oxford Nanopore Technologies (ONT) – have recently updated their platforms: PB enable high throughput HiFi reads with base-level resolution with >99% and ONT generated reads as long as 2 Mb. We applied the two up-to-date platforms to one single rice individual, and then compared the two assemblies to investigate the advantages and limitations of each. The results showed that ONT ultralong reads delivered higher contiguity producing a total of 18 contigs of which 10 were assembled into a single chromosome compared to that of 394 contigs and three chromosome-level contigs for the PB assembly. The ONT ultralong reads also prevented assembly errors caused by long repetitive regions for which we observed a total 44 genes of false redundancies and 10 genes of false losses in the PB assembly leading to over/under-estimations of the gene families in those long repetitive regions. We also noted that the PB HiFi reads generated assemblies with considerably less errors at the level of single nucleotide and small InDels than that of the ONT assembly which generated an average 1.06 errors per Kb assembly and finally engendered 1,475 incorrect gene annotations via altered or truncated protein predictions.
Sequence assembly
Hybrid genome assembly
Indel
Cite
Citations (13)
Abstract Background Advances in DNA sequencing have reduced the difficulty of sequencing and assembling plant genomes. A range of methods for long read sequencing and assembly have been recently compared and we now extend the earlier study and report a comparison with more recent methods. Results Updated Oxford Nanopore Technology software supported improved assemblies. The use of more accurate sequences produced by repeated sequencing of the same molecule (PacBio HiFi) resulted in much less fragmented assembly of sequencing reads. The use of more data to give increased genome coverage resulted in longer contigs (higher N50) but reduced the total length of the assemblies and improved genome completeness (BUSCO). The original model species, Macadamia jansenii , a basal eudicot, was also compared with the 3 other Macadamia species and with avocado ( Persea americana ), a magnoliid, and jojoba ( Simmondsia chinensis ) a core eudicot. In these phylogenetically diverse angiosperms, increasing sequence data volumes also caused a highly linear increase in contig size, decreased assembly length and further improved already high completeness. Differences in genome size and sequence complexity apparently influenced the success of assembly from these different species. Conclusions Advances in long read sequencing technology have continued to significantly improve the results of sequencing and assembly of plant genomes. However, results were consistently improved by greater genome coverage (using an increased number of reads) with the amount needed to achieve a particular level of assembly being species dependant.
Sequence assembly
Hybrid genome assembly
Genome size
Cite
Citations (8)
We report the complete genome and the plasmid (F′ episome) sequences of Escherichia coli JM101 assembled with a combination of Nanopore and Illumina data. The resulting genome is a single contig of 4,524,963 bp, and the plasmid consists of a single contig of 197,186 bp.
Illumina dye sequencing
Nanopore
Sequence assembly
Cite
Citations (1)
A method for de novo assembly of data from the Oxford Nanopore MinION instrument is presented which is able to reconstruct the sequence of an entire bacterial chromosome in a single contig. Initially, overlaps between nanopore reads are detected. Reads are then subjected to one or more rounds of error correction by a multiple alignment process employing partial order graphs. After correction, reads are assembled using the Celera assembler. Finally, the assembly is polished using signal-level data from the nanopore employing a novel hidden Markov model. We show that this method is able to assemble nanopore reads from Escherichia coli K-12 MG1655 into a single contig of length 4.6Mb permitting a full reconstruction of gene order. The resulting draft assembly has 98.4% nucleotide identity compared to the finished reference genome. After polishing the assembly with our signal-level HMM, the nucleotide identity is improved to 99.4%. We show that MinION sequencing data can be used to reconstruct genomes without the need for a reference sequence or data from other sequencing platforms.
Minion
Sequence assembly
Nanopore
Hybrid genome assembly
Bacterial genome size
Sequence (biology)
Cite
Citations (27)