Abstract Background Variations in the human genome have been studied extensively. However, little is known about the role of micro-inversions (MIs), generally defined as small (< 100 bp) inversions, in human evolution, diversity, and health. Depicting the pattern of MIs among diverse populations is critical for interpreting human evolutionary history and obtaining insight into genetic diseases. Results In this paper, we explored the distribution of MIs in genomes from 26 human populations and 7 nonhuman primate genomes and analyzed the phylogenetic structure of the 26 human populations based on the MIs. We further investigated the functions of the MIs located within genes associated with human health. With hg19 as the reference genome, we detected 6968 MIs among the 1937 human samples and 24,476 MIs among the 7 nonhuman primate genomes. The analyses of MIs in human genomes showed that the MIs were rarely located in exonic regions. Nonhuman primates and human populations shared only 82 inverted alleles, and Africans had the most inverted alleles in common with nonhuman primates, which was consistent with the “Out of Africa” hypothesis. The clustering of MIs among the human populations also coincided with human migration history and ancestral lineages. Conclusions We propose that MIs are potential evolutionary markers for investigating population dynamics. Our results revealed the diversity of MIs in human populations and showed that they are essential to construct human population relationships and have a potential effect on human health.
Nanopore sequencing is one of the most promising technologies of the Third-Generation Sequencing (TGS). Since 2014, Oxford Nanopore technologies (ONT) has developed a series of devices based on nanopore sequencing to produce very long reads, which has an expectable impact on genomics. However, the nanopore sequencing reads expose to a fairly high error rate owing to the difficulty determining the DNA bases from the complex electrical signals. Although a number of basecalling tools have been developed for the nanopore sequencing over the past years, there is still a challenge to correct the sequences after the procedure of basecalling by now. In this study, we present an open-source DNA base reviser, NanoReviser, based on deep learning model which is capable to correct the basecalling errors introduced by various basecallers provided by default. In our module, we re-segmented the raw electrical signals based on the basecalled sequences provided by the default basecallers and this re-segmentation process was proved to be necessary to correct the leak detection errors. By employing Convolution Neural Networks (CNN) and bidirectional Long Short-Term Memory (Bi-LSTM) networks, we took advantage of the information from the raw electrical signals and the basecalled sequences from the basecallers. Our result shows that NanoReviser, as a post-basecalling reviser, significantly improves the basecalling quality. Trained and testes on the standard ONT sequencing reads from public E.coli and human NA12878 datasets, NanoReviser can reduce the sequencing error rate over 5% on the E.coli dataset and 7% on the human dataset. The performance of NanoReviser is better than all current basecalling tools. Furthermore, we analyzed the modified bases of the E.coli and add the methylation information to train our module. With the methylation annotation, NanoReviser could reduce the error rate 7% on the E.coli dataset and reduce the error rate over 10% on the methylated area. To the best of our knowledge, NanoReviser is the first post-processing tool after basecalling to accurately correct the nanopore sequences without the time-consuming procedure of the building of the consensus sequence building. NanoReviser package is available at https://github.com/pkubioinformatics/NanoReviser.
Ethylene has been regarded as a stress hormone to regulate myriad stress responses. Salinity stress is one of the most serious abiotic stresses limiting plant growth and development. But how ethylene signaling is involved in plant response to salt stress is poorly understood. Here we showed that Arabidopsis plants pretreated with ethylene exhibited enhanced tolerance to salt stress. Gain- and loss-of-function studies demonstrated that EIN3 (ETHYLENE INSENSITIVE 3) and EIL1 (EIN3-LIKE 1), two ethylene-activated transcription factors, are necessary and sufficient for the enhanced salt tolerance. High salinity induced the accumulation of EIN3/EIL1 proteins by promoting the proteasomal degradation of two EIN3/EIL1-targeting F-box proteins, EBF1 and EBF2, in an EIN2-independent manner. Whole-genome transcriptome analysis identified a list of SIED (Salt-Induced and EIN3/EIL1-Dependent) genes that participate in salt stress responses, including several genes encoding reactive oxygen species (ROS) scavengers. We performed a genetic screen for ein3 eil1-like salt-hypersensitive mutants and identified 5 EIN3 direct target genes including a previously unknown gene, SIED1 (At5g22270), which encodes a 93-amino acid polypeptide involved in ROS dismissal. We also found that activation of EIN3 increased peroxidase (POD) activity through the direct transcriptional regulation of PODs expression. Accordingly, ethylene pretreatment or EIN3 activation was able to preclude excess ROS accumulation and increased tolerance to salt stress. Taken together, our study provides new insights into the molecular action of ethylene signaling to enhance plant salt tolerance, and elucidates the transcriptional network of EIN3 in salt stress response.