The wisent or European bison is the largest European herbivore and is completely cross-fertile with its American relative. However, mtDNA genome of wisent is similar to that of cattle, which suggests that wisent emerged as a hybrid of bison and an extinct cattle-like species. Here, we analyzed nuclear whole-genome sequences of the bovine species, and found only a minor and recent gene flow between wisent and cattle. Furthermore, we identified an appreciable heterogeneity of the nuclear gene tree topologies of the bovine species. The relative frequencies of various topologies, including the mtDNA topology, were consistent with frequencies of incomplete lineage sorting (ILS) as estimated by tree coalescence analysis. This indicates that ILS has occurred and may well account for the anomalous wisent mtDNA phylogeny as the outcome of a rare event. We propose that ILS is a possible explanation of phylogenomic anomalies among closely related species.
Abstract Genome‐scale sequence data have become increasingly available in the phylogenetic studies for understanding the evolutionary histories of species. However, it is challenging to develop probabilistic models to account for heterogeneity of phylogenomic data. The multispecies coalescent model describes gene trees as independent random variables generated from a coalescence process occurring along the lineages of the species tree. Since the multispecies coalescent model allows gene trees to vary across genes, coalescent‐based methods have been popularly used to account for heterogeneous gene trees in phylogenomic data analysis. In this paper, we summarize and evaluate the performance of coalescent‐based methods for estimating species trees from genome‐scale sequence data. We investigate the effects of deep coalescence and mutation on the performance of species tree estimation methods. We found that the coalescent‐based methods perform well in estimating species trees for a large number of genes, regardless of the degree of deep coalescence and mutation. The performance of the coalescent methods is negatively correlated with the lengths of internal branches of the species tree.
Abstract The genomic revolution offers renewed hope of resolving rapid radiations in the Tree of Life. The development of the multispecies coalescent model and improved gene tree estimation methods can better accommodate gene tree heterogeneity caused by incomplete lineage sorting (ILS) and gene tree estimation error stemming from the short internal branches. However, the relative influence of these factors in species tree inference is not well understood. Using anchored hybrid enrichment, we generated a data set including 423 single-copy loci from 64 taxa representing 39 families to infer the species tree of the flowering plant order Malpighiales. This order includes 9 of the top 10 most unstable nodes in angiosperms, which have been hypothesized to arise from the rapid radiation during the Cretaceous. Here, we show that coalescent-based methods do not resolve the backbone of Malpighiales and concatenation methods yield inconsistent estimations, providing evidence that gene tree heterogeneity is high in this clade. Despite high levels of ILS and gene tree estimation error, our simulations demonstrate that these two factors alone are insufficient to explain the lack of resolution in this order. To explore this further, we examined triplet frequencies among empirical gene trees and discovered some of them deviated significantly from those attributed to ILS and estimation error, suggesting gene flow as an additional and previously unappreciated phenomenon promoting gene tree variation in Malpighiales. Finally, we applied a novel method to quantify the relative contribution of these three primary sources of gene tree heterogeneity and demonstrated that ILS, gene tree estimation error, and gene flow contributed to 10.0$\%$, 34.8$\%$, and 21.4$\%$ of the variation, respectively. Together, our results suggest that a perfect storm of factors likely influence this lack of resolution, and further indicate that recalcitrant phylogenetic relationships like the backbone of Malpighiales may be better represented as phylogenetic networks. Thus, reducing such groups solely to existing models that adhere strictly to bifurcating trees greatly oversimplifies reality, and obscures our ability to more clearly discern the process of evolution. [Coalescent; concatenation; flanking region; hybrid enrichment, introgression; phylogenomics; rapid radiation, triplet frequency.]
Abstract Ancient whole genome duplications (WGDs) are important in eukaryotic genome evolution, and are especially prominent in plants. Recent genomic studies from large vascular plant clades, including ferns, gymnosperms, and angiosperms suggest that WGDs may represent a crucial mode of speciation. Moreover, numerous WGDs have been dated to events coinciding with major episodes of global and climatic upheaval, including the mass extinction at the KT boundary (~65 Ma) and during more recent intervals of global aridification in the Miocene (~10-5 Ma). These findings have led to the hypothesis that polyploidization may buffer lineages against the negative consequences of such disruptions. Here, we explore WGDs in the large, and diverse flowering plant clade Malpighiales using a combination of transcriptomes and complete genomes from 42 species. We conservatively identify 22 ancient WGDs, widely distributed across Malpighiales subclades. Our results provide strong support for the hypothesis that WGD is an important mode of speciation in plants. Importantly, we also identify that these events are clustered around the Eocene-Paleocene Transition (~54 Ma), during which time the planet was warmer and wetter than any period in the Cenozoic. These results establish that the Eocene Climate Optimum represents another, previously unrecognized, period of prolific WGDs in plants, and lends support to the hypothesis that polyploidization promotes adaptation and enhances plant survival during major episodes of global change. Malpighiales, in particular, may have been particularly influenced by these events given their predominance in the tropics where Eocene warming likely had profound impacts owing to the relatively tight thermal tolerances of tropical organisms. Significance Statement Whole genome duplications (WGDs) are hypothesized to generate adaptive variations during episodes of climate change and global upheaval. Using large-scale phylogenomic assessments, we identify an impressive 22 ancient WGDs in the large, tropical flowering plant clade Malpighiales. This supports growing evidence that ancient WGDs are far more common than has been thought. Additionally, we identify that WGDs are clustered during a narrow window of time, ~54 Ma, when the climate was warmer and more humid than during any period in the last ~65 Ma. This lends support to the hypothesis that WGDs are associated with surviving climatic upheavals, especially for tropical organisms like Malpighiales, which have tight thermal tolerances.
Nomogram has demonstrated its capability in individualized estimates of survival in diverse cancers. Here we retrospectively investigated 1195 patients with esophageal squamous-cell carcinoma (ESCC) who underwent radical esophagectomy at Zhejiang Cancer Hospital in Hangzhou, China. We randomly assigned two-thirds of the patients to a training cohort (n = 797) and one-third to a validation cohort (n = 398). Cox proportional hazards regression analyses were performed using the training cohort, and a nomogram was developed for predicting 3-year and 5-year overall survival rates. Multivariate analysis identified tumor length, surgical approach, number of examined lymph node, number of positive lymph node, extent of positive lymph node, grade, and depth of invasion as independent risk factors for survival. The discriminative ability of the nomogram was externally determined using the validation cohort, showing that the nomogram exhibited a sufficient level of discrimination according to the C-index (0.715, 95% CI 0.671–0.759). The C-index of the nomogram was significantly higher than that of the sixth edition (0.664, P-value<0.0001) and the seventh edition (0.696, P-value<0.0003) of the TNM classification. This study developed the first nomogram for ESCC, which can be applied in daily clinical practice for individualized survival prediction.
Iterative multiple imputation is a popular technique for missing data analysis. It updates the parameter estimators iteratively using multiple imputation method. This technique is convenient and flexible. However, the parameter estimators do not converge point-wise and are not efficient for finite imputation size m. In this paper, we propose a regression multiple imputation method. It uses the parameter estimators obtained from multiple imputation method to estimate the parameter estimators based on expectation maximization algorithm. We show that the resulting estimators are asymptotically efficient and converge point-wise for small m values, when the iteration k of the iterative multiple imputation goes to infinity. We evaluate the performance of the new proposed methods through simulation studies. A real data analysis is also conducted to illustrate the new method.
The vast majority of phylogenetic models focus on resolution of gene trees, despite the fact that phylogenies of species in which gene trees are embedded are of primary interest. We analyze a Bayesian model for estimating species trees that accounts for the stochastic variation expected for gene trees from multiple unlinked loci sampled from a single species history after a coalescent process. Application of the model to a 106-gene data set from yeast shows that the set of gene trees recovered by statistically acknowledging the shared but unknown species tree from which gene trees are sampled is much reduced compared with treating the history of each locus independently of an overarching species tree. The analysis also yields a concentrated posterior distribution of the yeast species tree whose mode is congruent with the concatenated gene tree but can do so with less than half the loci required by the concatenation method. Using simulations, we show that, with large numbers of loci, highly resolved species trees can be estimated under conditions in which concatenation of sequence data will positively mislead phylogeny, and when the proportion of gene trees matching the species tree is <10%. However, when gene tree/species tree congruence is high, species trees can be resolved with just two or three loci. These results make accessible an alternative paradigm for combining data in phylogenomics that focuses attention on the singularity of species histories and away from the idiosyncrasies and multiplicities of individual gene histories.
Salvador (SAV) is a gene product that contains two protein-protein interaction modules known as WW domains and is believed to act as a scaffolding protein for Hippo and Warts. SAV1 is the human homolog of Salvador, which is the most well characterized upstream signaling component of Hippo pathway. Although its role in some tumors is known, SAV1 function in other types of tumors, including pancreatic tumor, is still obscure. Here, we determined the role of SAV1 in pancreatic ductal adenocarcinoma (PDAC) development and progression. Our results revealed that SAV1 suppressed expression promoted PDAC invasion and migration, and repressed pancreatic cancer cells apoptosis. Moreover, SAV1 was silenced by hypermethylation. Thus, SAV1 worked as a cancer suppressor and it might be considered as a target for pancreatic cancer therapy.