MASCOT-Skyline integrates population and migration dynamics to enhance phylogeographic reconstructions
1
Citation
59
Reference
10
Related Paper
Citation Trend
Abstract:
The spread of infectious diseases is shaped by spatial and temporal aspects, such as host population structure or changes in the transmission rate or number of infected individuals over time. These spatiotemporal dynamics are imprinted in the genome of pathogens and can be recovered from those genomes using phylodynamics methods. However, phylodynamic methods typically quantify either the temporal or spatial transmission dynamics, which leads to unclear biases, as one can potentially not be inferred without the other. Here, we address this challenge by introducing a structured coalescent skyline approach, MASCOT-Skyline that allows us to jointly infer spatial and temporal transmission dynamics of infectious diseases using Markov chain Monte Carlo inference. To do so, we model the effective population size dynamics in different locations using a non-parametric function, allowing us to approximate a range of population size dynamics. We show, using a range of different viral outbreak datasets, potential issues with phylogeographic methods. We then use these viral datasets to motivate simulations of outbreaks that illuminate the nature of biases present in the different phylogeographic methods. We show that spatial and temporal dynamics should be modeled jointly even if one seeks to recover just one of the two. Further, we showcase conditions under which we can expect phylogeographic analyses to be biased, particularly different subsampling approaches, as well as provide recommendations of when we can expect them to perform well. We implemented MASCOT-Skyline as part of the open-source software package MASCOT for the Bayesian phylodynamics platform BEAST2.Keywords:
Coalescent theory
Approximate Bayesian Computation
Viral phylodynamics
Skyline
The subtype C Eastern Africa clade (CEA), a particularly successful HIV-1 subtype C lineage, has seeded several sub-epidemics in Eastern African countries and Southern Brazil during the 1960s and 1970s. Here, we characterized the past population dynamics of the major CEA sub-epidemics in Eastern Africa and Brazil by using Bayesian phylodynamic approaches based on coalescent and birth-death models. All phylodynamic models support similar epidemic dynamics and exponential growth rates until roughly the mid-1980s for all the CEA sub-epidemics. Divergent growth patterns, however, were supported afterwards. The Bayesian skygrid coalescent model (BSKG) and the birth-death skyline model (BDSKY) supported longer exponential growth phases than the Bayesian skyline coalescent model (BSKL). The BDSKY model uncovers patterns of a recent decline for the CEA sub-epidemics in Burundi/Rwanda and Tanzania (Re < 1) and a recent growth for Southern Brazil (Re > 1); whereas coalescent models infer an epidemic stabilization. To the contrary, the BSKG model captured a decline of Ethiopian CEA sub-epidemic between the mid-1990s and mid-2000s that was not uncovered by the BDSKY model. These results underscore that the joint use of different phylodynamic approaches may yield complementary insights into the past HIV population dynamics.
Coalescent theory
Viral phylodynamics
Effective population size
Evolutionary Dynamics
Lineage (genetic)
Cite
Citations (11)
Coalescent methods are widely used to infer the demographic history of populations from gene genealogies. These approaches—often referred to as phylodynamic methods—have proven especially useful for reconstructing the dynamics of rapidly evolving viral pathogens. Yet, population dynamics inferred from viral genealogies often differ widely from those observed from other sources of epidemiological data, such as hospitalization records. We demonstrate how a modeling framework that allows for the direct fitting of mechanistic epidemiological models to genealogies can be used to test different hypotheses about what ecological factors cause phylodynamic inferences to differ from observed dynamics. We use this framework to test different hypotheses about why dengue serotype 1 (DENV-1) population dynamics in southern Vietnam inferred using existing phylodynamic methods differ from hospitalization data. Specifically, we consider how factors such as seasonality, vector dynamics, and spatial structure can affect inferences drawn from genealogies. The coalescent models we derive to take into account vector dynamics and spatial structure reveal that these ecological complexities can substantially affect coalescent rates among lineages. We show that incorporating these additional ecological complexities into coalescent models can also greatly improve estimates of historical population dynamics and lead to new insights into the factors shaping viral genealogies.
Coalescent theory
Viral phylodynamics
Demographic history
Evolutionary Dynamics
Effective population size
Cite
Citations (49)
One of the central objectives in the field of phylodynamics is the quantification of population dynamic processes using genetic sequence data or in some cases phenotypic data. Phylodynamics has been successfully applied to many different processes, such as the spread of infectious diseases, within-host evolution of a pathogen, macroevolution and even language evolution. Phylodynamic analysis requires a probability distribution on phylogenetic trees spanned by the genetic data. Because such a probability distribution is not available for many common stochastic population dynamic processes, coalescent-based approximations assuming deterministic population size changes are widely employed. Key to many population dynamic models, in particular epidemiological models, is a period of exponential population growth during the initial phase. Here, we show that the coalescent does not well approximate stochastic exponential population growth, which is typically modelled by a birth–death process. We demonstrate that introducing demographic stochasticity into the population size function of the coalescent improves the approximation for values of R 0 close to 1, but substantial differences remain for large R 0 . In addition, the computational advantage of using an approximation over exact models vanishes when introducing such demographic stochasticity. These results highlight that we need to increase efforts to develop phylodynamic tools that correctly account for the stochasticity of population dynamic models for inference.
Coalescent theory
Viral phylodynamics
Evolutionary Dynamics
Birth–death process
Cite
Citations (41)
Coalescent theory
Approximate Bayesian Computation
Python
Cite
Citations (31)
Phylodynamic models are widely used in infectious disease epidemiology to infer the dynamics and structure of pathogen populations. However, these models generally assume that individual hosts contact one another at random, ignoring the fact that many pathogens spread through highly structured contact networks. We present a new framework for phylodynamics on local contact networks based on pairwise epidemiological models that track the status of pairs of nodes in the network rather than just individuals. Shifting our focus from individuals to pairs leads naturally to coalescent models that describe how lineages move through networks and the rate at which lineages coalesce. These pairwise coalescent models not only consider how network structure directly shapes pathogen phylogenies, but also how the relationship between phylogenies and contact networks changes depending on epidemic dynamics and the fraction of infected hosts sampled. By considering pathogen phylogenies in a probabilistic framework, these coalescent models can also be used to estimate the statistical properties of contact networks directly from phylogenies using likelihood-based inference. We use this framework to explore how much information phylogenies retain about the underlying structure of contact networks and to infer the structure of a sexual contact network underlying a large HIV-1 sub-epidemic in Switzerland.
Coalescent theory
Viral phylodynamics
Sexual contact
Cite
Citations (22)
Abstract Phylodynamic models are widely used in infectious disease epidemiology to infer the dynamics and structure of pathogen populations. However, these models generally assume that individual hosts contact one another at random, ignoring the fact that many pathogens spread through highly structured contact networks. We present a new framework for phylodynamics on local contact networks based on pairwise epidemiological models that track the status of pairs of nodes in the network rather than just individuals. Shifting our focus from individuals to pairs leads naturally to coalescent models that describe how lineages move through networks and the rate at which lineages coalesce. These pairwise coalescent models not only consider how network structure directly shapes pathogen phylogenies, but also how the relationship between phylogenies and contact networks changes depending on epidemic dynamics and the fraction of infected hosts sampled. By considering pathogen phylogenies in a probabilistic framework, these coalescent models can also be used to estimate the statistical properties of contact networks directly from phylogenies using likelihood-based inference. We use this framework to explore how much information phylogenies retain about the underlying structure of contact networks and to infer the structure of a sexual contact network underlying a large HIV-1 sub-epidemic in Switzerland.
Coalescent theory
Viral phylodynamics
Sexual contact
Cite
Citations (0)
Coalescent theory
Approximate Bayesian Computation
Population Genetics
Effective population size
Cite
Citations (13)
The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods.
Coalescent theory
Approximate Bayesian Computation
Tree (set theory)
Cite
Citations (154)
Abstract The Kingman coalescent and its developments are often considered among the most important advances in population genetics of the last decades. Demographic inference based on coalescent theory has been used to reconstruct the population dynamics and evolutionary history of several species, including Mycobacterium tuberculosis (MTB), an important human pathogen causing tuberculosis. One key assumption of the Kingman coalescent is that the number of descendants of different individuals does not vary strongly, and violating this assumption could lead to severe biases caused by model misspecification. Individual lineages of MTB are expected to vary strongly in reproductive success because 1) MTB is potentially under constant selection due to the pressure of the host immune system and of antibiotic treatment, 2) MTB undergoes repeated population bottlenecks when it transmits from one host to the next, and 3) some hosts show much higher transmission rates compared with the average (superspreaders). Here, we used an approximate Bayesian computation approach to test whether multiple-merger coalescents (MMC), a class of models that allow for large variation in reproductive success among lineages, are more appropriate models to study MTB populations. We considered 11 publicly available whole-genome sequence data sets sampled from local MTB populations and outbreaks and found that MMC had a better fit compared with the Kingman coalescent for 10 of the 11 data sets. These results indicate that the null model for analyzing MTB outbreaks should be reassessed and that past findings based on the Kingman coalescent need to be revisited.
Coalescent theory
Approximate Bayesian Computation
Lineage (genetic)
Cite
Citations (16)
Coalescent theory
Approximate Bayesian Computation
Demographic history
Cite
Citations (0)