State-space models have been widely used to model the dynamics of communicable diseases in populations of interest by fitting to time-series data. Particle filters have enabled these models to incorporate stochasticity and so can better reflect the true nature of population behaviours. Relevant parameters such as the spread of the disease, $R_t$, and recovery rates can be inferred using Particle MCMC. The standard method uses a Metropolis-Hastings random-walk proposal which can struggle to reach the stationary distribution in a reasonable time when there are multiple parameters. In this paper we obtain full Bayesian parameter estimations using gradient information and the No U-Turn Sampler (NUTS) when proposing new parameters of stochastic non-linear Susceptible-Exposed-Infected-Recovered (SEIR) and SIR models. Although NUTS makes more than one target evaluation per iteration, we show that it can provide more accurate estimates in a shorter run time than Metropolis-Hastings.
The emergence of the novel coronavirus (COVID-19) has generated a need to quickly and accurately assemble up-to-date information related to its spread. While it is possible to use deaths to provide a reliable information feed, the latency of data derived from deaths is significant. Confirmed cases derived from positive test results potentially provide a lower latency data feed. However, the sampling of those tested varies with time and the reason for testing is often not recorded. Hospital admissions typically occur around 1-2 weeks after infection and can be considered out of date in relation to the time of initial infection. The extent to which these issues are problematic is likely to vary over time and between countries. We use a machine learning algorithm for natural language processing, trained in multiple languages, to identify symptomatic individuals derived from social media and, in particular Twitter, in real-time. We then use an extended SEIRD epidemiological model to fuse combinations of low-latency feeds, including the symptomatic counts from Twitter, with death data to estimate parameters of the model and nowcast the number of people in each compartment. The model is implemented in the probabilistic programming language Stan and uses a bespoke numerical integrator. We present results showing that using specific low-latency data feeds along with death data provides more consistent and accurate forecasts of COVID-19 related deaths than using death data alone.
State-space models have been widely used to model the dynamics of communicable diseases in populations of interest by fitting to time-series data. Particle filters have enabled these models to incorporate stochasticity and so can better reflect the true nature of population behaviours. Relevant parameters such as the spread of the disease, R t , and recovery rates can be inferred using Particle MCMC. The standard method uses a Metropolis-Hastings random-walk proposal which can struggle to reach the stationary distribution in a reasonable time when there are multiple parameters. In this paper we obtain full Bayesian parameter estimations using gradient information and the No U-Turn Sampler (NUTS) when proposing new parameters of stochastic non-linear Susceptible-Exposed-Infected-Recovered (SEIR) and SIR models. Although NUTS makes more than one target evaluation per iteration, we show that it can provide more accurate estimates in a shorter run time than Metropolis-Hastings.
Antimicrobial resistance (AMR) emerges when disease-causing microorganisms develop the ability to withstand the effects of antimicrobial therapy. This phenomenon is often fueled by the human-to-human transmission of pathogens and the overuse of antibiotics. Over the past 50 years, increased computational power has facilitated the application of Bayesian inference algorithms. In this comprehensive review, the basic theory of Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) methods are explained. These inference algorithms are instrumental in calibrating complex statistical models to the vast amounts of AMR-related data. Popular statistical models include hierarchical and mixture models as well as discrete and stochastic epidemiological compartmental and agent based models. Studies encompassed multi-drug resistance, economic implications of vaccines, and modeling AMR in vitro as well as within specific populations. We describe how combining these topics in a coherent framework can result in an effective antimicrobial stewardship. We also outline recent advancements in the methodology of Bayesian inference algorithms and provide insights into their prospective applicability for modeling AMR in the future.
It has been widely documented that the sampling and resampling steps in particle filters cannot be differentiated. The reparameterisation trick was introduced to allow the sampling step to be reformulated into a differentiable function. We extend the reparameterisation trick to include the stochastic input to resampling therefore limiting the discontinuities in the gradient calculation after this step. Knowing the gradients of the prior and likelihood allows us to run particle Markov Chain Monte Carlo (p-MCMC) and use the No-U-Turn Sampler (NUTS) as the proposal when estimating parameters. We compare the Metropolis-adjusted Langevin algorithm (MALA), Hamiltonian Monte Carlo with different number of steps and NUTS. We consider three state-space models and show that NUTS improves the mixing of the Markov chain and can produce more accurate results in less computational time.
Calibrating statistical models using Bayesian inference often requires both accurate and timely estimates of parameters of interest. Particle Markov Chain Monte Carlo (p-MCMC) and Sequential Monte Carlo Squared (SMC 2 ) are two methods that use an unbiased estimate of the log-likelihood obtained from a particle filter (PF) to evaluate the target distribution. P-MCMC constructs a single Markov chain which is sequential by nature so cannot be readily parallelized using Distributed Memory (DM) architectures. This is in contrast to SMC 2 which includes processes, such as importance sampling, that are described as embarrassingly parallel. However, difficulties arise when attempting to parallelize resampling. None-the-less, the choice of backward kernel, recycling scheme and compatibility with DM architectures makes SMC 2 an attractive option when compared with p-MCMC. In this paper, we present an SMC 2 framework that includes the following features: an optimal (in terms of time complexity) $\mathcal{O}(\log_2 N)$ parallelization for DM architectures, an approximately optimal (in terms of accuracy) backward kernel, and an efficient recycling scheme. On a cluster of 128 DM processors, the results on a biomedical application show that SMC 2 achieves up to a 70× speed-up vs its sequential implementation. It is also more accurate and roughly 54× faster than p-MCMC. A GitHub link is given which provides access to the code.
The emergence of the novel coronavirus (COVID-19) generated a need to quickly and accurately assemble up-to-date information related to its spread. In this research article, we propose two methods in which Twitter is useful when modelling the spread of COVID-19: (1) machine learning algorithms trained in English, Spanish, German, Portuguese and Italian are used to identify symptomatic individuals derived from Twitter. Using the geo-location attached to each tweet, we map users to a geographic location to produce a time-series of potential symptomatic individuals. We calibrate an extended SEIRD epidemiological model with combinations of low-latency data feeds, including the symptomatic tweets, with death data and infer the parameters of the model. We then evaluate the usefulness of the data feeds when making predictions of daily deaths in 50 US States, 16 Latin American countries, 2 European countries and 7 NHS (National Health Service) regions in the UK. We show that using symptomatic tweets can result in a 6% and 17% increase in mean squared error accuracy, on average, when predicting COVID-19 deaths in US States and the rest of the world, respectively, compared to using solely death data. (2) Origin/destination (O/D) matrices, for movements between seven NHS regions, are constructed by determining when a user has tweeted twice in a 24 h period in two different locations. We show that increasing and decreasing a social connectivity parameter within an SIR model affects the rate of spread of a disease.
Estimates from infectious disease models have constituted a significant part of the scientific evidence used to inform the response to the COVID-19 pandemic in the UK. These estimates can vary strikingly in their bias and variability. Epidemiological forecasts should be consistent with the observations that eventually materialize. We use simple scoring rules to refine the forecasts of a novel statistical model for multisource COVID-19 surveillance data by tuning its smoothness hyperparameter. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.
ABSTRACT Blood cultures are central to the management of patients with sepsis and bloodstream infection. Clinical decisions depend on the timely availability of laboratory information, which, in turn, depends on the optimal laboratory processing of specimens. Discrete event simulation (DES) offers insights into where optimization efforts can be targeted. Here, we generate a detailed process map of blood culture processing within a laboratory and use it to build a simulator. Direct observation of laboratory staff processing blood cultures was used to generate a flowchart of the blood culture laboratory pathway. Retrospective routinely collected data were combined with direct observations to generate probability distributions over the time taken for each event. These data were used to inform the DES model. A sensitivity analysis explored the impact of staff availability on turnaround times. A flowchart of the blood culture pathway was constructed, spanning labeling, incubation, organism identification, and antimicrobial susceptibility testing. Thirteen processes in earlier stages of the pathway, not otherwise captured by routinely collected data, were timed using direct observations. Observations revealed that specimen processing is predominantly batched. Another eight processes were timed using retrospective data. A simulator was built using DES. Sensitivity analysis revealed that specimen progression through the simulation was especially sensitive to laboratory technician availability. Gram stain reporting time was also sensitive to laboratory scientist availability. Our laboratory simulation model has wide-ranging applications for the optimization of laboratory processes and effective implementation of the changes required for faster and more accurate results. IMPORTANCE Optimization of laboratory pathways and resource availability has a direct impact on the clinical management of patients with bloodstream infection. This research offers an insight into the laboratory processing of blood cultures at a system level and allows clinical microbiology laboratories to explore the impact of changes to processes and resources.