Using Bayesian capture–recapture analysis, we estimated the number of current injecting drug users (IDUs) in Scotland in 2006 from the cross-counts of 5670 IDUs listed on four data-sources: social enquiry reports (901 IDUs listed), hospital records (953), drug treatment agencies (3504), and recent Hepatitis C virus (HCV) diagnoses (827 listed as IDU-risk). Further, we accessed exact numbers of opiate-related drugs-related deaths (DRDs) in 2006 and 2007 to improve estimation of Scotland's DRD rates per 100 current IDUs. Using all four data-sources, and model-averaging of standard hierarchical log-linear models to allow for pairwise interactions between data-sources and/or demographic classifications, Scotland had an estimated 31700 IDUs in 2006 (95% credible interval: 24900–38700); but 25000 IDUs (95% CI: 20700–35000) by excluding recent HCV diagnoses whose IDU-risk can refer to past injecting. Only in the younger age-group (15–34 years) were Scotland's opiate-related DRD rates significantly lower for females than males. Older males' opiate-related DRD rate was 1.9 (1.24–2.40) per 100 current IDUs without or 1.3 (0.94–1.64) with inclusion of recent HCV diagnoses. If, indeed, Scotland had only 25000 current IDUs in 2006, with only 8200 of them aged 35+ years, the opiate-related DRD rate is higher among this older age group than has been appreciated hitherto. There is counter-balancing good news for the public health: the hitherto sharp increase in older current IDUs had stalled by 2006.
A general Bayesian method for $L^2$ calibration of a mathematical model is presented. General Bayesian inference starts with the specification of a loss function. Then, the log-likelihood in Bayes' theorem is replaced by the negative loss. While the minimiser of the loss function is unchanged by, for example, multiplying the loss by a constant, the same is not true of the resulting general posterior distribution. To address this problem in the context of $L^2$ calibration of mathematical models, different automatic scalings of the general Bayesian posterior are proposed. These are based on equating asymptotic properties of the general Bayesian posterior and the minimiser of the $L^2$ loss.
The construction of decision-theoretic Bayesian designs for realistically-complex nonlinear models is computationally challenging, as it requires the optimization of analytically intractable expected utility functions over high-dimensional design spaces. We provide the most general solution to date for this problem through a novel approximate coordinate exchange algorithm. This methodology uses a Gaussian process emulator to approximate the expected utility as a function of a single design coordinate in a series of conditional optimization steps. It has flexibility to address problems for any choice of utility function and for a wide range of statistical models with different numbers of variables, numbers of runs and randomization restrictions. In contrast to existing approaches to Bayesian design, the method can find multi-variable designs in large numbers of runs without resorting to asymptotic approximations to the posterior distribution or expected utility. The methodology is demonstrated on a variety of challenging examples of practical importance, including design for pharmacokinetic models and design for mixed models with discrete data. For many of these models, Bayesian designs are not currently available. Comparisons are made to results from the literature, and to designs obtained from asymptotic approximations.
Quality control in industrial processes is increasingly making use of prior scientific knowledge, often encoded in physical models that require numerical approximation. Statistical prediction, and subsequent optimization, is key to ensuring the process output meets a specification target. However, the numerical expense of approximating the models poses computational challenges to the identification of combinations of the process factors where there is confidence in the quality of the response. Recent work in Bayesian computation and statistical approximation (emulation) of expensive computational models is exploited to develop a novel strategy for optimizing the posterior probability of a process meeting specification. The ensuing methodology is motivated by, and demonstrated on, a chemical synthesis process to manufacture a pharmaceutical product, within which an initial set of substances evolve according to chemical reactions, under certain process conditions, into a series of new substances. One of these substances is a target pharmaceutical product and two are unwanted by-products. The aim is to determine the combinations of process conditions and amounts of initial substances that maximize the probability of obtaining sufficient target pharmaceutical product whilst ensuring unwanted by-products do not exceed a given level. The relationship between the factors and amounts of substances of interest is theoretically described by the solution to a system of ordinary differential equations incorporating temperature dependence. Using data from a small experiment, it is shown how the methodology can approximate the multivariate posterior predictive distribution of the pharmaceutical target and by-products, and therefore identify suitable operating values.1
Recently, multiple systems estimation (MSE) has been applied to estimate the number of victims of human trafficking in different countries. The estimation procedure consists of a log-linear analysis of a contingency table of population registers and covariates. As the number of potential models increases exponentially with the number of registers and covariates, it is practically impossible to fit and compare all models. Therefore, the model search needs to be restricted to a small subset of all potential models. This paper addresses principles and criteria for model assessment and selection for MSE of human trafficking with special attention to sparsity which is typical to human trafficking data. The concepts are illustrated on data from Slovakia and Romania.
Bayesian optimal design of experiments is a well-established approach to planning experiments. Briefly, a probability distribution, known as a statistical model, for the responses is assumed which is dependent on a vector of unknown parameters. A utility function is then specified which gives the gain in information for estimating the true value of the parameters using the Bayesian posterior distribution. A Bayesian optimal design is given by maximising the expectation of the utility with respect to the joint distribution given by the statistical model and prior distribution for the true parameter values. The approach takes account of the experimental aim via specification of the utility and of all assumed sources of uncertainty via the expected utility. However, it is predicated on the specification of the statistical model. Recently, a new type of statistical inference, known as Gibbs (or General Bayesian) inference, has been advanced. This is Bayesian-like, in that uncertainty on unknown quantities is represented by a posterior distribution, but does not necessarily rely on specification of a statistical model. Thus the resulting inference should be less sensitive to misspecification of the statistical model. The purpose of this paper is to propose Gibbs optimal design: a framework for optimal design of experiments for Gibbs inference. The concept behind the framework is introduced along with a computational approach to find Gibbs optimal designs in practice. The framework is demonstrated on exemplars including linear models, and experiments with count and time-to-event responses.
A Bayesian design is given by maximising an expected utility over a design space. The utility is chosen to represent the aim of the experiment and its expectation is taken with respect to all unknowns: responses, parameters and/or models. Although straightforward in principle, there are several challenges to finding Bayesian designs in practice. Firstly, the utility and expected utility are rarely available in closed form and require approximation. Secondly, the design space can be of high-dimensionality. In the case of intractable likelihood models, these problems are compounded by the fact that the likelihood function, whose evaluation is required to approximate the expected utility, is not available in closed form. A strategy is proposed to find Bayesian designs for intractable likelihood models. It relies on the development of an automatic, auxiliary modelling approach, using multivariate Gaussian process emulators, to approximate the likelihood function. This is then combined with a copula-based approach to approximate the marginal likelihood (a quantity commonly required to evaluate many utility functions). These approximations are demonstrated on examples of stochastic process models involving experimental aims of both parameter estimation and model comparison.
Complex models used to describe biological processes in epidemiology and ecology often have computationally intractable or expensive likelihoods. This poses significant challenges in terms of Bayesian inference but more significantly in the design of experiments. Bayesian designs are found by maximising the expectation of a utility function over a design space, and typically this requires sampling from or approximating a large number of posterior distributions. This renders approaches adopted in inference computationally infeasible to implement in design. Consequently, optimal design in such fields has been limited to a small number of dimensions or a restricted range of utility functions. To overcome such limitations, we propose a synthetic likelihood-based Laplace approximation for approximating utility functions for models with intractable likelihoods. As will be seen, the proposed approximation is flexible in that a wide range of utility functions can be considered, and remains computationally efficient in high dimensions. To explore the validity of this approximation, an illustrative example from epidemiology is considered. Then, our approach is used to design experiments with a relatively large number of observations in two motivating applications from epidemiology and ecology.