This chapter discusses the classical sample size determination. Then, success probability (SP) estimation for adapting/estimating the sample size is presented. Hence, some practical tools are introduced: some launch criteria for phase III; the variability of sample size estimates; the averaged SP of phase III only, once it has been launched; the averaged SP of phases II and III considered jointly. Some frequentist conservative sample size estimation (CSSE) strategies are then presented, giving rise to the optimal CSSE strategy. Some Bayesian CSSE strategies are also presented and a comparison of the performances of all these strategies is provided. The one-tailed setting is adopted for presenting concepts and for showing examples and results, and the balanced sampling is assumed. Generalization in terms of unbalanced sampling is straightforward. Sample size estimation for the two-tailed setting is also discussed in this chapter. Controlled Vocabulary Terms Bayes estimator; effect size
A new method of looking for statistical reliability, stability calculation, is described and is applied to statistical tests. Alike the power of statistical tests, stability calculation enables us to assess reliability of the latter. It belongs to the category of subsampling techniques that require using subsamples taken from the original sample. It provides descriptive and non-inferential results indicating the stability percentage: the percentage of sample elements to be removed, in order to change results obtained with the original sample. The higher is the stability percentage the more reliable is the statistical test. Stability percentage and power are correlated. Stability calculation provides informations about the elements in the sample, the most powerful points.
Abstract A quantitative evaluation of individual and collective ethics is proposed here, with the aim of providing a tool for sample size determination/estimation that goes further than the standard power setting of 80–90%. Individual ethics deal with issues that concern the patients enrolled in the trial, where collective ones concern the patients not enrolled in the trial, and who might benefit from a positive result. The global ethical utility (GEU) of a phase III trial is introduced here, being the summation of individual and collective ethical utilities, and can be viewed as a function of the sample size. The GEU model is based on the extent of the efficacy of the treatments in study, of the quality of life of the patients being treated, of the effects of potential adverse reactions, it accounts for the duration of the periods of interest and for the size of population groups, and also embeds the experimental power. This work aims at arguing the case for GEU adoption for sample size determination. The sample size that maximizes GEU can be adopted for planning the trial, even when providing a power value out of the classical range [.8,.9]. Alternatively, among the sample sizes based on power values of 80% and 90%, the one providing the highest GEU can be adopted. Intuitively, when a treatment is assumed to work well, to have few adverse effects, and is expected to improve the QoL of the ill population for a considerable amount of time, collective ethics may prevail giving ethically optimal sample sizes larger than usual, and consequent quite high power values (e.g. 99%). Instead, medium, though still clinically meaningful, levels of effect, considerable adverse reactions, and limited life expectation and QoL improvement, might shift the ethical balance on individual ethics and give an ethically optimal sample size providing a power lower than standard values (e.g. 70%). Some examples and an application in the cardiovascular area, including sensitivity analyses of the results based on the so‐called Bayesian “assurance” technique, are also discussed. Several possible extensions of the model related to particular clinical frameworks are also presented.
A study on the robustness of the adaptation of the sample size for a phase III trial on the basis of existing phase II data is presented—when phase III is lower than phase II effect size. A criterion of clinical relevance for phase II results is applied in order to launch phase III, where data from phase II cannot be included in statistical analysis. The adaptation consists in adopting the conservative approach to sample size estimation, which takes into account the variability of phase II data. Some conservative sample size estimation strategies, Bayesian and frequentist, are compared with the calibrated optimal γ conservative strategy (viz. COS) which is the best performer when phase II and phase III effect sizes are equal. The Overall Power (OP) of these strategies and the mean square error (MSE) of their sample size estimators are computed under different scenarios, in the presence of the structural bias due to lower phase III effect size, for evaluating the robustness of the strategies. When the structural bias is quite small (i.e., the ratio of phase III to phase II effect size is greater than 0.8), and when some operating conditions for applying sample size estimation hold, COS can still provide acceptable results for planning phase III trials, even if in bias absence the OP was higher. Main results concern the introduction of a correction, which affects just sample size estimates and not launch probabilities, for balancing the structural bias. In particular, the correction is based on a postulation of the structural bias; hence, it is more intuitive and easier to use than those based on the modification of Type I or/and Type II errors. A comparison of corrected conservative sample size estimation strategies is performed in the presence of a quite small bias. When the postulated correction is right, COS provides good OP and the lowest MSE. Moreover, the OPs of COS are even higher than those observed without bias, thanks to higher launch probability and a similar estimation performance. The structural bias can therefore be exploited for improving sample size estimation performances. When the postulated correction is smaller than necessary, COS is still the best performer, and it also works well. A higher than necessary correction should be avoided.
This chapter provides some basic statistical concepts and tools. Point wise estimation and confidence interval estimation are introduced; conservative estimation follows. Then, an explanation is given on what statistical tests are. The power function of the tests together with the errors of first and second type is defined. The p-value is presented, as an index for evaluating the outcome of the test. Some applications in the context of clinical trials are shown and numerical examples and figures are also provided. The probability of success in a trial is illustrated, including how to estimate it. Superiority tests are adopted first to illustrate the above topics. Then, inequality tests are considered. Finally, there is a brief Section regarding how success probability estimation can be derived for tests of clinical superiority, of non-inferiority and for equality tests. Controlled Vocabulary Terms confidence interval; statistical data; type I error; type II error