logo
    Abstract:
    ABSTRACT We present a Bayesian jackknife test for assessing the probability that a data set contains biased subsets, and, if so, which of the subsets are likely to be biased. The test can be used to assess the presence and likely source of statistical tension between different measurements of the same quantities in an automated manner. Under certain broadly applicable assumptions, the test is analytically tractable. We also provide an open-source code, chiborg, that performs both analytic and numerical computations of the test on general Gaussian-distributed data. After exploring the information theoretical aspects of the test and its performance with an array of simulations, we apply it to data from the Hydrogen Epoch of Reionization Array (HERA) to assess whether different sub-seasons of observing can justifiably be combined to produce a deeper 21 cm power spectrum upper limit. We find that, with a handful of exceptions, the HERA data in question are statistically consistent and this decision is justified. We conclude by pointing out the wide applicability of this test, including to CMB experiments and the H0 tension.
    Keywords:
    Jackknife resampling
    Statistical power
    Null (SQL)
    Statistical power
    Scrutiny
    Alternative hypothesis
    p-value
    Null model
    Citations (12)
    Calculations of the power of statistical tests are important in planning research studies (including meta-analyses) and in interpreting situations in which a result has not proven to be statistically significant. The authors describe procedures to compute statistical power of fixed- and random-effects tests of the mean effect size, tests for heterogeneity (or variation) of effect size parameters across studies, and tests for contrasts among effect sizes of different studies. Examples are given using 2 published meta-analyses. The examples illustrate that statistical power is not always high in meta-analysis.
    Statistical power
    Statistical Analysis
    Variation (astronomy)
    Statistical theory
    Citations (657)
    Although courts have incorporated statistical hypothesis testing into their evaluation of numerical evidence in a variety of cases, they have primarily focused on one aspect of a statistical analysis: whether or not the result is 'statistically significant' at the 0.05 or 'two-standard deviation' level. The theory underlying hypothesis testing is also concerned with the power of the test to detect a meaningful difference. This article shows that using the insights provided by power calculations should assist courts to better interpret and evaluate the statistical analyses submitted into evidence. In particular, the concept of power should help in assessing whether a sample is too small to provide reliable inferences. On the other hand very large samples can classify minor differences as statistically significant. This occurs when the power of the test at the standard 0.05 level is very high. It will be seen that requiring significance at a more stringent level, e.g. 0.005, which can be determined from a power calculation, often resolves this problem.
    Statistical power
    Sample (material)
    Statistical Analysis
    Statistical theory
    Statistical Inference
    Statistical evidence
    Significance testing
    Citations (15)
    Statistical significance, effect size and statistical power carry great weight in quantitative educational research and this chapter addresses these three key factors. It explains what they are, what they mean and how to work with them. The chapter cautions against over-reliance on significance testing and indicates several concerns about null hypothesis significance testing (NHST). In addressing significance testing, and as an introduction to effect size, the chapter introduces correlational analysis, and advocates complementing significance testing with the use of effect size. It indicates different measures of effect size used with different statistics and how to calculate effect size, since measures of effect size, be they in terms of standardized units, original units or unit-free measures, vary according to the statistical tests used to calculate them. The chapter introduces statistical power, and, in doing so, clarifies Type I and Type II errors, what they mean and how to avoid them.
    Statistical power
    Statistical Analysis
    Alternative hypothesis
    Citations (7)
    In hypothesis testing, the p value is in routine use as a tool to make statistical decisions. It gathers evidence to reject null hypothesis. Although it is supposed to reject the null hypothesis when it is false and fail to reject the null hypothesis when it is true but there is a potential to err by incorrectly rejecting the true null hypothesis and wrongly not rejecting the null hypothesis even when it is false. These are named as type I and type II errors respectively. The type I error (α error) is chosen arbitrarily by the researcher before the start of the experiment which serves as an arbitrary cutoff to bifurcate the entire quantitative data into two qualitative groups as 'significant' and 'insignificant'. This is known as level of significance (α level). Type II error (β error) is also predetermined so that the statistical test should have enough statistical power ((1-β)) to detect the statistically significant difference. In order to achieve adequate statistical power, the minimum sample size required for the study is determined. This approach is potentially flawed for the precision crisis due to choosing of arbitrary cutoff as level of significance and due to dependence of statistical power for detecting the difference on sample size. Moreover, p value does not tell about the magnitude of the difference at all. Therefore, one must be aware of these errors and their role in making statistical decisions.
    Statistical power
    p-value
    Null (SQL)
    Alternative hypothesis
    Cut-off
    Sample (material)
    Multiple comparisons problem
    Value (mathematics)
    Citations (0)
    In comparing or assessing methods of treatment it is vital that the appropriate number of patients is selected in order to ensure that the conclusions drawn are statistically viable. This annotation describes the relevance of a statistical power analysis in the context of hypothesis testing to the determination of the optimal sample size of a study. The power of the test indicates how likely it is that the test will correctly produce a statistically significant result.
    Statistical power
    Relevance
    Significance testing
    Statistical Analysis
    Multiple comparisons problem
    Sample (material)
    Alternative hypothesis
    Statistical hypothesis testing has recently come under fire, as invalid and inadequately powered tests have become more frequent and the results of statistical hypothesis testing have become harder to interpret. Following a review of the classic definition of a statistical test of a hypothesis, discussion of common ways in which this approach has gone awry is presented. The review concludes with a description of the structure to support valid and powerful statistical testing and the context in which power calculations are properly done.
    Statistical power
    Alternative hypothesis
    Statistical Analysis
    Statistical theory
    Statistical evidence
    The Bonferroni procedure controls the risk of rejecting one or more true null hypotheses no matter how many significance tests are performed, but permits the risk of failing to reject false null hypotheses to grow with the number of tests. The loss of statistical power associated with the use of this procedure is demonstrated, and two options for alleviating the problem are explored. Setting a less stringent significance level for the set of tests is shown to be less effective than increasing the sample size.
    Bonferroni correction
    Statistical power
    Null (SQL)
    Multiple comparisons problem
    Alternative hypothesis
    Statistical Analysis
    Citations (18)