Bayesian jackknife tests with a small number of subsets: application to HERA 21 cm power spectrum upper limits

Monthly Notices of the Royal Astronomical Society (2022)

Michael J. Wilensky Fraser Kennedy Philip Bull Joshua S. Dillon Zara Abdurashidova Tyrone Adams James E. Aguirre Paul Alexander Zaki S. Ali Rushelle Baartman Yanga Balfour Adam P. Beardsley G. Bernardi Tashalee S. Billings Judd D. Bowman Richard F. Bradley Jacob Burba Steven Carey C. L. Carilli Carina Cheng David R. DeBoer Eloy de Lera Acedo Matt Dexter Nico Eksteen John Ely Aaron Ewall‐Wice Nicolas Fagnoni Randall Fritz Steven R. Furlanetto Kingsley Gale‐Sides Brian Glendenning Deepthi Gorthi Bradley Greig Jasper Grobbelaar Ziyaad Halday B. J. Hazelton Jacqueline N. Hewitt J. Hickish Daniel Jacobs Austin Julius MacCalvin Kariseb Nicholas S. Kern Joshua Kerrigan Piyanat Kittiwisit Saul A. Kohn Matthew Kolopanis Adam Lanman Paul La Plante Adrian Liu Anita Loots David H. E. MacMahon Lourence Malan Cresshim Malgas Keith Malgas Bradley Marero Zachary E. Martinot Andrei Mesinger Mathakane Molewa M. F. Morales Tshegofalang Mosiane Steven Murray Abraham R. Neben Bojan Nikolic Hans Nuwegeld Aaron R. Parsons Nipanjana Patra Samantha Pieterse N. Razavi‐Ghods James Robnett Kathryn Rosie Peter Sims Hilton Swarts Nithyanandan Thyagarajan Pieter van Wyngaarden Peter K. G. Williams Haoxuan Zheng

Citation

Reference

Related Paper

Citation Trend

Abstract:

ABSTRACT We present a Bayesian jackknife test for assessing the probability that a data set contains biased subsets, and, if so, which of the subsets are likely to be biased. The test can be used to assess the presence and likely source of statistical tension between different measurements of the same quantities in an automated manner. Under certain broadly applicable assumptions, the test is analytically tractable. We also provide an open-source code, chiborg, that performs both analytic and numerical computations of the test on general Gaussian-distributed data. After exploring the information theoretical aspects of the test and its performance with an array of simulations, we apply it to data from the Hydrogen Epoch of Reionization Array (HERA) to assess whether different sub-seasons of observing can justifiably be combined to produce a deeper 21 cm power spectrum upper limit. We find that, with a handful of exceptions, the HERA data in question are statistically consistent and this decision is justified. We conclude by pointing out the wide applicability of this test, including to CMB experiments and the H0 tension.

Keywords:

Jackknife resampling

Statistical power

Topics:

Radio Astronomy Observations and Technology

Advanced Wireless Communication Techniques

Antenna Design and Optimization

10.1093/mnras/stac3484

Cite

PDF

Powering Reproducible Research

Katherine S. Button Marcus R. Munafò

Null (SQL)

Statistical power

Scrutiny

Alternative hypothesis

p-value

Null model

10.1002/9781119095910.ch2

Cite

Citations (12)

Making Sense of Methods and Measurement: Statistical Power

Clinical Simulation in Nursing (2013)

Katie Anne Adamson Susan Prion

Statistical power

Affect

Statistical Analysis

Null (SQL)

Alternative hypothesis

10.1016/j.ecns.2013.03.002

Cite

Citations (5)

The power of statistical tests in meta-analysis.

Psychological Methods (2001)

Larry V. Hedges Terri Pigott

Calculations of the power of statistical tests are important in planning research studies (including meta-analyses) and in interpreting situations in which a result has not proven to be statistically significant. The authors describe procedures to compute statistical power of fixed- and random-effects tests of the mean effect size, tests for heterogeneity (or variation) of effect size parameters across studies, and tests for contrasts among effect sizes of different studies. Examples are given using 2 published meta-analyses. The examples illustrate that statistical power is not always high in meta-analysis.

Statistical power

Statistical Analysis

Variation (astronomy)

Statistical theory

10.1037/1082-989x.6.3.203

Cite

Citations (657)

Statistical tools for evaluating the adequacy of the size of a sample on which statistical evidence is based

Law Probability and Risk (2014)

Joseph L. Gastwirth Wenjing Xu

Although courts have incorporated statistical hypothesis testing into their evaluation of numerical evidence in a variety of cases, they have primarily focused on one aspect of a statistical analysis: whether or not the result is 'statistically significant' at the 0.05 or 'two-standard deviation' level. The theory underlying hypothesis testing is also concerned with the power of the test to detect a meaningful difference. This article shows that using the insights provided by power calculations should assist courts to better interpret and evaluate the statistical analyses submitted into evidence. In particular, the concept of power should help in assessing whether a sample is too small to provide reliable inferences. On the other hand very large samples can classify minor differences as statistically significant. This occurs when the power of the test at the standard 0.05 level is very high. It will be seen that requiring significance at a more stringent level, e.g. 0.005, which can be determined from a power calculation, often resolves this problem.

Statistical power

Sample (material)

Statistical Analysis

Statistical theory

Statistical Inference

Statistical evidence

Significance testing

10.1093/lpr/mgu010

Cite

Citations (15)

Statistical significance, effect size and statistical power

Routledge eBooks (2017)

Louis Cohen Lawrence Manion Keith Morrison

Statistical significance, effect size and statistical power carry great weight in quantitative educational research and this chapter addresses these three key factors. It explains what they are, what they mean and how to work with them. The chapter cautions against over-reliance on significance testing and indicates several concerns about null hypothesis significance testing (NHST). In addressing significance testing, and as an introduction to effect size, the chapter introduces correlational analysis, and advocates complementing significance testing with the use of effect size. It indicates different measures of effect size used with different statistics and how to calculate effect size, since measures of effect size, be they in terms of standardized units, original units or unit-free measures, vary according to the statistical tests used to calculate them. The chapter introduces statistical power, and, in doing so, clarifies Type I and Type II errors, what they mean and how to avoid them.

Statistical power

Statistical Analysis

Alternative hypothesis

10.4324/9781315456539-39

Cite

Citations (7)

ERRORS IN HYPOTHESIS TESTING: AN OVERVIEW

International journal of scientific research (2020)

Dinesh Kumar Bagga Poonam Agrawal Madhurima Nanda Sakshi Tiwari Aartika Singh

In hypothesis testing, the p value is in routine use as a tool to make statistical decisions. It gathers evidence to reject null hypothesis. Although it is supposed to reject the null hypothesis when it is false and fail to reject the null hypothesis when it is true but there is a potential to err by incorrectly rejecting the true null hypothesis and wrongly not rejecting the null hypothesis even when it is false. These are named as type I and type II errors respectively. The type I error (α error) is chosen arbitrarily by the researcher before the start of the experiment which serves as an arbitrary cutoff to bifurcate the entire quantitative data into two qualitative groups as 'significant' and 'insignificant'. This is known as level of significance (α level). Type II error (β error) is also predetermined so that the statistical test should have enough statistical power ((1-β)) to detect the statistically significant difference. In order to achieve adequate statistical power, the minimum sample size required for the study is determined. This approach is potentially flawed for the precision crisis due to choosing of arbitrary cutoff as level of significance and due to dependence of statistical power for detecting the difference on sample size. Moreover, p value does not tell about the magnitude of the difference at all. Therefore, one must be aware of these errors and their role in making statistical decisions.

Statistical power

p-value

Null (SQL)

Alternative hypothesis

Cut-off

Sample (material)

Multiple comparisons problem

Value (mathematics)

Source

Cite

Citations (0)

Optimal experiment design for hypothesis testing applied to functional magnetic resonance imaging

IFAC Proceedings Volumes (2011)

Xavier Bombois Arnold J. den Dekker Cristian R. Rojas Håkan Hjalmarsson Paul M.J. Van den Hof

Statistical power

Multiple comparisons problem

Alternative hypothesis

False Discovery Rate

10.3182/20110828-6-it-1002.00763

Cite

Citations (3)

Statistical power in testing a hypothesis

Journal of Bone and Joint Surgery - British Volume (2010)

Aviva Petrie

In comparing or assessing methods of treatment it is vital that the appropriate number of patients is selected in order to ensure that the conclusions drawn are statistically viable. This annotation describes the relevance of a statistical power analysis in the context of hypothesis testing to the determination of the optimal sample size of a study. The power of the test indicates how likely it is that the test will correctly produce a statistically significant result.

Statistical power

Relevance

Significance testing

Statistical Analysis

Multiple comparisons problem

Sample (material)

Alternative hypothesis

10.1302/0301-620x.92b9.25069

Cite

Citations (6)

Statistical Power

Oxford University Press eBooks (2013)

Helena C. Kraemer

Statistical hypothesis testing has recently come under fire, as invalid and inadequately powered tests have become more frequent and the results of statistical hypothesis testing have become harder to interpret. Following a review of the classic definition of a statistical test of a hypothesis, discussion of common ways in which this approach has gone awry is presented. The review concludes with a description of the structure to support valid and powerful statistical testing and the context in which power calculations are properly done.

Statistical power

Alternative hypothesis

Statistical Analysis

Statistical theory

Statistical evidence

10.1093/oxfordhb/9780199793549.013.0012

Cite

Citations (4)

Statistical Power Lost and Statistical Power Regained: The Bonferroni Procedure in Exploratory Research

Educational and Psychological Measurement (1986)

A. B. Silverstein

The Bonferroni procedure controls the risk of rejecting one or more true null hypotheses no matter how many significance tests are performed, but permits the risk of failing to reject false null hypotheses to grow with the number of tests. The loss of statistical power associated with the use of this procedure is demonstrated, and two options for alleviating the problem are explored. Setting a less stringent significance level for the set of tests is shown to be less effective than increasing the sample size.

Bonferroni correction

Statistical power

Null (SQL)

Multiple comparisons problem

Alternative hypothesis

Statistical Analysis

10.1177/001316448604600202

Cite

Citations (18)