Repeated Significance Tests on Accumulating Data

1969 
THE general effect of performing repeated significance tests at different stages during the accumulation of a body of data is well known. If the null hypothesis is true and if each significance test is performed at the same nominal level, the probability that at some stage or another the test criterion is significant may be substantially greater than the nominal value. Feller (1940) discussed the possibility that some of the more significant results in card-guessing experiments in extra-sensory perception might be attributed to "optional stopping" at particularly favourable stages during an investigation. The law of the iterated logarithm shows that a test criterion which takes the form of a standardized cumulative sum of deviations from expectation divided by its standard error will, with probability one, eventually reach any preassigned value. Thus, in many common situations a result as highly significant as one desires can be obtained by sufficiently extensive sampling. Robbins (1952) and Anscombe (1954) provide further discussion of this point. The desire to control the error of the first kind, as well as the power of a test procedure, was of course one of the motivations of sequential analysis (Wald, 1947). More recently the practical relevance of this phenomenon has been called into question. Anscombe (1954) had pointed out that inferences based on likelihoods or, through likelihoods, on posterior probabilities were unaffected by stopping rules. The contrast between this property and the extreme sensitivity of frequency-type inferences to the stopping rule explains why sequential analysis is a topic of such contention between adherents of different viewpoints (Birnbaum, 1964; Cornfield, 1966; Armitage, 1967). The exchanges of opinion on these matters have been remarkable for the lack of quantitative information about the optional stopping effect. It has not, for example, been possible to answer questions such as the following. (a) What is the probability of obtaining a result "significant" at a certain nominal level, within the first 50 tests? (b) Does the enhancement of the probability of obtaining a significant result reach a noticeably high level only after a very large number of tests? (c) What is the effect of repeated tests when the null hypothesis is not true? The purpose of the present paper is to repair gome of these gaps in our knowledge without indulging in further discussion of inferential problems. We consider sequential observations of three different distributional forms: binomial, normal and exponential. In the binomial case exact results are obtained by direct calculation of
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    593
    Citations
    NaN
    KQI
    []