Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare/Weak Perturbations.

David L. Donoho,Alon Kipnis

Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare/Weak Perturbations.

2020

Given two samples from possibly different discrete distributions over a common set of size $N$, consider the problem of testing whether these distributions are identical, vs. the following rare/weak perturbation alternative: the frequencies of $N^{1-\beta}$ elements are perturbed by $r(\log N)/2n$ in the Hellinger distance, where $n$ is the size of each sample. We adapt the Higher Criticism (HC) test to this setting using P-values obtained from $N$ exact binomial tests. We characterize the asymptotic performance of the HC-based test in terms of the sparsity parameter $\beta$ and the perturbation intensity parameter $r$. Specifically, we derive a region in the $(\beta,r)$-plane where the test asymptotically has maximal power, while having asymptotically no power outside this region. Our analysis distinguishes between the cases of dense ($N\gg n$) and sparse ($N\ll n$) contingency tables. In the dense case, the phase transition curve matches that of an analogous two-sample normal means model.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations