Adaptive Storey's null proportion estimator
0
Citation
0
Reference
10
Related Paper
Abstract:
False discovery rate (FDR) is a commonly used criterion in multiple testing and the Benjamini-Hochberg (BH) procedure is arguably the most popular approach with FDR guarantee. To improve power, the adaptive BH procedure has been proposed by incorporating various null proportion estimators, among which Storey's estimator has gained substantial popularity. The performance of Storey's estimator hinges on a critical hyper-parameter, where a pre-fixed configuration lacks power and existing data-driven hyper-parameters compromise the FDR control. In this work, we propose a novel class of adaptive hyper-parameters and establish the FDR control of the associated BH procedure using a martingale argument. Within this class of data-driven hyper-parameters, we present a specific configuration designed to maximize the number of rejections and characterize the convergence of this proposal to the optimal hyper-parameter under a commonly-used mixture model. We evaluate our adaptive Storey's null proportion estimator and the associated BH procedure on extensive simulated data and a motivating protein dataset. Our proposal exhibits significant power gains when dealing with a considerable proportion of weak non-nulls or a conservative null distribution.Keywords:
Null (SQL)
False Discovery Rate
Null (SQL)
Sample (material)
Null model
Alternative hypothesis
Cite
Citations (0)
Null (SQL)
Sample (material)
Null model
Cite
Citations (0)
Two new estimators are proposed: the first for the proportion of true null hypotheses and the second for the false discovery rate (FDR) of one-step multiple testing procedures (MTPs). They generalize similar estimators developed jointly by Storey, Taylor and Siegmund and can be applied also to discrete $p$-values whose null distributions dominate the uniform distribution. For the new estimator of the FDR, we establish its simultaneous asymptotic conservativeness and justify formally the stopping time property of its threshold for $p$-values not necessarily independent or continuous. Our empirical studies show that, when applied to the aforementioned $p$-values, both of our estimators usually outperform their competitors (considered in this work) except that the first one may under-estimate the target when the needed tuning parameters are inappropriately chosen. The methodology of our work easily extends to other types of discrete $p$-values. We illustrate the improvement in power our estimators can induce by applying them to next-generation sequencing count data.
False Discovery Rate
Null (SQL)
Multiple comparisons problem
Cite
Citations (7)
False Discovery Rate
Null (SQL)
Multiple comparisons problem
Cite
Citations (6)
Classical multiple testing theory prescribes the null distribution, which is often a too stringent assumption for nowadays large scale experiments. This paper presents theoretical foundations to understand the limitations caused by ignoring the null distribution, and how it can be properly learned from the (same) data-set, when possible. We explore this issue in the case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and the alternative distribution is let arbitrary. While an oracle procedure in that case is the Benjamini Hochberg procedure applied with the true (unknown) null distribution, we pursue the aim of building a procedure that asymptotically mimics the performance of the oracle (AMO in short). Our main result states that an AMO procedure exists if and only if the sparsity parameter $k$ (number of false nulls) is of order less than $n/\log(n)$, where $n$ is the total number of tests. Further sparsity boundaries are derived for general location models where the shape of the null distribution is not necessarily Gaussian. Given our impossibility results, we also pursue a weaker objective, which is to find a confidence region for the oracle. To this end, we develop a distribution-dependent confidence region for the null distribution. As practical by-products, this provides a goodness of fit test for the null distribution, as well as a visual method assessing the reliability of empirical null multiple testing methods. Our results are illustrated with numerical experiments and a companion vignette \cite{RVvignette2020}.
Null (SQL)
Null model
False Discovery Rate
Cite
Citations (0)
An important issue raised by Efron in the context of large-scale multiple comparisons is that in many applications, the usual assumption that the null distribution is known is incorrect, and seemingly negligible differences in the null may result in large differences in subsequent studies. This suggests that a careful study of estimation of the null is indispensable. In this article we consider the problem of estimating a null normal distribution, and a closely related problem, estimation of the proportion of nonnull effects. We develop an approach based on the empirical characteristic function and Fourier analysis. The estimators are shown to be uniformly consistent over a wide class of parameters. We investigate the numerical performance of the estimators using both simulated and real data. In particular, we apply our procedure to the analysis of breast cancer and human immunodeficiency virus microarray datasets. The estimators perform favorably compared with existing methods.
Null (SQL)
Null model
Cite
Citations (154)
False Discovery Rate
Decoy
Null (SQL)
Statistic
Multiple comparisons problem
Alternative hypothesis
Cite
Citations (8)
The current approaches to false discovery rate (FDR) control in multiple hypothesis testing are usually based on the null distribution of a test statistic. However, all types of null distributions, including the theoretical, permutation-based and empirical ones, have some inherent drawbacks. For example, the theoretical null might fail because of improper assumptions on the sample distribution. Here, we propose a null distribution-free approach to FDR control for large-scale two-groups hypothesis testing. This approach, named $\textit{target-decoy procedure}$, simply builds on the ordering of tests by some statistic/score, the null distribution of which is not required to be known. Competitive decoy tests are constructed by permutations of original samples and are used to estimate the false target discoveries. We prove that this approach controls the FDR when the statistics are independent between different tests. Simulation demonstrates that it is more stable and powerful than two existing popular approaches. Evaluation is also made on a real dataset of gene expression microarray.
False Discovery Rate
Null (SQL)
Decoy
Multiple comparisons problem
Statistic
Alternative hypothesis
Cite
Citations (0)
This paper introduces the \texttt{FDR-linking} theorem, a novel technique for understanding \textit{non-asymptotic} FDR control of the Benjamini--Hochberg (BH) procedure under arbitrary dependence of the $p$-values. This theorem offers a principled and flexible approach to linking all $p$-values and the null $p$-values from the FDR control perspective, suggesting a profound implication that, to a large extent, the FDR of the BH procedure relies mostly on the null $p$-values. To illustrate the use of this theorem, we propose a new type of dependence only concerning the null $p$-values, which, while strictly \textit{relaxing} the state-of-the-art PRDS dependence (Benjamini and Yekutieli, 2001), ensures the FDR of the BH procedure below a level that is independent of the number of hypotheses. This level is, furthermore, shown to be optimal under this new dependence structure. Next, we present a concept referred to as \textit{FDR consistency} that is weaker but more amenable than FDR control, and the \texttt{FDR-linking} theorem shows that FDR consistency is completely determined by the joint distribution of the null $p$-values, thereby reducing the analysis of this new concept to the global null case. Finally, this theorem is used to obtain a sharp FDR bound under arbitrary dependence, which improves the $\log$-correction FDR bound (Benjamini and Yekutieli, 2001) in certain regimes.
Null (SQL)
False Discovery Rate
Cite
Citations (4)
The traditional approaches to false discovery rate (FDR) control in multiple hypothesis testing are usually based on the null distribution of a test statistic. However, all types of null distributions, including the theoretical, permutation-based and empirical ones, have some inherent drawbacks. For example, the theoretical null might fail because of improper assumptions on the sample distribution. Here, we propose a null distribution-free approach to FDR control for multiple hypothesis testing. This approach, named target-decoy procedure, simply builds on the ordering of tests by some statistic or score, the null distribution of which is not required to be known. Competitive decoy tests are constructed from permutations of original samples and are used to estimate the false target discoveries. We prove that this approach controls the FDR when the statistics are independent between different tests. Simulation demonstrates that it is more stable and powerful than two existing popular approaches. Evaluation is also made on a real dataset.
False Discovery Rate
Null (SQL)
Decoy
Statistic
Multiple comparisons problem
Alternative hypothesis
Cite
Citations (1)