For multiple testing, we introduce Storey-type FDR procedures and the concept of "regular estimator of the proportion of true nulls". We show that the rejection threshold of a Storey-type FDR procedure is a stopping time with respect to the backward filtration generated by the p-values and that a Storey-type FDR estimator at this rejection threshold equals the pre-specified FDR level, when the estimator of the proportion of true nulls is regular. These results hold regardless of the dependence among or the types of distributions of the p-values. They directly imply that a Storey-type FDR procedure is conservative when the null p-values are independent and uniformly distributed.
We consider the problem of extracting a low-dimensional, linear latent variable structure from high-dimensional random variables. Specifically, we show that under mild conditions and when this structure manifests itself as a linear space that spans the conditional means, it is possible to consistently recover the structure using only information up to the second moments of these random variables. This finding, specialized to one-parameter exponential families whose variance function is quadratic in their means, allows for the derivation of an explicit estimator of such latent structure. This approach serves as a latent variable model estimator and as a tool for dimension reduction for a high-dimensional matrix of data composed of many related variables. Our theoretical results are verified by simulation studies and an application to genomic data.
False discovery rate (FDR) control in structured hypotheses testing is an important topic in simultaneous inference. Most existing methods that aim to utilize group structure among hypotheses either employ the groupwise mixture model or weight all p-values or hypotheses. Thus, their powers can be improved when the groupwise mixture model is inappropriate or when most groups contain only true null hypotheses. Motivated by this, we propose a grouped, selectively weighted FDR procedure, which we refer to as "sGBH". Specifically, without employing the groupwise mixture model, sGBH identifies groups of hypotheses of interest, weights p-values in each such group only, and tests only the selected hypotheses using the weighted p-values. The sGBH subsumes a standard grouped, weighted FDR procedure which we refer to as "GBH". We provide simple conditions to ensure the conservativeness of sGBH, together with empirical evidence on its much improved power over GBH. The new procedure is applied to a gene expression study.
We consider multiple testing means of many dependent Normal random variables when these random variables have a principal correlation structure and different variances. We extend Jin's estimator of the proportion of nonzero Normal means to this setting and show that the extended estimator is consistent. We also show that the false discovery rate of the adaptive single-step multiple testing procedure that employs this estimator can be consistently estimated by its false discovery proportion and that the rejection threshold of the procedure can be explicitly determined to ensure the conservativeness of the procedure. The extended estimator and adaptive procedure are applied to multiple testing in an association study based on brain imaging data.
The proportion of true null (or false null) hypotheses is a critical ingredient of an adaptive multiple testing procedure since it directly influences the false discovery rate of the procedure and a tighter estimator of this quantity usually increases the power of the procedure. However, in general it is very hard to well estimate this proportion under strong dependence. For the multiple testing scenario where the test statistics are jointly normally distributed and we are interested in assessing which of them have zero means, we construct a consistent estimator of the proportion of test statistics that have nonzero means (i.e., the proportion of nonzero normal means) when the correlation matrix of these statistics is known but the dependence among them can be fairly strong. Our estimator is perhaps the first among estimators of this proportion that has theoretically ensured consistency under strong dependence, and it provides a partial, positive answer to the question raised by Jianqing Fan and his colleagues on if it is possible to estimate the proportion of nonzero normal means under arbitrary covariance dependence. We demonstrate via numerical studies that the new estimator is very competitive to existing estimators of this proportion but performs much better in terms of precision and stability when the nonzero normal means are large, or rare and small.
Objective To probe the changes of plasma albumin concentration and its correlation with that of blood inflammatory factors at the postoperative early stage in patients undergoing intraabdominal surgery. Methods From August 2008 to March 2009, 45 patients undergoing abdominal surgery were divided into three groups according to different types of operation with 15 cases in each group, cholecystectomy group( A), chole cystectomy plus common bile duct exploration group(B) and radical resection of alimentary duct maliguance group (C). Before the surgery and 12,24,48,72 h after operation, plasma albumin contentserum IL-6 and TNF-α concentration were measured. Results Postoperatively the content of plasma albumin did not change significantly in group A ( P > 0.05 ), while that decreased after operation in group B and group C(P <0.01 ). The postoperative concentration of serum IL-6 and TNF-α increased in group A at 12, 24 h and 48 h after operation(P <0.01 ). In group B and group C IL-6 and TNF-α increased at all tested time point after operation ( P < 0.01 ). The postoperative alterations of IL-6 and TNF-α were statistically different between the three groups at all time points(P <0.01 ). The content of plasma albumin was in a negative correlation with the concentration of IL-6 and TNF-α; ( r = - 0.376, P = 0.000; r =-0.772,P = 0.000). Conclusions The content of plasma albumin decreased at the early stage after major and moderate abdominal surgery. The content of plasma albumin was in a negative correlation with the concentration of inflammatory factors at the early stage after abdominal surgery.
Key words:
Surgical procedures, operative; Albumins; Tumor necrosis factor-alpha; Interleukin-6
Abstract Multiple testing (MT) with false discovery rate (FDR) control has been widely conducted in the “discrete paradigm” where p ‐values have discrete and heterogeneous null distributions. However, in this scenario existing FDR procedures often lose some power and may yield unreliable inference, and for this scenario there does not seem to be an FDR procedure that partitions hypotheses into groups, employs data‐adaptive weights and is nonasymptotically conservative. We propose a weighted p ‐value‐based FDR procedure, “weighted FDR (wFDR) procedure” for short, for MT in the discrete paradigm that efficiently adapts to both heterogeneity and discreteness of p ‐value distributions. We theoretically justify the nonasymptotic conservativeness of the wFDR procedure under independence, and show via simulation studies that, for MT based on p ‐values of binomial test or Fisher's exact test, it is more powerful than six other procedures. The wFDR procedure is applied to two examples based on discrete data, a drug safety study, and a differential methylation study, where it makes more discoveries than two existing methods.
Two new estimators are proposed: the first for the proportion of true null hypotheses and the second for the false discovery rate (FDR) of one-step multiple testing procedures (MTPs). They generalize similar estimators developed jointly by Storey, Taylor and Siegmund and can be applied also to discrete $p$-values whose null distributions dominate the uniform distribution. For the new estimator of the FDR, we establish its simultaneous asymptotic conservativeness and justify formally the stopping time property of its threshold for $p$-values not necessarily independent or continuous. Our empirical studies show that, when applied to the aforementioned $p$-values, both of our estimators usually outperform their competitors (considered in this work) except that the first one may under-estimate the target when the needed tuning parameters are inappropriately chosen. The methodology of our work easily extends to other types of discrete $p$-values. We illustrate the improvement in power our estimators can induce by applying them to next-generation sequencing count data.