Scalable Inference for Massive Data.

2017 
Abstract With the availability of massive data sets, how to make accurate inference with lower computational cost is the key to improving scalability. One important scenario is when both the sample size and the number of covariates are large, which is in contrast to the typical high-dimensional setting with relatively low sample size. In such cases, naive application of existing sparse modeling procedures can be computationally inefficient or infeasible. To ameliorate the scalability, in this paper we suggest the method of inference with partitioned data (IPAD) that divides the entire sample set into subsamples for correcting the bias and constructs confidence intervals by aggregating the estimates based on subsamples. Compared to inference with the whole sample set, such an approach can substantially reduce the computational cost. Furthermore, we establish confidence intervals of the bagging estimator for aggregation, which remain largely unexplored in the literature due to the communication barriers between subsamples. Both computational advantage and theoretical guarantee of our new method are evidenced by numerical examples.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []