Subsampling with K Determinantal Point Processes for Estimating Statistics in Large Data Sets

2018 
We present the use of kDPP subsampling for estimation purposes in large data sets. This generalizes the use of DPPs but with a subsample of fixed size. Inclusion probabilities in kDDPs are needed for the estimation. Their evaluation is based on ratios of elementary symmetric polynomials, which adds complexity and is in general numerically unstable using traditional algorithms. We propose an approximation of kDPPs that leads to efficient and stable calculation of inclusion probabilities. Their use in estimation is presented, and the improvement compared to i.i.d. subsampling is highlighted. An illustration on the estimation of correlation matrices is presented.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    3
    Citations
    NaN
    KQI
    []