Differentially Private Top-k Frequent Columns Publication for High-dimensional Data

2019 
Differential privacy (DP) is a promising scheme for releasing the results of statistical queries on sensitive data. This paper focuses on top- $k$ frequent columns publication on sensitive data, with high result utility under differential privacy. Existing works directly select frequent columns from all columns (called one-phase scheme), which is far from ideal due to the large privacy consumption or misjudgments for columns with frequencies close to the frequency of the $k$ th frequent column (called near- $k$ -fluctuation-column). This paper presents a new solution Two Phase Selection (TPS) to carefully choose frequent columns in two phases. The main idea is to classify columns into two distinct categories based on whether it is one near- $k$ -fluctuation-column or not. Frequent columns are chosen from the two categories using different techniques, which is totally different from existing solutions without classifying. Furthermore, by analyzing the distribution of near- $k$ -fluctuation-columns, we introduce a block-centric column-choosing method privacy-free-mechanism (PFM). By partitioning columns into blocks, PFM makes the privacy consumption proportional to the number of blocks, instead of frequent columns. Extensive experiments on real datasets show that our proposals outperform the state-of-the-art techniques for top- $k$ column publication.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []