Privacy-preserving High-dimensional Data Publishing for Classification

2020 
Abstract With increasing amounts of personal information being collected by various organizations, many privacy models have been proposed for masking the collected data so that the data can be published without releasing individual privacy. However, most existing privacy models are not applicable to high-dimensional data, because of the sparseness of high-dimensional search space. In this paper, we present our solution to release high-dimensional data for privacy preservation and classification analysis. The challenge is how to reduce high dimensions from the perspective of privacy models while preserving as much information as possible for classification. Our proposed approach tackles it by using an idea of vertical partition, which is to vertically divide the raw data into different disjointed subsets of smaller dimensionality. Specifically, our partition metric considers both the correlation between attributes and the proportion of attributes in each subset. Then a generalization method based on local recoding is employed to each subset separately for achieving k-anonymity. Considering the hardness of the optimal implementation of k-anonymity, the local recoding method finds a near-optimal solution with the goal of improving efficiency. The proposed approach is evaluated using two datasets, and experiments show that it outperforms two related approaches in data utility at the same privacy level.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    1
    Citations
    NaN
    KQI
    []