Preserving Differential Privacy and Utility of Non-stationary Data Streams

2018 
Data publishing poses many challenges regarding the efforts to preserve data privacy, on one hand, and maintain its high utility, on the other hand. The Privacy Preserving Data Publishing field (PPDP) has emerged as a possible solution to such trade-off, allowing data miners to analyze the published data, while providing a sufficient degree of privacy. Most existing anonymization platforms deal with static and stationary data, which can be scanned at least once before its publishing. More and more real-world applications generate streams of data which can be non-stationary, i.e., subject to a concept drift. In this paper, we introduce MiDiPSA (Microaggregation-based Differential Private Stream Anonymization) algorithm for non-stationary data streams, which aims at satisfying the constraints of k-anonymity, recursive (c, l)-diversity, and differential privacy while minimizing the information loss and the possible disclosure risk. The algorithm is implemented via four main steps: incremental clustering of the incoming tuples; incremental aggregation of the tuples in each cluster according to a pre-defined aggregation function; monitoring of the stream in order to detect possible concept drifts using a non-parametric Kolmogorov-Smirnov statistical test; and incremental publishing of anonymized tuples. Whenever a concept drift is detected, the clustering system is updated to reflect the current changes in the stream, without affecting the publishing process. In our empirical evaluation, we analyze the performance of various data stream classifiers on the anonymized data and compare it to their performance on the original data. We conduct experiments with seven benchmark data streams and show that our algorithm preserves privacy while providing higher utility, in comparison with other state-of-the-art anonymization algorithms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    4
    Citations
    NaN
    KQI
    []