How to hide the elephant- or the donkey- in the room: Practical privacy against statistical inference for large data

2013 
We propose a practical methodology to protect a user's private data, when he wishes to publicly release data that is correlated with his private data, in the hope of getting some utility. Our approach relies on a general statistical inference framework that captures the privacy threat under inference attacks, given utility constraints. Under this framework, data is distorted before it is released, according to a privacy-preserving probabilistic mapping. This mapping is obtained by solving a convex optimization problem, which minimizes information leakage under a distortion constraint. We address a practical challenge encountered when applying this theoretical framework to real world data: the optimization may become untractable and face scalability issues when data assumes values in large size alphabets, or is high dimensional. Our work makes two major contributions. We first reduce the optimization size by introducing a quantization step, and show how to generate privacy mappings under quantization. Second, we evaluate our method on a dataset showing correlations between political views and TV viewing habits, and demonstrate that good privacy properties can be achieved with limited distortion so as not to undermine the original purpose of the publicly released data, e.g. recommendations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    72
    Citations
    NaN
    KQI
    []