Scalable, Efficient Anonymization with INCOGNITO - Framework & Algorithm

2017 
With the advent of "big-data" processing and analytics, organizations and enterprises have increased the collection of data from individuals, and are increasingly developing business models involving analytics to gain deep insights into the collected data. Often, it becomes essential to release and merge said data to third-parties for more extensive analytics for which an organization may not have the necessary expertise. Data often has to be anonymized prior to such release, to safeguard the privacy of individuals involved. While several algorithms, with varying privacy guarantees, have been proposed for anonymizing data, large scale distributed anonymization remains an under-explored topic. In this paper, we propose Incognito, a distributed algorithm and framework for anonymization of large data sets. Incognito as a framework is targeted at data center environments, both private data centers and public clouds, and is intended to be compatible with modern data analytics frameworks like mapreduce and resilient distributed datasets (RDDs). Incognito the algorithm aims at minimizing identity, similarity and skins based attacks on anonymized data sets. This paper describes Incognito in detail along with an empirical evaluation of its scalability and efficiency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    1
    Citations
    NaN
    KQI
    []