Data Swapping for Private Information Sharing of Web Search Logs

2017 
Abstract With the increasing number of sophisticated cyber attacks on both government and private infrastructure, cybersecurity data sharing is critical for the advancement of collaborative research among various entities, both in government, private sector, and academia. Of recent, the US Congress passed the Cyber Intelligence Sharing and Protection Act, as a framework for data sharing between various entities. Nevertheless this development raises the issue of trust between the collaborating parties, since shared data could be revealing. Conversely, due to the sensitive and confidential nature of the data involved, entities would have to employ various anonymization techniques to meet legal requirements in compliance with confidentiality policies of both their own organizations and federal government requirements. Secondly, a basic sharing of the data without the privatization process could make entities involved vulnerable to insider and inference attacks. For instance, an entity sharing data on cyber attacks might accidently reveal a sensitive network topology to an untrusted collaborator. As a contribution, we propose a modest but effective data privacy enhancement heuristic; a targeted 2k basic data swapping of individual web search log records. In this heuristic, if individual has a set of x records in their web search log set A , those records are swapped in that individual set A , then swapped again with another individual y records in set B . Our preliminary results show that data swapping is effective for big data and it would be demanding to trace the original issuer of the queries in a given large dataset of web search logs, thus providing some level of confidentiality.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    3
    Citations
    NaN
    KQI
    []