Data Coarsening and Data Swapping Algorithms

2014 
With increased concern about privacy and, at the same time, pressure to make survey data available, statistical disclosure control (SDC) treatments are performed on survey microdata to reduce disclosure risk prior to dissemination to the public. Making the time to conduct the necessary SDC treatments is all the more problematic in the push to provide data online for immediate user query. Two SDC approaches are data coarsening, which reduces the information collected, and data swapping, which is used to adjust data values. Data coarsening includes recodes, top/bottom codes, and variable suppression. Challenges related to creating a SAS® macro for data coarsening include providing flexibility for conducting different coarsening approaches and keeping track of the changes to the data so that variable and value labels can be assigned correctly. Data swapping includes selecting target records for swapping, finding swapping partners, and swapping data values for the target variables. With the goal of minimizing the impact on resulting estimates, challenges for data swapping are to find swapping partners that are close matches in terms of both unordered categorical and ordered categorical variables, to ensure that enough change is made to the target variables, to retain data consistency among variables, and to control the pool of potential swapping partners. An example is presented using each algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    2
    Citations
    NaN
    KQI
    []