Identification Risks Evaluation of Partially Synthetic Data with the $\texttt{IdentificationRiskCalculation}$ R Package.

Ryan Hornby,Jingchen Hu

Identification Risks Evaluation of Partially Synthetic Data with the $\texttt{IdentificationRiskCalculation}$ R Package.

2021

We propose a general approach to evaluating identification risk of continuous synthesized variables in partially synthetic data. We introduce the use of a radius $r$ in the construction of identification risk probability of each target record, and illustrate with working examples for one or more continuous synthesized variables. We demonstrate our methods with applications to a data sample from the Consumer Expenditure Surveys (CE), and discuss the impacts on risk and data utility of 1) the choice of radius $r$, 2) the choice of synthesized variables, and 3) the choice of number of synthetic datasets. We give recommendations for statistical agencies for synthesizing and evaluating identification risk of continuous variables. An R package is created to perform our proposed methods of identification risk evaluation, and sample R scripts are included.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations