Identification Risks Evaluation of Partially Synthetic Data with the $\texttt{IdentificationRiskCalculation}$ R Package.
2021
We propose a general approach to evaluating identification risk of continuous synthesized variables in partially synthetic data. We introduce the use of a radius $r$ in the construction of identification risk probability of each target record, and illustrate with working examples for one or more continuous synthesized variables. We demonstrate our methods with applications to a data sample from the Consumer Expenditure Surveys (CE), and discuss the impacts on risk and data utility of 1) the choice of radius $r$, 2) the choice of synthesized variables, and 3) the choice of number of synthetic datasets. We give recommendations for statistical agencies for synthesizing and evaluating identification risk of continuous variables. An R package is created to perform our proposed methods of identification risk evaluation, and sample R scripts are included.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
17
References
1
Citations
NaN
KQI