Quantifying and Localizing Private Information Leakage from Neural Network Gradients

2021 
Empirical attacks on collaborative learning show that the gradients of deep neural networks can not only disclose private latent attributes of the training data but also be used to reconstruct the original data. While prior works tried to quantify the privacy risk stemming from gradients, these measures do not establish a theoretically grounded understanding of gradient leakages, do not generalize across attackers, and can fail to fully explain what is observed through empirical attacks in practice. In this paper, we introduce theoretically-motivated measures to quantify information leakages in both attack-dependent and attack-independent manners. Specifically, we present an adaptation of the $\mathcal{V}$-information, which generalizes the empirical attack success rate and allows quantifying the amount of information that can leak from any chosen family of attack models. We then propose attack-independent measures, that only require the shared gradients, for quantifying both original and latent information leakages. Our empirical results, on six datasets and four popular models, reveal that gradients of the first layers contain the highest amount of original information, while the (fully-connected) classifier layers placed after the (convolutional) feature extractor layers contain the highest latent information. Further, we show how techniques such as gradient aggregation during training can mitigate information leakages. Our work paves the way for better defenses such as layer-based protection or strong aggregation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    67
    References
    0
    Citations
    NaN
    KQI
    []