Dataset Inference: Ownership Resolution in Machine Learning

2021 
With increasingly more data and computation involved in their training, machine learning models constitute valuable intellectual property. This has spurred interest in model stealing attacks, which are made more practical by advances in learning with partial, little, or no supervision. Existing defenses focus on inserting unique watermarks in the model's decision surface, but this is insufficient: since the watermarks are not sampled from the training distribution, they are not always preserved during model stealing. In this paper, we make the key observation that knowledge contained in the stolen model's training set is what is common to all stolen copies. The adversary's goal, irrespective of the attack employed, is always to extract this knowledge or its by-products. This gives the original model's owner a strong advantage over the adversary: model owners have access to the original training data. We thus introduce dataset inference, the process of identifying whether a suspected model copy has private knowledge from the original model's dataset, as a defense against model stealing. We develop an approach for dataset inference that combines statistical testing with the ability to estimate the distance of multiple data points to the decision boundary. Our experiments on CIFAR10 and CIFAR100 show that model owners can claim with confidence greater than 99% that their model (or dataset as a matter of fact) was stolen, despite only exposing 50 of the stolen model's training points. Dataset inference defends against state-of-the-art attacks, even when the adversary is adaptive. Unlike prior work, it also does not require retraining or overfitting the defended model.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    5
    Citations
    NaN
    KQI
    []