Your Model Trains on My Data? Protecting Intellectual Property of Training Data via Membership Fingerprint Authentication

2022 
In recent years, data has become the new oil that fuels various machine learning (ML) applications. Just as the oil refining, providing data to an ML model is a product of massive costs and expertise efforts. However, how to protect the intellectual property (IP) of the training data in ML remains largely open. In this paper, we present MeFA, a novel framework for detecting training data IP embezzlement via Membership Fingerprint Authentication, which is able to determine whether a suspect ML model is trained on the to be protected target data or not. The key observation is that a part of data has a similar influence on the prediction behavior of different ML models. On this basis, MeFA leverages membership inference techniques to extract these data as the fingerprints of the target data and constructs an authentication model to verify the data’s ownership by identifying the obtained membership fingerprints. MeFA has several salient features. It does not assume any knowledge of the suspect model except for its black-box prediction API, through which we can merely get the prediction output of a given input, and also does not require any modification to the dataset or the training process, since it takes advantage of the inherent membership property of the data. As a by-product, MeFA can also serve as a post-protection to verify the ownership of ML models, without modifying the training process of the model. Extensive experiments on three realistic datasets and seven types of ML models validate the effectiveness of MeFA, and demonstrate that it is also robust to scenarios when the training data is partially used or preprocessed with representative membership inference defenses.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []