Data-Free Knowledge Distillation with Positive-Unlabeled Learning

2021 
In model compression, knowledge distillation is a popular algorithm, which trains a lightweight network (student) by learning the knowledge from a pre-trained complicated network (teacher). It is essential to acquire the training data that the teacher used since the knowledge is obtained by inputting training data to the teacher network. However, the data is often unavailable due to privacy problems or storage costs. Its lead exiting data-driven knowledge distillation methods is unable to apply to the real world. To solve these problems, in this paper, we propose a data-free knowledge distillation method called DFPU, which introduce positive-unlabeled (PU) learning. For training a compact neural network without data, a generator is introduced to generate pseudo data under the supervision of the teacher network. By feeding the generated data into the teacher network and student network, the attention features are extracted for knowledge transfer. The student network is promoted to produce more similar features to the teacher network by PU learning. Without any data, the efficient student network trained by DFPU contains only half parameters and calculations of the teacher network and achieves an accuracy similar to the teacher network.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []