The Optimization Method of Knowledge Distillation Based on Model Pruning

2020 
In recent years, Deep Neural Networks have continuously updated their performance in computer vision, speech recognition and other fields, and have been widely used. However, the acquisition of better performance requires the design of more complex networks, and the corresponding model calculation and storage space are also increasing, which is very unfavorable for the deployment and application of models on embedded mobile devices. Knowledge distillation technology based on transfer learning is an effective means to achieve model compression. In our research, the model pruning technology was introduced into the design of the student network of knowledge distillation, and the teacher-student distillation training method was used to improve the accuracy of the student network. In this paper, six pruning methods have been used as student networks and tested on three networks (VGG13, VGG16, VGG19) and two datasets (CIFAR10, CIFAR100). Compared with the network without knowledge distillation method, its accuracy can be increased by up to 5.81%. The results show that compared with the commonly used pruning technology, our method is capable of effectively improving the accuracy of the network without increasement of the network size.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    1
    Citations
    NaN
    KQI
    []