MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation

2020 
Knowledge Distillation (KD) has been one of the most popular methods to learn a compact model. However, it still suffers from high demand in time and computational resources caused by sequential training pipeline. Furthermore, the soft targets from deeper models do not often serve as good cues for the shallower models due to the gap of compatibility. In this work, we consider these two problems at the same time. Specifically, we propose that better soft targets with higher compatibility can be generated by using a label generator to fuse the feature maps from deeper stages in a top-down manner, and we can employ the meta-learning technique to optimize this label generator. Utilizing the soft targets learned from the intermediate feature maps of the model, we can achieve better self-boosting of the network in comparison with the state-of-the-art. The experiments are conducted on two standard classification benchmarks, namely CIFAR-100 and ILSVRC2012. We test various network architectures to show the generalizability of our MetaDistiller. The experiments results on two datasets strongly demonstrate the effectiveness of our method .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []