TC3KD: Knowledge distillation via teacher-student cooperative curriculum customization

2022 
Knowledge distillation aims to improve the performance of a lightweight student network by transferring some knowledge from a large-scale teacher network. Most existing knowledge distillation methods follow the traditional training strategy which feeds the sequence of mini-batches sampled randomly from the training set. Inspired by curriculum learning, we propose a novel knowledge distillation method via teacher-student cooperative curriculum customization. Specifically, a weighted ensemble of teacher and snapshot student is designed to measure the difficulty of samples. Dynamically update the ensemble weights and the snapshot student in the difficulty measurer that customizes appropriate curricula to guide the student network in different training stages. A “fetch and remove in balance” training scheduler is adopted to maintain the training stability and reduce the ranking cost. Extensive experiments on CIFAR-100, CINIC-10 and ImageNet validate the effectiveness of our method. As an independent training strategy of distillation, the proposed teacher-student cooperative curriculum customization paradigm also can be combined with the mainstream knowledge distillation approaches to improve their performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []