Gradient Deconfliction-Based Training For Multi-Exit Architectures

2020 
Muiti-exit architectures, in which a sequence of intermediate classifiers are introduced at different depths of the feature layers, perform adaptive computation by early exiting “easy” samples to speed up the inference. In this paper, we propose a new gradient deconfliction-based training technique for multi-exit architectures. In particular, the conflicting between the gradients back-propagated from different classifiers is removed by projecting the gradient from one classifier onto the normal plane of the gradient from the other classifier. Experiments on CFAR-100 and ImageNet show that the gradient deconfliction-based training strategy significantly improves the performance of the state-of-the-art multi-exit neural networks. Moreover, this method does not require within architecture modifications and can be effectively combined with other previously-proposed training techniques and further boosts the performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    0
    Citations
    NaN
    KQI
    []