HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression
2021
On many natural language processing tasks, large pre-trained language models
(PLMs) have shown overwhelming performances compared with traditional neural
network methods. Nevertheless, their huge model size and low inference speed
have hindered the deployment on resource-limited devices in practice. In this
paper, we target to compress PLMs with knowledge distillation, and propose a
hierarchical relational knowledge distillation (HRKD) method to capture both
hierarchical and domain relational information. Specifically, to enhance the
model capability and transferability, we leverage the idea of meta-learning and
set up domain-relational graphs to capture the relational information across
different domains. And to dynamically select the most representative prototypes
for each domain, we propose a hierarchical compare-aggregate mechanism to
capture hierarchical relationships. Extensive experiments on public
multi-domain datasets demonstrate the superior performance of our HRKD method
as well as its strong few-shot learning ability. For reproducibility, we
release the code at https://github.com/cheneydon/hrkd.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
35
References
0
Citations
NaN
KQI