Hierarchical BERT with an adaptive fine-tuning strategy for document classification

2022 
Pretrained language models (PLMs) have achieved impressive results and have become vital tools for various natural language processing (NLP) tasks. However, there is a limitation that applying these PLMs to document classification when the document length exceeds the maximum acceptable length of the PLM since the excess portion is truncated in these models. If the keywords are in the truncated part, then the performance of the model declines. To address this problem, this paper proposes a hierarchical BERT with an adaptive fine-tuning strategy (HAdaBERT). It consists of a BERT-based model as the local encoder and an attention-based gated memory network as the global encoder. In contrast to existing PLMs that directly truncate documents, the proposed model uses a part of the document as a region, dividing input document into several containers. This allows the useful information in each container to be extracted by a local encoder and composed by a global encoder according to its contribution to the classification. To further improve the performance of the model, this paper proposes an adaptive fine-tuning strategy, which dynamically decides the layers of BERT to be fine-tuned instead of fine-tuning all layers for each input text. Experimental results on different corpora indicated that this method outperformed existing neural networks for document classification.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []