Hierarchical Attention Transformer Networks for Long Document Classification

2021 
Profiting from the pre-trained language representation models like BERT, the recently proposed document classification methods have obtained considerable improvement. However, most of these methods usually model the document as a sequence of text and omit the structure information, which appears obviously in long document composed of several sections with assigned relations. For this purpose, we propose a novel Hierarchical Attention Transformer Network (HATN) for long document classification, which extracts the structure of the long document by intra- and inter-section attention transformers, and further strengths the feature interaction by two fusion gates: the Residual Fusion Gate (RFG) and the Feature Fusion Gate (FFG). The proposed method is evaluated on three long document datasets and the experimental results show that our approach outperforms the related state-of-the-art methods. The code will be available at https://github.com/TengfeiLiu966/HATN
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    34
    References
    0
    Citations
    NaN
    KQI
    []