Hierarchical Attention Transformer Networks for Long Document Classification

Yongli Hu,Puman Chen,Tengfei Liu,Junbin Gao,Yanfeng Sun,Baocai Yin

Hierarchical Attention Transformer Networks for Long Document Classification

2021

Profiting from the pre-trained language representation models like BERT, the recently proposed document classification methods have obtained considerable improvement. However, most of these methods usually model the document as a sequence of text and omit the structure information, which appears obviously in long document composed of several sections with assigned relations. For this purpose, we propose a novel Hierarchical Attention Transformer Network (HATN) for long document classification, which extracts the structure of the long document by intra- and inter-section attention transformers, and further strengths the feature interaction by two fusion gates: the Residual Fusion Gate (RFG) and the Feature Fusion Gate (FFG). The proposed method is evaluated on three long document datasets and the experimental results show that our approach outperforms the related state-of-the-art methods. The code will be available at https://github.com/TengfeiLiu966/HATN

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations