Multimodal Attention-Based Learning for Imbalanced Corporate Documents Classification

2021 
The corporate document classification process may rely on the use of textual approach considered separately of image features. On the opposite, some methods only use the visual content of documents while ignoring the semantic information. This semantic corresponds to an important part of corporate documents which make some classes of document impossible to distinguish effectively. The recent state-of-the-art deep learning methods propose to combine the textual content and the visual features within a multi-modal approach. In addition, corporate document classification processes offer a particular challenge for deep learning-based systems with an imbalanced corpus. Indeed the neural network performances strongly depend on the corpus used to train the network, and an imbalanced set generally entails bad final system performances. This paper proposes a multi-modal deep convolutional network with an attention model designed to classify a large variety of imbalanced corporate documents. Our proposed approach is compared to several state-of-the-art methods designed for document classification task using the textual content, the visual content and some multi-modal approaches. We obtained higher performances on our two testing datasets with an improvement of 2% on our private dataset and a 3% on the public RVL-CDIP dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []