ABR-HIC: Attention Based Bidirectional RNN for Hierarchical Industry Classification

2019 
Accurate industry classification of national economic activities as an important component in the construction of economic structure and as the basis of the formulation of economic policies and management of national economic activities has been gaining increasing attention. However, owing to the rapid growth in the number of industries, it is become increasingly difficult for tax bureaus to classify the registered taxpayers’ industries. Conventional industrial classification methods only focus on the text features, which can not be analyzed and judged comprehensively according to the registration information, and can only carry on single-label classification since they neglect the primary and secondary relationships between the main and subsidiary industries, which can not meet application requirements. To better address these challenges, this paper proposes a model known as attention based bidirectional RNN for hierarchical industry classification (ABR-HIC), which is the first approach, to the best of our knowledge, to simultaneously address comprehensive registration information utilization and multi-label classification for the main and subsidiary industries. Our architecture establishes a bidirectional RNN using a word-attention mechanism, which is able to capture and fully utilize the text and non-text registration information for feature representation. By separating the taxpayer’s primary and secondary multi-label classification problem corresponding to the main and subsidiary industries, respectively, into two subtasks and through multi-task learning, our model can provide comprehensive primary and secondary multi-industrial labels. Experiments were conducted on real tax data-sets of the Shaanxi Province, China and the results demonstrate the outstanding performance of our architecture in terms of both the classification effect and training time compared with those of state-of-the-art approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    2
    Citations
    NaN
    KQI
    []