TDM-CFC: Towards Document-Level Multi-label Citation Function Classification

2021 
Citation function classification is an indispensable constituent of the citation content analysis, which has numerous applications, ranging from improving informative citation indexers to facilitating resource search. Existing research works primarily simply treat citation function classification as a sentence-level single-label task, ignoring some essential realistic phenomena thereby creating problems like data bias and noise information. For instance, one scientific paper contains many citations, and each citation context may contain rich discussions of the cited paper, which may reflect multiple citation functions. In this paper, we propose a novel task of Document-level Multi-label Citation Function Classification in a bid to considerably extend the previous research works from a sentence-level single-label task to a document-level multi-label task. Given the complicated nature of the document-level citation function analysis, we propose a novel two-stage fine-tuning approach of large scale pre-trained language model. Specifically, we represent a citation as an independent token and propose a novel two-stage fine-tuning approach to better represent it in the document context. To enable this task, we accordingly introduce a new benchmark, i.e., TDMCite, encompassing 9594 citations (annotated for their function) from online scientific papers by leveraging a three-aspect citation function annotation scheme. Experimental results suggest that our approach results in a considerable improvement in contrast to the state-of-the-art BERT classification fine-tuning approaches.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []