The ACL FWS-RC: A Dataset for Recognition and Classification of Sentence about Future Works

2020 
Sentences about future work (FWS) mentioned in the academic papers are very important, which contain valuable information and can provide researchers with new research topics or directions. At present, researchers' analysis of academic papers mainly focuses on the content of citations, bibliographic information, etc., and little attention is paid to the FWS contained in the full text. This paper constructs a corpus of the FWS based on the full text content of academic papers, and analyzes characteristics and the rules of FWS. Taking 4,024 conference papers in Natural Language Processing (NLP) as the research object, three basic annotation specifications are formulated, and 3,067 sentences about future work are extracted from 4,509 chapters by manual annotation. Then, all the FWS are manually coded, and the sentences are classified into 6 main categories, which are method, resources, evaluation, application, problem and other, and 17 sub-categories. Finally, we analyze the future work in different types. The results show that sentences mentioning methods account for the highest proportion. There is little difference in the number of the FWS in other categories. To the best of our knowledge, this is the first attempt at constructing the corpus of FWS, which will provide the basis for the automatic extraction and classification of the FWS and facilitate researchers to conduct large-scale research on future work in academic papers. Our future works include extending scale of the FWS corpus and automatic extraction of the FWS.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    1
    Citations
    NaN
    KQI
    []