Ontology Based Approach for Annotating a Corpus of Computer Science Abstracts

2019 
About 2 million new scientific papers are published yearly. This leads to difficulties with extracting information and searching texts. This paper proposes creating “CSAC” annotated corpus of computer science research papers' bibliographic information, such as title and abstract. The corpus uses an XML schema to apply two types of annotations: structuring and semantic annotations. First, IMRaD format is applied for structuring the abstracts. Second, ontology is used to semantically annotate the different properties of algorithms in the abstracts. Applied linguistics is discussed to investigate the possibility of semi-automate the annotations. Since this paper claims that our methodology improves writing and searching the abstracts of computer science research papers, we propose to evaluate these claims via developing an online application based on “CSAC” that allows the users to, first, classify the sentences of their abstracts and, to second, search for specific algorithm using one of is properties. Last, a survey will be conducted to measure the users' satisfaction of using the proposed application.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []