The Canonical Model of Structure for Data Extraction in Systematic Reviews of Scientific Research Articles

2018 
The systematic review activity is time-consuming, error prone and labour intensive activity due to the manual processes involved; with data extraction being an extremely difficult and cognitively demanding process. Automation can save a significant amount of time and reduces the workload. However, there is no unified approach for automatic data extraction in systematic reviews. This paper presents a canonical model of structure of the papers that serves as a unified approach and a foundation for subsequent extraction of information from scientific research articles automatically. The model was developed using text mining and natural language processing techniques on one thousand (1000) published research papers. A novel approach was used to identify the various section headings from the papers. This approach achieved an accuracy of 82%. A statistical analysis of the most frequent words/phrases in the section headings was used to build the canonical model of structure of the papers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    6
    Citations
    NaN
    KQI
    []