Context-based extraction of concepts from unstructured textual documents

2022 
Summarizing a collection of unstructured textual documents, e.g., lecture slides or book chapters, by extracting the most relevant concepts helps learners realize connections among these concepts. However, to accomplish this goal existing methods neglect the context in which concepts are extracted - because a concept might be irrelevant in one context, but relevant in another one. To that end we propose a novel unsupervised method for extracting the relevant concepts from a collection of unstructured textual documents assuming that the documents are related to a certain topic. Our two-step method first identifies candidate concepts from the textual documents, then infers the context information for the input documents and finally ranks them with respect to the inferred context. In the second step this context information is enriched with more abstract information to improve the ranking process. In the experiments we demonstrate that our method outperforms seven supervised and unsupervised approaches on five datasets and is competitive on the other two. Furthermore, we release three new benchmark datasets that were created from books in the educational domain. Our code and datasets are available at: https://github.com/gulsaima/COBEC.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []