Finding Entities and Related Facts in Newspaper

2020 
Information production is increasing very fast. Most of this information is in free text format, and to extract meaningful knowledge is a difficult task. Many techniques can help with the problem of processing a large amount of data and its relations. One of these tasks is Relation Extraction (RE). RE is a Natural Language Processing (NLP) task and can be defined as the extraction of relations among two or more entities. Besides, semantic relation extraction, sentiment analysis, opinion mining, question answering are areas that may apply RE to ease their processing. In our work, we propose to use RE to find entities and related facts in newspaper articles. To carry out this task, we segment the text into sentences. Withing each sentence, we tokenized the terms and extracted their dependencies by using the spaCy tool. Moreover, we applied the Named Entities Recognition (NER) to extract some of the entities-classes. And finally, we use an inductive logic programming-based model to model some logic relations we find within sentences. To train our model, we defined a proportion for training and tests from the newspaper corpus to evaluate our solution by comparing the annotated relations against that a human has done in the same dataset. The results show a competitive model for Relation Extraction in Portuguese.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []