Creating a focused corpus of factual outcomes from biomedical experiments

2011 
The results of an experiment are often described in a series of textual state- ments, the most concise of which being the title of the article. Here we imple- mented a novel approach, using standard data mining techniques, to collect a set of concise `factual' statements about a research area. We compare two standard text classification approaches to identify `factual' and `non-factual' sentences in article titles; the first of which uses a statistical language-modelling approach, and the second a more sophisticated semantic and grammatical approach. We find that the simple approach provides more accurately classified titles; achiev- ing 92% overall accuracy compared to 90% for the complex approach. We also implement a strategy to convert the phrasal dependencies in a `factual' title into subject-predicate-object structures (triples). These triples can then be organised according to a schema provided by domain ontologies; which occurs by mapping URIs to entities found in the textual labels.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    0
    Citations
    NaN
    KQI
    []