The Termolator: Terminology Recognition based on Chunking, Statistical and Search-based Scores.

2015 
The Termolator is a high-performing terminology extraction system, which will soon be available as open source software. The Termolator combines several different approaches to get superior coverage and accuracy. The system identifies potential instances of terminology using a chunking procedure, similar to noun group chunking, but favoring chunks that contain out-of-vocabulary words, nominalizations, technical adjectives, and other specialized word classes. The system ranks such term chunks according to several metrics including: (a) a set of metrics that favors term chunks that are relatively more frequent in a “foreground” corpus about a single topic than they are in a “background” or multi-topic corpus and (b) a relevance score which measures how often terms appear in articles and patents in a Yahoo web search. We analyse the contributions made by each of these metrics and show that all modules contribute to the system’s performance, both in terms of the number and quality of terms identified. Workshop Topic Terminology Extraction
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    3
    Citations
    NaN
    KQI
    []