language-icon Old Web
English
Sign In

MUC-4 Test Results and Analysis

1992 
Abstract : LSI's overall natural language processing (NLP) objective is the development of a broad coverage, reusable system which is readily transportable to additional domains, applications, and sublanguages in English, as well as providing a foundation for our multilingual work . Our system, called DBG, for Data Base Generator, is comprised of a set of NLP components which have been developed, extended, and rebuilt over a period of some years. The core of the system is an innovative Principle-based parser, using ideas from [1], which we began developing in the course of MUC-3 to replace our previous chart parser. Our approach thus relies on the concept of powerful, robust parsing as the most crucial component in an NLP system . In applying our NLP system to text extraction, our ultimate objective is to develop a high quality text extraction system, where "high quality " is defined as scoring above 80% -- a number well beyond any current MUC scores. In line with these NLP objectives, our major focus for MUC-4 was a follow-up to our main "lesson learned" in MUC-3, which was to acquire a machine-readable dictionary (MRD) and integrate its content into the DBG system. When attempts to acquire the computer-friendly Longmans or one of the Oxford Dictionaries were unsuccessful, we turned to ACL's CD-ROM containing the Collins English Dictionary . The most correct version of the CED on the ACL CD-ROM was apparently developed directly from a medium prepared for the typographer , and unfortunately lacks any documentation of features, fonts, language, etc . The effort of acquiring an d integrating the CED was clearly a worthwhile endeavor, since we were able to increase the number of entries i n our lexicon three-fold in a relatively short time (see Table 1) . The increase in lexicon size will benefit all the applications LSI is currently working on.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    6
    Citations
    NaN
    KQI
    []