Normalizing clinical terms using learned edit distance patterns

2016 
Background Variations of clinical terms are very commonly encountered in clinical texts. Normalization methods that use similarity measures or hand-coded approximation rules for matching clinical terms to standard terminologies have limited accuracy and coverage. Materials and Methods In this paper, a novel method is presented that automatically learns patterns of variations of clinical terms from known variations from a resource such as the Unified Medical Language System (UMLS). The patterns are first learned by computing edit distances between the known variations, which are then appropriately generalized for normalizing previously unseen terms. The method was applied and evaluated on the disease and disorder mention normalization task using the dataset of SemEval 2014 and compared with the normalization ability of the MetaMap system and a method based on cosine similarity. Results Excluding the mentions that already exactly match in UMLS and the training dataset, the proposed method obtained 64.7% accuracy on the rest of the test dataset. The accuracy was calculated as the number of mentions that correctly matched the gold-standard concept unique identifiers (CUIs) or correctly matched to be without a CUI. In comparison, MetaMap’s accuracy was 41.9% and cosine similarity’s accuracy was 44.6%. When only the output CUIs were evaluated, the proposed method obtained 54.4% best F -measure (at 92.1% precision and 38.6% recall) while MetaMap obtained 19.4% best F -measure (at 38.0% precision and 13.0% recall) and cosine similarity obtained 38.1% best F -measure (at 70.3% precision and 26.1% recall). Conclusions The novel method was found to perform much better than the MetaMap system and the cosine similarity based method in normalizing disease mentions in clinical text that did not exactly match in UMLS. The method is also general and can be used for normalizing clinical terms of other semantic types as well.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    24
    References
    22
    Citations
    NaN
    KQI
    []