Inductive identification of functional status information and establishing a gold standard corpus: A case study on the Mobility domain

Thanh Thieu,Jonathan Camacho,Pei-Shu Ho,Julia Porcino,Min Ding,Lisa Nelson,Elizabeth K. Rasch,Chunxiao Zhou,Leighton Chan,Diane Brandt,Denis Newman-Griffis,Ao Yuan,Albert M. Lai

Inductive identification of functional status information and establishing a gold standard corpus: A case study on the Mobility domain

2017

The importance of functional status information (FSI) has become increasingly evident in recent years [1, 2]. However, implementation, application, and normalization of FSI in health care and Electronic Health Records (EHRs) have been largely underexplored. The World Health Organization's International Classification of Functioning, Disability and Health (ICF) [3] is considered to be the international standard for describing and coding function and health states. Nevertheless, the ICF provides only a limited vocabulary for recognizing FSI descriptions, since its purpose is to organize concepts related to functioning rather than to provide a comprehensive terminology or a complete set of relations between concepts. While the free text portion of EHRs might provide a more complete picture of health status, treatment, and progress, current Natural Language Processing (NLP) methods largely focus on extracting medical conditions (e.g. diagnoses and symptoms, etc.). The absence of a standardized functional terminology and incompleteness of the ICF as a vocabulary source makes it challenging to build a NLP system to extract FSI from EHR free text. Our work takes the first step towards extraction of FSI from free text by systematically identifying the structure of FSI related to Mobility, a key domain of the ICF and an important domain in the determination of work disability. Our interdisciplinary research group inductively evaluated examples extracted from over 1,200 Physical Therapy (PT) notes from the Clinical Center of the National Institutes of Health (NIH). This extensive work resulted in a nested entity structure comprised of 2 entities, 3 sub-entities, 8 attributes, and 21 attribute values. Furthermore, we have manually curated the first gold standard corpus of 200 double-annotated and 50 triple-annotated PT notes. Our inter-annotator agreement (IAA) averages 97% F1-score on partial textual span matching and from 0.4 to 0.9 Siegel & Castellan's kappa on attribute value matching. Such a rich semantic corpus of Mobility FSI is valuable and a promising resource for future statistical learning. Our method is also adaptable to other domains of the ICF.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations