language-icon Old Web
English
Sign In

Diagnostic Radiology Text

2015 
annotated corpus of 83,452 chest x-ray reports. We show that the distribution of semantics is consistent with Zipfian distributions observed in other natural language corpora, and we quantify the semantic focus imparted by limiting a study by body area and modality. We demonstrate that within our semantically focused corpus, pairwise co-occurrence statistics can be used to accurately impute the semantic class for frequently occurring unknown entities, thereby reducing the number of semantically unclassified phrases by up to 25%. Finally, we show that our imputation approach is consistent across multiple reconstructions of the underlying text data.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []