Multi-combined Features Text Mining of TCM Medical Cases with CRF

2016 
TCM medical cases in records are free text with much valuable data and clinical terms, how to recognize and extract these clinical terms automatically is a valuable work. TCM medical records obtained from Guangdong Provincial Hospital of Chinese Medicine are segmented to single word and labeled with five labeling features(words in sentence, grammatical property of words, words in clinical dictionary, set phrases acting on neighbor context, and set phrases acting on far distance.), and divided into training sets and testing sets. Training sets are also handled with outputted labeling (labeling of symptoms or signs, TCM diagnosis, TCM syndrome type, Chinese medicines (drug), and Names of TCM prescriptions.). In order to evaluate abilities of labeling features on improving clinical terms recognition with CRF, three indicators (recognition Precision (P), recognition Recall (R) and F-score (F)) are defined, and three comparisons are given: comparisons of individual labeling features, comparisons of combined labeling features, and comparisons of combined features in different diseases. The results show that, "grammatical property of words" is the best labeling features in all individual labeling features. Multi-combined features have higher scores than individual labeling features on improving clinical terms recognition. The combined mode of "grammatical property of words", "words in sentence", and "words in clinical dictionary" may be the most suitable labeling features. Multi-combined labeling features can improve term recognition with CRF model for text mining in TCM medical cases.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []