Multi-label Text Classification of German Language Medical Documents

2007 
At nearly every patient visit, medical documents are produced and stored in a medical record, often in an unstructured form as free text. The growing amount of stored documents increases the need for effective and timely retrieval of information. We developed a multi-label text classification system to categorize free text medical documents (e.g. discharge letters, clinical findings, reports) written in German into predefined classes. A random sample of 1,500 free text medical documents was retrieved from a general hospital information system and was manually assigned to 1 to 8 categories by a domain expert. This sample was used to train and evaluate the performance of 4 classification schemes: Na ve Bayes, k-NN, SVM, and J48. Additional tests of the effect of text preprocessing were done. In our study, preprocessing improved the performance, and best results were obtained by J48 classification.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    2
    References
    6
    Citations
    NaN
    KQI
    []