Multi-label Text Classification of German Language Medical Documents

Stephan Spat,Bruno Cadonna,Ivo Rakovac,Christian Gütl,Hubert Leitner,Günther Stark,Peter Beck

Multi-label Text Classification of German Language Medical Documents

2007

At nearly every patient visit, medical documents are produced and stored in a medical record, often in an unstructured form as free text. The growing amount of stored documents increases the need for effective and timely retrieval of information. We developed a multi-label text classification system to categorize free text medical documents (e.g. discharge letters, clinical findings, reports) written in German into predefined classes. A random sample of 1,500 free text medical documents was retrieved from a general hospital information system and was manually assigned to 1 to 8 categories by a domain expert. This sample was used to train and evaluate the performance of 4 classification schemes: Na ve Bayes, k-NN, SVM, and J48. Additional tests of the effect of text preprocessing were done. In our study, preprocessing improved the performance, and best results were obtained by J48 classification.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations