Data Mining Techniques for Prediction of Different Categories of Dermatology Diseases

2013 
INTRODUCTION In medical science, diagnosis of health conditions is a challenging task. Medical history data comprises of a number of tests essential to diagnose a particular disease and the diagnosis are based on the experience of the physician; a less experience physician can diagnose a problem incorrectly. Hence, it is possible for the health care industry to increase the advantages through the use of data mining techniques to develop a decision support system (DSS) which will diagnose the problem uniformly and intelligently. Therefore, an effective and intelligent health care DSS for diagnosis of different types of diseases is an essential requirement for health care. Dermatology is a study of skin disease that is extremely complex and difficult to diagnose, and ultimately may be a leading cause of skin cancer. The six different categories of these diseases share the similar clinical features of erythema (Guvenir & Emeksiz, 2000; Elsayad, 2010b). Classification is a robust technique in medical mining. Even though most studies are conducted in the field of classification to diagnose erythemato-squamous diseases, researchers still are working to find the best classifier for this kind of dataset (Ubeyli, 2008 & 2009; Elsayad, 2010b). Several authors (Guvenir, 1998; Guvenir & Emeksiz, 2000; Nanni, 2006; Elsayad, 2010b) have used data mining techniques for the diagnosis of erythemato-squamous disease. Guvenir et al. (1998, 2000) were the pioneers in this area and have done lots of works to develop a classifier. In their work (Guvenir & Emeksiz, 2000), they have developed a graphical user interface (GUI) with all visible information based on nearest neighbor, naive Bayesian and voting features intervals-5 techniques to assist physician involved in this domain. A domain expert physician can use this DSS to diagnose the disease while intern-doctors can use it to verify their knowledge; model has achieved remarkably high classification accuracy of 99.2% based on the data set collected on their own. Other authors (Bojarczuk, 2001; Ubeyli & Guler, 2005; Nani, 2006; Polat & Gunes, 2009; Ubeyli & Dogdu, 2010; Barati et al., 2011) have also used various data mining techniques such as a decision tree, neuro-fuzzy, k-means clustering, and SVM techniques for the same purpose and achieved accuracy between 94.22% to 98.3%. Elsayad (2010b) has investigated this problem and developed a data mining based ensemble model using multilayer neural network, decision tree and linear discriminant analysis (LDA) techniques and got 98.23% classification accuracy. Recently, Xie and Wang (2011) applied support vector machine with novel hybrid feature selection methods and achieved 98.61% classification accuracy. Among all the above authors, Guvenir and Emeksiz (2000) achieved the highest classification accuracy of 99.2%. In this study, we have presented classifications techniques with ensemble of Support Vector Machine (SVM) and Artificial Neural Network (ANN), for the classification of different categories of erythemato-squamous diseases. The dermatology dataset is taken from the University of California at Irvine (UCI) machine learning dataset (web source http://archive.ics.uci.edu/ml/datasets.html, last accessed on Jan 2012) to demonstrate the techniques. The classification accuracy obtained in this piece of research work using ensemble model is remarkably close to that of Guvenir and Emeksiz (2000) and is highest among all other models suggested in the literature. The proposed model is the only ensemble model using SVM and artificial neural network techniques with highest classification accuracy of 98.99%. Hence, our model is a competitive model as compared to the model developed by the different authors using data mining techniques. DATA SET DESCRIPTION Each sample of dermatology dataset is classified into six categories: psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, chronic dermatitis and pityriasis rubra pilaris. …
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    16
    Citations
    NaN
    KQI
    []