Attribute Selection for Predicting Credit Default with Decision Trees

2015 
Large amount of data available in corporate databases creates a need for using different technologies to find important information which can be used for decision making in business organizations. Data mining is the process of transformation a large amount of data to useful information. In the last several years, data mining techniques have a widespread use in many business areas. Some of the examples are customer relation management, financial fraud and credit risk detection, healthcare management, churn management, and manufacturing. Data mining is based on the usage of machine learning and statistical techniques on the data that is described using different attributes. One of the important data mining applications is credit default. Accuracy of credit default depends on the quality of the data mining process, as well as on the attributes selected for prediction. Goal of the paper is to investigate which approach to selection of attributes for prediction of credit default could yield the best classification accuracy: demographic attributes, behavioral attributes, algorithm selected attributes or combination of demographic and behavioral attributes. In order to full-fill this goal, we used German credit data set available on UCI Machine Learning Repository which contains sample of 1000 debtors classified according to credit default. First, we created four datasets with different attributes (demographic, behavioral, algorithm selected and combination of demographic and behavioral). Second, we applied C4.5 algorithm to four datasets using Weka data mining tool. Third, we compared the results using several measures of classification efficiency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []