An ensemble of three classifiers for KDD cup 2009: expanded linear model, heterogeneous boosting, and selective naïve Bayes

Hung-Yi Lo,Kai-Wei Chang,Shang-Tse Chen,Tsung-Hsien Chiang,Chun-Sung Ferng,Cho-Jui Hsieh,Yi Kuang Ko,Tsung-Ting Kuo,Hung Che Lai,Ken-Yi Lin,Chia-Hsuan Wang,Hsiang-Fu Yu,Chih-Jen Lin,Hsuan-Tien Lin,Shou-De Lin

An ensemble of three classifiers for KDD cup 2009: expanded linear model, heterogeneous boosting, and selective naïve Bayes

2009

This paper describes our ensemble of three classifiers for the KDD Cup 2009 challenge. First, we transform the three binary classification tasks into a joint multi-class classification problem, and solve an l1-regularized maximum entropy model under the LIBLINEAR framework. Second, we propose a heterogeneous base learner, which is capable of handling different types of features and missing values, and use AdaBoost to improve the base learner. Finally, we adopt a selective naive Bayes classifier that automatically groups categorical features and discretizes numerical ones. The parameters are tuned using cross-validation results rather than the 10% test results on the competition website. Based on the observation that the three positive labels are exclusive, we conduct a post-processing step using the linear SVM to jointly adjust the prediction scores of each classifier on the three tasks. Then, we average these prediction scores with careful validation to get the final outputs. Our final average AUC on the whole test set is 0.8461, which ranks third place in the slow track of KDD Cup 2009.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations