Supervised Massive Data Analysis for Telecommunication Customer Churn Prediction

2016 
Customer churn management becomes increasingly critical for telecommunication companies in the competitive mobile market. For retaining customers before they switch to competitors, an accurate customer churn analysis model is important to predict the potential lost customers in two or three months. Two month window is practical for telecommunication companies to design strategies to retain potential lost customers. However it will bring large uncertainty and increase the difficulty for prediction. There are three main difficulties for customer churn prediction modeling. First, the customer churn data set is substantially imbalanced in reality. Second, the samples in feature space are relatively scattering. Third, the dimension of feature space is high and dimension reduction is necessary for algorithm efficiency. To overcome these difficulties, we propose a new supervised one-side sampling technique to pre-process the imbalanced data set. K-means method is applied to cluster the data set into meaningful clusters and then one-sided sampling is applied in each cluster for removing noise and redundant negative samples. Random forest method is used for dimensional reduction and selecting important variables. C5.0 decision tree is the classifier applied in this study to predict customer churn in two or three months. About 2.7 million 4 Generation (4G) telecommunication customer data are used for experiments. We obtain a precision ratio of 80.42% with a recall ratio of 52.43%. The proposed model provides satisfied prediction results which can be practically used to retain potential lost customers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    11
    References
    14
    Citations
    NaN
    KQI
    []