A Multi Linear Regression Approach for Handling Missing Values with Unknown Dependent Variable (MLRMUD)

2018 
many problems in data applications are plagued with missing values. The Missing Value problem (MV) is the problem of predicting these missing values, in an attempt to make full use of the data. Simply deleting the missing record will waste precious information. In this work a new approach is proposed, the so-called MLRMUD. It is based on Multiple Linear Regression is used to predict Missing values for a data set with Unknown Dependent variable. It is applicable if complete rows are at least 20%. If they are less than that the Mean method is used to fill some rows until the complete rows reach 20%. After that MLRMUD can be applied normally. This approach is composed of three algorithms; splitting algorithm, dependent variable selection algorithm and multi-linear regression algorithm. MLRMUD is compared to other methods in the literature where it is proved that it outperforms them all in the accuracy of missing value computation determined in terms of to Root Mean Square Error (RMSE) and Mean Standard Error (MSE). A method to determine the unknown, dependent variable from the training set is proposed. The results show that the proposed method can successfully select the dependent variable with an accuracy of 83% over all the datasets examined.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    1
    Citations
    NaN
    KQI
    []