Simultaneous feature and instance selection in big noisy data using memetic variable neighborhood search

2021 
Abstract In smart factories, the data collected by Internet-of-things sensors is enormous and includes a lot of noise and missing values. To address this big data problem, metaheuristic is one of the main approaches to data preprocessing, i.e., instance selection or feature selection before training the model. Most previous works on metaheuristic approaches rarely considered simultaneous instance selection and feature selection, and rarely focused on addressing big noisy data. Consequently, this work proposes a hybrid memetic algorithm (MA) with variable neighborhood search (VNS) to simultaneously select instances and features, in which MA performs excellently in data selection; and VNS has been shown to perform well in local search. To evaluate the performance of the proposed algorithm, this work creates simulation data by combining the datasets from the UCI with noisy data. The proposed algorithm for simultaneous feature and instance selection is adopted to reduce the simulation data, and then the reduced data is adopted to train a predictive model for later performance evaluation of model testing. As compared with other metaheuristics, the proposed algorithm achieves a balance between exploration and exploitation. Additionally, the results show that the proposed algorithm is more robust than other feature selection methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    50
    References
    1
    Citations
    NaN
    KQI
    []