logo
    In this paper, we research the influence of data preprocessing. We conclude that using different preprocessing methods leads to different classification performances. Moreover, not all data preprocessing methods are necessary, and a criterion is given to make sure which data preprocessing is necessary and which one is effective. Experiments on some real-world data sets validate that different data preprocessing methods result in different effects. Furthermore, experiments about some algorithms with different preprocessing methods also confirm that preprocessing has a great influence on the performance of a classifier.
    Data pre-processing
    Citations (22)
    In the real world, data is not available in the appropriate form for mining or extracting information from this. Generally, real-world data is incomplete, inconsistent and dirty so it is very necessary to process data smartly according to the requirement of the dataset. Preprocessing is one of the most crucial steps in data mining and most of the time spent in this about 60% of the time. Unprocessed data takes lots of time in mining. End-user wasted lots of time in getting the desired result. So it is very necessary to process data according to the specific dataset by applying techniques of processing and thereby it reduces the overall mining time, the end user gets the desired result more fastly. In this paper, preprocessing of missing value and discretisation has been done. Preprocessing of missing value handle by three techniques that is a deletion, replacement by mean or averages, and prediction method. From these three techniques, user opt the best technique for handling missing value, which gives maximum accuracy and takes less time for preprocessing. After handling the missing value, discretisation is done for data reduction so it minimises the preprocessing time.
    Data pre-processing
    Data Processing
    Value (mathematics)
    In the real world, data is not available in the appropriate form for mining or extracting information from this. Generally, real-world data is incomplete, inconsistent and dirty so it is very necessary to process data smartly according to the requirement of the dataset. Preprocessing is one of the most crucial steps in data mining and most of the time spent in this about 60% of the time. Unprocessed data takes lots of time in mining. End-user wasted lots of time in getting the desired result. So it is very necessary to process data according to the specific dataset by applying techniques of processing and thereby it reduces the overall mining time, the end user gets the desired result more fastly. In this paper, preprocessing of missing value and discretisation has been done. Preprocessing of missing value handle by three techniques that is a deletion, replacement by mean or averages, and prediction method. From these three techniques, user opt the best technique for handling missing value, which gives maximum accuracy and takes less time for preprocessing. After handling the missing value, discretisation is done for data reduction so it minimises the preprocessing time.
    Data pre-processing
    Value (mathematics)
    Data Processing
    Citations (2)
    A new logistics oriented work flow of data preprocessing is proposed,based on the problems existed in spatial data of logistics.Introduce the advantage of using data preprocessing in the logistics,analyze data preprocessing can improve disintegrate and inconsistency of logistics data,mainly describe the whole process of data preprocessing.At the end,point out the importing data quality will be improved after data preprocessing,then improve the efficiency of data mining.
    Data pre-processing
    Work flow
    Citations (0)
    Based on analyzing of the process and task of preprocessing of oceanographic hydrological data,three main aspects,including the data preprocessing of Acoustic Doppler Profiler (ADP),quality control of data preprocessing and the recognition and processing method of abnormal hydrological data,are discussed.Theoretical reference is made for the post processing of hydrological data,and technical support for specialty development and expanding are also provided.
    Data pre-processing
    Data Processing
    Citations (0)
    This paper discussed the problem of data preprocessing of the invasion detection system.We investigated a data preprocessing based invation deteetion system in view of the problems of low check rate and high false alarm rate of the present system,and proposed an effective preprocessing approach to the data preprocessing subsystem,which integrated the basic processing for a data source and TCM-KNN algorithm based data preprocessing cluster.The experiment proves that the approach not only greatly decreases the imperfect information and attack data quantity,but also farther increases the detection rate and reduces the false alarm rate.
    Data pre-processing
    False alarm
    Data Processing
    False positive rate
    Citations (1)
    In this study we focused on the relationship between preprocessing and model accuracy. The performance of the Machine learning techniques depends on the quality of the data set. Preprocessing is not only advantageous but it is very necessary and a preliminary work in predicting model. As a result, experiments discovered that preprocessed techniques increased performance for model building. To see the performance preprocessing support vector machine is applied before preprocessing and after preprocessing. Its model accuracy increased from 68.7% to 88.5%.
    Data pre-processing
    Data set
    Training set
    Citations (14)