Data cleaning system and method based on IC card data characteristics

2021 
In the bus IC card data system, there will often be quality issues, such as time point irregularities and losses, due to slight variations in the patterns and usage across the country or due to equipment work or transport failures in the IC card data, with an average error rate of 1.5%.Especially, at present, the amount of data is increasing, and the time that one data cleaning process needs to take is more and more astonishing, so this paper tries for a kind of data cleaning system that can normative the cleaning of data, and can guarantee the complete data cleaning in a reasonable time frame. The first data cleaning result is obtained by standardizing and classifying the initial data format; the second data cleaning result is obtained by correcting the format in accordance with the first data cleaning result, and the third data cleaning result is obtained by correcting the logic in terms of the second data cleaning result. In comparison to the typical comparable duplicate data cleaning technique, this method ensures high efficiency and accuracy while cleaning and maintaining bus IC card data on a regular basis, allowing for precise location of the source and destination of each dirty data, which has significant practical implications for big data processing.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []