Improving iForest for Hydrological Time Series Anomaly Detection

2020 
With the increasing number of installed hydrological sensors, the data from these sensors usually contain a variety of abnormal values due to network congestion, equipment failure, or environmental influence. To deal with the anomaly on a larger scale of hydrological sensor data, a series of algorithms have been proposed. However, they are usually based on the ideas of distance or classification, which usually bring pretty high time complexity. To solve this problem, a detection algorithm called AR-iForest is proposed. It is an algorithm for hydrological time series anomaly detection based on the isolation forest. Firstly, the features of hydrological data are extracted and mapped it to a high-dimensional space. Before using the isolation forest in high-dimensional space for anomaly detection, the Auto-Regressive model is used first to predict the current data and calculate the confidence interval. Only the data not in the confidence interval needs to be detected. Secondly, a measure of the effectiveness of trees in the isolation forest is proposed. This method selects the tree with the best classification effect through continuous iteration. Finally, the proposed algorithm is integrated into the window of the big data platform Flink to give a performance evaluation. The experimental results show that the proposed algorithm increases the AUC value from 90.60% to 96.72%, and the detection time is reduced by 52.23%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []