Bigdata logs analysis based on seq2seq networks for cognitive Internet of Things

2019 
Abstract While bigdata system processes high-volume data at high speed, it also generates a large amount of logs. However, it is hard for people to predict future events based on massive, multi-source, heterogeneous bigdata logs. This paper proposes a comprehensive method for smart computation and prediction of massive logs in the internet of things (IoT). Traditional machine learning, Hidden Markov Model (HMM) and Autoregressive Integrated Moving Average Model (ARIMA) methods are not accurate enough to predict time series based data over time. In this work we first elaborate the distributed collection and storage, event location, and vectorized representations of bigdata logs. Next, we present a log fusion algorithm to convert the logs (unstructured text data) of each component of bigdata into structured data by removing noise, adding timestamps and classification labels. Then, we introduce a predictive model for bigdata system. We use an attention mechanism to improve sequence to sequence (seq2seq) algorithm and add an adjustor to globally fit the data distribution. Our experimental results show that the neural network model trained by our method has a good performance with the real-world data. Compared with the previous predictive method, the root mean square error (RMSE) is reduced by 46.65% and the R-squared (R2) fitting degree is improved by 14.28%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    20
    Citations
    NaN
    KQI
    []