A two-stage ensemble of diverse models for recognition of abnormal data in raw wind data

2016 
Wind energy integration research generally relies on complex sensors located at remote sites. The procedure for generating high-level synthetic information from databases containing large amounts of low-level data must therefore account for possible sensor failures and imperfect input data. Data-mining methods are widely used for recognizing the relationship between wind farm power output and wind speed, which is important for wind power prediction. Incorrect and unnatural data has great influence on the results. To address this problem, the paper presents an empirical methodology that can efficiently preprocess and filter the raw wind data using a two-stage ensemble of diverse models. First, abnormal features are extracted from raw wind data and the dataset is labeled according to the wind farm operation state records and the characters of typical abnormal data. Next, a two-stage classification model is built by Random Forest (RF) and Gradient Boosting Decision Tree (GBDT). In the first stage, a RF classifier is trained with the labeled dataset as input. In the second stage, a GBDT classifier is trained with the labeled dataset and the RF classification result as input. Finally, the testing set is predicted respectively by the two trained models and the average of forecast values of the RF model and the GBDT model are considered as the final result. The methodology was tested successfully on the data collected from a large wind farm in northeast China.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    9
    References
    0
    Citations
    NaN
    KQI
    []