Applying Data Mining Methods for Forest Planning Data Validation

2008 
Decision making in forest planning is based mostly on simulated forest management scenarios. A fundamental tool in creating these scenarios is the forest planning system, which utilizes a set of models for projecting the future development of forests and assessing the effects of alternative management tasks, such as timber harvests (Burkhart 2003). Input data for forest planning is obtained from several different sources, such as remote sensing and field measurements and visual assessments. All data collection systems include errors, both due to human and technical sources, which eventually affect the quality of forest plans. Part of the errors can be considered as next to impossible to detect but part of the errors are outliers and can be separated from the data. Statistical outlier detection methods have been used in data processing, although the statistical methods do not work well for multi-dimensional data. Data mining offers some interesting possibilities for the outlier detection task. The different data mining schemes for outlier detection include for example distance-, density- and clustering -based algorithms that have been proven to work with multi-dimensional data. In the field of forest research, data mining methods have not been studied almost at all. In this study we compared three different outlier detection schemes for finding the outliers in a large forest inventory data. The tested algorithms were Nested-Loop distance-based outlier detection (Knorr & Ng 1998), Simple-Pruning distance-based outlier detection (Bay & Schwabacher 2003) and Outlier Removal Clustering (Hautamaki et al. 2005). The data included a total of 5090 field measured sample plots on 578 forest stands with a number of stand-level aggregate attributes representing different characteristics of the growing stock, the forest site and the surrounding region. Each of the examined methods has a number of parameters having a strong effect on the outlier detection result. Also the selection of the attributes which were used in the outlier detection strongly affected the results. None of the three methods proved to be superior compared to the others in finding the outliers. The large natural variation in the forest attribute values made the task of separating the outliers difficult. However, the examined data mining methods showed very promising results in finding outliers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []