Multi-objective genetic programming for data visualization and classification

2011 
The process of knowledge discovery lies on a continuum ranging between the human driven (manual exploration) approaches to fully automatic data mining methods. As a hybrid approach, the emerging field of visual analytics aims to facilitate human-machine collaborative decision making by providing automated analysis of data via interactive visualizations. One area of interest in visual analytics is to develop data transformation methods that support visualization and analysis. In this thesis, we develop an evolutionary computing based multi-objective dimensionality reduction method for visual data classification. The algorithm is called Genetic Programming Projection Pursuit (G3P) where genetic programming is utilized in order to automatically create visualizations of higher dimensional labeled datasets which are assessed in terms of discriminative power and interpretability. We consider two forms of interpretability of the visualizations: clearly separated and compact class structures along with easily interpretable data transformation expressions relating the original data attributes to the visualization axes. The G3P algorithm incorporates a number of automated measures of interpretability that were found to be in alignment with human judgement through a user study we conducted. On a number of data mining problems, we show that G3P generates a large number of data transformations that are better than those generated by a number of dimensionality reduction methods such as the principal components analysis (PCA), multiple discriminants analysis (MDA) and targeted projection pursuit (TPP) in terms of discriminative power and interpretability.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    129
    References
    0
    Citations
    NaN
    KQI
    []