From Principal Curves to Granular Principal Curves

2014 
Principal curves arising as an essential construct in dimensionality reduction and data analysis have recently attracted much attention from theoretical as well as practical perspective. In many real-world situations, however, the efficiency of existing principal curves algorithms is often arguable, in particular when dealing with massive data owing to the associated high computational complexity. A certain drawback of these constructs stems from the fact that in several applications principal curves cannot fully capture some essential problem-oriented facets of the data dealing with width, aspect ratio, width change, etc. Information granulation is a powerful tool supporting processing and interpreting massive data. In this paper, invoking the underlying ideas of information granulation, we propose a granular principal curves approach, regarded as an extension of principal curves algorithms, to improve efficiency and achieve a sound accuracy–efficiency tradeoff. First, large amounts of numerical data are granulated into $C$ intervals—information granules developed with the use of fuzzy C-means clustering and the two criteria of information granulation, which significantly reduce the amount of data to be processed at the later phase of the overall design. Granular principal curves are then constructed by determining the upper and the lower bounds of the interval data. Finally, we develop an objective function using the criteria of information confidence and specificity to evaluate the granular output formed by the principal curves. We also optimize the granular principal curves by adjusting the level of information granularity (the number of clusters), which is realized with the aid of the particle swarm optimization. A number of numeric studies completed for synthetic and real-world datasets provide a useful quantifiable insight into the effectiveness of the proposed algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    43
    References
    9
    Citations
    NaN
    KQI
    []