An Extended Neural Gas Model for Efficient Data Mining Tasks

2007 
This paper presents a numerical association rule extraction method that is based on original quality measures which evaluate to what extent a numerical classification model behaves as a natural symbolic classifier such as a Galois lattice. The proposed method copes with the usual problems of the symbolic association rule extraction method that are computation time and rule selection. Introduction Symbolic association rule extraction models [1] suffer of very serious limitations. Rule generation is a highly timeconsuming process that generates a huge number of rules, including a large ratio of redundant rules. Hence, this prohibits any kind of rule computation and selection as soon as data are numerous and they are represented by very high-dimensional description space. This latter situation is very often encountered with documentary data. In this paper we propose a new approach for knowledge extraction that consists in using a MultiGAS model as a front-end for unsupervised extraction of association rules. In our approach we exploit both the generalization and the intercommunication mechanisms of the model. We also make use of our original recall and precision measures that derive from the Galois lattice theory and from Information Retrieval (IR) domains. Basic principles The MultiGAS model is a neural network model that represents a viewpoint-oriented extension of the Neural Gas model. Its main principle is to be constituted by several gases that have been generated from the same data. Each gas is itself issued from a specific data description subspace (i.e. viewpoint). The relation between gases is established through the use of two main mechanisms: the inter-gas communication mechanism and the generalization mechanism. A detailed description of the model is given in [2]. Copyright © 2007, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. The classical evaluation measures for the quality of classification are based on the intra-class inertia and the inter-class inertia (see [3]). These measures are often strongly biased because they depend both on the preprocessing and on the classification methods. Therefore, we have proposed to derive from the Galois lattice and Information Retrieval (IR) domains two new quality measures, Recall and Precision. The Precision and Recall measures are based on the properties of class members [3]. The Precision criterion measures in which proportion the content of the classes generated by a classification method is homogeneous. The greater the Precision, the nearer the intensions of the data belonging to the same classes will be one with respect to the other, and consequently, the more homogenous will be the classes. In a complementary way, the Recall criterion measures the exhaustiveness of the content of said classes, evaluating to what extent single properties are associated with single classes. The Recall criterion should be considered as a specific application of the statistical concept of sensitivity (i.e. true positive rate) to class properties. The Recall (Rec) and Precision (Prec) measures for a given property p of the class c are expressed as:
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    0
    Citations
    NaN
    KQI
    []