language-icon Old Web
English
Sign In

Cluster-weighted modeling

In data mining, cluster-weighted modeling (CWM) is an algorithm-based approach to non-linear prediction of outputs (dependent variables) from inputs (independent variables) based on density estimation using a set of models (clusters) that are each notionally appropriate in a sub-region of the input space. The overall approach works in jointly input-output space and an initial version was proposed by Neil Gershenfeld. In data mining, cluster-weighted modeling (CWM) is an algorithm-based approach to non-linear prediction of outputs (dependent variables) from inputs (independent variables) based on density estimation using a set of models (clusters) that are each notionally appropriate in a sub-region of the input space. The overall approach works in jointly input-output space and an initial version was proposed by Neil Gershenfeld. The procedure for cluster-weighted modeling of an input-output problem can be outlined as follows. In order to construct predicted values for an output variable y from an input variable x, the modeling and calibration procedure arrives at a joint probability density function, p(y,x). Here the 'variables' might be uni-variate, multivariate or time-series. For convenience, any model parameters are not indicated in the notation here and several different treatments of these are possible, including setting them to fixed values as a step in the calibration or treating them using a Bayesian analysis. The required predicted values are obtained by constructing the conditional probability density p(y|x) from which the prediction using the conditional expected value can be obtained, with the conditional variance providing an indication of uncertainty. The important step of the modeling is that p(y|x) is assumed to take the following form, as a mixture model: where n is the number of clusters and {wj} are weights that sum to one. The functions pj(y,x) are joint probability density functions that relate to each of the n clusters. These functions are modeled using a decomposition into a conditional and a marginal density:

[ "Regular conditional probability", "Probability mass function", "Conditional variance" ]
Parent Topic
Child Topic
    No Parent Topic