language-icon Old Web
English
Sign In

Additive smoothing

In statistics, additive smoothing, also called Laplace smoothing (not to be confused with Laplacian smoothing as used in image processing), or Lidstone smoothing, is a technique used to smooth categorical data. Given an observation x   =   ⟨ x 1 , x 2 , … , x d ⟩ { extstyle scriptstyle {mathbf {x} = leftlangle x_{1},,x_{2},,ldots ,,x_{d} ight angle }} from a multinomial distribution with N { extstyle scriptstyle {N}} trials, a 'smoothed' version of the data gives the estimator: In statistics, additive smoothing, also called Laplace smoothing (not to be confused with Laplacian smoothing as used in image processing), or Lidstone smoothing, is a technique used to smooth categorical data. Given an observation x   =   ⟨ x 1 , x 2 , … , x d ⟩ { extstyle scriptstyle {mathbf {x} = leftlangle x_{1},,x_{2},,ldots ,,x_{d} ight angle }} from a multinomial distribution with N { extstyle scriptstyle {N}} trials, a 'smoothed' version of the data gives the estimator: where the 'pseudocount' α > 0 is a smoothing parameter. α = 0 corresponds to no smoothing. (This parameter is explained in § Pseudocount below.) Additive smoothing is a type of shrinkage estimator, as the resulting estimate will be between the empirical probability (relative frequency) x i N { extstyle scriptstyle {frac {x_{i}}{N}}} , and the uniform probability 1 d { extstyle scriptstyle {frac {1}{d}}} . Invoking Laplace's rule of succession, some authors have argued that α should be 1 (in which case the term add-one smoothing is also used), though in practice a smaller value is typically chosen. From a Bayesian point of view, this corresponds to the expected value of the posterior distribution, using a symmetric Dirichlet distribution with parameter α as a prior distribution. In the special case where the number of categories is 2, this is equivalent to using a Beta distribution as the conjugate prior for the parameters of Binomial distribution. Laplace came up with this smoothing technique when he tried to estimate the chance that the sun will rise tomorrow. His rationale was that even given a large sample of days with the rising sun, we still can not be completely sure that the sun will still rise tomorrow (known as the sunrise problem). A pseudocount is an amount (not generally an integer, despite its name) added to the number of observed cases in order to change the expected probability in a model of those data, when not known to be zero. It is so named because, roughly speaking, a pseudo-count of value α { extstyle scriptstyle {alpha }} weighs into the posterior distribution similarly to each category having an additional count of α { extstyle scriptstyle {alpha }} . If the frequency of each item i { extstyle scriptstyle {i}} is x i {displaystyle scriptstyle {x_{i}}} out of N { extstyle scriptstyle {N}} samples, the empirical probability of event i { extstyle scriptstyle {i}} is p i ,   e m p i r i c a l = x i N {displaystyle p_{i, mathrm {empirical} }={frac {x_{i}}{N}}}

[ "Smoothing", "Estimator" ]
Parent Topic
Child Topic
    No Parent Topic