Mean shift is a non-parametric feature-space analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm. Application domains include cluster analysis in computer vision and image processing. K ( x ) = { 1 if ‖ x ‖ ≤ λ 0 if ‖ x ‖ > λ {displaystyle K(x)={egin{cases}1&{ ext{if}} |x|leq lambda \0&{ ext{if}} |x|>lambda \end{cases}}} f ( x ) = ∑ i K ( x − x i ) = ∑ i k ( ‖ x − x i ‖ 2 h 2 ) {displaystyle f(x)=sum _{i}K(x-x_{i})=sum _{i}kleft({frac {|x-x_{i}|^{2}}{h^{2}}} ight)} k ( x ) = { 1 if x ≤ λ 0 if x > λ {displaystyle k(x)={egin{cases}1&{ ext{if}} xleq lambda \0&{ ext{if}} x>lambda \end{cases}}} k ( x ) = e − x 2 σ 2 , {displaystyle k(x)=e^{-{frac {x}{2sigma ^{2}}}},} Mean shift is a non-parametric feature-space analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm. Application domains include cluster analysis in computer vision and image processing. The mean shift procedure was originally presented in 1975 by Fukunaga and Hostetler. Mean shift is a procedure for locating the maxima—the modes—of a density function given discrete data sampled from that function. This is an iterative method, and we start with an initial estimate x {displaystyle x} . Let a kernel function K ( x i − x ) {displaystyle K(x_{i}-x)} be given. This function determines the weight of nearby points for re-estimation of the mean. Typically a Gaussian kernel on the distance to the current estimate is used, K ( x i − x ) = e − c | | x i − x | | 2 {displaystyle K(x_{i}-x)=e^{-c||x_{i}-x||^{2}}} . The weighted mean of the density in the window determined by K {displaystyle K} is where N ( x ) {displaystyle N(x)} is the neighborhood of x {displaystyle x} , a set of points for which K ( x i ) ≠ 0 {displaystyle K(x_{i}) eq 0} . The difference m ( x ) − x {displaystyle m(x)-x} is called mean shift in Fukunaga and Hostetler. The mean-shift algorithm now sets x ← m ( x ) {displaystyle xleftarrow m(x)} , and repeats the estimation until m ( x ) {displaystyle m(x)} converges. Although the mean shift algorithm has been widely used in many applications, a rigid proof for the convergence of the algorithm using a general kernel in a high dimensional space is still not known. Aliyari Ghassabeh showed the convergence of the mean shift algorithm in one-dimension with a differentiable, convex, and strictly decreasing profile function. However, the one-dimensional case has limited real world applications. Also, the convergence of the algorithm in higher dimensions with a finite number of the (or isolated) stationary points has been proved. However, sufficient conditions for a general kernel function to have finite (or isolated) stationary points have not been provided. Let data be a finite set S {displaystyle S} embedded in the n-dimensional Euclidean space, X. Let K be a flat kernel that is the characteristic function of the λ {displaystyle lambda } -ball in X, In each iteration of the algorithm, s ← m ( s ) {displaystyle sleftarrow m(s)} is performed for all s ∈ S {displaystyle sin S} simultaneously. The first question, then, is how to estimate the density function given a sparse set of samples. One of the simplest approaches is to just smooth the data, e.g., by convolving it with a fixed kernel of width h {displaystyle h} , where x i {displaystyle x_{i}} are the input samples and k ( r ) {displaystyle k(r)} is the kernel function (or Parzen window). h is the only parameter in the algorithm and is called the bandwidth. This approach is known as kernel density estimation or the Parzen window technique. Once we have computed f ( x ) {displaystyle f(x)} from equation above, we can find its local maxima using gradient ascent or some other optimization technique. The problem with this 'brute force' approach is that, for higher dimensions, it becomes computationally prohibitive to evaluate f ( x ) {displaystyle f(x)} over the complete search space. Instead, mean shift uses a variant of what is known in the optimization literature as multiple restart gradient descent. Starting at some guess for a local maximum, y k {displaystyle y_{k}} , which can be a random input data point x 1 {displaystyle x_{1}} , mean shift computes the gradient of the density estimate f ( x ) {displaystyle f(x)} at y k {displaystyle y_{k}} and takes an uphill step in that direction.