Ensemble Kalman filter

The ensemble Kalman filter (EnKF) is a recursive filter suitable for problems with a large number of variables, such as discretizations of partial differential equations in geophysical models. The EnKF originated as a version of the Kalman filter for large problems (essentially, the covariance matrix is replaced by the sample covariance), and it is now an important data assimilation component of ensemble forecasting. EnKF is related to the particle filter (in this context, a particle is the same thing as ensemble member) but the EnKF makes the assumption that all probability distributions involved are Gaussian; when it is applicable, it is much more efficient than the particle filter. The ensemble Kalman filter (EnKF) is a recursive filter suitable for problems with a large number of variables, such as discretizations of partial differential equations in geophysical models. The EnKF originated as a version of the Kalman filter for large problems (essentially, the covariance matrix is replaced by the sample covariance), and it is now an important data assimilation component of ensemble forecasting. EnKF is related to the particle filter (in this context, a particle is the same thing as ensemble member) but the EnKF makes the assumption that all probability distributions involved are Gaussian; when it is applicable, it is much more efficient than the particle filter. The ensemble Kalman filter (EnKF) is a Monte Carlo implementation of the Bayesian update problem: given a probability density function (pdf) of the state of the modeled system (the prior, called often the forecast in geosciences) and the data likelihood, Bayes' theorem is used to obtain the pdf after the data likelihood has been taken into account (the posterior, often called the analysis). This is called a Bayesian update. The Bayesian update is combined with advancing the model in time, incorporating new data from time to time. The original Kalman filter, introduced in 1960, assumes that all pdfs are Gaussian (the Gaussian assumption) and provides algebraic formulas for the change of the mean and the covariance matrix by the Bayesian update, as well as a formula for advancing the covariance matrix in time provided the system is linear. However, maintaining the covariance matrix is not feasible computationally for high-dimensional systems. For this reason, EnKFs were developed. EnKFs represent the distribution of the system state using a collection of state vectors, called an ensemble, and replace the covariance matrix by the sample covariance computed from the ensemble. The ensemble is operated with as if it were a random sample, but the ensemble members are really not independent – the EnKF ties them together. One advantage of EnKFs is that advancing the pdf in time is achieved by simply advancing each member of the ensemble. Let us review first the Kalman filter. Let x {displaystyle mathbf {x} } denote the n {displaystyle n} -dimensional state vector of a model, and assume that it has Gaussian probability distribution with mean μ {displaystyle mathbf {mu } } and covariance Q {displaystyle Q} , i.e., its pdf is Here and below, ∝ {displaystyle propto } means proportional; a pdf is always scaled so that its integral over the whole space is one. This p ( x ) {displaystyle p(mathbf {x} )} , called the prior, was evolved in time by running the model and now is to be updated to account for new data. It is natural to assume that the error distribution of the data is known; data have to come with an error estimate, otherwise they are meaningless. Here, the data d {displaystyle mathbf {d} } is assumed to have Gaussian pdf with covariance R {displaystyle R} and mean H x {displaystyle Hmathbf {x} } , where H {displaystyle H} is the so-called observation matrix. The covariance matrix R {displaystyle R} describes the estimate of the error of the data; if the random errors in the entries of the data vector d {displaystyle mathbf {d} } are independent, R {displaystyle R} is diagonal and its diagonal entries are the squares of the standard deviation (“error size”) of the error of the corresponding entries of the data vector d {displaystyle mathbf {d} } . The value H x {displaystyle Hmathbf {x} } is what the value of the data would be for the state x {displaystyle mathbf {x} } in the absence of data errors. Then the probability density p ( d | x ) {displaystyle p(mathbf {d} |mathbf {x} )} of the data d {displaystyle mathbf {d} } conditional of the system state x {displaystyle mathbf {x} } , called the data likelihood, is The pdf of the state and the data likelihood are combined to give the new probability density of the system state x {displaystyle mathbf {x} } conditional on the value of the data d {displaystyle mathbf {d} } (the posterior) by the Bayes theorem, The data d {displaystyle mathbf {d} } is fixed once it is received, so denote the posterior state by x ^ {displaystyle mathbf {hat {x}} } instead of x | d {displaystyle mathbf {x} |mathbf {d} } and the posterior pdf by p ( x ^ ) {displaystyle pleft(mathbf {hat {x}} ight)} . It can be shown by algebraic manipulations that the posterior pdf is also Gaussian, with the posterior mean μ ^ {displaystyle mathbf {hat {mu }} } and covariance Q ^ {displaystyle {hat {Q}}} given by the Kalman update formulas

Parent Topic

Child Topic

No Parent Topic