Maximum entropy probability distribution

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class (usually defined in terms of specified properties or measures), then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time. In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class (usually defined in terms of specified properties or measures), then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time. If X is a discrete random variable with distribution given by then the entropy of X is defined as If X is a continuous random variable with probability density p(x), then the differential entropy of X is defined as The quantity p(x) log p(x) is understood to be zero whenever p(x) = 0. This is a special case of more general forms described in the articles Entropy (information theory), Principle of maximum entropy, and differential entropy. In connection with maximum entropy distributions, this is the only one needed, because maximizing H ( X ) {displaystyle H(X)} will also maximize the more general forms. The base of the logarithm is not important as long as the same one is used consistently: change of base merely results in a rescaling of the entropy. Information theorists may prefer to use base 2 in order to express the entropy in bits; mathematicians and physicists will often prefer the natural logarithm, resulting in a unit of nats for the entropy. The choice of the measure d x {displaystyle dx} is however crucial in determining the entropy and the resulting maximum entropy distribution, even though the usual recourse to the Lebesgue measure is often defended as 'natural' Many statistical distributions of applicable interest are those for which the moments or other measurable quantities are constrained to be constants. The following theorem by Ludwig Boltzmann gives the form of the probability density under these constraints.

[ "Principle of maximum entropy", "Entropy (information theory)", "Differential entropy", "maximum entropy formalism", "Entropy rate", "Typical set" ]
Parent Topic
Child Topic
    No Parent Topic