language-icon Old Web
English
Sign In

Cross entropy

In information theory, the cross entropy between two probability distributions p {displaystyle p} and q {displaystyle q} over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution q {displaystyle q} , rather than the true distribution p {displaystyle p} . H ( p , q ) = − ∑ x ∈ X p ( x ) log ⁡ q ( x ) {displaystyle H(p,q)=-sum _{xin {mathcal {X}}}p(x),log q(x)}     (Eq.1) H ( p , q ) = − ∫ X P ( x ) log ⁡ Q ( x ) d r ( x ) {displaystyle H(p,q)=-int _{mathcal {X}}P(x),log Q(x),dr(x)}     (Eq.2) In information theory, the cross entropy between two probability distributions p {displaystyle p} and q {displaystyle q} over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution q {displaystyle q} , rather than the true distribution p {displaystyle p} . The cross entropy for the distributions p {displaystyle p} and q {displaystyle q} over a given set is defined as follows: The definition may be formulated using the Kullback–Leibler divergence D K L ( p ‖ q ) {displaystyle D_{mathrm {KL} }(p|q)} of q {displaystyle q} from p {displaystyle p} (also known as the relative entropy of p {displaystyle p} with respect to q {displaystyle q} — note the reversal of emphasis). where H ( p ) {displaystyle H(p)} is the entropy of p {displaystyle p} . For discrete probability distributions p {displaystyle p} and q {displaystyle q} with the same support X {displaystyle {mathcal {X}}} this means The situation for continuous distributions is analogous. We have to assume that p {displaystyle p} and q {displaystyle q} are absolutely continuous with respect to some reference measure r {displaystyle r} (usually r {displaystyle r} is a Lebesgue measure on a Borel σ-algebra). Let P {displaystyle P} and Q {displaystyle Q} be probability density functions of p {displaystyle p} and q {displaystyle q} with respect to r {displaystyle r} . Then

[ "Principle of maximum entropy", "Algorithm", "Statistics", "Machine learning", "Entropy (information theory)" ]
Parent Topic
Child Topic
    No Parent Topic