# Kullback-Leibler Divergence

2021-01-14 10:25:51

http://alpopkes.com/files/kl_divergence.pdf

Kullback-Leibler The divergence

Definition ： Kullback-Leibler Divergence is used to measure the similarity between two distributions （ Or difference ）. For two discrete probability distributions P and Q , Gather at a point X On Kullback-Leibler Divergence is defined as ： D K L ( P ∣ ∣ Q ) = ∑ x ∈ X P ( x ) l o g ( P ( x ) Q ( x ) ) D_{KL}(P||Q)=\sum_{x\in X}^{}P(x)log(\frac{P(x)}{Q(x)} ) DKL​(P∣∣Q)=x∈X∑​P(x)log(Q(x)P(x)​)

For a probability distribution on a continuous variable , Sum becomes integral ： D K L ( P ∣ ∣ Q ) = ∫ − ∞ ∞ p ( x ) l o g p ( x ) q ( x ) D_{KL}(P||Q)=\int_{\mathrm{-\infty } }^{\infty } p(x)log\frac{p(x)}{q(x)} DKL​(P∣∣Q)=∫−∞∞​p(x)logq(x)p(x)​ among p and q yes A probability distribution P and Q Of Probability distribution function

Visual Example

stay Wikipedia entry About KL divergence There is a very nice Schematic diagram

On the left side of the image, we see two Gaussian probability distributions p(x) and q(x). The shadow area on the right side of the image corresponds to p and q Of KL divergence Calculate the integral value . We know

D K L ( P ∣ ∣ Q ) = ∫ − ∞ ∞ p ( x ) l o g p ( x ) q ( x ) = ∫ − ∞ ∞ p ( x ) ( l o g p ( x ) − l o g q ( x ) ) D_{KL}(P||Q)=\int_{\mathrm{-\infty } }^{\infty } p(x)log\frac{p(x)}{q(x)} =\int_{\mathrm{-\infty } }^{\infty }\mathrm{ p(x)\mathit{} } (\mathrm{log} p(x) -\mathrm{ log} q(x)) DKL​(P∣∣Q)=∫−∞∞​p(x)logq(x)p(x)​=∫−∞∞​p(x)(logp(x)−logq(x))

So for x Every point on the axis x i {x_{i}} xi​ , We calculated l o g   p ( x i ) − l o g   q ( x i ) {\mathrm{log} \,p(x_{i})-\mathrm{log} \,q(x_{i})} logp(xi​)−logq(xi​) And then multiply p ( x i ) {p(x_{i})} p(xi​) , Then we take the final result as y Axis Drawing , We get the one on the right y Axis . KL divergence Corresponding to the shadow area on the right above

KL divergence in machine learning

In most machine learning problems , We have a data set X, It consists of an unknown probability distribution P Generated . This P Distribution is Target distribution （ True value distribution ）, We use one Q Distribution P. Then we can use KL divergence To evaluate the approximate distribution Q The stand or fall of . in many instances , for example variational inference , KL divergence It can be used as an optimization criterion to find the best approximate distribution Q.

Interpreting the KL divergence Note : Here we use a probabilistic perspective to explain KL divergence, It's useful for machine learning .

Expected value Expectations KL divergence The expected value is used in the definition formula , Let's review the definition of expectation first . For discrete variables x x x, f ( x ) f(x) f(x) The expected value of is defined as E [ f ( x ) ] = ∑ x f ( x ) p ( x ) E[f(x)]=\sum_{x}^{} f(x)p(x) E[f(x)]=x∑​f(x)p(x) among p ( x ) p(x) p(x) It's a variable. x x x Of Probability density function . Again, for continuous variables, we have E [ f ( x ) ] = ∫ − ∞ ∞ f ( x ) p ( x ) E[f(x)]=\int_{-\infty }^{\infty} f(x)p(x) E[f(x)]=∫−∞∞​f(x)p(x)

Ratio p ( x ) / q ( x ) p(x)/q(x) p(x)/q(x)

Let's take a closer look KL divergence The definition formula of , We can find out KL divergence It's very similar to The definition of expectation . When we set f ( x ) = l o g ( p ( x ) q ( x ) ) f(x)=log(\frac{p(x)}{q(x)} ) f(x)=log(q(x)p(x)​), Then we can see

E [ f ( x ) ] = E x ∼ p ( x ) [ l o g ( p ( x ) q ( x ) ) ] = ∫ − ∞ ∞ p ( x ) l o g p ( x ) q ( x ) d x = D K L ( P ∣ ∣ Q ) E[f(x)]=E_{x\sim p(x)} [log(\frac{p(x)}{q(x)} )] =\int_{\mathrm{-\infty } }^{\infty } p(x)log\frac{p(x)}{q(x)}dx =D_{KL}(P||Q) E[f(x)]=Ex∼p(x)​[log(q(x)p(x)​)]=∫−∞∞​p(x)logq(x)p(x)​dx=DKL​(P∣∣Q)

The KL divergence is simply the expected value of the log-ratio of the entire dataset.

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

https://chowdera.com/2021/01/20210113165554246G.html