The Csiszár-Kullback inequality

Let $f$, $g$ be nonnegative real functions in $L^1(\R^d)$. The relative entropy of $f$ to $g$, denoted by $H[f|g]$, is defined by \begin{equation} \label{eq:rel} H[f|g] := \ird f \log \frac{f}{g}. \end{equation} It is also known by the name of Kullback relative entropy; and by Kullback-Leibler divergence or information divergence in the field of information theory, where it is also a particular case of a Csiszár $f$-divergence. At the points $x \in \R^d$ with $f(x) = 0$ we assume the above integrand to be equal to $0$ for convenience and we assume it to be $+\infty$ at the points $x \in \R^d$ with both $g(x) = 0$ and $f(x) > 0$. These are natural choices, since $\lim_{x \to 0} x \log x = 0$ and $\lim_{x \to +\infty} \log x = +\infty$. Note that, since $|x \log x| \leq 1/e$ for all $x \in [0,1]$, one can easily bound the negative part of the integral in \eqref{eq:rel}: \begin{equation*} \int_{\{f \leq g\}} f \left| \log \frac{f}{g} \right| = \int_{\{f \leq g\}} g \frac{f}{g} \left| \log \frac{f}{g} \right| \leq \frac{1}{e} \|g\|_1. \end{equation*}Hence the negative part of the above integral must be finite, and one can safely set $H[f|g] := +\infty$ whenever the positive part of the integrand in \eqref{eq:rel} is not integrable.

The Csiszár-Kullback inequality (or Csiszár-Kullback-Pinsker inequality, or Pinsker-type inequality) originally derived in these papers by Csiszár and Kullback building on previous work by Pinsker , bounds the $L^1$ distance between two functions by their relative entropy. Its simplest form is the following:

Theorem 1. Let $f$, $g$ be nonnegative real functions in $L^1(\R^d)$ with $\|f\|_1 = \|g\|_1 = 1$. It holds that \begin{equation} \label{ck} \|f-g\|_1^2 \leq 2 H[f | g]. \end{equation}

The following proof is essentially the one in Theorem 3 of this paper by Gilardoni :

Proof. Call $\Phi(r) := r \log r - r + 1$. Then obviously \begin{equation*} H[f|g] = \ird \Phi\left( \frac{f}{g} \right) g. \end{equation*} The function $\Phi$ is convex, nonnegative, and attains its minimum (which is $0$) at $x=1$. By using the lower bound of $\Phi$ in Lemma 2 and the Cauchy-Schwarz inequality we have \begin{multline*} \|f-g\|_1^2 = \left( \ird \left|\frac fg- 1\right| g \right)^2 \leq \ird \frac{(\frac fg- 1)^2}{\frac fg + 2} g \ird \left(\frac fg + 2\right) g \\ = 3 \ird \frac{(\frac fg- 1)^2}{\frac fg + 2} g \leq 2 \ird \Phi\left( \frac fg \right) g = 2 H[f|g], \end{multline*} which gives the inequality. ∎

Lemma 2. It holds that \begin{equation} \label{Phi-bound-optimal} \Phi(r) := r \log r - r + 1 \geq \frac{3}{2} \frac{(r-1)^2}{r + 2} \quad \text{ for all $r \geq 0$.} \end{equation}

Proof. First, observe that the following inequality holds between the second-order derivatives of the terms in the inequality: \begin{equation} \label{Phi-bo2} \Phi”(r) = \frac{1}{r} \geq \frac{3}{2} \frac{\mathrm{d^2}}{\mathrm{d}r^2} \left( \frac{(r-1)^2}{r + 2} \right) = \frac{27}{(r+2)^3}. \end{equation} This amounts to showing that $(r+2)^3 \geq 27 r$, which can be readily seen, for example, by Taylor-expanding both polynomials around $r = 1$. Now, since both terms of the inequality \eqref{Phi-bound-optimal} coincide at $r=1$, and their first derivatives coincide at $r=1$, equation \eqref{Phi-bo2} proves the result. ∎

Though the above is a complete proof, the way to guess the right bound for $\Phi$ in the first place, as was done in Lemma 2, may not be obvious. Let’s see how to do this. One idea is to use a simple lower bound for $\Phi(r)$ that somehow involves $(r-1)^2$. Since $\Phi”(r) = 1/r$, one may directly obtain by Taylor’s theorem that, for $r > 0$, \begin{equation} \label{eq:Phi-simple} \Phi(r) = \frac{(1-r)^2}{2 \xi} \geq \frac{(1-r)^2}{2 \max\{1,r\}} \end{equation} for some $\xi \in [1,r]$ (where $[1,r]$ is the interval with endpoints $1$ and $r$, regardless of whether $r$ is above or below $1$). One can then try to repeat the proof above with this bound: \begin{multline*} \|f-g\|_1^2 = \left( \ird \left|\frac fg- 1\right| g \right)^2 \leq \ird \frac{(\frac fg- 1)^2}{\max\{\frac fg, 1\}} g \ird \left( \max\{\frac fg, 1\} \right) g \\ \leq 2 \ird \frac{(\frac fg- 1)^2}{\max\{\frac fg, 1\}} g \leq 4 \ird \Phi\left( \frac fg \right) g = 4 H[f|g]. \end{multline*} Thus, the simple inequality \eqref{eq:Phi-simple} gives the Csiszár-Kullback inequality \eqref{ck} with a constant which is off the optimal one by a factor of $2$. This approach is used, for example, for proving generalized Csiszár-Kullback inequalities in this paper .

Alternatively, one can use the following bound, for a given $\alpha > 0$: \begin{equation} \label{Phi-bound} \Phi(r) \geq C_\alpha \frac{(r-1)^2}{r + \alpha} \quad \text{ for all $r \geq 0$.} \end{equation} where $C_\alpha$ is defined by \begin{equation*} C_\alpha := \sup_{r > 0} \frac{\Phi(r) (r+\alpha)}{(r-1)^2}. \end{equation*} The fact that $C_\alpha < +\infty$ is easy to check since $\Phi$ behaves as $(r-1)^2/2$ close to $r=1$ and as $x \log x$ close to $x = +\infty$. Following the same strategy as before, inequality \eqref{Phi-bound} leads to \begin{equation} \label{eq:CK2} \|f-g\|_1^2 \leq \frac{C_\alpha}{1+\alpha} H[f|g], \end{equation} so one can optimize the inequality by finding the value of $\alpha$ that minimizes $C_\alpha/(1+\alpha)$. We know from above that $\alpha = 2$, $C_\alpha = 3/2$ will do, but is this the best possible value? If inequality \eqref{Phi-bound} holds, then by comparing the second-order derivatives of both terms at $r=1$ we have (since both terms have a minimum at $r=1$) \begin{equation*} 1 \geq \frac{2 C_\alpha}{1+\alpha}, \end{equation*} which already shows that through inequality \eqref{Phi-bound} we cannot do better than Theorem 1. How did we find the best value of $\alpha$ to use? Assume we can reach the optimal value, this is, assume that we can choose $\alpha$ so that \begin{equation*} \frac{C_\alpha}{1+\alpha} = 1/2. \end{equation*} Then equality holds for the second derivatives at $r=1$ of both sides of \eqref{Phi-bound}, so equality must also hold for the third derivatives at $r=1$ (since it is an odd-order derivative). This implies that \begin{equation*} -1 = -\frac{6C_{\alpha}}{(1+\alpha)^2} = -\frac{3}{1+\alpha}, \end{equation*} this is, $\alpha = 2$. One then has to check that for $\alpha=2$ we have indeed $C_\alpha / (1+\alpha) = 1/2$ (this is what we did in Lemma 2), and then we are sure that the value of $\alpha$ is the one that will give us the best result when deriving \eqref{eq:CK2}.