# The Csiszár-Kullback inequality

Let $f$, $g$ be nonnegative real functions in $L^1(\R^d)$. The
*relative entropy* of $f$ to $g$, denoted by $H[f|g]$,
is defined by
\begin{equation}
\label{eq:rel}
H[f|g] := \ird f \log \frac{f}{g}.
\end{equation}
It is also known by the name of *Kullback relative entropy*; and
by *Kullback-Leibler divergence* or *information divergence*
in the field of information theory, where it is also a particular case
of a *Csiszár $f$-divergence*. At the points $x \in \R^d$ with
$f(x) = 0$ we assume the above integrand to be equal to $0$ for
convenience and we assume it to be $+\infty$ at the points
$x \in \R^d$ with both $g(x) = 0$ and $f(x) > 0$. These are natural
choices, since $\lim_{x \to 0} x \log x = 0$ and
$\lim_{x \to +\infty} \log x = +\infty$. Note that, since
$|x \log x| \leq 1/e$ for all $x \in [0,1]$, one can easily bound the
negative part of the integral in \eqref{eq:rel}:
\begin{equation*}
\int_{\{f \leq g\}} f \left| \log \frac{f}{g} \right|
=
\int_{\{f \leq g\}} g \frac{f}{g} \left| \log \frac{f}{g} \right|
\leq
\frac{1}{e} \|g\|_1.
\end{equation*}Hence the negative part of the above integral must be finite, and one
can safely set $H[f|g] := +\infty$ whenever the positive part of the
integrand in \eqref{eq:rel} is not integrable.

The *Csiszár-Kullback inequality* (or
*Csiszár-Kullback-Pinsker inequality*, or *Pinsker-type
inequality*) originally derived in these papers
by
Csiszár and
Kullback
building on previous work
by Pinsker
, bounds the $L^1$ distance between two functions by
their relative entropy. Its simplest form is the following:

**Theorem 1**.
Let $f$, $g$ be nonnegative real functions in $L^1(\R^d)$ with $\|f\|_1 =
\|g\|_1 = 1$. It holds that
\begin{equation}
\label{ck}
\|f-g\|_1^2 \leq 2 H[f | g].
\end{equation}

The following proof is essentially the one in Theorem 3 of this paper by Gilardoni :

**Proof.**
Call $\Phi(r) := r \log r - r + 1$. Then obviously
\begin{equation*}
H[f|g] = \ird \Phi\left( \frac{f}{g} \right) g.
\end{equation*}
The function $\Phi$ is convex, nonnegative, and attains its minimum
(which is $0$) at $x=1$. By using the lower bound of $\Phi$ in Lemma
2 and the Cauchy-Schwarz inequality we
have
\begin{multline*}
\|f-g\|_1^2
=
\left( \ird \left|\frac fg- 1\right| g \right)^2
\leq
\ird \frac{(\frac fg- 1)^2}{\frac fg + 2} g
\ird \left(\frac fg + 2\right) g
\\
=
3
\ird \frac{(\frac fg- 1)^2}{\frac fg + 2} g
\leq
\frac{1}{2}
\ird \Phi\left( \frac fg \right) g
=
\frac{1}{2} H[f|g],
\end{multline*}
which gives the inequality.
∎

**Lemma 2**.
It holds that
\begin{equation}
\label{Phi-bound-optimal}
\Phi(r) := r \log r - r + 1 \geq \frac{3}{2} \frac{(r-1)^2}{r + 2}
\quad \text{ for all $r \geq 0$.}
\end{equation}

**Proof.**
First, observe that the following inequality holds between the
second-order derivatives of the terms in the inequality:
\begin{equation}
\label{Phi-bo2}
\Phi”(r) = \frac{1}{r}
\geq
\frac{3}{2} \frac{\mathrm{d^2}}{\mathrm{d}r^2}
\left( \frac{(r-1)^2}{r + 2} \right)
=
\frac{27}{(r+2)^3}.
\end{equation}
This amounts to showing that $(r+2)^3 \geq 27 r$, which can be
readily seen, for example, by Taylor-expanding both polynomials
around $r = 1$. Now, since both terms of the inequality
\eqref{Phi-bound-optimal} coincide at $r=1$, and their first
derivatives coincide at $r=1$, equation \eqref{Phi-bo2} proves the
result.
∎

Though the above is a complete proof, the way to guess the right bound for $\Phi$ in the first place, as was done in Lemma 2, may not be obvious. Let’s see how to do this. One idea is to use a simple lower bound for $\Phi(r)$ that somehow involves $(r-1)^2$. Since $\Phi”(r) = 1/r$, one may directly obtain by Taylor’s theorem that, for $r > 0$, \begin{equation} \label{eq:Phi-simple} \Phi(r) = \frac{(1-r)^2}{2 \xi} \geq \frac{(1-r)^2}{2 \max\{1,r\}} \end{equation} for some $\xi \in [1,r]$ (where $[1,r]$ is the interval with endpoints $1$ and $r$, regardless of whether $r$ is above or below $1$). One can then try to repeat the proof above with this bound: \begin{multline*} \|f-g\|_1^2 = \left( \ird \left|\frac fg- 1\right| g \right)^2 \leq \ird \frac{(\frac fg- 1)^2}{\max\{\frac fg, 1\}} g \ird \left( \max\{\frac fg, 1\} \right) g \\ \leq 2 \ird \frac{(\frac fg- 1)^2}{\max\{\frac fg, 1\}} g \leq 4 \ird \Phi\left( \frac fg \right) g = 4 H[f|g]. \end{multline*} Thus, the simple inequality \eqref{eq:Phi-simple} gives the Csiszár-Kullback inequality \eqref{ck} with a constant which is off the optimal one by a factor of $8$. This approach is used, for example, for proving generalized Csiszár-Kullback inequalities in this paper .

Alternatively, one can use the following bound, for a given $\alpha >
0$:
\begin{equation}
\label{Phi-bound}
\Phi(r) \geq C_\alpha \frac{(r-1)^2}{r + \alpha}
\quad \text{ for all $r \geq 0$.}
\end{equation}
where $C_\alpha$ is defined by
\begin{equation*}
C_\alpha := \sup_{r > 0} \frac{\Phi(r) (r+\alpha)}{(r-1)^2}.
\end{equation*}
The fact that $C_\alpha < +\infty$ is easy to check since $\Phi$
behaves as $(r-1)^2/2$ close to $r=1$ and as $x \log x$ close to $x =
+\infty$. Following the same strategy as before, inequality
\eqref{Phi-bound} leads to
\begin{equation}
\label{eq:CK2}
\|f-g\|_1^2
\leq \frac{C_\alpha}{1+\alpha} H[f|g],
\end{equation}
so one can optimize the inequality by finding the value of $\alpha$
that minimizes $C_\alpha/(1+\alpha)$. We know from above that $\alpha
= 2$, $C_\alpha = 3/2$ will do, but is this the best possible value?
If inequality \eqref{Phi-bound} holds, then by comparing the
second-order derivatives of both terms at $r=1$ we have (since both
terms have a minimum at $r=1$)
\begin{equation*}
1 \geq \frac{2 C_\alpha}{1+\alpha},
\end{equation*}
which already shows that through inequality \eqref{Phi-bound} we
cannot do better than Theorem 1. How did we find the best
value of $\alpha$ to use? Assume we *can* reach the optimal
value, this is, assume that we can choose $\alpha$ so that
\begin{equation*}
\frac{C_\alpha}{1+\alpha} = 1/2.
\end{equation*}
Then equality holds for the second derivatives at $r=1$ of both sides
of \eqref{Phi-bound}, so *equality* must also hold for the third
derivatives at $r=1$ (since it is an odd-order derivative). This
implies that
\begin{equation*}
-1 = -\frac{6C}{(1+\alpha)^2} = -\frac{3}{1+\alpha},
\end{equation*}
this is, $\alpha = 2$. One then has to check that for $\alpha=2$ we
have indeed $C_\alpha / (1+\alpha) = 1/2$ (this is what we did in
Lemma 2), and then we are sure that the
value of $\alpha$ is the one that will give us the best result when
deriving \eqref{eq:CK2}.