# Levy's continuity theorem

Here are some notes on Lévy’s continuity theorem, which is a very clean result on a type of weak convergence of measures often used in probability. The discussion below is very standard in probability books for the $d=1$ case, but the proof in $\R^d$ is slightly harder to find.

Denote by $\mathcal{C}^b(\R^d)$ the set of real continuous and bounded
functions on $\R^d$. We recall that a sequence of probability measures
$(\mu_n)_{n \geq 1}$ is said to converge to a probability measure
$\mu$ *in the narrow sense* or *in the narrow topology* if
\begin{equation*}
\ird \varphi(x) \d \mu_n(x) \to \ird \varphi(x) \d \mu(x)
\qquad \text{for all $\varphi \in \mathcal{C}^b(\R^d)$.}
\end{equation*}
(In some places this is referred to as *weak convergence*, but
for me this conflicts with the
weak convergence in
Banach spaces, which is slightly different.) There is a topology on
the set of probability measures on $\R^n$ (or on the set of all finite
measures on $\R^n$), called the *narrow topology*, for which the
concept of convergence is the one we just defined.

The *Fourier transform*, or *characteristic function* of a
probability measure $\mu$ is given by
\begin{equation*}
\widehat{\mu}(\xi) := (2 \pi)^{-d/2} \ird e^{-i x \xi} \d \mu(x).
\end{equation*}
Of course, both of these concepts can be defined in more generality
(narrow convergence makes sense for signed finite measures, and the
Fourier transform makes sense for Schwartz distributions in general),
but they are often restricted to probability measures in probabilistic
results.

Lévy’s continuity theorem is the following:

**Theorem 1**.
Let $\mu$ and $\mu_n$ be Borel probability measures on $\R^d$, for
integer $n \geq 1$. Then $\mu_n$ converges to $\mu$ in the narrow
topology if and only if $\widehat{\mu}_n$ converges pointwise to
$\widehat{\mu}$.

The case $d=1$ can be found, for example, in Theorem 26.3 of this book by Billingsley. The case for any $d$ is proved in section 18.6 of this book by considering the marginals of $\mu_n$ and using the $d=1$ case. We give a direct proof, since the general $d$ case is not really harder than $d=1$.

**Proof.**(Proof of Theorem 1.)
One implication is direct: if $\mu_n$ converges to $\mu$ in the
narrow topology, it is clear that its Fourier transform converges
pointwise to $\widehat{\mu}$, since both the real and imaginary parts of
$x \mapsto e^{-i x \xi}$ are in $\mathcal{C}^b(\R^d)$.
The other implication is a bit more subtle. Since $\widehat{\mu}_n$
converges pointwise to $\widehat{\mu}$, the dominated convergence
theorem shows that, for any $\delta > 0$,
\begin{equation*}
\frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu_n(\xi)) \d \xi
\to
\frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu(\xi)) \d \xi
\qquad \text{as $n \to +\infty$,}
\end{equation*}
where $C_\delta := [-\delta,\delta]^d$ is the cube of side $2\delta$
on $\R^d$. Since $\widehat{\mu}$ is continuous at $0$ (and we have $\widehat{\mu}(0)=1$), the right
hand side can be made small by choosing $\delta$ appropriately, and
then one can see that by choosing the right $\delta$, the left hand
side can be made as small as we wish, uniformly for all $n$; that
is, for all $\epsilon > 0$ there exists $\delta > 0$ such that
\begin{equation*}
\frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu_n(\xi)) \d \xi
\leq \epsilon
\qquad \text{for all $n \geq 1$.}
\end{equation*}
Lemma 3 then shows that the sequence
$(\mu_n)_{n \geq 0}$ is tight, so by
Prokhorov’s
theorem it must have a subsequence which converges weakly to a
probability measure. This probability measure must be $\mu$, due to
the implication we proved first. In fact, this reasoning applies to
any subsequence of $(\mu_n)_{n \geq 0}$, so the whole sequence must
converge weakly to $\mu$.
∎

In the previous proof we used the following well-known property, which holds in any topology and in particular in the narrow topology:

**Lemma 2**.
Let $\mu$ be a Borel probability measure on $\R^d$. Assume that a
sequence of probability measures $(\mu_n)_{n \geq 1}$ has the
property that any subsequence of $(\mu_n)_{n \geq 1}$ must have a
further subsequence that converges narrowly to $\mu$. Then the whole
sequence $(\mu_n)_{n \geq 1}$ converges narrowly to $\mu$.

The following lemma, used in the proof of Theorem 1, is an expression of the general principle that the regularity of the Fourier transform of a function is related to the decay of the tail of the function (and conversely, since the inverse Fourier transform has essentially the same properties). The lemma says that the tail behavior of a probability measure can be estimated by the continuity of its Fourier transform at $\xi=0$:

**Lemma 3**.
For a Borel probability measure $\mu$ on $\R$ we have
\begin{equation*}
\int_{|x| \geq 2/\delta} \d \mu(x)
\leq
\frac{1}{\delta}\int_{-\delta}^\delta (1 -
\widehat\mu(\xi)) \d \xi
\qquad \text{for all $\delta > 0$.}
\end{equation*}
For a Borel probability measure $\mu$ on $\R^d$, calling
$C_r := [-r,r]^d \subseteq \R^d$, we have
\begin{equation*}
\frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x)
\leq
\frac{1}{(2\delta)^d}\int_{C_\delta} (1 -
\widehat\mu(\xi)) \d \xi
\qquad \text{for all $\delta > 0$.}
\end{equation*}

**Proof.**
**The case $d=1$.** We first write the case in dimension $1$,
where the calculation is simpler and the argument is seen more
clearly. A good way to show the result is to notice that, by
Fubini’s theorem,
\begin{equation*}
\frac{1}{2\delta} \int_{-\delta}^\delta \widehat{\mu}(\xi) \d \xi
= \frac{1}{2\delta} \int_{-\infty}^\infty \widehat{\psi}_\delta(x) \d \mu(x)
\end{equation*}
where $\psi_\delta := 𝟙_{[-\delta,\delta]}$. The left hand
side is the average of the Fourier transform on
$[-\delta,\delta]$; and the right hand side is not far from the
integral of $\mu$ on a large set. We can calculate explicitly
$\widehat{\psi}_\delta$:
\begin{equation*}
\widehat{\psi}_\delta(x) = \int_{-\delta}^\delta e^{-i x \xi} \d \xi
= \frac{2}{x} \sin(\delta x),
\qquad x \in \R.
\end{equation*}
Since $\sin(y)/y \leq 1$ for all $y$ we have
\begin{multline*}
\int_{-\infty}^\infty
\left(1 - \frac{1}{2\delta} \widehat{\psi}_\delta(x) \right) \d \mu(x)
\geq
\int_{|x| \geq 2/\delta}
\left(1 - \frac{|\sin(\delta x)|}{\delta|x|} \right) \d \mu(x)
\\
\geq
\int_{|x| \geq 2/\delta}
\left(1 - \frac{1}{\delta |x|} \right) \d \mu(x)
\geq
\frac12 \int_{|x| \geq 2/\delta} \d \mu(x).
\end{multline*}
Hence
\begin{multline*}
\frac12 \int_{|x| \geq 2/\delta} \d \mu(x)
\leq
1 - \frac{1}{2\delta} \int_{-\infty}^\infty \widehat{\psi}_\delta(x) \d \mu(x)
\\
=
1 - \frac{1}{2\delta} \int_{-\delta}^\delta
\widehat{\mu}(\xi) \d \xi
= \frac{1}{2\delta} \int_{-\delta}^\delta
\left( 1 - \widehat{\mu}(\xi) \right) \d \xi.
\end{multline*}
**General proof for any dimension $d$.** The result can be
proved in $\R^d$ for any $d \geq 1$ with essentially the same
calculation, and we write a parallel proof. Calling
$C_\delta = [-\delta, \delta]^d$ we have, again by Fubini’s theorem,
\begin{equation*}
\frac{1}{(2\delta)^d} \int_{C_\delta} \widehat{\mu}(\xi) \d \xi
= \frac{1}{(2\delta)^d} \ird \widehat{\Psi}_\delta(x) \d \mu(x),
\end{equation*}
where now
\begin{equation*}
\Psi_\delta(x) := 𝟙_{C_\delta}(x) = \prod_{i=1}^d \psi_\delta(x_i)
\qquad \text{for $x = (x_1, \dots, x_d) \in \R^d$}.
\end{equation*}
Its Fourier transform is
\begin{equation*}
\widehat{\Psi}_\delta(\xi) = \prod_{i=1}^d
\widehat{\psi}_\delta(\xi_i)
= \prod_{i=1}^d \frac{2}{\xi_i} \sin(\delta \xi_i),
\qquad \xi \in \R^d.
\end{equation*}
Notice that $(2\delta)^{-d} \widehat{\Psi}_\delta(x) \leq 1$ for
all $x \in \R^d$, and
\begin{equation*}
(2\delta)^{-d} | \widehat{\Psi}_\delta(x) | \leq \frac12
\qquad
\text{for all $x \in \R^d \setminus C_{2/\delta}$},
\end{equation*}
since outside the cube $C_{2/\delta}$ at least one of the
coordinates must be larger than $2/\delta$. Then
\begin{multline*}
\ird
\left(1 - \frac{1}{(2\delta)^d} \widehat{\Psi}_\delta(x) \right) \d \mu(x)
\geq
\int_{\R^d \setminus C_{2/\delta}}
\left(1 - \frac{1}{(2\delta)^d} |\widehat{\Psi}_\delta(x)| \right) \d \mu(x)
\\
\geq
\frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x),
\end{multline*}
so
\begin{multline*}
\frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x)
\leq
1 - \frac{1}{(2\delta)^d} \ird \widehat{\Psi}_\delta(x) \d \mu(x)
\\
=
1 - \frac{1}{(2\delta)^d} \int_{C_\delta}
\widehat{\mu}(\xi) \d \xi
= \frac{1}{(2\delta)^d} \int_{C_\delta}
\left( 1 - \widehat{\mu}(\xi) \right) \d \xi.
\end{multline*}
∎

In the continuity theorem 1 it is necessary to assume that $\mu$ is a probability, as can be seen by taking any nonnegative, continuous, compactly supported function $f \: \R^d \to \R$ and considering the measures $\mu_n$ with densities \begin{equation*} f_n(x) := \frac{1}{n^d} f \big( \frac{x}{n} \big) \qquad \text{for $x \in \R^d$, $n \geq 1$.} \end{equation*} Their Fourier transforms converge pointwise to $0$ everywhere except at $\xi=0$, where they are constantly equal to $1$. However one can easily see that the sequence $(\mu_n)_{n \geq 1}$ does not converge in the narrow sense (though it does converge to zero in the weak-$*$ sense of measures). Observe that the pointwise limit of $\widehat{\mu}_n$ is not continuous at $0$. There is another version of Lévy’s continuity theorem that says this is the only way this can fail. This version is sometimes more useful to check whether a sequence of probability measures converges weakly to a probability measure, assuming that we only have information about its Fourier transform:

**Theorem 4**. Let
$(\mu_n)_{n \geq 1}$ be a sequence of probability measures on
$\R^d$. The following are equivalent:

- $(\widehat{\mu}_n)_{n \geq 1}$ converges pointwise to a function $\zeta$ which is continuous at $0$.
- $(\mu_n)_{n \geq 1}$ converges in the narrow sense to a probability measure $\mu$.

**Proof.**
The proof is essentially a repetition of the same ideas which led to
Theorem 1. If the second statement holds then from
Theorem 1 we already know that
$\widehat{\mu}_n \to \widehat{\mu}$ pointwise, so the first
statement holds since $\hat{\mu}$ must be continuous.
Conversely, if the first statement holds, then the same proof of
Theorem 1 shows that the sequence $(\mu_n)_{n \geq 1}$
is tight, since in that part of the proof we only used the
continuity of the pointwise limit of $(\widehat{\mu})_n$ (and notice that,
since the $\mu_n$ are probabilities, $\widehat{\mu}_n(0)=1$ and hence $\widehat{\mu}(0)=1$).
Again by Prokhorov’s theorem, $\mu_n$ has a subsequence which converges in
the narrow sense to some measure $\mu$. This $\mu$ must then be
nonnegative, and then $\mu$ must in fact be a probability measure
since $\ird \d \mu = \lim_{n \to +\infty} \ird \d \mu_n = 1$, by
definition of narrow convergence. Of course, then
$\widehat{\mu} = \zeta$ due to the implication $2 \Rightarrow 1$ we
just proved. In fact, this reasoning holds for all subsequences of
$(\mu_n)_{n \geq 1}$, so we conclude the whole sequence must
converge in the narrow sense to the only measure $\mu$ such that
$\widehat{\mu} = \zeta$, which is in addition a probability measure.
∎