# Levy's continuity theorem

Here are some notes on Lévy’s continuity theorem, which is a very clean result on a type of weak convergence of measures often used in probability. The discussion below is very standard in probability books for the $d=1$ case, but the proof in $\R^d$ is slightly harder to find.

Denote by $\mathcal{C}^b(\R^d)$ the set of real continuous and bounded
functions on $\R^d$. We recall that a sequence of probability measures
$(\mu_n)_{n \geq 1}$ is said to converge to a probability measure
$\mu$ *in the narrow sense* or *in the narrow topology* if
\begin{equation*}
\ird \varphi(x) \d \mu_n(x) \to \ird \varphi(x) \d \mu(x)
\qquad \text{for all $\varphi \in \mathcal{C}^b(\R^d)$.}
\end{equation*}
(In some places this is referred to as *weak convergence*, not to
be confused with the
weak convergence in
Banach spaces.) There is a topology on the set of probability
measures on $\R^n$ (or on the set of all finite measures on $\R^n$),
called the *narrow topology*, for which the concept of
convergence is the one we just defined.

The *Fourier transform*, or *characteristic function* of a
probability measure $\mu$ is given by
\begin{equation*}
\widehat{\mu}(\xi) := \ird e^{-i x \xi} \d \mu(x).
\end{equation*}
Of course, both of these concepts can be defined in more generality
(narrow convergence makes sense for signed finite measures, and the
Fourier transform makes sense for Schwartz distributions in general),
but they are often restricted to probability measures in probabilistic
results. We will need below the following basic properties of the
Fourier transform:

- The Fourier transform of a probability measure $\mu$ uniquely determines $\mu$; that is, if $\mu$, $\nu$ are probability measures with $\widehat{\mu}(\xi) = \widehat{\nu}(\xi)$ for all $\xi \in \R^d$, then $\mu = \nu$.
- The Fourier transform of any probability measure $\mu$ is continuous on $\R^d$ (as is easily seen by the dominated convergence theorem).

The following is one of the “strong” versions of Lévy’s continuity theorem. It is often useful to check whether a sequence of probability measures converges weakly to some probability measure when we have information about their Fourier transform:

**Theorem 1** (Lévy’s continuity theorem). Let
$(\mu_n)_{n \geq 1}$ be a sequence of probability measures on
$\R^d$. The following are equivalent:

- $(\widehat{\mu}_n)_{n \geq 1}$ converges pointwise to a function $\zeta$ which is continuous at $0$.
- $(\mu_n)_{n \geq 1}$ converges in the narrow sense to a probability measure $\mu$.

In this theorem the continuity of $\zeta$ is needed, as the following example shows: take any nonnegative, continuous, compactly supported function $f \colon \R^d \to \R$ and consider the measures $\mu_n$ with densities \begin{equation*} f_n(x) := \frac{1}{n^d} f \big( \frac{x}{n} \big) \qquad \text{for $x \in \R^d$, $n \geq 1$.} \end{equation*} Their Fourier transforms converge pointwise everywhere: they converge to $0$ everywhere except at $\xi=0$, where they are constantly equal to $1$. However, one can easily see that the sequence $(\mu_n)_{n \geq 1}$ does not converge in the narrow sense (though it does converge to zero in the weak-$*$ sense of measures). Of course, the pointwise limit of $\widehat{\mu}_n$ is not continuous at $0$.

A proof of Lévy’s continuity theorem in the case $d=1$ can be found, for example, in Theorem 26.3 and Corollary 1 immediately after, in this book by Billingsley. The case for any $d$ is proved for example in section 18.6 of this book by considering the marginals of $\mu_n$ and using the $d=1$ case. The statement can also be found in these notes by Terry Tao, where it is proposed as an exercise. We give a direct proof, since the general $d$ case is not really harder than $d=1$.

**Proof of Theorem 1.**
One implication is easy: if $\mu_n$ converges to a probability
measure $\mu$ in the narrow topology, it is clear that its Fourier
transform converges pointwise to $\widehat{\mu}$, since both the
real and imaginary parts of $x \mapsto e^{-i x \xi}$ are in
$\mathcal{C}^b(\R^d)$. Of course, $\widehat{\mu}$ is continuous, as
remarked before.
The other implication is the interesting one. Since
$\widehat{\mu}_n$ converge pointwise to $\zeta$, the
dominated convergence theorem shows that, for any $\delta > 0$,
\begin{equation*}
\frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu_n(\xi)) \d \xi
\to
\frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \zeta(\xi)) \d \xi
\qquad \text{as $n \to +\infty$,}
\end{equation*}
where $C_\delta := [-\delta,\delta]^d$ is the cube of side $2\delta$
on $\R^d$. Since $\zeta$ is continuous at $0$, the right hand side
can be made small by choosing $\delta$ appropriately, and then one
can see that by choosing again $\delta$ small enough, the left hand
side can be made as small as we wish, uniformly for all $n$; that
is, for all $\epsilon > 0$ there exists $\delta > 0$ such that
\begin{equation*}
\frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu_n(\xi)) \d \xi
\leq \epsilon
\qquad \text{for all $n \geq 1$.}
\end{equation*}
Lemma 3 then shows that the sequence
$(\mu_n)_{n \geq 0}$ is tight, so by
Prokhorov’s
theorem it must have a subsequence which converges weakly to a
probability measure $\mu$. Due to the implication we proved first,
it must happen that $\widehat{\mu} = \zeta$, so $\mu$ is uniquely
determined by $\zeta$. In fact, this reasoning applies to any
subsequence of $(\mu_n)_{n \geq 0}$, so the whole sequence must
converge weakly to $\mu$.
∎

In the previous proof we used the following well-known property, which holds in any topology and in particular in the narrow topology:

**Lemma 2**.
Let $\mu$ be a Borel probability measure on $\R^d$. Assume that a
sequence of probability measures $(\mu_n)_{n \geq 1}$ has the
property that any subsequence of $(\mu_n)_{n \geq 1}$ must have a
further subsequence that converges narrowly to $\mu$. Then the whole
sequence $(\mu_n)_{n \geq 1}$ converges narrowly to $\mu$.

The following lemma, used in a crucial way in the proof of Theorem 1, is an expression of the general principle that the regularity of the Fourier transform of a function is related to the decay of the tail of the function (and conversely, since the inverse Fourier transform has essentially the same properties). The lemma says that the tail behavior of a probability measure can be estimated by the continuity of its Fourier transform at $\xi=0$:

**Lemma 3**.
For a Borel probability measure $\mu$ on $\R$ we have
\begin{equation*}
\int_{|x| \geq 2/\delta} \d \mu(x)
\leq
\frac{1}{\delta}\int_{-\delta}^\delta (1 -
\widehat\mu(\xi)) \d \xi
\qquad \text{for all $\delta > 0$.}
\end{equation*}
For a Borel probability measure $\mu$ on $\R^d$, calling
$C_r := [-r,r]^d \subseteq \R^d$, we have
\begin{equation*}
\frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x)
\leq
\frac{1}{(2\delta)^d}\int_{C_\delta} (1 -
\widehat\mu(\xi)) \d \xi
\qquad \text{for all $\delta > 0$.}
\end{equation*}

**Proof.**
**The case $d=1$.** We first write the case in dimension $1$,
where the calculation is simpler and the argument is seen more
clearly. A good way to show the result is to notice that, by
Fubini’s theorem,
\begin{equation*}
\frac{1}{2\delta} \int_{-\delta}^\delta \widehat{\mu}(\xi) \d \xi
= \frac{1}{2\delta} \int_{-\infty}^\infty \widehat{\psi}_\delta(x) \d \mu(x)
\end{equation*}
where $\psi_\delta := 𝟙_{[-\delta,\delta]}$. The left hand
side is the average of the Fourier transform on
$[-\delta,\delta]$; and the right hand side is not far from the
integral of $\mu$ on a large set. We can calculate explicitly
$\widehat{\psi}_\delta$:
\begin{equation*}
\widehat{\psi}_\delta(x) = \int_{-\delta}^\delta e^{-i x \xi} \d \xi
= \frac{2}{x} \sin(\delta x),
\qquad x \in \R.
\end{equation*}
Since $\sin(y)/y \leq 1$ for all $y$ we have
\begin{multline*}
\int_{-\infty}^\infty
\left(1 - \frac{1}{2\delta} \widehat{\psi}_\delta(x) \right) \d \mu(x)
\geq
\int_{|x| \geq 2/\delta}
\left(1 - \frac{|\sin(\delta x)|}{\delta|x|} \right) \d \mu(x)
\\
\geq
\int_{|x| \geq 2/\delta}
\left(1 - \frac{1}{\delta |x|} \right) \d \mu(x)
\geq
\frac12 \int_{|x| \geq 2/\delta} \d \mu(x).
\end{multline*}
Hence
\begin{multline*}
\frac12 \int_{|x| \geq 2/\delta} \d \mu(x)
\leq
1 - \frac{1}{2\delta} \int_{-\infty}^\infty \widehat{\psi}_\delta(x) \d \mu(x)
\\
=
1 - \frac{1}{2\delta} \int_{-\delta}^\delta
\widehat{\mu}(\xi) \d \xi
= \frac{1}{2\delta} \int_{-\delta}^\delta
\left( 1 - \widehat{\mu}(\xi) \right) \d \xi.
\end{multline*}
**General proof for any dimension $d$.** The result can be
proved in $\R^d$ for any $d \geq 1$ with essentially the same
calculation, and we write a parallel proof. Calling
$C_\delta = [-\delta, \delta]^d$ we have, again by Fubini’s theorem,
\begin{equation*}
\frac{1}{(2\delta)^d} \int_{C_\delta} \widehat{\mu}(\xi) \d \xi
= \frac{1}{(2\delta)^d} \ird \widehat{\Psi}_\delta(x) \d \mu(x),
\end{equation*}
where now
\begin{equation*}
\Psi_\delta(x) := 𝟙_{C_\delta}(x) = \prod_{i=1}^d \psi_\delta(x_i)
\qquad \text{for $x = (x_1, \dots, x_d) \in \R^d$}.
\end{equation*}
Its Fourier transform is
\begin{equation*}
\widehat{\Psi}_\delta(\xi) = \prod_{i=1}^d
\widehat{\psi}_\delta(\xi_i)
= \prod_{i=1}^d \frac{2}{\xi_i} \sin(\delta \xi_i),
\qquad \xi \in \R^d.
\end{equation*}
Notice that $(2\delta)^{-d} \widehat{\Psi}_\delta(x) \leq 1$ for
all $x \in \R^d$, and
\begin{equation*}
(2\delta)^{-d} | \widehat{\Psi}_\delta(x) | \leq \frac12
\qquad
\text{for all $x \in \R^d \setminus C_{2/\delta}$},
\end{equation*}
since outside the cube $C_{2/\delta}$ at least one of the
coordinates must be larger than $2/\delta$. Then
\begin{multline*}
\ird
\left(1 - \frac{1}{(2\delta)^d} \widehat{\Psi}_\delta(x) \right) \d \mu(x)
\geq
\int_{\R^d \setminus C_{2/\delta}}
\left(1 - \frac{1}{(2\delta)^d} |\widehat{\Psi}_\delta(x)| \right) \d \mu(x)
\\
\geq
\frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x),
\end{multline*}
so
\begin{multline*}
\frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x)
\leq
1 - \frac{1}{(2\delta)^d} \ird \widehat{\Psi}_\delta(x) \d \mu(x)
\\
=
1 - \frac{1}{(2\delta)^d} \int_{C_\delta}
\widehat{\mu}(\xi) \d \xi
= \frac{1}{(2\delta)^d} \int_{C_\delta}
\left( 1 - \widehat{\mu}(\xi) \right) \d \xi.
\end{multline*}
∎

## A weak version of Lévy’s continuity theorem

There is also a weaker version of Lévy’s continuity theorem which is commonly found in probability books, and which can be proved in a more elementary way. For example, it is easy to prove it without using Prokhorov’s theorem:

**Corollary 4** (Lévy’s continuity theorem, special case).
Let $\mu$ and $\mu_n$ be Borel probability measures on $\R^d$, for
integer $n \geq 1$. Then $\mu_n$ converges to $\mu$ in the narrow
topology if and only if $\widehat{\mu}_n$ converges pointwise to
$\widehat{\mu}$.

This is obviously a consequence of Theorem 1, since $\widehat{\mu}$ is continuous. The main difference is that we are assuming that $\widehat{\mu}_n$ converges pointwise to a function which is already known to be the Fourier transform of a probability measure, and not to any function $\zeta$.

This result can be also be proved in the following way, which is an interesting alternative proof. We will need to know the following result on weak convergence in $\R^d$ (it is not too hard to show, and we leave it as an exercise):

**Exercise 5**.
Let $\mu$ and $\mu_n$ be Borel probability measures on $\R^d$, for
integer $n \geq 1$. The following are equivalent:
\begin{align*}
\ird \phi \mu_n \to \ird \phi \mu
\quad \text{as $n \to +\infty$,}
&\quad \text{for all $\phi \in \mathcal{C}^b(\R^d)$.}
\\
\ird \phi \mu_n \to \ird \phi \mu
\quad \text{as $n \to +\infty$,}
&\quad \text{for all $\phi \in \mathcal{C}^\infty_{\mathrm{c}}(\R^d)$.}
\end{align*}
That is: narrow convergence is equivalent to convergence against
smooth, compactly supported test functions (as long as we already
know the limit is a probability measure).

**Proof of Corollary 4.**
As usual, if $\mu_n$ converges to $\mu$ in the narrow sense, it is
clear that their Fourier transforms converge pointwise, so all we
need to show is the converse implication.
Take $\phi \in \mathcal{C}^\infty_{\mathrm{c}}(\R^d)$. Since its
Fourier transform is a Schwartz function, we can use Fourier’s
inversion theorem to see that
\begin{equation*}
\ird \phi \mu_n = \ird \widehat{\phi} \, \widehat{\mu_n}.
\end{equation*}
Since $\widehat{\phi}$ is integrable and $|\widehat{\mu_n}|$ is
uniformly bounded by $1$, the dominated convergence theorem shows
that $\ird \phi \mu_n \to \ird \widehat{\phi} \, \widehat{\mu} =
\ird \phi \mu$ as $n \to +\infty$.
∎