Levy's continuity theorem

Here are some notes on Lévy’s continuity theorem, which is a very clean result on a type of weak convergence of measures often used in probability. The discussion below is very standard in probability books for the $d=1$ case, but the proof in $\R^d$ is slightly harder to find.

Denote by $\mathcal{C}^b(\R^d)$ the set of real continuous and bounded functions on $\R^d$. We recall that a sequence of probability measures $(\mu_n)_{n \geq 1}$ is said to converge to a probability measure $\mu$ in the narrow sense or in the narrow topology if \begin{equation*} \ird \varphi(x) \d \mu_n(x) \to \ird \varphi(x) \d \mu(x) \qquad \text{for all $\varphi \in \mathcal{C}^b(\R^d)$.} \end{equation*} (In some places this is referred to as weak convergence, not to be confused with the weak convergence in Banach spaces.) There is a topology on the set of probability measures on $\R^n$ (or on the set of all finite measures on $\R^n$), called the narrow topology, for which the concept of convergence is the one we just defined.

The Fourier transform, or characteristic function of a probability measure $\mu$ is given by \begin{equation*} \widehat{\mu}(\xi) := \ird e^{-i x \xi} \d \mu(x). \end{equation*} Of course, both of these concepts can be defined in more generality (narrow convergence makes sense for signed finite measures, and the Fourier transform makes sense for Schwartz distributions in general), but they are often restricted to probability measures in probabilistic results. We will need below the following basic properties of the Fourier transform:

The Fourier transform of a probability measure $\mu$ uniquely determines $\mu$; that is, if $\mu$, $\nu$ are probability measures with $\widehat{\mu}(\xi) = \widehat{\nu}(\xi)$ for all $\xi \in \R^d$, then $\mu = \nu$.
The Fourier transform of any probability measure $\mu$ is continuous on $\R^d$ (as is easily seen by the dominated convergence theorem).

The following is one of the “strong” versions of Lévy’s continuity theorem. It is often useful to check whether a sequence of probability measures converges weakly to some probability measure when we have information about their Fourier transform:

Theorem 1 (Lévy’s continuity theorem). Let $(\mu_n)_{n \geq 1}$ be a sequence of probability measures on $\R^d$. The following are equivalent:

$(\widehat{\mu}_n)_{n \geq 1}$ converges pointwise to a function $\zeta$ which is continuous at $0$.
$(\mu_n)_{n \geq 1}$ converges in the narrow sense to a probability measure $\mu$.

If these equivalent statements hold, then $\widehat{\mu} = \zeta$.

In this theorem the continuity of $\zeta$ is needed, as the following example shows: take any nonnegative, continuous, compactly supported function $f \colon \R^d \to \R$ and consider the measures $\mu_n$ with densities \begin{equation*} f_n(x) := \frac{1}{n^d} f \big( \frac{x}{n} \big) \qquad \text{for $x \in \R^d$, $n \geq 1$.} \end{equation*} Their Fourier transforms converge pointwise everywhere: they converge to $0$ everywhere except at $\xi=0$, where they are constantly equal to $1$. However, one can easily see that the sequence $(\mu_n)_{n \geq 1}$ does not converge in the narrow sense (though it does converge to zero in the weak-$*$ sense of measures). Of course, the pointwise limit of $\widehat{\mu}_n$ is not continuous at $0$.

A proof of Lévy’s continuity theorem in the case $d=1$ can be found, for example, in Theorem 26.3 and Corollary 1 immediately after, in this book by Billingsley. The case for any $d$ is proved for example in section 18.6 of this book by considering the marginals of $\mu_n$ and using the $d=1$ case. The statement can also be found in these notes by Terry Tao, where it is proposed as an exercise. We give a direct proof, since the general $d$ case is not really harder than $d=1$.

Proof of Theorem 1. One implication is easy: if $\mu_n$ converges to a probability measure $\mu$ in the narrow topology, it is clear that its Fourier transform converges pointwise to $\widehat{\mu}$, since both the real and imaginary parts of $x \mapsto e^{-i x \xi}$ are in $\mathcal{C}^b(\R^d)$. Of course, $\widehat{\mu}$ is continuous, as remarked before. The other implication is the interesting one. Since $\widehat{\mu}_n$ converge pointwise to $\zeta$, the dominated convergence theorem shows that, for any $\delta > 0$, \begin{equation*} \frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu_n(\xi)) \d \xi \to \frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \zeta(\xi)) \d \xi \qquad \text{as $n \to +\infty$,} \end{equation*} where $C_\delta := [-\delta,\delta]^d$ is the cube of side $2\delta$ on $\R^d$. Since $\zeta$ is continuous at $0$, the right hand side can be made small by choosing $\delta$ appropriately, and then one can see that by choosing again $\delta$ small enough, the left hand side can be made as small as we wish, uniformly for all $n$; that is, for all $\epsilon > 0$ there exists $\delta > 0$ such that \begin{equation*} \frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu_n(\xi)) \d \xi \leq \epsilon \qquad \text{for all $n \geq 1$.} \end{equation*} Lemma 3 then shows that the sequence $(\mu_n)_{n \geq 0}$ is tight, so by Prokhorov’s theorem it must have a subsequence which converges weakly to a probability measure $\mu$. Due to the implication we proved first, it must happen that $\widehat{\mu} = \zeta$, so $\mu$ is uniquely determined by $\zeta$. In fact, this reasoning applies to any subsequence of $(\mu_n)_{n \geq 0}$, so the whole sequence must converge weakly to $\mu$. ∎

In the previous proof we used the following well-known property, which holds in any topology and in particular in the narrow topology:

Lemma 2. Let $\mu$ be a Borel probability measure on $\R^d$. Assume that a sequence of probability measures $(\mu_n)_{n \geq 1}$ has the property that any subsequence of $(\mu_n)_{n \geq 1}$ must have a further subsequence that converges narrowly to $\mu$. Then the whole sequence $(\mu_n)_{n \geq 1}$ converges narrowly to $\mu$.

The following lemma, used in a crucial way in the proof of Theorem 1, is an expression of the general principle that the regularity of the Fourier transform of a function is related to the decay of the tail of the function (and conversely, since the inverse Fourier transform has essentially the same properties). The lemma says that the tail behavior of a probability measure can be estimated by the continuity of its Fourier transform at $\xi=0$:

Lemma 3. For a Borel probability measure $\mu$ on $\R$ we have \begin{equation*} \int_{|x| \geq 2/\delta} \d \mu(x) \leq \frac{1}{\delta}\int_{-\delta}^\delta (1 - \widehat\mu(\xi)) \d \xi \qquad \text{for all $\delta > 0$.} \end{equation*} For a Borel probability measure $\mu$ on $\R^d$, calling $C_r := [-r,r]^d \subseteq \R^d$, we have \begin{equation*} \frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x) \leq \frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu(\xi)) \d \xi \qquad \text{for all $\delta > 0$.} \end{equation*}

Proof. The case $d=1$. We first write the case in dimension $1$, where the calculation is simpler and the argument is seen more clearly. A good way to show the result is to notice that, by Fubini’s theorem, \begin{equation*} \frac{1}{2\delta} \int_{-\delta}^\delta \widehat{\mu}(\xi) \d \xi = \frac{1}{2\delta} \int_{-\infty}^\infty \widehat{\psi}_\delta(x) \d \mu(x) \end{equation*} where $\psi_\delta := 𝟙_{[-\delta,\delta]}$. The left hand side is the average of the Fourier transform on $[-\delta,\delta]$; and the right hand side is not far from the integral of $\mu$ on a large set. We can calculate explicitly $\widehat{\psi}_\delta$: \begin{equation*} \widehat{\psi}_\delta(x) = \int_{-\delta}^\delta e^{-i x \xi} \d \xi = \frac{2}{x} \sin(\delta x), \qquad x \in \R. \end{equation*} Since $\sin(y)/y \leq 1$ for all $y$ we have \begin{multline*} \int_{-\infty}^\infty \left(1 - \frac{1}{2\delta} \widehat{\psi}_\delta(x) \right) \d \mu(x) \geq \int_{|x| \geq 2/\delta} \left(1 - \frac{|\sin(\delta x)|}{\delta|x|} \right) \d \mu(x) \\ \geq \int_{|x| \geq 2/\delta} \left(1 - \frac{1}{\delta |x|} \right) \d \mu(x) \geq \frac12 \int_{|x| \geq 2/\delta} \d \mu(x). \end{multline*} Hence \begin{multline*} \frac12 \int_{|x| \geq 2/\delta} \d \mu(x) \leq 1 - \frac{1}{2\delta} \int_{-\infty}^\infty \widehat{\psi}_\delta(x) \d \mu(x) \\ = 1 - \frac{1}{2\delta} \int_{-\delta}^\delta \widehat{\mu}(\xi) \d \xi = \frac{1}{2\delta} \int_{-\delta}^\delta \left( 1 - \widehat{\mu}(\xi) \right) \d \xi. \end{multline*} General proof for any dimension $d$. The result can be proved in $\R^d$ for any $d \geq 1$ with essentially the same calculation, and we write a parallel proof. Calling $C_\delta = [-\delta, \delta]^d$ we have, again by Fubini’s theorem, \begin{equation*} \frac{1}{(2\delta)^d} \int_{C_\delta} \widehat{\mu}(\xi) \d \xi = \frac{1}{(2\delta)^d} \ird \widehat{\Psi}_\delta(x) \d \mu(x), \end{equation*} where now \begin{equation*} \Psi_\delta(x) := 𝟙_{C_\delta}(x) = \prod_{i=1}^d \psi_\delta(x_i) \qquad \text{for $x = (x_1, \dots, x_d) \in \R^d$}. \end{equation*} Its Fourier transform is \begin{equation*} \widehat{\Psi}_\delta(\xi) = \prod_{i=1}^d \widehat{\psi}_\delta(\xi_i) = \prod_{i=1}^d \frac{2}{\xi_i} \sin(\delta \xi_i), \qquad \xi \in \R^d. \end{equation*} Notice that $(2\delta)^{-d} \widehat{\Psi}_\delta(x) \leq 1$ for all $x \in \R^d$, and \begin{equation*} (2\delta)^{-d} | \widehat{\Psi}_\delta(x) | \leq \frac12 \qquad \text{for all $x \in \R^d \setminus C_{2/\delta}$}, \end{equation*} since outside the cube $C_{2/\delta}$ at least one of the coordinates must be larger than $2/\delta$. Then \begin{multline*} \ird \left(1 - \frac{1}{(2\delta)^d} \widehat{\Psi}_\delta(x) \right) \d \mu(x) \geq \int_{\R^d \setminus C_{2/\delta}} \left(1 - \frac{1}{(2\delta)^d} |\widehat{\Psi}_\delta(x)| \right) \d \mu(x) \\ \geq \frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x), \end{multline*} so \begin{multline*} \frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x) \leq 1 - \frac{1}{(2\delta)^d} \ird \widehat{\Psi}_\delta(x) \d \mu(x) \\ = 1 - \frac{1}{(2\delta)^d} \int_{C_\delta} \widehat{\mu}(\xi) \d \xi = \frac{1}{(2\delta)^d} \int_{C_\delta} \left( 1 - \widehat{\mu}(\xi) \right) \d \xi. \end{multline*} ∎

A weak version of Lévy’s continuity theorem

There is also a weaker version of Lévy’s continuity theorem which is commonly found in probability books, and which can be proved in a more elementary way. For example, it is easy to prove it without using Prokhorov’s theorem:

Corollary 4 (Lévy’s continuity theorem, special case). Let $\mu$ and $\mu_n$ be Borel probability measures on $\R^d$, for integer $n \geq 1$. Then $\mu_n$ converges to $\mu$ in the narrow topology if and only if $\widehat{\mu}_n$ converges pointwise to $\widehat{\mu}$.

This is obviously a consequence of Theorem 1, since $\widehat{\mu}$ is continuous. The main difference is that we are assuming that $\widehat{\mu}_n$ converges pointwise to a function which is already known to be the Fourier transform of a probability measure, and not to any function $\zeta$.

This result can be also be proved in the following way, which is an interesting alternative proof. We will need to know the following result on weak convergence in $\R^d$ (it is not too hard to show, and we leave it as an exercise):

Exercise 5. Let $\mu$ and $\mu_n$ be Borel probability measures on $\R^d$, for integer $n \geq 1$. The following are equivalent: \begin{align*} \ird \phi \mu_n \to \ird \phi \mu \quad \text{as $n \to +\infty$,} &\quad \text{for all $\phi \in \mathcal{C}^b(\R^d)$.} \\ \ird \phi \mu_n \to \ird \phi \mu \quad \text{as $n \to +\infty$,} &\quad \text{for all $\phi \in \mathcal{C}^\infty_{\mathrm{c}}(\R^d)$.} \end{align*} That is: narrow convergence is equivalent to convergence against smooth, compactly supported test functions (as long as we already know the limit is a probability measure).

Proof of Corollary 4. As usual, if $\mu_n$ converges to $\mu$ in the narrow sense, it is clear that their Fourier transforms converge pointwise, so all we need to show is the converse implication. Take $\phi \in \mathcal{C}^\infty_{\mathrm{c}}(\R^d)$. Since its Fourier transform is a Schwartz function, we can use Fourier’s inversion theorem to see that \begin{equation*} \ird \phi \mu_n = \ird \widehat{\phi} \, \widehat{\mu_n}. \end{equation*} Since $\widehat{\phi}$ is integrable and $|\widehat{\mu_n}|$ is uniformly bounded by $1$, the dominated convergence theorem shows that $\ird \phi \mu_n \to \ird \widehat{\phi} \, \widehat{\mu} = \ird \phi \mu$ as $n \to +\infty$. ∎