Levy's continuity theorem
Here are some notes on Lévy’s continuity theorem, which is a very clean result on a type of weak convergence of measures often used in probability. The discussion below is very standard in probability books for the $d=1$ case, but the proof in $\R^d$ is slightly harder to find.
Denote by $\mathcal{C}^b(\R^d)$ the set of real continuous and bounded functions on $\R^d$. We recall that a sequence of probability measures $(\mu_n)_{n \geq 1}$ is said to converge to a probability measure $\mu$ in the narrow sense or in the narrow topology if \begin{equation*} \ird \varphi(x) \d \mu_n(x) \to \ird \varphi(x) \d \mu(x) \qquad \text{for all $\varphi \in \mathcal{C}^b(\R^d)$.} \end{equation*} (In some places this is referred to as weak convergence, but for me this conflicts with the weak convergence in Banach spaces, which is slightly different.) There is a topology on the set of probability measures on $\R^n$ (or on the set of all finite measures on $\R^n$), called the narrow topology, for which the concept of convergence is the one we just defined.
The Fourier transform, or characteristic function of a probability measure $\mu$ is given by \begin{equation*} \widehat{\mu}(\xi) := (2 \pi)^{-d/2} \ird e^{-i x \xi} \d \mu(x). \end{equation*} Of course, both of these concepts can be defined in more generality (narrow convergence makes sense for signed finite measures, and the Fourier transform makes sense for Schwartz distributions in general), but they are often restricted to probability measures in probabilistic results.
Lévy’s continuity theorem is the following:
Theorem 1. Let $\mu$ and $\mu_n$ be Borel probability measures on $\R^d$, for integer $n \geq 1$. Then $\mu_n$ converges to $\mu$ in the narrow topology if and only if $\widehat{\mu}_n$ converges pointwise to $\widehat{\mu}$.
The case $d=1$ can be found, for example, in Theorem 26.3 of this book by Billingsley. The case for any $d$ is proved in section 18.6 of this book by considering the marginals of $\mu_n$ and using the $d=1$ case. We give a direct proof, since the general $d$ case is not really harder than $d=1$.
Proof.(Proof of Theorem 1.) One implication is direct: if $\mu_n$ converges to $\mu$ in the narrow topology, it is clear that its Fourier transform converges pointwise to $\widehat{\mu}$, since both the real and imaginary parts of $x \mapsto e^{-i x \xi}$ are in $\mathcal{C}^b(\R^d)$. The other implication is a bit more subtle. Since $\widehat{\mu}_n$ converges pointwise to $\widehat{\mu}$, the dominated convergence theorem shows that, for any $\delta > 0$, \begin{equation*} \frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu_n(\xi)) \d \xi \to \frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu(\xi)) \d \xi \qquad \text{as $n \to +\infty$,} \end{equation*} where $C_\delta := [-\delta,\delta]^d$ is the cube of side $2\delta$ on $\R^d$. Since $\widehat{\mu}$ is continuous at $0$ (and we have $\widehat{\mu}(0)=1$), the right hand side can be made small by choosing $\delta$ appropriately, and then one can see that by choosing the right $\delta$, the left hand side can be made as small as we wish, uniformly for all $n$; that is, for all $\epsilon > 0$ there exists $\delta > 0$ such that \begin{equation*} \frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu_n(\xi)) \d \xi \leq \epsilon \qquad \text{for all $n \geq 1$.} \end{equation*} Lemma 3 then shows that the sequence $(\mu_n)_{n \geq 0}$ is tight, so by Prokhorov’s theorem it must have a subsequence which converges weakly to a probability measure. This probability measure must be $\mu$, due to the implication we proved first. In fact, this reasoning applies to any subsequence of $(\mu_n)_{n \geq 0}$, so the whole sequence must converge weakly to $\mu$. ∎
In the previous proof we used the following well-known property, which holds in any topology and in particular in the narrow topology:
Lemma 2. Let $\mu$ be a Borel probability measure on $\R^d$. Assume that a sequence of probability measures $(\mu_n)_{n \geq 1}$ has the property that any subsequence of $(\mu_n)_{n \geq 1}$ must have a further subsequence that converges narrowly to $\mu$. Then the whole sequence $(\mu_n)_{n \geq 1}$ converges narrowly to $\mu$.
The following lemma, used in the proof of Theorem 1, is an expression of the general principle that the regularity of the Fourier transform of a function is related to the decay of the tail of the function (and conversely, since the inverse Fourier transform has essentially the same properties). The lemma says that the tail behavior of a probability measure can be estimated by the continuity of its Fourier transform at $\xi=0$:
Lemma 3. For a Borel probability measure $\mu$ on $\R$ we have \begin{equation*} \int_{|x| \geq 2/\delta} \d \mu(x) \leq \frac{1}{\delta}\int_{-\delta}^\delta (1 - \widehat\mu(\xi)) \d \xi \qquad \text{for all $\delta > 0$.} \end{equation*} For a Borel probability measure $\mu$ on $\R^d$, calling $C_r := [-r,r]^d \subseteq \R^d$, we have \begin{equation*} \frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x) \leq \frac{1}{(2\delta)^d}\int_{C_\delta} (1 - \widehat\mu(\xi)) \d \xi \qquad \text{for all $\delta > 0$.} \end{equation*}
Proof. The case $d=1$. We first write the case in dimension $1$, where the calculation is simpler and the argument is seen more clearly. A good way to show the result is to notice that, by Fubini’s theorem, \begin{equation*} \frac{1}{2\delta} \int_{-\delta}^\delta \widehat{\mu}(\xi) \d \xi = \frac{1}{2\delta} \int_{-\infty}^\infty \widehat{\psi}_\delta(x) \d \mu(x) \end{equation*} where $\psi_\delta := 𝟙_{[-\delta,\delta]}$. The left hand side is the average of the Fourier transform on $[-\delta,\delta]$; and the right hand side is not far from the integral of $\mu$ on a large set. We can calculate explicitly $\widehat{\psi}_\delta$: \begin{equation*} \widehat{\psi}_\delta(x) = \int_{-\delta}^\delta e^{-i x \xi} \d \xi = \frac{2}{x} \sin(\delta x), \qquad x \in \R. \end{equation*} Since $\sin(y)/y \leq 1$ for all $y$ we have \begin{multline*} \int_{-\infty}^\infty \left(1 - \frac{1}{2\delta} \widehat{\psi}_\delta(x) \right) \d \mu(x) \geq \int_{|x| \geq 2/\delta} \left(1 - \frac{|\sin(\delta x)|}{\delta|x|} \right) \d \mu(x) \\ \geq \int_{|x| \geq 2/\delta} \left(1 - \frac{1}{\delta |x|} \right) \d \mu(x) \geq \frac12 \int_{|x| \geq 2/\delta} \d \mu(x). \end{multline*} Hence \begin{multline*} \frac12 \int_{|x| \geq 2/\delta} \d \mu(x) \leq 1 - \frac{1}{2\delta} \int_{-\infty}^\infty \widehat{\psi}_\delta(x) \d \mu(x) \\ = 1 - \frac{1}{2\delta} \int_{-\delta}^\delta \widehat{\mu}(\xi) \d \xi = \frac{1}{2\delta} \int_{-\delta}^\delta \left( 1 - \widehat{\mu}(\xi) \right) \d \xi. \end{multline*} General proof for any dimension $d$. The result can be proved in $\R^d$ for any $d \geq 1$ with essentially the same calculation, and we write a parallel proof. Calling $C_\delta = [-\delta, \delta]^d$ we have, again by Fubini’s theorem, \begin{equation*} \frac{1}{(2\delta)^d} \int_{C_\delta} \widehat{\mu}(\xi) \d \xi = \frac{1}{(2\delta)^d} \ird \widehat{\Psi}_\delta(x) \d \mu(x), \end{equation*} where now \begin{equation*} \Psi_\delta(x) := 𝟙_{C_\delta}(x) = \prod_{i=1}^d \psi_\delta(x_i) \qquad \text{for $x = (x_1, \dots, x_d) \in \R^d$}. \end{equation*} Its Fourier transform is \begin{equation*} \widehat{\Psi}_\delta(\xi) = \prod_{i=1}^d \widehat{\psi}_\delta(\xi_i) = \prod_{i=1}^d \frac{2}{\xi_i} \sin(\delta \xi_i), \qquad \xi \in \R^d. \end{equation*} Notice that $(2\delta)^{-d} \widehat{\Psi}_\delta(x) \leq 1$ for all $x \in \R^d$, and \begin{equation*} (2\delta)^{-d} | \widehat{\Psi}_\delta(x) | \leq \frac12 \qquad \text{for all $x \in \R^d \setminus C_{2/\delta}$}, \end{equation*} since outside the cube $C_{2/\delta}$ at least one of the coordinates must be larger than $2/\delta$. Then \begin{multline*} \ird \left(1 - \frac{1}{(2\delta)^d} \widehat{\Psi}_\delta(x) \right) \d \mu(x) \geq \int_{\R^d \setminus C_{2/\delta}} \left(1 - \frac{1}{(2\delta)^d} |\widehat{\Psi}_\delta(x)| \right) \d \mu(x) \\ \geq \frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x), \end{multline*} so \begin{multline*} \frac12 \int_{\R^d \setminus C_{2/\delta}} \d \mu(x) \leq 1 - \frac{1}{(2\delta)^d} \ird \widehat{\Psi}_\delta(x) \d \mu(x) \\ = 1 - \frac{1}{(2\delta)^d} \int_{C_\delta} \widehat{\mu}(\xi) \d \xi = \frac{1}{(2\delta)^d} \int_{C_\delta} \left( 1 - \widehat{\mu}(\xi) \right) \d \xi. \end{multline*} ∎
In the continuity theorem 1 it is necessary to assume that $\mu$ is a probability, as can be seen by taking any nonnegative, continuous, compactly supported function $f \: \R^d \to \R$ and considering the measures $\mu_n$ with densities \begin{equation*} f_n(x) := \frac{1}{n^d} f \big( \frac{x}{n} \big) \qquad \text{for $x \in \R^d$, $n \geq 1$.} \end{equation*} Their Fourier transforms converge pointwise to $0$ everywhere except at $\xi=0$, where they are constantly equal to $1$. However one can easily see that the sequence $(\mu_n)_{n \geq 1}$ does not converge in the narrow sense (though it does converge to zero in the weak-$*$ sense of measures). Observe that the pointwise limit of $\widehat{\mu}_n$ is not continuous at $0$. There is another version of Lévy’s continuity theorem that says this is the only way this can fail. This version is sometimes more useful to check whether a sequence of probability measures converges weakly to a probability measure, assuming that we only have information about its Fourier transform:
Theorem 4. Let
$(\mu_n)_{n \geq 1}$ be a sequence of probability measures on
$\R^d$. The following are equivalent:
If these equivalent statements hold, then $\widehat{\mu} = \zeta$.
Proof. The proof is essentially a repetition of the same ideas which led to Theorem 1. If the second statement holds then from Theorem 1 we already know that $\widehat{\mu}_n \to \widehat{\mu}$ pointwise, so the first statement holds since $\hat{\mu}$ must be continuous. Conversely, if the first statement holds, then the same proof of Theorem 1 shows that the sequence $(\mu_n)_{n \geq 1}$ is tight, since in that part of the proof we only used the continuity of the pointwise limit of $(\widehat{\mu})_n$ (and notice that, since the $\mu_n$ are probabilities, $\widehat{\mu}_n(0)=1$ and hence $\widehat{\mu}(0)=1$). Again by Prokhorov’s theorem, $\mu_n$ has a subsequence which converges in the narrow sense to some measure $\mu$. This $\mu$ must then be nonnegative, and then $\mu$ must in fact be a probability measure since $\ird \d \mu = \lim_{n \to +\infty} \ird \d \mu_n = 1$, by definition of narrow convergence. Of course, then $\widehat{\mu} = \zeta$ due to the implication $2 \Rightarrow 1$ we just proved. In fact, this reasoning holds for all subsequences of $(\mu_n)_{n \geq 1}$, so we conclude the whole sequence must converge in the narrow sense to the only measure $\mu$ such that $\widehat{\mu} = \zeta$, which is in addition a probability measure. ∎