A Regeneration Proof of the Central Limit Theorem for Uniformly Ergodic Markov Chains
Witold Bednorz
†Krzysztof Latuszynski
‡† Institute of Mathematics, Warsaw University. 00-927 Warsaw. Poland.
‡ Department of Mathematical Statistics. Institute of Econometrics. Warsaw School of Economics. 02-554 Warsaw. Poland.
Keywords: Markov Chain, CLT, Uniform Ergodicity, Regeneration.
AMS: 60J05
Abstract
Central limit theorems for functionals of general state space Markov chains are of crucial importance in sensible implementation of Markov chain Monte Carlo algorithms as well as of vital theoretical interest. Different approaches to proving this type of results under diverse assumptions led to a large variety of CTL versions. However due to the recent development of the regeneration theory of Markov chains, many classical CLTs can be reproved using this in- tuitive probabilistic approach, avoiding technicalities of original proofs. In this paper we provide a regeneration proof of a CLT for functionals of uniformly ergodic Markov chains, thus solve the open problem posed in [8]. Moreover we discuss the difference between one-step and multiple-step small set condition.
1. Introduction
Let (Xn)n≥0 be a time homogeneous Markov chain on a measurable space (X, B(X)) with initial distribution π0, transition kernel P and a unique stationary distribution π. Let g be a real valued Borel function on X and define
¯
gn = n1Pn−1
i=0 g(Xi) and Eπg = R
Xg(x)π(dx). We say that a √
n−CLT holds for (Xn)n≥0and g, if
√n(¯gn− Eπg)→ N (0, σd g2), as n → ∞, (1)
where σg2:= varπg(X0) + 2P∞
n=1covπ{g(X0), g(Xn)} < ∞. Central limit theo- rems of this type are crucial for assessing the quality of Markov chain Monte Carlo estimation (see e.g. [5]) and are also of independent theoretical interest.
Thus a large body of work on CLTs for functionals of Markov chains exists and a variety of results have been established under different assumptions and with different approaches to proofs (see [4] for a review). We state two classical CLT versions for geometrically ergodic and uniformly ergodic Markov chains.
Let kµ1(·) − µ2(·)ktv := 2 supA∈B|µ1(A) − µ2(A)| be the well known total variation distance between probability measures µ1 and µ2. We say that a Markov chain (Xn)n≥0with transition kernel P and stationary distribution π is geometrically ergodic, if kPn(x, ·) − π(·)ktv ≤ M (x)ρn, for some ρ < 1
and M (x) < ∞ π−almost everywhere. We say it is uniformly ergodic, if kPn(x, ·) − π(·)ktv ≤ M ρn, for some ρ < 1 and M < ∞.
Theorem 1.1. If a Markov chain (Xn)n≥0 with stationary distribution π is geometrically ergodic and π(|g|2+δ) < ∞ for some δ > 0, then a √
n−CLT holds for (Xn)n≥0 and g.
Theorem 1.2. If a Markov chain (Xn)n≥0 with stationary distribution π is uniformly ergodic and π(g2) < ∞, then a√
n−CLT holds for (Xn)n≥0and g.
Theorem 1.1 due to [3] has been reproved in [8] using the intuitive rege- neration approach and avoiding technicalities of the original proof (however see our Section 4). Roberts and Rosenthal posed an open problem, whether Theo- rem 1.2 due to [2] can also be reproved using direct regeneration arguments.
The aim of this paper is to provide a regeneration proof of Theorem 1.2.
The outline of the paper is as follows. In Section 2 we describe the regeneration construction. In Section 3 we prove Theorem 1.2 and we discuss some of the difficulties of the regeneration approach in Section 4.
2. Small Sets and the Split Chain
The regeneration construction discovered independently by [7] and [1] is now a well established technique. A systematic development of the theory can be found in e.g. [6] which we exploit in this section.
Definition 2.1 (Small Set). A set C ∈ B(X) is νm−small, if there exist m > 0, ε > 0, and a nontrivial probability measure νm on B(X), such that for all x ∈ C,
Pm(x, ·) ≥ ενm(·). (2)
Since ergodic Markov chains are π−irreducible, Theorem 5.2.2 of [6]
implies that for an ergodic chain a small set C with π(C) > 0 always exists.
A small set C with π(C) > 0 allows for constructing the split chain for (Xn)n≥0 which is the central object of the approach (see Section 17.3 of [6]
for a detailed description). Let (Xnm)n≥0 be the m−skeleton of (Xn)n≥0, i.e.
a Markov chain evolving according to the m−step transition kernel Pm. The small set condition allows to write Pmas a mixture of two distributions:
Pm(x, ·) = εIC(x)νm(·) + [1 − εIC(x)]R(x, ·), (3) where R(x, ·) = [1 − εIC(x)]−1[P (x, ·) − εIC(x)νm(·)]. Now let (Xnm, Yn)n≥0 be the split chain of the m−skeleton i.e. let the random variable Yn ∈ {0, 1}
be the level of the split m−skeleton at time nm. The split chain (Xnm, Yn)n≥0 is a Markov chain that obeys the following transition rule ˇP .
P (Yˇ n = 1, X(n+1)m∈ dy|Yn−1, Xnm= x) = εIC(x)νm(dy) (4) P (Yˇ n = 0, X(n+1)m∈ dy|Yn−1, Xnm= x) = (1 − εIC(x))R(x, dy), (5)
and Yn can be interpreted as a coin toss indicating whether X(n+1)m given Xnm = x should be drawn from νm(·) - with probability εIC(x) - or from R(x, ·) - with probability 1 − εIC(x).
One obtains the split chain (Xk, Yn)k≥0,n≥0 of the initial Markov chain (Xn)n≥0 by defining appropriate conditional probabilities. To this end let X0nm= {X0, . . . , Xnm−1} and Y0n= {Y0, . . . , Yn−1}.
P (Yˇ n= 1, Xnm+1∈ dx1, . . . , X(n+1)m−1∈ dxm−1, X(n+1)m ∈ dy| (6)
|Y0n, X0nm; Xnm= x) = εIC(x)νm(dy)
Pm(x, dy) P (x, dx1) · · · P (xm−1, dy), P (Yˇ n= 0, Xnm+1∈ dx1, . . . , X(n+1)m−1∈ dxm−1, X(n+1)m ∈ dy| (7)
|Y0n, X0nm; Xnm= x) =(1 − εIC(x))R(x, dy)
Pm(x, dy) P (x, dx1) · · · P (xm−1, dy).
Note that the marginal distribution of (Xk)k≥0 in the split chain is that of the underlying Markov chain with transition kernel P.
For a measure λ on (X, B(X)) let λ∗ denote the measure on X × {0, 1}
(with product σ−algebra) defined by λ∗(B × {1}) = ελ(B) and λ∗(B × {0}) = (1 − ε)λ(B). Now the crucial observation is that on the set {Yn = 1}, the pre−nm process {Xk, Yi : k ≤ nm, i ≤ n} and the post−(n + 1)m process {Xk, Yi : k ≥ (n + 1)m, i ≥ n + 1} are independent and the post−(n + 1)m process has the same distribution as {Xk, Yi : k ≥ 0, i ≥ 0} with νm∗ for the initial distribution of (X0, Y0). This leads to Theorem 2.2, but we first need some more notation. Thus let σ(n) denote entrance times of the split chain to the set C ×{1}, i.e. σ(0) = min{k ≥ 0 : Yk= 1}, and σ(n) = min{k > σ(n−1) : Yk= 1}, for n ≥ 1. Also define Zn(g) =Pm−1
k=0 g(Xnm+k) and gc= g − πg.
Theorem 2.2 (Theorem 17.3.6 of [6]). Suppose that (Xn)n≥0 is ergodic and let νm be the measure satisfying (2). If the following conditions hold
(i) Eˇνm∗
σ(0) X
n=0
Zn(|g|)
2
< ∞, (ii) Eˇνm∗ σ(0)2 < ∞, (8)
then the√
n−CLT holds for (Xn)n≥0 and g, with
σ2g= επ(C) m
( Eˇν∗m
σ(0) X
n=0
Zn(gc)
2 +2 ˇEνm∗
σ(0) X
n=0
Zn(gc)
σ(1) X
n=σ(0)+1
Zn(gc)
) .
3. A Proof
In view of Theorem 2.2 providing a regeneration proof of Theorem 1.2 amounts to establishing conditions (i) and (ii) of (8). To this end we need some additional facts about small sets for uniformly ergodic Markov chains.
Theorem 3.1. If (Xn)n≥0, a Markov chain on (X, B(X)) with stationary dis- tribution π is uniformly ergodic, then X is νm−small for some νm.
Hence for uniformly ergodic chains (2) holds for all x ∈ X. Theorem 3.1 is well known in literature, in particular it results from Theorems 5.2.1 and 5.2.4 in [6] with their ψ = π.
We start with proving (ii) of (8) which is now straightforward. Integra- ting (6) together with the fact that X is small, yields ˇP (Yn= 1|X0nm, Y0n−1; Xnm= x) = ε, thus Y0, Y1, . . . are independent Bernoulli trials and the distribution of σ(0) is geometric.
Establishing (i) of (8) is the essential part of the proof. Theorem 3.1 im- plies that for uniformly ergodic Markov chains (3) can be rewritten in operator notation as
Pm= ενm+ (1 − ε)R. (9)
The following mixture representation of π will turn out very useful.
Lemma 3.2. If (Xn)n≥0is an ergodic Markov chain with transition kernel P and (9) holds, then
π = εµ := ε
∞
X
n=0
νm(1 − ε)nRn. (10)
Proof. Since (εP∞
n=0νm(1 − ε)nRn)(X) = εP∞
n=0(1 − ε)n(νmRn)(X) = 1, the measure in question is a probability measure. It is also invariant for Pm. By (9) we obtain
∞ X
n=0
νm(1 − ε)nRn
Pmεµνm+
∞
X
n=1
νm(1 − ε)nRn=
∞
X
n=0
νm(1 − ε)nRn.
Hence by ergodicity εµ = εµPnm→ π, as n → ∞. Thus εµ = π.
Corollary 3.3. The decomposition in Lemma 3.2 implies that
(i) Eˇν∗
m
σ(0) X
n=0
I{Xnm∈A}
= ˇEν∗
m
∞ X
n=0
I{Xnm∈A}I{Y0=0,...,Yn−1=0}
= ε−1π(A)
(ii) Eˇνm∗
∞ X
n=0
f (Xnm, Xnm+1, . . . ; Yn, Yn+1, . . . )I{Y0=0,...,Yn−1=0}
=
= ε−1Eˇπ∗f (X0, X1, . . . ; Y0, Y1, . . . ).
Proof. (i) is a direct consequence of (10). To see (ii) note that Yn is a coin toss independent of {Y0, . . . , Yn−1} and Xnm, this allows for π∗ instead of π on the RHS of (ii). Moreover the evolution of {Xnm+1, Xnm+2, . . . ; Yn+1, Yn+2, . . . } depends only (and explicitly by (6) and (7)) on Xnmand Yn. Now use (i).
Our object of interest
I = Eˇνm∗
σ(0) X
n=0
Zn(|g|)
2
= ˇEν∗m
∞ X
n=0
Zn(|g|)I{σ(0)≥n}
2
= Eˇν∗
m
∞ X
n=0
Zn(|g|)2I{Y0=0,Y1=0,...,Yn−1=0}
+2 ˇEνm∗
∞ X
n=0
∞
X
k=n+1
Zn(|g|)I{σ(0)≥n}Zk(|g|)I{σ(0)≥k}
= A + B (11)
Now we can use Corollary 3.3 and then the inequality 2ab ≤ a2+ b2 to bound the term A in (11).
A = 1 ε
Eˇπ∗Z0(|g|)2= 1
εEπm−1X
k=0
|g(Xk)|2
≤m
εEπhm−1X
k=0
g2(Xk)i
≤ m2
ε πg2< ∞.
We can similarly proceed with term B.
B = 2 ˇEνm∗
∞ X
n=0
Zn(|g|)I{σ(0)≥n}
∞
X
k=1
Zn+k(|g|)I{σ(0)≥n+k}
= 2
ε Eˇπ∗
Z0(|g|)
∞
X
k=1
Zk(|g|)I{σ(0)≥k}
=2 ε
∞
X
k=1
Eˇπ∗
I{σ(0)≥k}Z0(|g|)Zk(|g|) (12) Let Ck := ˇEπ∗
I{σ(0)≥k}Z0(|g|)Zk(|g|). By Cauchy-Schwarz, Ck ≤
qEˇπ∗
I{σ(0)≥k}Z0(|g|)2q
Eˇπ∗Zk(|g|)2
=
qEˇπ∗
I{Y0=0}I{Y1=0,...,Yk−1=0}Z0(|g|)2q
Eˇπ∗Z0(|g|)2. Now observe that {Y1, . . . , Yk−1} and {X0, . . . , Xm−1} are independent. Mo- reover we drop I{Y0=0} to obtain
Ck ≤ (1 − ε)k−12 Eˇπ∗Z0(|g|)2≤ (1 − ε)k−12 m2πg2 (13) Combining (12) and (13) yields that B < ∞. This completes the proof.
4. The difference between m = 1 and m 6= 1
Assume the small set condition (2) holds and consider the split chain defined by (6) and (7). The following tours
{X(σ(n)+1)m, X(σ(n)+1)m+1, . . . , X(σ(n+1)+1)m−1}, n = 0, 1, . . .
that start whenever Xk ∼ νm are of crucial importance to the regeneration theory and are eagerly analyzed by researchers. In virtually every paper on the subject there is a claim these objects are independent identically distributed random variables. This claim is usually considered obvious and no proof is provided. However this is not true if m > 1.
In fact formulas (6) and (7) should be convincing enough, as Xmn+1, . . . , X(n+1)m given Yn= 1 and Xnm= x are linked in a way described by P (x, dx1) · · · P (xm−1, dy).
In particular consider a Markov chain on X = {a, b, c, d, e} with transition pro- babilities P (a, b) = P (a, c) = P (b, b) = P (b, d) = P (c, c) = P (c, e) = 1/2, and P (d, a) = P (e, a) = 1. Let ν4(d) = ν4(e) = 1/2 and ε = 1/8. Clearly P4(x, ·) ≥ εν4(·) for every x ∈ X, hence we established (2) with C = X. Note that for this simplistic example each tour can start with d or e. However if it starts with d or e the previous tour must have ended with b or c respectively.
This makes them dependent!
Similar examples with general state space X and C 6= X can be easily provided. Hence Theorem 2.2 is critical to providing regeneration proofs of CLTs and standard arguments that involve iid random variables are not valid.
5. Bibliography
[1] Athreya K. B., Ney P., 1978, A new approach to the limit theory of re- current Markov chains. Trans. Amer. Math. Soc., 245: 493-501.
[2] Cogburn, R., 1972, The Central Limit Theorem for Markov Processes. In Le Cam, L. E., Neyman, J. & Scott, E. L. (Eds) Proc. Sixth Ann. Berkley Symp. Math. Sttist. and Prob., 2, 458-512.
[3] Ibragimov, I. A., Linnik, Y. V., 1971, Independent and Stationary Sequen- ces of Random Variables. Wolters-Noordhof, Groningen.
[4] Jones, G. L., 2005. "On the Markov chain central limit theorem" Proba- bility Surveys 1:299-320.
[5] Jones, G. L., Haran, M., Caffo, B. S., Neath, R. (2006), "Fixed-Width Output Analysis for Markov Chain Monte Carlo," Journal of the American Statatistical Association, 101, 1537-1547.
[6] Meyn S. P., Tweedie R. L., 1993. Markov Chains and Stochastic Stability.
Springer-Verlag.
[7] Nummelin E., 1978. A splitting technique for Harris recurrent chains. Z.
Wahrscheinlichkeitstheorie und Verw. Geb., 43: 309-318.
[8] Roberts G. O., Rosenthal J. S., 2005. General state space Markov chains and MCMC algorithms. Probability Surveys 1:20-71.