A Regeneration Proof of the Central Limit Theorem for Uniformly Ergodic Markov Chains

(1)

A Regeneration Proof of the Central Limit Theorem for Uniformly Ergodic Markov Chains

Witold Bednorz

^†

Krzysztof Latuszynski

^‡

† Institute of Mathematics, Warsaw University. 00-927 Warsaw. Poland.

‡ Department of Mathematical Statistics. Institute of Econometrics. Warsaw School of Economics. 02-554 Warsaw. Poland.

Keywords: Markov Chain, CLT, Uniform Ergodicity, Regeneration.

AMS: 60J05

Abstract

Central limit theorems for functionals of general state space Markov chains are of crucial importance in sensible implementation of Markov chain Monte Carlo algorithms as well as of vital theoretical interest. Different approaches to proving this type of results under diverse assumptions led to a large variety of CTL versions. However due to the recent development of the regeneration theory of Markov chains, many classical CLTs can be reproved using this intuitive probabilistic approach, avoiding technicalities of original proofs. In this paper we provide a regeneration proof of a CLT for functionals of uniformly ergodic Markov chains, thus solve the open problem posed in [8]. Moreover we discuss the difference between one-step and multiple-step small set condition.

1. Introduction

Let (Xn)n≥0 be a time homogeneous Markov chain on a measurable space (X, B(X)) with initial distribution π0, transition kernel P and a unique stationary distribution π. Let g be a real valued Borel function on X and define

¯

g_n = _n¹Pn−1

i=0 g(X_i) and E_πg = R

Xg(x)π(dx). We say that a √

n−CLT holds for (X_n)_n≥0and g, if

√n(¯gn− Eπg)→ N (0, σ^d _g²), as n → ∞, (1)

where σ_g²:= varπg(X0) + 2P∞

n=1covπ{g(X0), g(Xn)} < ∞. Central limit theorems of this type are crucial for assessing the quality of Markov chain Monte Carlo estimation (see e.g. [5]) and are also of independent theoretical interest.

Thus a large body of work on CLTs for functionals of Markov chains exists and a variety of results have been established under different assumptions and with different approaches to proofs (see [4] for a review). We state two classical CLT versions for geometrically ergodic and uniformly ergodic Markov chains.

Let kµ₁(·) − µ₂(·)k_tv := 2 sup_A∈B|µ₁(A) − µ₂(A)| be the well known total variation distance between probability measures µ₁ and µ₂. We say that a Markov chain (Xn)n≥0with transition kernel P and stationary distribution π is geometrically ergodic, if kPⁿ(x, ·) − π(·)ktv ≤ M (x)ρⁿ, for some ρ < 1

(2)

and M (x) < ∞ π−almost everywhere. We say it is uniformly ergodic, if kPⁿ(x, ·) − π(·)ktv ≤ M ρⁿ, for some ρ < 1 and M < ∞.

Theorem 1.1. If a Markov chain (Xn)n≥0 with stationary distribution π is geometrically ergodic and π(|g|^2+δ) < ∞ for some δ > 0, then a √

n−CLT holds for (Xn)n≥0 and g.

Theorem 1.2. If a Markov chain (Xn)n≥0 with stationary distribution π is uniformly ergodic and π(g²) < ∞, then a√

n−CLT holds for (Xn)n≥0and g.

Theorem 1.1 due to [3] has been reproved in [8] using the intuitive regeneration approach and avoiding technicalities of the original proof (however see our Section 4). Roberts and Rosenthal posed an open problem, whether Theo- rem 1.2 due to [2] can also be reproved using direct regeneration arguments.

The aim of this paper is to provide a regeneration proof of Theorem 1.2.

The outline of the paper is as follows. In Section 2 we describe the regeneration construction. In Section 3 we prove Theorem 1.2 and we discuss some of the difficulties of the regeneration approach in Section 4.

2. Small Sets and the Split Chain

The regeneration construction discovered independently by [7] and [1] is now a well established technique. A systematic development of the theory can be found in e.g. [6] which we exploit in this section.

Definition 2.1 (Small Set). A set C ∈ B(X) is νm−small, if there exist m > 0, ε > 0, and a nontrivial probability measure νm on B(X), such that for all x ∈ C,

P^m(x, ·) ≥ ενm(·). (2)

Since ergodic Markov chains are π−irreducible, Theorem 5.2.2 of [6]

implies that for an ergodic chain a small set C with π(C) > 0 always exists.

A small set C with π(C) > 0 allows for constructing the split chain for (Xn)n≥0 which is the central object of the approach (see Section 17.3 of [6]

for a detailed description). Let (Xnm)n≥0 be the m−skeleton of (Xn)n≥0, i.e.

a Markov chain evolving according to the m−step transition kernel P^m. The small set condition allows to write P^mas a mixture of two distributions:

P^m(x, ·) = εI^C(x)νm(·) + [1 − εI^C(x)]R(x, ·), (3) where R(x, ·) = [1 − εIC(x)]⁻¹[P (x, ·) − εIC(x)ν_m(·)]. Now let (X_nm, Y_n)_n≥0 be the split chain of the m−skeleton i.e. let the random variable Y_n ∈ {0, 1}

be the level of the split m−skeleton at time nm. The split chain (X_nm, Y_n)_n≥0 is a Markov chain that obeys the following transition rule ˇP .

P (Yˇ n = 1, X(n+1)m∈ dy|Yn−1, Xnm= x) = εI^C(x)νm(dy) (4) P (Yˇ n = 0, X_(n+1)m∈ dy|Yn−1, Xnm= x) = (1 − εI^C(x))R(x, dy), (5)

(3)

and Yn can be interpreted as a coin toss indicating whether X(n+1)m given Xnm = x should be drawn from νm(·) - with probability εI^C(x) - or from R(x, ·) - with probability 1 − εIC(x).

One obtains the split chain (X_k, Y_n)_k≥0,n≥0 of the initial Markov chain (X_n)_n≥0 by defining appropriate conditional probabilities. To this end let X₀^nm= {X₀, . . . , X_nm−1} and Y₀ⁿ= {Y₀, . . . , Y_n−1}.

P (Yˇ _n= 1, X_nm+1∈ dx1, . . . , X_(n+1)m−1∈ dxm−1, X_(n+1)m ∈ dy| (6)

|Y₀ⁿ, X₀^nm; Xnm= x) = εI^C(x)νm(dy)

P^m(x, dy) P (x, dx1) · · · P (xm−1, dy), P (Yˇ n= 0, Xnm+1∈ dx1, . . . , X_(n+1)m−1∈ dxm−1, X_(n+1)m ∈ dy| (7)

|Y₀ⁿ, X₀^nm; Xnm= x) =(1 − εI^C(x))R(x, dy)

P^m(x, dy) P (x, dx1) · · · P (xm−1, dy).

Note that the marginal distribution of (Xk)k≥0 in the split chain is that of the underlying Markov chain with transition kernel P.

For a measure λ on (X, B(X)) let λ^∗ denote the measure on X × {0, 1}

(with product σ−algebra) defined by λ^∗(B × {1}) = ελ(B) and λ^∗(B × {0}) = (1 − ε)λ(B). Now the crucial observation is that on the set {Y_n = 1}, the pre−nm process {X_k, Y_i : k ≤ nm, i ≤ n} and the post−(n + 1)m process {X_k, Y_i : k ≥ (n + 1)m, i ≥ n + 1} are independent and the post−(n + 1)m process has the same distribution as {Xk, Yi : k ≥ 0, i ≥ 0} with ν_m^∗ for the initial distribution of (X0, Y0). This leads to Theorem 2.2, but we first need some more notation. Thus let σ(n) denote entrance times of the split chain to the set C ×{1}, i.e. σ(0) = min{k ≥ 0 : Yk= 1}, and σ(n) = min{k > σ(n−1) : Y_k= 1}, for n ≥ 1. Also define Z_n(g) =Pm−1

k=0 g(X_nm+k) and g_c= g − πg.

Theorem 2.2 (Theorem 17.3.6 of [6]). Suppose that (X_n)_n≥0 is ergodic and let ν_m be the measure satisfying (2). If the following conditions hold

(i) Eˇν_m^∗

^σ(0) X

n=0

Zn(|g|)

²

< ∞, (ii) Eˇν_m^∗ σ(0)² < ∞, (8)

then the√

n−CLT holds for (X_n)_n≥0 and g, with

σ²_g= επ(C) m

( Eˇν^∗_m

^σ(0) X

n=0

Zn(gc)

2 +2 ˇEν_m^∗

^σ(0) X

n=0

Zn(gc)

^σ(1) X

n=σ(0)+1

Zn(gc)

) .

3. A Proof

In view of Theorem 2.2 providing a regeneration proof of Theorem 1.2 amounts to establishing conditions (i) and (ii) of (8). To this end we need some additional facts about small sets for uniformly ergodic Markov chains.

(4)

Theorem 3.1. If (Xn)n≥0, a Markov chain on (X, B(X)) with stationary distribution π is uniformly ergodic, then X is νm−small for some νm.

Hence for uniformly ergodic chains (2) holds for all x ∈ X. Theorem 3.1 is well known in literature, in particular it results from Theorems 5.2.1 and 5.2.4 in [6] with their ψ = π.

We start with proving (ii) of (8) which is now straightforward. Integra- ting (6) together with the fact that X is small, yields ˇP (Yn= 1|X₀^nm, Y₀ⁿ⁻¹; Xnm= x) = ε, thus Y0, Y1, . . . are independent Bernoulli trials and the distribution of σ(0) is geometric.

Establishing (i) of (8) is the essential part of the proof. Theorem 3.1 implies that for uniformly ergodic Markov chains (3) can be rewritten in operator notation as

P^m= εν_m+ (1 − ε)R. (9)

The following mixture representation of π will turn out very useful.

Lemma 3.2. If (X_n)_n≥0is an ergodic Markov chain with transition kernel P and (9) holds, then

π = εµ := ε

∞

X

n=0

νm(1 − ε)ⁿRⁿ. (10)

Proof. Since (εP∞

n=0νm(1 − ε)ⁿRⁿ)(X) = εP∞

n=0(1 − ε)ⁿ(νmRⁿ)(X) = 1, the measure in question is a probability measure. It is also invariant for P^m. By (9) we obtain

^∞ X

n=0

νm(1 − ε)ⁿRⁿ

P^mεµνm+

∞

X

n=1

νm(1 − ε)ⁿRⁿ=

∞

X

n=0

νm(1 − ε)ⁿRⁿ.

Hence by ergodicity εµ = εµP^nm→ π, as n → ∞. Thus εµ = π.

Corollary 3.3. The decomposition in Lemma 3.2 implies that

(i) Eˇ_ν^∗

m

^σ(0) X

n=0

I{Xnm∈A}

= ˇE_ν^∗

m

^∞ X

n=0

I{Xnm∈A}I{Y0=0,...,Y_n−1=0}

= ε⁻¹π(A)

(ii) Eˇν_m^∗

^∞ X

n=0

f (Xnm, Xnm+1, . . . ; Yn, Yn+1, . . . )I{Y0=0,...,Y_n−1=0}

=

= ε⁻¹Eˇπ^∗f (X0, X1, . . . ; Y0, Y1, . . . ).

Proof. (i) is a direct consequence of (10). To see (ii) note that Y_n is a coin toss independent of {Y₀, . . . , Y_n−1} and X_nm, this allows for π^∗ instead of π on the RHS of (ii). Moreover the evolution of {Xnm+1, Xnm+2, . . . ; Yn+1, Yn+2, . . . } depends only (and explicitly by (6) and (7)) on Xnmand Yn. Now use (i).

(5)

Our object of interest

I = Eˇν_m^∗

^σ(0) X

n=0

Zn(|g|)

²

= ˇEν^∗_m

^∞ X

n=0

Zn(|g|)I{σ(0)≥n}

²

= Eˇ_ν^∗

m

^∞ X

n=0

Z_n(|g|)²I{Y0=0,Y₁=0,...,Y_n−1=0}

+2 ˇEν_m^∗

^∞ X

n=0

∞

X

k=n+1

Zn(|g|)I{σ(0)≥n}Zk(|g|)I{σ(0)≥k}

= A + B (11)

Now we can use Corollary 3.3 and then the inequality 2ab ≤ a²+ b² to bound the term A in (11).

A = 1 ε

Eˇ_π∗Z₀(|g|)²= 1

εE_π^m−1X

k=0

|g(Xk)|²

≤m

εE_πh^m−1X

k=0

g²(X_k)i

≤ m²

ε πg²< ∞.

We can similarly proceed with term B.

B = 2 ˇEν_m^∗

^∞ X

n=0

Zn(|g|)I{σ(0)≥n}

∞

X

k=1

Zn+k(|g|)I{σ(0)≥n+k}

= 2

ε Eˇπ^∗

Z0(|g|)

∞

X

k=1

Zk(|g|)I{σ(0)≥k}

=2 ε

∞

X

k=1

Eˇπ^∗

I{σ(0)≥k}Z0(|g|)Zk(|g|) (12) Let Ck := ˇEπ^∗

I{σ(0)≥k}Z0(|g|)Zk(|g|). By Cauchy-Schwarz, Ck ≤

qEˇπ^∗

I{σ(0)≥k}Z0(|g|)²q

Eˇπ^∗Zk(|g|)²

=

qEˇπ^∗

I{Y0=0}I{Y1=0,...,Yk−1=0}Z0(|g|)²q

Eˇπ^∗Z0(|g|)². Now observe that {Y₁, . . . , Y_k−1} and {X₀, . . . , X_m−1} are independent. Mo- reover we drop I{Y₀=0} to obtain

C_k ≤ (1 − ε)^k−1² Eˇ_π^∗Z₀(|g|)²≤ (1 − ε)^k−1² m²πg² (13) Combining (12) and (13) yields that B < ∞. This completes the proof.

4. The difference between m = 1 and m 6= 1

Assume the small set condition (2) holds and consider the split chain defined by (6) and (7). The following tours

{X(σ(n)+1)m, X(σ(n)+1)m+1, . . . , X(σ(n+1)+1)m−1}, n = 0, 1, . . .

(6)

that start whenever Xk ∼ νm are of crucial importance to the regeneration theory and are eagerly analyzed by researchers. In virtually every paper on the subject there is a claim these objects are independent identically distributed random variables. This claim is usually considered obvious and no proof is provided. However this is not true if m > 1.

In fact formulas (6) and (7) should be convincing enough, as X_mn+1, . . . , X_(n+1)m given Y_n= 1 and X_nm= x are linked in a way described by P (x, dx₁) · · · P (x_m−1, dy).

In particular consider a Markov chain on X = {a, b, c, d, e} with transition probabilities P (a, b) = P (a, c) = P (b, b) = P (b, d) = P (c, c) = P (c, e) = 1/2, and P (d, a) = P (e, a) = 1. Let ν4(d) = ν4(e) = 1/2 and ε = 1/8. Clearly P⁴(x, ·) ≥ εν4(·) for every x ∈ X, hence we established (2) with C = X. Note that for this simplistic example each tour can start with d or e. However if it starts with d or e the previous tour must have ended with b or c respectively.

This makes them dependent!

Similar examples with general state space X and C 6= X can be easily provided. Hence Theorem 2.2 is critical to providing regeneration proofs of CLTs and standard arguments that involve iid random variables are not valid.

5. Bibliography

[1] Athreya K. B., Ney P., 1978, A new approach to the limit theory of recurrent Markov chains. Trans. Amer. Math. Soc., 245: 493-501.

[2] Cogburn, R., 1972, The Central Limit Theorem for Markov Processes. In Le Cam, L. E., Neyman, J. & Scott, E. L. (Eds) Proc. Sixth Ann. Berkley Symp. Math. Sttist. and Prob., 2, 458-512.

[3] Ibragimov, I. A., Linnik, Y. V., 1971, Independent and Stationary Sequen- ces of Random Variables. Wolters-Noordhof, Groningen.

[4] Jones, G. L., 2005. "On the Markov chain central limit theorem" Proba- bility Surveys 1:299-320.

[5] Jones, G. L., Haran, M., Caffo, B. S., Neath, R. (2006), "Fixed-Width Output Analysis for Markov Chain Monte Carlo," Journal of the American Statatistical Association, 101, 1537-1547.

[6] Meyn S. P., Tweedie R. L., 1993. Markov Chains and Stochastic Stability.

Springer-Verlag.

[7] Nummelin E., 1978. A splitting technique for Harris recurrent chains. Z.

Wahrscheinlichkeitstheorie und Verw. Geb., 43: 309-318.

[8] Roberts G. O., Rosenthal J. S., 2005. General state space Markov chains and MCMC algorithms. Probability Surveys 1:20-71.