Let (S, F) be a measurable space and let P and Q be transition proba- bilities from S into S. The composition of P and Q, denoted by P Q, is the transition probability defined by

(1)

A. S . N O W A K (Wroc law)

A GENERALIZATION OF UENO’S INEQUALITY

FOR n-STEP TRANSITION PROBABILITIES

Abstract. We provide a generalization of Ueno’s inequality for n-step transition probabilities of Markov chains in a general state space. Our re- sult is relevant to the study of adaptive control problems and approxima- tion problems in the theory of discrete-time Markov decision processes and stochastic games.

Let (S, F) be a measurable space and let P and Q be transition proba- bilities from S into S. The composition of P and Q, denoted by P Q, is the transition probability defined by

P Q(s, B) =

\

S

Q(z, B) P (s, dz),

where s ∈ S, B ∈ F. For any integer n ≥ 2, we write Q ⁿ to denote the n-step transition probability QQ ⁿ⁻¹ from S into S, induced by Q ¹ = Q.

By k · k, we denote the total variation norm in the vector space of all finite signed measures on (S, F). Recall that if µ 1 and µ 2 are probability measures on (S, F), then

kµ 1 − µ 2 k = 2 sup

B∈F

|µ 1 (B) − µ 2 (B)|.

In the sequel, we prove the following result.

Theorem . Let P and Q be transition probabilities from S into S and let

ε = sup

s∈S

kP (s, ·) − Q(s, ·)k.

1991 Mathematics Subject Classification: Primary 60J10, 60J35; Secondary 93C40, 93E20.

Key words and phrases : Markov chains, transition probabilities, adaptive control, stochastic control.

[295]

(2)

Then for s, z ∈ S and n ≥ 1 we have

(1) kP ⁿ (s, ·) − Q ⁿ (z, ·)k ≤ ε (1 + β + . . . + β ⁿ⁻¹ ) + 2β ⁿ , where

(2) β = ¹ ₂ sup _x,y∈S kP (x, ·) − P (y, ·)k.

Remark 1. If ε = 0, then (1) is exactly Ueno’s inequality [9].

Corollary 1. If β < 1, then (1) implies that for n sufficiently large we have

kP ⁿ (s, ·) − Q ⁿ (z, ·)k ≤ 2ε 1 − β for each s, z ∈ S.

Suppose that S is the state space for Markov chains having transition probabilities P and Q respectively. If there exists a probability measure π P

on (S, F) such that sup

s∈S

kP ⁿ (s, ·) − π P (·)k → 0 as n → ∞,

at a geometric rate, then the Markov chain with transition probability P is called uniformly ergodic and π P is the unique invariant probability measure for P .

Corollary 2. Let π P and π Q be the invariant probability measures for P and Q respectively. Assume that the Markov chains with transition prob- abilities P and Q are uniformly ergodic. If β < 1, then

kπ P − π Q k ≤ ε 1 − β .

It is well known that the Markov chain with transition probability T is uniformly ergodic if and only if there exist a constant c ∈ (0, 1) and a positive integer m such that

(3) kT ^m (s, ·) − T ^m (z, ·)k ≤ 2c for every s, z ∈ S. For a proof see, e.g., [2].

Put P = T ^m and fix a transition probability Q. Define ε = sup

s∈S

kT ^m (s, ·) − Q(s, ·)k.

Assume that (3) holds and consider β defined by (2). Then β < 1, and using Corollary 1, we infer that for n sufficiently large, we have

kQ ⁿ (s, ·)−Q ⁿ (z, ·)k ≤ kQ ⁿ (s, ·)−T ^mn (s, ·)k+kT ^mn (s, ·)−Q ⁿ (z, ·)k ≤ 4ε

1 − β .

This enables us to state the following result.

(3)

Corollary 3. If (3) holds and 2ε/(1 − β) < 1, then the Markov chain with transition probability Q is also uniformly ergodic. Moreover ,

(4) kπ T − π Q k ≤ ε

1 − β

where π T (π Q ) is the unique invariant probability measure for the transition probability T (Q).

Remark 2. Our main result and Corollaries 1–3 may have applications to approximation problems or adaptive control problems as studied in [3], [5], [6], [7] and [8]. A result closely related to Corollary 2 was proved by Stettner in [8], but our inequality (5) has a more elementary form. Also, our proof is quite elementary while the method of proof in [8] is based on the theory of bounded transition operators considered in [4]. However, Stettner’s proof [8] can be used for studying some uniform convergence problems of n-step transition probabilities in different norms on the state space [6].

Proof of Theorem. We proceed by induction on n. It is easy to see that (1) holds for n = 1. Suppose it holds for a positive integer n. Note that (5) kP ⁿ⁺¹ (s, ·) − Q ⁿ⁺¹ (z, ·)k

= kP ⁿ P (s, ·) − Q ⁿ Q(z, ·)k

≤ kP ⁿ P (s, ·) − Q ⁿ P (z, ·)k + kQ ⁿ P (z, ·) − Q ⁿ Q(z, ·)k

≤ kP ⁿ P (s, ·) − Q ⁿ P (z, ·)k + ε.

Moreover, we have

(6) kP ⁿ P (s, ·) − Q ⁿ P (z, ·)k = 2 sup

B∈F

|L(B)|, where

L(B) =

\

S

P (x, B) µ(s, z)(dx) for any B ∈ F and µ(s, z)(·) = P ⁿ (s, ·) − Q ⁿ (z, ·).

Define

ϕ(x) = P (x, B) − inf

y∈S P (y, B).

Note that ϕ ≥ 0 on S and L(B) =

\

S

ϕ(x) µ(s, z)(dx).

Fix B ∈ F. Without loss of generality, we can assume that |L(B)| = L(B) (otherwise, use −µ(s, z)(dx) instead of µ(s, z)(dx)). By the Hahn decomposition theorem [1], there exists a set D ∈ F such that

µ(s, z)(E) ≥ 0 for all E ∈ F, E ⊂ D,

µ(s, z)(E) ≤ 0 for all E ∈ F, E ⊂ S \ D.

(4)

Note that

|L(B)| = L(B) =

\

D

ϕ(x) µ(s, z)(dx) +

\

S\D

ϕ(x) µ(s, z)(dx)

≤

\

D

ϕ(x) µ(s, z)(dx) ≤ µ(s, z)(D) sup

x∈S

ϕ(x)

≤ ¹ ₂ µ(s, z)(D) sup

x,y∈S

2|P (x, B) − P (y, B)|.

Hence,

(7) L(B) ≤ µ(s, z)(D) · ¹ ₂ sup

x,y∈S

kP (x, ·) − P (y, ·)k = µ(s, z)(D) · β.

But

µ(s, z)(D) = P ⁿ (s, D) − Q ⁿ (z, D) ≤ ¹ ₂ 2 sup

F ∈F

|P ⁿ (s, F ) − Q ⁿ (z, F )|

= ¹ ₂ kP ⁿ (s, ·) − Q ⁿ (z, ·)k.

This and (7) imply that

(8) |L(B)| = L(B) ≤ ¹ ₂ kP ⁿ (s, ·) − Q ⁿ (z, ·)k · β.

By (6) and (8) we obtain

kP ⁿ P (s, ·) − Q ⁿ P (z, ·)k ≤ kP ⁿ (s, ·) − Q ⁿ (z, ·)k · β.

Applying this inequality, (5) and our induction hypothesis we finally get kP ⁿ⁺¹ (s, ·) − Q ⁿ⁺¹ (z, ·)k ≤ ε + kP ⁿ (s, ·) − Q ⁿ (z, ·)k · β

≤ ε + β(ε + εβ + . . . + εβ ⁿ⁻¹ + 2β ⁿ )

= ε(1 + β + . . . + β ⁿ ) + 2β ⁿ⁺¹ , which we wanted to prove.

References

[1] R. B. A s h, Real Analysis and Probability, Academic Press, New York, 1972.

[2] J. P. G e o r g i n, Contrˆ ole de chaˆınes de Markov sur des espaces arbitraires, Ann.

Inst. H. Poincar´e S´er. B 14 (1978), 255–277.

[3] O. H e r n a n d e z - L e r m a, Adaptive Markov Control Processes, Springer, New York, 1989.

[4] N. W. K a r t a s h o v, Criteria for uniform ergodicity and strong stability of Markov chains in general state space , Probab. Theory Math. Statist. 30 (1984), 65–81.

[5] G. B. D i M a s i and L. S t e t t n e r, Bayesian ergodic adaptive control of discrete time Markov processes , Stochastics and Stochastics Reports 54 (1995), 301–316.

[6] A. S. N o w a k and E. A l t m a n, ε-Nash equilibria in stochastic games with uncountable

state space and unbounded cost, Technical Report, Institute of Mathematics, Wroc law

University of Technology, 1998.

(5)

[7] W. J. R u n g g a l d i e r and L. S t e t t n e r, Approximations of Discrete Time Partially Observed Control Problems , Appl. Math. Monographs 6, C.N.R., Pisa, 1994.

[8] L. S t e t t n e r, On nearly self-optimizing strategies for a discrete-time uniformly er- godic adaptive model, Appl. Math. Optim. 27 (1993), 161–177.

[9] T. U e n o, Some limit theorems for temporally discrete Markov processes, J. Fac. Sci.

Univ. Tokyo 7 (1957), 449–462.

Andrzej S. Nowak Institute of Mathematics

Wroc law University of Technology Wybrze˙ze Wyspia´ nskiego 27 50-370 Wroc law, Poland E-mail: nowak@im.pwr.wroc.pl

Received on 9.12.1996;

revised version on 15.12.1997

Let (S, F) be a measurable space and let P and Q be transition proba- bilities from S into S. The composition of P and Q, denoted by P Q, is the transition probability defined by

A. S . N O W A K (Wroc law)

A GENERALIZATION OF UENO’S INEQUALITY

FOR n-STEP TRANSITION PROBABILITIES

Let (S, F) be a measurable space and let P and Q be transition proba- bilities from S into S. The composition of P and Q, denoted by P Q, is the transition probability defined by

P Q(s, B) =

S

Q(z, B) P (s, dz),

where s ∈ S, B ∈ F. For any integer n ≥ 2, we write Q n to denote the n-step transition probability QQ n−1 from S into S, induced by Q 1 = Q.

By k · k, we denote the total variation norm in the vector space of all finite signed measures on (S, F). Recall that if µ 1 and µ 2 are probability measures on (S, F), then

kµ 1 − µ 2 k = 2 sup

B∈F

|µ 1 (B) − µ 2 (B)|.

In the sequel, we prove the following result.

Theorem . Let P and Q be transition probabilities from S into S and let

ε = sup

s∈S

kP (s, ·) − Q(s, ·)k.

1991 Mathematics Subject Classification: Primary 60J10, 60J35; Secondary 93C40, 93E20.

Key words and phrases : Markov chains, transition probabilities, adaptive control, stochastic control.

[295]

Then for s, z ∈ S and n ≥ 1 we have

(1) kP n (s, ·) − Q n (z, ·)k ≤ ε (1 + β + . . . + β n−1 ) + 2β n , where

(2) β = 1 2 sup x,y∈S kP (x, ·) − P (y, ·)k.

Remark 1. If ε = 0, then (1) is exactly Ueno’s inequality [9].

Corollary 1. If β < 1, then (1) implies that for n sufficiently large we have

kP n (s, ·) − Q n (z, ·)k ≤ 2ε 1 − β for each s, z ∈ S.

Suppose that S is the state space for Markov chains having transition probabilities P and Q respectively. If there exists a probability measure π P

on (S, F) such that sup

s∈S

kP n (s, ·) − π P (·)k → 0 as n → ∞,

at a geometric rate, then the Markov chain with transition probability P is called uniformly ergodic and π P is the unique invariant probability measure for P .

Corollary 2. Let π P and π Q be the invariant probability measures for P and Q respectively. Assume that the Markov chains with transition prob- abilities P and Q are uniformly ergodic. If β < 1, then

kπ P − π Q k ≤ ε 1 − β .

It is well known that the Markov chain with transition probability T is uniformly ergodic if and only if there exist a constant c ∈ (0, 1) and a positive integer m such that

(3) kT m (s, ·) − T m (z, ·)k ≤ 2c for every s, z ∈ S. For a proof see, e.g., [2].

Put P = T m and fix a transition probability Q. Define ε = sup

s∈S

kT m (s, ·) − Q(s, ·)k.

Assume that (3) holds and consider β defined by (2). Then β < 1, and using Corollary 1, we infer that for n sufficiently large, we have

kQ n (s, ·)−Q n (z, ·)k ≤ kQ n (s, ·)−T mn (s, ·)k+kT mn (s, ·)−Q n (z, ·)k ≤ 4ε

1 − β .

This enables us to state the following result.

Corollary 3. If (3) holds and 2ε/(1 − β) < 1, then the Markov chain with transition probability Q is also uniformly ergodic. Moreover ,

(4) kπ T − π Q k ≤ ε

1 − β

where π T (π Q ) is the unique invariant probability measure for the transition probability T (Q).

Proof of Theorem. We proceed by induction on n. It is easy to see that (1) holds for n = 1. Suppose it holds for a positive integer n. Note that (5) kP n+1 (s, ·) − Q n+1 (z, ·)k

= kP n P (s, ·) − Q n Q(z, ·)k

≤ kP n P (s, ·) − Q n P (z, ·)k + kQ n P (z, ·) − Q n Q(z, ·)k

≤ kP n P (s, ·) − Q n P (z, ·)k + ε.

Moreover, we have

(6) kP n P (s, ·) − Q n P (z, ·)k = 2 sup

B∈F

|L(B)|, where

L(B) =

S

P (x, B) µ(s, z)(dx) for any B ∈ F and µ(s, z)(·) = P n (s, ·) − Q n (z, ·).

Define

ϕ(x) = P (x, B) − inf

y∈S P (y, B).

Note that ϕ ≥ 0 on S and L(B) =

S

ϕ(x) µ(s, z)(dx).

Fix B ∈ F. Without loss of generality, we can assume that |L(B)| = L(B) (otherwise, use −µ(s, z)(dx) instead of µ(s, z)(dx)). By the Hahn decomposition theorem [1], there exists a set D ∈ F such that

µ(s, z)(E) ≥ 0 for all E ∈ F, E ⊂ D,

µ(s, z)(E) ≤ 0 for all E ∈ F, E ⊂ S \ D.

Note that

|L(B)| = L(B) =

D

ϕ(x) µ(s, z)(dx) +

S\D

ϕ(x) µ(s, z)(dx)

≤

D

ϕ(x) µ(s, z)(dx) ≤ µ(s, z)(D) sup

x∈S

ϕ(x)

≤ 1 2 µ(s, z)(D) sup

x,y∈S

where s ∈ S, B ∈ F. For any integer n ≥ 2, we write Q ⁿ to denote the n-step transition probability QQ ⁿ⁻¹ from S into S, induced by Q ¹ = Q.

(1) kP ⁿ (s, ·) − Q ⁿ (z, ·)k ≤ ε (1 + β + . . . + β ⁿ⁻¹ ) + 2β ⁿ , where

(2) β = ¹ ₂ sup _x,y∈S kP (x, ·) − P (y, ·)k.

kP ⁿ (s, ·) − Q ⁿ (z, ·)k ≤ 2ε 1 − β for each s, z ∈ S.

kP ⁿ (s, ·) − π P (·)k → 0 as n → ∞,

(3) kT ^m (s, ·) − T ^m (z, ·)k ≤ 2c for every s, z ∈ S. For a proof see, e.g., [2].

Put P = T ^m and fix a transition probability Q. Define ε = sup

kT ^m (s, ·) − Q(s, ·)k.

kQ ⁿ (s, ·)−Q ⁿ (z, ·)k ≤ kQ ⁿ (s, ·)−T ^mn (s, ·)k+kT ^mn (s, ·)−Q ⁿ (z, ·)k ≤ 4ε

Proof of Theorem. We proceed by induction on n. It is easy to see that (1) holds for n = 1. Suppose it holds for a positive integer n. Note that (5) kP ⁿ⁺¹ (s, ·) − Q ⁿ⁺¹ (z, ·)k

= kP ⁿ P (s, ·) − Q ⁿ Q(z, ·)k

≤ kP ⁿ P (s, ·) − Q ⁿ P (z, ·)k + kQ ⁿ P (z, ·) − Q ⁿ Q(z, ·)k

≤ kP ⁿ P (s, ·) − Q ⁿ P (z, ·)k + ε.

(6) kP ⁿ P (s, ·) − Q ⁿ P (z, ·)k = 2 sup

P (x, B) µ(s, z)(dx) for any B ∈ F and µ(s, z)(·) = P ⁿ (s, ·) − Q ⁿ (z, ·).

≤ ¹ ₂ µ(s, z)(D) sup

(7) L(B) ≤ µ(s, z)(D) · ¹ ₂ sup

µ(s, z)(D) = P ⁿ (s, D) − Q ⁿ (z, D) ≤ ¹ ₂ 2 sup

|P ⁿ (s, F ) − Q ⁿ (z, F )|

= ¹ ₂ kP ⁿ (s, ·) − Q ⁿ (z, ·)k.

(8) |L(B)| = L(B) ≤ ¹ ₂ kP ⁿ (s, ·) − Q ⁿ (z, ·)k · β.

kP ⁿ P (s, ·) − Q ⁿ P (z, ·)k ≤ kP ⁿ (s, ·) − Q ⁿ (z, ·)k · β.

Applying this inequality, (5) and our induction hypothesis we finally get kP ⁿ⁺¹ (s, ·) − Q ⁿ⁺¹ (z, ·)k ≤ ε + kP ⁿ (s, ·) − Q ⁿ (z, ·)k · β

≤ ε + β(ε + εβ + . . . + εβ ⁿ⁻¹ + 2β ⁿ )

= ε(1 + β + . . . + β ⁿ ) + 2β ⁿ⁺¹ , which we wanted to prove.