• Nie Znaleziono Wyników

Let (S, F) be a measurable space and let P and Q be transition proba- bilities from S into S. The composition of P and Q, denoted by P Q, is the transition probability defined by

N/A
N/A
Protected

Academic year: 2021

Share "Let (S, F) be a measurable space and let P and Q be transition proba- bilities from S into S. The composition of P and Q, denoted by P Q, is the transition probability defined by"

Copied!
5
0
0

Pełen tekst

(1)

A. S . N O W A K (Wroc law)

A GENERALIZATION OF UENO’S INEQUALITY

FOR n-STEP TRANSITION PROBABILITIES

Abstract. We provide a generalization of Ueno’s inequality for n-step transition probabilities of Markov chains in a general state space. Our re- sult is relevant to the study of adaptive control problems and approxima- tion problems in the theory of discrete-time Markov decision processes and stochastic games.

Let (S, F) be a measurable space and let P and Q be transition proba- bilities from S into S. The composition of P and Q, denoted by P Q, is the transition probability defined by

P Q(s, B) =

\

S

Q(z, B) P (s, dz),

where s ∈ S, B ∈ F. For any integer n ≥ 2, we write Q n to denote the n-step transition probability QQ n−1 from S into S, induced by Q 1 = Q.

By k · k, we denote the total variation norm in the vector space of all finite signed measures on (S, F). Recall that if µ 1 and µ 2 are probability measures on (S, F), then

kµ 1 − µ 2 k = 2 sup

B∈F

|µ 1 (B) − µ 2 (B)|.

In the sequel, we prove the following result.

Theorem . Let P and Q be transition probabilities from S into S and let

ε = sup

s∈S

kP (s, ·) − Q(s, ·)k.

1991 Mathematics Subject Classification: Primary 60J10, 60J35; Secondary 93C40, 93E20.

Key words and phrases : Markov chains, transition probabilities, adaptive control, stochastic control.

[295]

(2)

Then for s, z ∈ S and n ≥ 1 we have

(1) kP n (s, ·) − Q n (z, ·)k ≤ ε (1 + β + . . . + β n−1 ) + 2β n , where

(2) β = 1 2 sup x,y∈S kP (x, ·) − P (y, ·)k.

Remark 1. If ε = 0, then (1) is exactly Ueno’s inequality [9].

Corollary 1. If β < 1, then (1) implies that for n sufficiently large we have

kP n (s, ·) − Q n (z, ·)k ≤ 2ε 1 − β for each s, z ∈ S.

Suppose that S is the state space for Markov chains having transition probabilities P and Q respectively. If there exists a probability measure π P

on (S, F) such that sup

s∈S

kP n (s, ·) − π P (·)k → 0 as n → ∞,

at a geometric rate, then the Markov chain with transition probability P is called uniformly ergodic and π P is the unique invariant probability measure for P .

Corollary 2. Let π P and π Q be the invariant probability measures for P and Q respectively. Assume that the Markov chains with transition prob- abilities P and Q are uniformly ergodic. If β < 1, then

kπ P − π Q k ≤ ε 1 − β .

It is well known that the Markov chain with transition probability T is uniformly ergodic if and only if there exist a constant c ∈ (0, 1) and a positive integer m such that

(3) kT m (s, ·) − T m (z, ·)k ≤ 2c for every s, z ∈ S. For a proof see, e.g., [2].

Put P = T m and fix a transition probability Q. Define ε = sup

s∈S

kT m (s, ·) − Q(s, ·)k.

Assume that (3) holds and consider β defined by (2). Then β < 1, and using Corollary 1, we infer that for n sufficiently large, we have

kQ n (s, ·)−Q n (z, ·)k ≤ kQ n (s, ·)−T mn (s, ·)k+kT mn (s, ·)−Q n (z, ·)k ≤ 4ε

1 − β .

This enables us to state the following result.

(3)

Corollary 3. If (3) holds and 2ε/(1 − β) < 1, then the Markov chain with transition probability Q is also uniformly ergodic. Moreover ,

(4) kπ T − π Q k ≤ ε

1 − β

where π T (π Q ) is the unique invariant probability measure for the transition probability T (Q).

Remark 2. Our main result and Corollaries 1–3 may have applications to approximation problems or adaptive control problems as studied in [3], [5], [6], [7] and [8]. A result closely related to Corollary 2 was proved by Stettner in [8], but our inequality (5) has a more elementary form. Also, our proof is quite elementary while the method of proof in [8] is based on the theory of bounded transition operators considered in [4]. However, Stettner’s proof [8] can be used for studying some uniform convergence problems of n-step transition probabilities in different norms on the state space [6].

Proof of Theorem. We proceed by induction on n. It is easy to see that (1) holds for n = 1. Suppose it holds for a positive integer n. Note that (5) kP n+1 (s, ·) − Q n+1 (z, ·)k

= kP n P (s, ·) − Q n Q(z, ·)k

≤ kP n P (s, ·) − Q n P (z, ·)k + kQ n P (z, ·) − Q n Q(z, ·)k

≤ kP n P (s, ·) − Q n P (z, ·)k + ε.

Moreover, we have

(6) kP n P (s, ·) − Q n P (z, ·)k = 2 sup

B∈F

|L(B)|, where

L(B) =

\

S

P (x, B) µ(s, z)(dx) for any B ∈ F and µ(s, z)(·) = P n (s, ·) − Q n (z, ·).

Define

ϕ(x) = P (x, B) − inf

y∈S P (y, B).

Note that ϕ ≥ 0 on S and L(B) =

\

S

ϕ(x) µ(s, z)(dx).

Fix B ∈ F. Without loss of generality, we can assume that |L(B)| = L(B) (otherwise, use −µ(s, z)(dx) instead of µ(s, z)(dx)). By the Hahn decomposition theorem [1], there exists a set D ∈ F such that

µ(s, z)(E) ≥ 0 for all E ∈ F, E ⊂ D,

µ(s, z)(E) ≤ 0 for all E ∈ F, E ⊂ S \ D.

(4)

Note that

|L(B)| = L(B) =

\

D

ϕ(x) µ(s, z)(dx) +

\

S\D

ϕ(x) µ(s, z)(dx)

\

D

ϕ(x) µ(s, z)(dx) ≤ µ(s, z)(D) sup

x∈S

ϕ(x)

1 2 µ(s, z)(D) sup

x,y∈S

2|P (x, B) − P (y, B)|.

Hence,

(7) L(B) ≤ µ(s, z)(D) · 1 2 sup

x,y∈S

kP (x, ·) − P (y, ·)k = µ(s, z)(D) · β.

But

µ(s, z)(D) = P n (s, D) − Q n (z, D) ≤ 1 2 2 sup

F ∈F

|P n (s, F ) − Q n (z, F )|

= 1 2 kP n (s, ·) − Q n (z, ·)k.

This and (7) imply that

(8) |L(B)| = L(B) ≤ 1 2 kP n (s, ·) − Q n (z, ·)k · β.

By (6) and (8) we obtain

kP n P (s, ·) − Q n P (z, ·)k ≤ kP n (s, ·) − Q n (z, ·)k · β.

Applying this inequality, (5) and our induction hypothesis we finally get kP n+1 (s, ·) − Q n+1 (z, ·)k ≤ ε + kP n (s, ·) − Q n (z, ·)k · β

≤ ε + β(ε + εβ + . . . + εβ n−1 + 2β n )

= ε(1 + β + . . . + β n ) + 2β n+1 , which we wanted to prove.

References

[1] R. B. A s h, Real Analysis and Probability, Academic Press, New York, 1972.

[2] J. P. G e o r g i n, Contrˆ ole de chaˆınes de Markov sur des espaces arbitraires, Ann.

Inst. H. Poincar´e S´er. B 14 (1978), 255–277.

[3] O. H e r n a n d e z - L e r m a, Adaptive Markov Control Processes, Springer, New York, 1989.

[4] N. W. K a r t a s h o v, Criteria for uniform ergodicity and strong stability of Markov chains in general state space , Probab. Theory Math. Statist. 30 (1984), 65–81.

[5] G. B. D i M a s i and L. S t e t t n e r, Bayesian ergodic adaptive control of discrete time Markov processes , Stochastics and Stochastics Reports 54 (1995), 301–316.

[6] A. S. N o w a k and E. A l t m a n, ε-Nash equilibria in stochastic games with uncountable

state space and unbounded cost, Technical Report, Institute of Mathematics, Wroc law

University of Technology, 1998.

(5)

[7] W. J. R u n g g a l d i e r and L. S t e t t n e r, Approximations of Discrete Time Partially Observed Control Problems , Appl. Math. Monographs 6, C.N.R., Pisa, 1994.

[8] L. S t e t t n e r, On nearly self-optimizing strategies for a discrete-time uniformly er- godic adaptive model, Appl. Math. Optim. 27 (1993), 161–177.

[9] T. U e n o, Some limit theorems for temporally discrete Markov processes, J. Fac. Sci.

Univ. Tokyo 7 (1957), 449–462.

Andrzej S. Nowak Institute of Mathematics

Wroc law University of Technology Wybrze˙ze Wyspia´ nskiego 27 50-370 Wroc law, Poland E-mail: nowak@im.pwr.wroc.pl

Received on 9.12.1996;

revised version on 15.12.1997

Cytaty

Powiązane dokumenty

Let Z, N, Q be the sets of integers, positive integers and rational numbers respectively, and let P be the set of primes and prime powers. In this note we prove the following

The following lemma is an important tool for various constructions in Banach spaces.. It enables one to generalize constructions in Hilbert spaces

The purpose of this paper is to prove the following

On the one hand, we take convolution powers of a single probability measure supported on a fixed symmetric set S of generators, and on the other hand, convolution products of

Consequently, the bounds for hyper-Kloosterman sums of prime power moduli proved by Dąbrowski and Fisher [1] (see (19) and (20) in Section 4) can be rewritten and improved for large

In this paper, we will study class numbers of certain real abelian fields by using the polynomial g(t).. Our work is based on the observation that g(t) comes from a

To estimate the second moment we use the method of Balasubramanian–Ramachandra which is useful to observe the dependence on q and τ (see the proof of Lemma 2 0 of [1]), and use

I would also like to thank Professor Warren Sinnott and Professor Karl Rubin for helpful