STETTNER (Warszawa) ADAPTIVE CONTROL OF DISCRETE TIME MARKOV PROCESSES BY THE LARGE DEVIATIONSMETHOD Abstra t

(1)

T. E. DUNCAN (Lawren e, KS)

B. PASIK-DUNCAN (Lawren e, KS)

L. STETTNER (Warszawa)

ADAPTIVE CONTROL OF DISCRETE TIME MARKOV

PROCESSES BY THE LARGE DEVIATIONSMETHOD

Abstra t. Some dis rete time ontrolled Markov pro esses in a lo ally

ompa t metri spa e whose transition operators depend on an unknown

parameter are des ribed. The adaptive ontrols are onstru ted using the

largedeviationsofempiri aldistributionswhi hareuniformintheparame-

terthattakesvaluesina ompa tset. Theadaptive pro edureuses anite

family of ontinuous, almost optimal ontrols. Using the large deviations

propertyitisshownthatanadaptive ontrolwhi hisaxedalmostoptimal

ontrolafter anitetimeis almost optimalwithprobabilitynearly1.

0. Introdu tion. Consider a ontrolled Markov pro ess (x

n

;n 2 N)

on a probabilityspa e (;F;P) taking values ina lo ally ompa t metri

spa e (E;%

E

) with the transition operator P

0

;v

n

(x

n

;) at time n. The

quantity 0

isanunknownparameterthatisanelementofa ompa tmetri

spa e(A;%

A

),andthetermv

n

thatisthe ontrolisa(x

0

;:::;x

n

)-adapted

randomvariable withvaluesina ompa t metri spa e(U;%

U ).

Let :EU !R

+

bea ontinuousboundedfun tionand let

(1) J

0

((v

n

;n2N))=limsup

n!1 1

n n 1

X

i=0 (x

i

;v

i ):

The ontrol problem is to minimize J over the admissible strategies

(v

n

; n 2 N) where v

n

is a U-valued, (x

0

;:::;x

n

)-adapted random vari-

2000 Mathemati sSubje tClassi ation: 93E20,93E10,93C40.

Keywordsandphrases: adaptive ontrol,dis retetime ontrolledMarkovpro esses,

largedeviations.

Resear h supportedin part by NSF Grant DMS9623439 and KBNGrant 2 P03A

05309.

(2)

able. If 0

is known and some ergodi assumptions are satised, then the

family of admissible strategies an be restri ted to Markovian ones, that

is, v

n

= u(x

n

), where u 2 A = B(E;U), the family of Borel measurable

fun tionsfrom E toU. ThusJ

0

((u(x

n

);n2N)) isminimizedoveru2A.

If 0

isunknownthenanestimateof 0

ismadeforea hn2N usingthe

statex

n

anda ontrolu2Ais hosenthatisalmostoptimalforthe urrent

valueoftheestimateof 0

. Whilesu hastrategy anbeshowntobealmost

self-optimizing( f.,e.g., [5, 9℄) thepro edure requires an estimate forea h

n2N and a hoi e of a ontrol. Intheapproa h inthispaperan adaptive

ontrolisxedafteranitetimeand itisshownto bealmost optimalwith

probabilitynearly1. Thisapproa huses theresultsonthelargedeviations

of empiri aldistributionswhi h areuniform inthe parameter. These large

deviation results are des ribed in Se tion 2. In Se tion 3 it is shown that

an adaptive ontrol an be taken to bexed after a nitetime and almost

optimal. In Se tion4two Markovmodelsare given forwhi h theresultsin

Se tions 2and 3 an beapplied.

Throughout thepaperwe denoteby:

C(E) thespa e of ontinuousboundedfun tions onE,

P(E) the spa e of all probability measures on E endowed with the

weak onvergen etopologyand withthesetof Borel subsetsB(P(E)),

B(E) thesetof Borelsubsets of E.

2. Uniform large deviations of empiri al distributions. In this

se tion we onsideranun ontrolledMarkovpro ess (x

n

;n2N) with tran-

sition operators P

(x

n

;) where 2 A. The following assumptions are

made:

(B1) Forf 2C(E) themapping

AE 3(;x)7!P

f(x):=

\

E f(y)P

(x;dy)

is ontinuous.

ForB 2B(E) and C 2B(P(E)), the empiri almeasure and its proba-

bility(see [2℄and [3℄) aredened asfollows:

S

n (B)=

1

n n 1

X

i=0

B (x

i );

(2)

Q

nx

(C)=P

x fS

n 2Cg;

(3)

where P

x

stands forthe onditionalprobabilitymeasure given (x

n

;n2N)

starts fromx and thetrue parameteris . Furthermore let

(3)

and for2P(E),dene

(5) I

()=sup

f2

\

E log

f(x)

P

f(x)

(dx):

Theorem 1. If (B1) is satised then for ompa t subsets C and A

1 of

P(E) and A, respe tively, we have

(6) limsup

n!1 n

1

sup

2A

1 sup

x2E logQ

nx

(C) inf

2A

1 inf

2C I

()

whereQ is givenby (3)and I is given by (5).

Proof. For d>1 let

d

=ff 2:sup

x2E

f(x)d inf

x2E f(x)g

and for2P(E) let

I

d

()= sup

f2d

\

E log

f(x)

P

f(x)

(dx):

Clearly

(7) E

x

exp

n 1

X

i=0 log

f(x

i )

P

f(x

i )

=E

x

exp

n

\

E log

f(y)

P

f(y) S

n (dy)

=

\

P(E) exp

n

\

E log

f(y)

P

f(y)

(dy)

Q

nx (d):

Sin e by thedenitionof thefamily

d

,forn=1;2;:::;

(8) E

x

exp

n 1

X

i=0 log

f(x

i )

P

f(x

i )

d

by(7)it followsthat

(9)

\

P(E) exp

n

\

E log

f(y)

P

f(y)

(dy)

Q

nx

(d)d:

Consequently,forany Borelsubset of P(E) and2A,

(10) sup

x2E Q

nx

( )dexp

ninf

2

\

E log

f(y)

P

f(y)

(dy)

:

Nowlet

d

= inf

2A1 inf

2C I

d ():

(4)

Sin e ea h f 2 belongs to

d

with d suÆ iently large, for f 2 the

mapping

P(E)A 3(;)7!

log f

P

f

:=

\

E log

f(y)

P

f(y)

(dy)

is ontinuous. Therefore,for">0 theset

(;)2P(E)A :

log f

P

f

>

d

"

isopen and

CA

1

[

f2

d

(;):

log f

P

f

>

d

"

:

Sin e C A

1

is a ompa t subset of P(E)A, there is a nite subset

ff

1

;:::;f

k g of

d

su hthat

CA

1

k

[

j=1

(;):

log f

j

P

f

j

>

d

"

:

Consequently,forevery 2A

1 ,

C [

f2

d

2P(E) :

log f

P

f

>

d

"

:

Nowifwerepla e in(10)bythesets

K

j

=

2P(E):

log f

j

P

f

j

>

d

"

\C

itfollows that

(11) sup

x2E Q

nx (K

j )de

n(

d

")

:

Hen e

(12) sup

x2E sup

2A

1 Q

nx

(C)sup

x2E sup

2A

1 h

k

X

j=1 Q

nx (K

j )

i

kde

n(d ")

and

limsup

n!1 sup

x2E sup

2A1 n

1

logQ

nx

(C)

d +":

Sin e ">0 an be hosenarbitrarilysmall, itfollows that

(13) limsup

n!1 sup

x2E sup

2A1 n

1

logQ

nx

(C)

d :

To ompletetheproof itremainsto show that

(14) lim

d!1

d

= inf

2A1 inf

2C I

():

(5)

Note thatfor">0,

CA

1

[

f2

(;):

log f

P

f

> inf

2A

1 inf

2C I

() "

andbythe ompa tnessofCA

1

thereisanitesetff

1

;:::;f

k

gsu h

that

CA

1

k

[

j=1

(;):

log f

j

P

f

j

> inf

2A

1 inf

2C I

() "

:

Bythedenitionsofand

d

thereisd>0su hthatff

1

;:::;f

k g

d .

Therefore, forea h 2A

1

and 2C,

sup

f2d

\

E log

f(y)

P

f(y)

(dy) max

i=1;:::;k

\

E log

f

i (y)

P

f

i (y)

(dy)

> inf

2A

1 inf

2C I

() "

and

limsup

d!1 k

d

inf

2A

1 inf

2C I

() ":

Sin e thelastinequalityholdsforany ">0,and

d

inf

2A

1 inf

2C I

();

theequality (14)follows,whi h ompletes theproof.

In the proof of Theorem 1 the ompa tness of the set C P(E) is

important. Torelaxthisrequirementanextraassumptiononthetransition

operatorP

(x;)is made:

(B2) Thereisa ontinuousfun tion :A E !R su hthat (;x)1

for2A, x2E,themapping

(15) E3x7! sup

2A

\

E

(;y)P

(x;dy)

isboundedon ompa tsubsets of E and forea h m>0 theset

(16) K

m :=

x2E: inf

2A

(;x)

T

E

(;y)P

(x;dy)

m

is ompa t.

Let

(17) %:= inf

x2E inf

2A

(;x)

T

E

(;y)P

(x;dy) :

If (B2) is satisedthen% >0. The followingtwo lemmas areeasy adapta-

(6)

Lemma 1.If (B2) issatised then for ea h ">0,

(18) Q

nx

(f2P(E):(K

m )>"g)

(;x)exp

nlog% n"log m

%

:

Lemma2.If (B2)issatisedthenforanyrealnumberaandany ompa t

set W E there exists a ompa t subsetC(a)P(E) su h that

(19) limsup

n!1 n

1

sup

2A sup

x2W Q

nx (C(a)

( )

) a:

Using Lemma 2, one an generalize Theorem 1 in the same way as in

Theorem4.4 of[3℄.

Theorem2. If (B1) and(B2) aresatised, then for any losed subsets

C and A

1

of P(E) and A, respe tively, and ompa t subsets W of E,

(20) limsup

n!1 n

1

sup

2A1 sup

x2W logQ

nx

(C) inf

2A1 inf

2C I

():

Subsequently,the followinglemmais used.

Lemma 3.If (B1) and (B2) are satised then for ea h m>0 the set

(21) C

m

=f2P(E): inf

2A I

()mg

is ompa t in P(E).

Proof. Sin e for ea h xed 2 C

m

the mapping 7! I

() is lower

semi ontinuous, there is an () 2 A su h that I

() m. By (B2) for

r>0theset

K

r

=

x2E:

(;x)

T

E

(;y)P

(x;dy)

r

K

r

is ompa tinEwhereK

r

isdenedin(16). AdaptingthemethodofLemma

4.2of [3℄ it an beshownthatforr >%,

((K

r )

)

m log%

logr log%

<":

Therefore, for">0 there isr >0su hthat forany2C

m ,

((K

r )

)((K ()

r )

)":

Consequently,thefamilyC

m

of measuresis tight. Sin ethe mapping

(22) A P(E)3(;)7!I

()

islowersemi ontinuous, thesetC

m

is ompa t.

Inthenextpropositiontheratefun tionalI

()isshowntobepositive.

Proposition1. Assume (B1), (B2) and that for ea h 2A there is a

uniqueinvariantmeasure of thetransitionoperator P

(x;). LetA and

(7)

C be losed subsets of A andP(E) su h that

62C for 2A

1

. Then

(23) inf

2A

1 inf

2C I

()>0:

Proof. Note rst that the mapping (15) is lower semi ontinuous and

therefore it attains its minimumon ompa t sets. Moreover I

()0 for

2 P(E) and 2 A and, by Lemma 2.5 of [3℄, I

() = 0 if and only if

=

.

By the ompa tness of the set C

m

, dened in (21), for ea h m > 0 it

follows that

inf

2A

1 inf

2C\C

m I

() >0:

Sin e by thedenition(21),

inf

2A inf

2C

m I

()m>0;

(23) isveried.

An additional assumptionis made.

(B3) Forea h thereisauniqueinvariantmeasure

forthetransition

operator P

(x;) and for 2 C(E) the mappingA 3 7!

( ) is

ontinuous.

Using the assumptions (B1){(B3) we give a uniform estimate of the

deviationsofthe running ostsfromthe limit.

Theorem 3. If (B1){(B3) are satised, then for " > 0, 2 C(E) and

any ompa t set W E there are a p > 0 and a positive integer N su h

that for nN,

(24) sup

2A sup

x2W P

x n

n 1

X

j=0 (x

j )

( )

"

o

e np

:

Proof. By (B3) for " >0 and 2 A there exists Æ

>0 su h that if

0

2B(;Æ

) :=f2A :%

A

(; )Æ

g,then j

( )

0( )j

". Sin e

A is ompa tthere is a niteset

1

;:::;

k

su h that A S

k

i=1 B(

i

;Æ

i ).

For i = 1;:::;k let C

"

(

i

) := f 2 P(E) : j( )

i

( )j 2"g. Clearly

C

"

(

i

) is a losed subset of P(E) and if 2 B(

i

;Æ

i

) then

62C

"

(

i ).

Therefore,byProposition1andTheorem2therearep>0andN >0su h

thatforn>N and i=1;:::;k;

sup

2B(

i

;Æ

i )

sup

x2W Q

nx (C

"

(

i ))e

np

:

Equivalentlythismeansthat

sup

2B(

i

;Æ

i )

sup

x2W P

x n

n 1

X

(x

j )

i ( )

2"

o

e np

(8)

fori=1;:::;k. By thedenitionof B(

i

;Æ

i

)it follows that

sup

2B(i;Æ

i )

sup

x2W P

x n

n 1

X

j=0 (x

j )

( )

"

o

e np

fori=1;:::;k and onsequently(24) isobtained.

Theremainingpartofthisse tionisdevotedtothestudyofthelargede-

viationsforempiri aldistributionsofpairsof onse utivestates. Byanalogy

to (2)and (3), forB

1

;B

2

2B(E) let

(25) S

n (B

1

B

2 )=

1

n n 1

X

i=0

B1 (x

i )

B2 (x

i+1 )

and forC2B(P(EE)) let

(26) Q

nx

(C)=P

x fS

n 2Cg:

By analogyto (4)and (5) dene

(27) :=ff 2C(EE):9a>0 8x;y2E f(x;y)ag

and for2P(EE),

(28) I

()=sup

f2

\

E log

f(x;y)

P

f(y)

(dx;dy)

withP

f(y)= T

E

f(y;z)P

(y;dz).

Theorem4.If (B1) holds then for any ompa t subsets CP(EE)

and A

1

A,

(29) limsup

n!1 n

1

sup

2A

1 sup

x2E logQ

nx

(C) inf

2A1 inf

2C I

():

Proof. Note rst thatiff 2

d

,where

d

:=ff 2C(EE): sup

x;y2E

f(x;y)d inf

x;y2E

f(x;y)g;

then

E

x

exp

n 1

X

i=0 log

f(x

i

;x

i+1 )

P

f(x

i+1 )

d:

If 2B(P(EE)) and 2A then

sup

x2E Q

nx

( )dexp

n inf

2

\

E log

f(x;y)

P

f(y)

(dx;dy)

(9)

Remark 1. IfI

isdenedas

sup

f2

\

E log

f(x;y)

P

f(x)

(dx;dy)

thenfollowingSe tion1.3 of[7℄ an analogueof Theorem4 an beobtained

witha simplerproofbe ause

E

x

exp

n 1

X

i=0 log

f(x

i

;x

i+1 )

P

f(x

i )

=1:

However to adaptTheorem 4to the aseof anon ompa t losed setC the

rate fun tionalI

ofthe formdened in(28)is required.

An additional assumptionis madenow.

(B4) There is a ontinuous fun tion : A E E ! R su h that

(;x;y) 1for 2A and x;y2E,the mapping

E3x7! sup

2A

\

E

(;x;y)P

(x;dy)

isboundedon ompa tsets and forea h m>0 theset

K

m :=

(x;y)2EE : inf

2A

(;x;y)

T

E

(;x;y)P

(y;dz)

m

is ompa t inEE.

Let

(30) %:= inf

x;y2E inf

2A

(;x;y)

T

E

(;x;y)P

(y;dz) :

AdaptingLemmas1{3andTheorem2tothe aseof onse utivepairseasily

yieldsthefollowingresults.

Lemma 4.If (B4) issatised, then for ea h ">0,

(31) Q

nx

(f2P(EE):(K

m

)>"g)

(;x;y)exp

nlog% n"log m

%

:

Lemma5.If (B4) issatised, thenfor any real number aandany om-

pa t setW E thereexists a ompa t subsetC(a)P(EE) su h that

(32) limsup

n!1 n

1

supsup Q

nx (C(a)

) a:

(10)

Theorem5. If (B1) and(B4) aresatised, then for any losed subsets

C of P(EE) and A

1

of A, and any ompa t subset W of E,

(33) limsup

n!1 n

1

sup

2A

1 sup

x2W Q

nx

(C) inf

2A

1 inf

2C I

():

Lemma 6.If (B1) and (B4) are satised,then for ea h m>0,

(34) C

m

=f2P(EE): inf

2A I

()mg

is a ompa t subsetof P(EE).

To prove an analogueofProposition1 thefollowinglemma is used.

Lemma 7. Assume (B1) and (B4) and that for ea h 2 A there is a

uniqueinvariant measure

of thetransition operator P

(x;dy). Thenfor

2P(E E) the following equivalen e issatised:

(35) I

()=0,(dx;dy) =P

(x;dy)

(dx):

To verify thislemma the methodsof the proof of Lemma 2.5 of [3℄ an

beused. The detailsareleft to thereader.

Using Lemma 7 we an state an analogue of Proposition 1 ombined

withTheorem5.

Theorem 6. If (B1), (B3) and (B4) are satised, then for any om-

pa t sets W E, A

1

A and losed set C P(E E) su h that

P

(x;dy)

(dx) 62 C for 2 A

1

there are p > 0 and a positive integer

N su h that for nN,

(36) sup

2A

1 sup

x2W P

x n

n 1

X

i=0

xi;xi+1

()2C o

e np

:

Remark 2. If E is ompa t then the assumptions (B2) and (B4) are

learlysatised. An analysisof theproofsofTheorems 3 and 6shows that

the onstantsp an be giveninterms oftheinmaof theratefun tionsI

and I

,respe tively.

3. Adaptive ontrol with observation of ost. Consider now a

ontrolledMarkovpro ess(x

n

,n2N) withtransitionoperatorP

0

;v

n

(x

n

;).

The followingassumptionsaremade.

(A1) Forf 2C(E) themapping

A UE 3(;v;x)7!P ;v

f(x)

is ontinuous.

(A2) Thereisa ontinuousfun tion :A E !R su hthat (;x)1

for2A, x2E, themapping

E 3x7! sup

2A sup

v2U

\

(;y)P ;v

(x;dy)

(11)

isboundedon ompa tsubsets ofE and forea h m>0the set

K

m :=

x2E : inf

2A inf

v2U

(;x)

T

E

(;y)P ;v

(x;dy)

m

is ompa t.

(A3) For u 2 A

:= A\C(E;U) there is a unique invariant measure

u

for P

;u

(x;), and for 2 C(E) the mapping A 3 7! u

( )is

ontinuous.

(A4) For 2A theoptimalvalueofthe ostfun tionalJ

denedin(1)

oin idesalmost surelywiththatof

(37) J

((v

n

;n2N)) =limsup

n!1 n

1

E

x n

n 1

X

i=0 (x

i

;v

i )

o

:

Moreover, forany">0thereisanite lassU(")=fu

1

;:::;u

r g

A

of "-optimal ontrol fun tions for J

, that is, for any 2 A

there isa u

j

2U(") su hthat

J

x ((u

j (x

n

))) inf

(vn) J

((v

n ))+":

Furthermore, there is a ompa tset W E that is positivere ur-

rentfortheMarkovpro ess(x

n

;n2N) ontrolledwithany ontrol

fun tionofthe lassU(").

An analogue ofTheorem3 isgiven now.

Theorem 7. Assume that (A1){(A4) are satised and x ">0. Then

there are p>0 and a positive integer N su h that for nN,

(38) sup

k=1;:::;r sup

2A sup

x2W P

u

k

x n

n 1

X

j=0 (x

j

;u

k (x

j ))

\

E (y;u

k (y))

u

k

(dy)

"

o

e np

:

Moreover, the following adaptive strategy ( bv

j

) is "-optimal with probability

(1 e np

) r

: hoose n N, use the ontrol fun tion u

1

for i < T

1 :=

T(T(0)+n),thenu

2 forT

1

i<T

2

:=T(T

1

+n);:::;andu

i forT

r 1

i<

T

r

:=T(T

r 1

+n),whereT() denotes the rst hittingtime tothe positive

re urrent ompa t setW after the random time ,determine k2f1;:::;rg

su h that

(39)

T

k 1 +n 1

X

j=T

k 1 (x

j

;n

k (x

j

))= min

q=1;:::;r T

q 1 +n 1

X

j=Tq

1 (x

j

;u

q (x

j ))

and after T

r

usethe ontrol fun tionu.

(12)

Proof. Note that(38) follows from(24). Then forx2W andnN,

inf

2A P

x n

n 1

Tq 1+n 1

X

j=Tq

1 (x

j

;u

q (x

j ))

\

E (z;u

q (z))

u

q

(dz)

"forq =1;:::;r o

(1 e np

) r

:

Consequently,by(39) forx2W,

inf

2A P

x n

\

E (z;u

k (z))

u

k

(dz) min

q=1;:::;r

\

E (z;u

q (z))

u

q

(dz)+2"

o

(1 e np

) r

:

Sin e by (A4), J

0

(( bv

n ))=

T

E (z;u

q (z))

u

k

(dz),P

0

-a.e., itfollows that

J

0

(( bv

n

))3"+inf

(vi) J

0

((v

i ))

withprobability(1 e np

) r

.

4. Examples. Two models are des ribed to demonstrate the appli a-

bilityoftheprevious resultsthatused largedeviations.

Model I.E isa ompa tmetri spa e. Thereis aprobabilitymeasure

su h that

(40) P

;v

(x;B)=

\

B

p(x;y;;v)(dy)

for B 2 B(E), the mapping EE A U 3 (x;y;;v) 7! p(x;y;;v)

is ontinuous and p(x;y;;v) > 0 for x;y 2 E, 2 A, v 2 U. Clearly

(A1){(A3) aresatisedforthismodel. Furthermore,there isanitefamily

of almostoptimal ontrols.

Theorem 8. For ea h ">0 there is a nite lass U(") A

of "-op-

timal ontrol fun tions for Model I with ost fun tionalsJ

, 2A.

Proof. The proof onsistsof three steps:

Step I.Initiallyitisshownthatthereisaniteset b

U(")=fu

1

;:::;u

r g

of pie ewise onstant "-optimal ontrol fun tions su h that the set of their

dis ontinuitypointsisof -measure zero.

Let fE n

1

;E n

2

;:::;E n

d

n

g, n = 1;2;:::; be a sequen e of partitions of E

and fe n

1

;e n

2

;:::;e n

dn

g, n = 1;2;:::; be a sequen e of their representative

elementssu h that

E = d

n

[

E n

i

; E

n

i

\E n

j

=; for i6=j;

(13)

(E n

i

) = 0, the diameter of E n

i

is not greater than 1=n, e n

i 2 E

n

i for

i=1;:::;d, fE n+1

1

;:::;E n+1

dn+1

g isa subpartitionoffE n

1

;:::;E n

dn gand

fe n

1

;:::;e n

d

n gfe

n+1

1

;:::;e n+1

d

n+1 g:

Moreover, letp ;v

n (e

n

i

;e n

j )=P

;v

(e n

i

;e n

j ).

Consider now a ontrolled Markov pro ess on fe n

1

;:::;e n

d

n

g with tran-

sition operator p ;v

n (e

n

i

;e n

j

). By Lemma 3 of [1℄ for given " >0 there is a

nitesetfu n

1

;:::;u n

r

n

g of ontrolfun tionssu hthat

(41) sup

2A sup

1ir

n sup

1jd

n h

w n

(e

n

j )

n

d

n

X

i=1 w

n

(e

n

l )p

;u n

i (e

n

j )

n

(e n

j

;e n

l )

+ (e n

j

;u n

i (e

n

j ))

i

"

2

where n

and w n

are the optimal value and the Bellman fun tion of the

orresponding ost fun tionalJ

,respe tively.

Let u n

j

(x) = u n

j (e

n

l ), w

n

(x) = w n

(e

n

l

), and P ;v

n

(x;) = P ;v

(e n

l

;) for

x2E n

l

,l=1;:::;d

n

:It follows that

(42) sup

2A min

1jr

n sup

x2E fw

n

(x) n

P

;u n

i (x)

n

w n

(x)+

n (x;u

n

i

(x))g"=2

with

n

(x;v) = (e n

l

;v) for x 2 E n

l

and v 2 U. Note, moreover, that n

is also the optimal value of the ost fun tional J

orresponding to the

ontrolledMarkov pro ess onE withtransition operatorP ;v

n

(x;).

By Lemma 3.3.3 of [6℄ for u 2 A, n = 1;2;:::; there exist probability

measures u

and

u;n

su h thatforx2E and B 2B(E),

(43) j(P

u;

) k

(x;B) u

(B)j(1 d) k 1

and

(43) j(P

u;

n )

k

(x;B) u;n

(B)j(1 d) k 1

with

d= inf

x;y2E inf

2A inf

v2U

p(x;y;;v):

An analysisoftheproofofProposition1of [9℄shows thatthereexistsa

onstant K >0that isindependentof u2A,2A and n2N, for whi h

(44) k

u

u;n

k

var

sup

x2E KkP

u;

n

(x;) P u;

(x;)k

var

wherekk

var

denotesthevariationnorm.

Sin e bythe ontinuityof thetransition densityp(x;y;;v),

(45) lim

n!1

supsupsupKkP u;

n

(x;) P u;

(x;)k

var

=0;

(14)

for

=inf

u2A

E

(x;u(x)) u

(dx) itfollows that

sup

2A j

n

jsup

2A k

u

u;n

k

var

+ sup

x2E;v2U

j (x;v)

n

(x;v)j!0

as n! 1. Therefore, for suÆ iently largen from (42), (44), and (45) we

obtain

(46) sup

2A min

1jr

n sup

x2E fw

n

(x)

P

;u n

i (x)

w n

(x)+

n (x;u

n

i

(x))g";

whi h means ( f. the proof of Theorem 3.2.2 of [6℄) that b

U(") = fu n

1

;:::

:::;u n

r

g is a setof "-optimal ontrol fun tions fortheoriginalMarkovpro-

ess. It followsfrom the onstru tion thatthesetof dis ontinuitypointsof

theabove ontrol fun tionsis of-measure0.

Step II.It is learthat fori=1;:::;r there isa sequen e u

i

(n)2A

,

n=1;2;:::;su hthat

(47) lim

n!1

(fz2E :u

i

(z)6=u

i

(n)(z)g)=0:

We laimthat foranyboundedBorelfun tion f :E !R su h that the

set ofdis ontinuitypointsoff isof -measure 0,and fori=1;:::;r,

(48) lim

n!1 sup

2A j

ui

(f) ui(n)

(f)j=0:

Assume that ontrary to (48) there is a sequen e (

n

;n2N) from A su h

that

n

!2A and forsome i2f1;:::;rg,

(49) j

u

i

n

(f) u

i (n)

n

(f)j>Æ>0

forsuÆ ientlylargen.

By thetightnessof asuitably hosensubsequen e n

k

itfollows that

u

i (n

k )

n

k

=)

whereisaprobabilitymeasure onE. Wenowshowthat isinvariantfor

thetransition operatorP a;u

i

(x;). In fa t,forg2C(E) itfollows that

j(g) (P ;u

i

g)j j(g) u

i (n

k )

n

k (g)j

+j

ui(nk)

n

k

(g) ui(nk)

n

k (P

n

k

;ui(nk)

g)j

+j

u

i (n

k )

n

k (P

n

k

;u

i (n

k )

g) u

i (n

k )

n

k (P

;u

i (n

k )

g)j

+j

ui(nk)

n

k (P

;ui(nk)

g) ui(nk)

n

k (P

;ui

g)j

+j

u

i (n

k )

n

k (P

;u

i

g) (P ;u

i

g)j

=I

1k +I

2k +I

3k +I

4k +I

5k :

ClearlyI

1k

!0 ask!1, andI

2k

=0. Moreover,

I

3k

jgj sup supjp(x;y;

n

k

;v) p(x;y;;v)j!0

(15)

ask !1. Dene

(50) M = sup

x;y2E sup

2A sup

v2U

p(x;y;;v):

Sin e u

i (n

k )

n

k

is an invariant measure it follows that u

i (n

k )

n

k

() M().

Therefore, () M(). Consequently, by (47), I

4k

! 0 as k ! 1, and

sin etheset of dis ontinuitypointsof P ;u

i

gis of -measure 0,I

5k

!0 as

k !1.

Bytheuniquenessoftheinvariantmeasureitfollowsthat = u

i

. Sin e

doesnotdependon a parti ularsubsequen e itfollows that

(51)

u

i (n)

n

=) u

i

asn!1. Usingsimilarargumentsitalso follows that

ui

n

=) ui

asn!1,whi htogetherwith(51) ontradi ts(49). Thus,(48)issatised.

Step III.The followinginequalityiselementary:

sup

2A

\

E (x;u

i (x))

ui

(dx)

\

E (x;u

i

(n)(x)) ui(n)

(dx)

sup

2A h

\

E (x;u

i (x))

u

i

(dx)

\

E (x;u

i (x))

u

i (n)

(dx)

+

\

E ( (x;u

i

(n)(x)) (x;u

i (x)))

u

i (n)

(dx)

i

=I

1n +I

2n :

By (48) learly I

1n

!0asn!1. Sin e by(47) also

I

2n

k kM(fx2E :u

i

(n)(x)6=u

i

(x)g)!0

as n ! 1 with M dened in (50), for suÆ iently large n, for 2 A and

i=1;:::;r it followsthat

(52) J

x (u

i

(n))J

x (u

i

)+"

+2":

Thus, the set fu

1

(n);:::;u

r

(n)g ontains 2"-optimal ontrol fun tions for

the ost fun tionalJ

with 2A.

Therefore, (A4) isalso satised.

Model II.LetE =R d

. Assume (x

n

;n2N) satises thefollowingre ur-

sive formula:

(53) x

n+1

=f(x

n

; 0

;v

n

)+g(x

n )w

n

wheref :R d

A U !R d

and g:R d

!R d

R d

are ontinuousbounded

fun tions, there is a ontinuous bounded inverse matrix g 1

, and w

n is a

(16)

variables. By the ontinuityof f,g, and g 1

it follows that(A1) and (A3)

( f. [9℄) aresatised. In (A2)let (;x)=x 2

+1 sothat

(;x)

T

E

(;y)P v

(x;dy)

=

x 2

+1

f 2

(x;;v)+g 2

(x)

and K

m

is ompa tif

sup

2A sup

v2U [f

2

(x;;v)+g 2

(x)℄ =o(x 2

)

asx 2

!1, and,inparti ular,iff and g arebounded.

Hen e (A2)is satised. JustasforModelIthere isthefollowingresult.

Theorem 9. For given ">0 there is a nite family U(") of "-optimal

ontrol fun tions for ModelII with ost fun tionalsJ

, 2A.

Proof. ByLemma1of[9℄thereareanitemeasurethatisabsolutely

ontinuouswithrespe ttoLebesguemeasureinR d

and a onstant M su h

thatforx2R d

,2A and v2U,

()P ;v

(x;)M():

Consider now a sequen e of partitions fE n

1

;:::;E n

dn

g, n = 1;2;:::; of R d

and representative elementsfe n

1

;:::;e n

d

n

g,n=1;2;:::;su h that

R d

= dn

[

i=1 E

n

i

; E

n

i

\E n

j

=; fori6=j;

(E n

i

) = 0, e n

i 2E

n

i

for i= 1;:::;d

n

, the diameter of E n

i

is not greater

than1=nfori=1;:::;d

n 1,e

n

d

n

fx2R d

:kxk>ng,fE n+1

1

;:::;E n+1

dn+1 g

isa subpartitionof fE n

1

;:::;E n

d

n gand

fe n

1

;:::;e n

d

n gfe

n+1

1

;:::;e n+1

d

n+1 g:

NowthemethodsoftheproofofTheorem7 an beused. Noteonlythat

thevalueof d in(43) isnowrepla ed by(R d

).

Thus, (A1){(A4) are satised. Moreover, for any u 2 A and transition

operator P u(x)

(x;), the assumption (B4) is satised with the fun tion

(;x;y) =x 2

+y 4

+1.

5. Adaptive ontrol with estimation. Consider rst the ase of

ModelI. Foru2A

and ; 0

2A let

(54) K u

(; 0

):=

\

E

\

E

p(x;y;;u(x))logp(x;y;

0

;u(x))(dy)

u (dx):

Fix ">0. By the proofsof Propositions7 and 4 in [5℄ there is Æ >0 su h

0 u u 0

(17)

(55) k

u

0k

var

":

By the ontinuityof K u

(; 0

) and the ompa tness of A there area nite

sequen e

1

;:::;

k

and > 0 su h that for 2 A there is

j

su h that

2B

(

j

)=f2A :%

A ( ;

j

)g and foru2U("),

K u

(;) K u

(;

j

)Æ=2;

sup

2A jK

u

(

j

; ) K u

(; )j Æ=16:

(56)

Foru2U(")let

(57) b u

n

= n

j :

n 1

Y

i=0 p(x

i

;x

i+1

;u(x

i );

j )

= max

q=1;:::;k n 1

Y

i=1 p(x

i

;x

i+1

;u(x

i );

q )

> max

qj 1 n 1

Y

i=0 p(x

i

;x

i+1

;u(x

i );

q )

o

:

Theorem 10. For Model I there exist a p > 0 and a positive integer

nN su h that for u2U("),

(58) inf

2A inf

x2E P

x fk

u

b u

n k

var

"g1 ke np

;

whereb u

n

is given by (57).

Proof. Let

(59) C u

(

i

;

j )=

n

l2P(EE):

\

EE

logp(x;y;u(x);

j

)l(dx;dy) K u

(

i

;

j )

Æ=8 o

:

If2B

(

i

)then P u

(x;dy)

u

(dy)62C u

(

i

;

j

) be ause

\

EE

logp(x;y;u(x);

j )P

u

(x;dy) u

(dx)=K u

(;

j )

andby(57), jK u

(;

j

) K(

i

;

j

)jÆ=16. Therefore,byTheorem6there

are a p > 0 and a positive integer N su h that for n N, i;j = 1;:::;k

and u2U("),

sup

2B

(

i )

sup

x2E P

u

x n

n 1

X

m=0

logp(x

m

;x

m+1

;u(x

m );

j )

K u

(; )

Æ=8 o

e np

:

(18)

Then from(57) fornN and i;j=1;:::;k;

(60) sup

2B

(

i )

sup

x2E P

u

x n

n 1

X

m=0

logp(x

m

;x

m+1

;u(x

m );

j )

K u

(;

j )

Æ=4 o

e np

:

Sin e

n 1

X

m=0

logp(x

m

;x

m+1

;u(x

m );b

u

n )n

1 n 1

X

m=0 logp(x

m

;x

m+1

;u(x

m );

j )

forj=1;:::;k;using(60)it followsthat

K u

(;b u

n

)+Æ=4K u

(;

j

) Æ=4

forj=1;:::;k withprobability1 ke np

,for2A and u2U(").

Consequently,using(56)for u2U(") we get

inf

2A inf

x2E P

u

x fK

u

(;) K u

(;b u

n

)Æg1 ke pn

and by (55)itfollowsthat (58)is satised.

Now onsider thefollowing adaptive strategy: Choose n N where N

is asinTheorem10 and test ea h ontrolfun tionof thefamily U(") forn

unitsof time. Then forq=1;:::;r determine

(61)

q

= n

j :

nq 1

X

i=n(q 1) p(x

i

;x

i+1

;u

q (x

i );

j )

= max

j 0

=1;:::;k nq 1

X

i=n(q 1) p(x

i

;x

i+1

;u

q (x

i );

j 0

)

> max

j 0

j 1 nq 1

X

i=n(q 1) p(x

i

;x

i+1

;u

q (x

i );

j 0

) o

and ndp2f1;:::;rg su h that

(62)

\

E (y;u

p (y))

u

p

(dy)= min

q=1;:::;r

\

E (y;u

p (y))

u

p

p (dy):

By Theorem10 thefollowing orollary easilyfollows:

Corollary 1. The ontrol strategy bv

j

= u

p (x

j

) for j nr is 2"k k-

optimal withprobability (1 ke np

) r

.

ForModel II dene, foru2A

,

(63) K u

(; 0

)=

\

d

\

d

k(y f(x;

0

;u(x)))g 1

(x)k 2

P u(x)

(x;dy) u

(dx)

(19)

and

(64) K u

m (;

0

)

=

\

R d

\

R d

(k(y f(x;

0

;u(x)))g 1

(x)k 2

^m)P u(x)

(x;dy) u

(dx):

ByProposition1of[9℄,K u

(; 0

)andK u

m (;

0

)are ontinuousfun tionsin

theirtwovariables. Moreover,thefollowing ontinuitypropertyissatised.

Lemma 8.If u2A

then

sup

; 0

2A jK

u

(; 0

) K

u

m (;

0

)j!0

as m!1.

Proof. Supposethatfor

m

! and 0

m

! 0

thereisÆ>0su hthat

jK u

(

m

; 0

m

) K

u

m (

m

; 0

m )j>Æ:

Sin e

(65) K u

(; 0

)=

\

R d

Ef k(f(x; ;u( x)) f(x;

0

;u(x)))g 1

(x) + k 2

g u

(dx)

and

(66) K u

m (;

0

)=

\

R d

Efminfk(f(x;;u(x))

f(x;

0

;u(x)))g 1

(x)+k 2

;mgg u

(dx);

where is anN(0;1) randomvariableand E isexpe tation,and byPropo-

sition1of[9℄,k

u

m

u

k

var

!0,itfollows thatK u

m (

m

; 0

m )!K

u

(; 0

)

and K u

(

m

; 0

m )!K

u

(; 0

)as m!1. Thisisa ontradi tion.

Lemma9.For ">0 and u2A

thereisa Æ >0 su h that if K u

(; 0

)

Æ then k

u

0k

var

".

Proof. If the lemma does not hold then K u

(

n

; 0

n

) ! 0,

n

! ,

0

n

! 0

and k

u

n

u

0

n k

var

" for n = 1;2;::: and an " > 0. From

Lemma 8 it follows that K(; 0

) = 0. By (65) and the proof of Theorem

9 it follows that f(x;;u(x)) = f(x;

0

;u(x)) for almost all x 2 R d

with

respe t to d-dimensional Lebesgue measure. Sin e f and u are ontinuous

fun tions, f(x;;u(x)) = f(x;

0

;u(x)) for all x 2 R d

, and onsequently

u

= u

0

,a ontradi tion.

CombiningLemmas 9and 8 givesthefollowing orollary.

Corollary 2. For " > 0 there are m > 0 and Æ > 0 su h that for

u2U("), if K u

m (;

0

)<Æ then k

u

0 k

var

".

(20)

Fix">0andtakemasinCorollary2. Thereareaniteset

1

;:::;

k 2

A and >0su hthat for2B

(v

j

)and u2U("),

(67) K

u

m (;

j

)Æ=2

and

(68) sup

2A jK

u

m (

j

; ) K u

m

(; )jÆ=16:

Let

(69) b u

m

= n

j :

n 1

X

i=0 k(x

i+1 f(x

i

;

j

;u(x

i )))g

1

(x

i )k

2

= min

q=1;:::;k n 1

X

i=1 k(x

i+1 f(x

i

;

q

;u(x

i )))g

1

(x

i )k

2

min

qj 1 n 1

X

i=0 k(x

i+1 f(x

i

;

q

;u(x

i )))g

1

(x

i )k

2 o

:

UsingTheorem6,Corollary2 and(67){(69)asinthe aseofModelIyields

thefollowingtheorem.

Theorem11.GivenModel IIfora given ompa t setW R d

thereare

p>0 and a positive integer N su h that for nN and u2U("),

inf

2A inf

2W P

x fk

u

b u

n k

var

"g1 ke pn

:

LetW R d

bea ompa tsetthatispositivere urrentforea hu2U(").

Choose nN and use ontrolu

1

for i< T

1 ,u

2 for T

1

i<T

2

;:::; and

u

r forT

r 1

i<T

r

whereT

i

is denedasinTheorem7.

Forq=1;:::;r let

q

= n

j :

Tq 1+n

X

i=Tq

1 k(x

i+1 f(x

i

;

j

;u(x

i )))g

1

(x

i )k

2

= min

j 0

=1;:::;n T

q 1 +n

X

i=T

q 1 k(x

i+1 f(x

i

;

j 0

;u(x

i )))g

1

(x

i )k

2

< min

j 0

j 1 T

q 1 +n

X

i=Tq

1 k(x

i+1 f(x

i

;

j 0

;u(x

i )))g

1

(x

i )k

2 o

:

Findp2f1;:::;rgsu hthat

\

E (u;u

p (y))

u

p

(dy)= min

q=1;2;:::;r

\

E (y;u

q (y))

u

p

p (dy):

(21)

Corollary3.The ontrol bv

j

=u

p (x

j

) forjT

r

is2"k k-optimal with

probability (1 ke np

) r

.

Referen es

[1℄ G.B.DiMasiandL . Stettner,Bayesianergodi adaptive ontrolofdis retetime

Markovpro esses,Sto hasti sSto hasti sRep. 54(1995),301{316.

[2℄ M.D.DonskerandS.R.S.Varadhan,Asymptoti evaluationof ertainMarkov

pro ess expe tationsforlargetime; I,Comm.PureAppl. Math. 28(1975),1{47.

[3℄ |,|,Asymptoti evaluationof ertainMarkovpro essexpe tationsforlargetime|

III,ibid. 29(1976),389{461.

[4℄ M. Duflo, Formule de Cherno pour des ha^nes de Markov (d'apres Donsker et

Varadhan), in: Grandes deviations et appli ations statistiques, Seminaire Orsay

1977{78,Asterisque68(1979),99{124.

[5℄ T.E.Dun an,B.Pasik-Dun anandL . Stettner,Dis retizedmaximumlikeli-

hoodandalmostoptimal ontrolofergodi Markovmodels,SIAMJ.ControlOptim.

36(1998),422{446.

[6℄ O.Hernandez-Lerma,AdaptiveMarkovControlPro esses,Springer,1976.

[7℄ N. Maigret, Majorations de Cherno pour des ha^nes de Markov ontr^olees, Z.

Wahrs h.Verw. Gebiete51(1980),133{151.

[8℄ |,Statistiques des ha^nes ontroles Felleriennes, in: Grandesdeviationsetappli-

ationsstatistiques,SeminaireOrsay,1977{1978, Asterisque68(1979),143{169.

[9℄ /L.Stettner,On nearlyself-optimizingstrategies foradis rete-time uniformlyer-

godi adaptivemodel,Appl. Math. Optim. 27(1993),161{177.

T.E.Dun an,B.Pasik-Dun an

DepartmentofMathemati s

UniversityofKansas

Lawren e,KS66045,U.S.A.

E-mail: dun anmath.ukans.edu

bozennakuhub. .ukans.edu

L ukaszStettner

DepartmentofMathemati s

UniversityofKansas

Lawren e,KS66045,U.S.A.

E-mail: stettnerimpan.gov.pl

Re eivedon19.1.1999