T. E. DUNCAN (Lawren e, KS)
B. PASIK-DUNCAN (Lawren e, KS)
L. STETTNER (Warszawa)
ADAPTIVE CONTROL OF DISCRETE TIME MARKOV
PROCESSES BY THE LARGE DEVIATIONSMETHOD
Abstra t. Some dis rete time ontrolled Markov pro esses in a lo ally
ompa t metri spa e whose transition operators depend on an unknown
parameter are des ribed. The adaptive ontrols are onstru ted using the
largedeviationsofempiri aldistributionswhi hareuniformintheparame-
terthattakesvaluesina ompa tset. Theadaptive pro edureuses anite
family of ontinuous, almost optimal ontrols. Using the large deviations
propertyitisshownthatanadaptive ontrolwhi hisaxedalmostoptimal
ontrolafter anitetimeis almost optimalwithprobabilitynearly1.
0. Introdu tion. Consider a ontrolled Markov pro ess (x
n
;n 2 N)
on a probabilityspa e (;F;P) taking values ina lo ally ompa t metri
spa e (E;%
E
) with the transition operator P
0
;v
n
(x
n
;) at time n. The
quantity 0
isanunknownparameterthatisanelementofa ompa tmetri
spa e(A;%
A
),andthetermv
n
thatisthe ontrolisa(x
0
;:::;x
n
)-adapted
randomvariable withvaluesina ompa t metri spa e(U;%
U ).
Let :EU !R
+
bea ontinuousboundedfun tionand let
(1) J
0
((v
n
;n2N))=limsup
n!1 1
n n 1
X
i=0 (x
i
;v
i ):
The ontrol problem is to minimize J over the admissible strategies
(v
n
; n 2 N) where v
n
is a U-valued, (x
0
;:::;x
n
)-adapted random vari-
2000 Mathemati sSubje tClassi ation: 93E20,93E10,93C40.
Keywordsandphrases: adaptive ontrol,dis retetime ontrolledMarkovpro esses,
largedeviations.
Resear h supportedin part by NSF Grant DMS9623439 and KBNGrant 2 P03A
05309.
able. If 0
is known and some ergodi assumptions are satised, then the
family of admissible strategies an be restri ted to Markovian ones, that
is, v
n
= u(x
n
), where u 2 A = B(E;U), the family of Borel measurable
fun tionsfrom E toU. ThusJ
0
((u(x
n
);n2N)) isminimizedoveru2A.
If 0
isunknownthenanestimateof 0
ismadeforea hn2N usingthe
statex
n
anda ontrolu2Ais hosenthatisalmostoptimalforthe urrent
valueoftheestimateof 0
. Whilesu hastrategy anbeshowntobealmost
self-optimizing( f.,e.g., [5, 9℄) thepro edure requires an estimate forea h
n2N and a hoi e of a ontrol. Intheapproa h inthispaperan adaptive
ontrolisxedafteranitetimeand itisshownto bealmost optimalwith
probabilitynearly1. Thisapproa huses theresultsonthelargedeviations
of empiri aldistributionswhi h areuniform inthe parameter. These large
deviation results are des ribed in Se tion 2. In Se tion 3 it is shown that
an adaptive ontrol an be taken to bexed after a nitetime and almost
optimal. In Se tion4two Markovmodelsare given forwhi h theresultsin
Se tions 2and 3 an beapplied.
Throughout thepaperwe denoteby:
C(E) thespa e of ontinuousboundedfun tions onE,
P(E) the spa e of all probability measures on E endowed with the
weak onvergen etopologyand withthesetof Borel subsetsB(P(E)),
B(E) thesetof Borelsubsets of E.
2. Uniform large deviations of empiri al distributions. In this
se tion we onsideranun ontrolledMarkovpro ess (x
n
;n2N) with tran-
sition operators P
(x
n
;) where 2 A. The following assumptions are
made:
(B1) Forf 2C(E) themapping
AE 3(;x)7!P
f(x):=
\
E f(y)P
(x;dy)
is ontinuous.
ForB 2B(E) and C 2B(P(E)), the empiri almeasure and its proba-
bility(see [2℄and [3℄) aredened asfollows:
S
n (B)=
1
n n 1
X
i=0
B (x
i );
(2)
Q
nx
(C)=P
x fS
n 2Cg;
(3)
where P
x
stands forthe onditionalprobabilitymeasure given (x
n
;n2N)
starts fromx and thetrue parameteris . Furthermore let
and for2P(E),dene
(5) I
()=sup
f2
\
E log
f(x)
P
f(x)
(dx):
Theorem 1. If (B1) is satised then for ompa t subsets C and A
1 of
P(E) and A, respe tively, we have
(6) limsup
n!1 n
1
sup
2A
1 sup
x2E logQ
nx
(C) inf
2A
1 inf
2C I
()
whereQ is givenby (3)and I is given by (5).
Proof. For d>1 let
d
=ff 2:sup
x2E
f(x)d inf
x2E f(x)g
and for2P(E) let
I
d
()= sup
f2d
\
E log
f(x)
P
f(x)
(dx):
Clearly
(7) E
x
exp
n 1
X
i=0 log
f(x
i )
P
f(x
i )
=E
x
exp
n
\
E log
f(y)
P
f(y) S
n (dy)
=
\
P(E) exp
n
\
E log
f(y)
P
f(y)
(dy)
Q
nx (d):
Sin e by thedenitionof thefamily
d
,forn=1;2;:::;
(8) E
x
exp
n 1
X
i=0 log
f(x
i )
P
f(x
i )
d
by(7)it followsthat
(9)
\
P(E) exp
n
\
E log
f(y)
P
f(y)
(dy)
Q
nx
(d)d:
Consequently,forany Borelsubset of P(E) and2A,
(10) sup
x2E Q
nx
( )dexp
ninf
2
\
E log
f(y)
P
f(y)
(dy)
:
Nowlet
d
= inf
2A1 inf
2C I
d ():
Sin e ea h f 2 belongs to
d
with d suÆ iently large, for f 2 the
mapping
P(E)A 3(;)7!
log f
P
f
:=
\
E log
f(y)
P
f(y)
(dy)
is ontinuous. Therefore,for">0 theset
(;)2P(E)A :
log f
P
f
>
d
"
isopen and
CA
1
[
f2
d
(;):
log f
P
f
>
d
"
:
Sin e C A
1
is a ompa t subset of P(E)A, there is a nite subset
ff
1
;:::;f
k g of
d
su hthat
CA
1
k
[
j=1
(;):
log f
j
P
f
j
>
d
"
:
Consequently,forevery 2A
1 ,
C [
f2
d
2P(E) :
log f
P
f
>
d
"
:
Nowifwerepla e in(10)bythesets
K
j
=
2P(E):
log f
j
P
f
j
>
d
"
\C
itfollows that
(11) sup
x2E Q
nx (K
j )de
n(
d
")
:
Hen e
(12) sup
x2E sup
2A
1 Q
nx
(C)sup
x2E sup
2A
1 h
k
X
j=1 Q
nx (K
j )
i
kde
n(d ")
and
limsup
n!1 sup
x2E sup
2A1 n
1
logQ
nx
(C)
d +":
Sin e ">0 an be hosenarbitrarilysmall, itfollows that
(13) limsup
n!1 sup
x2E sup
2A1 n
1
logQ
nx
(C)
d :
To ompletetheproof itremainsto show that
(14) lim
d!1
d
= inf
2A1 inf
2C I
():
Note thatfor">0,
CA
1
[
f2
(;):
log f
P
f
> inf
2A
1 inf
2C I
() "
andbythe ompa tnessofCA
1
thereisanitesetff
1
;:::;f
k
gsu h
that
CA
1
k
[
j=1
(;):
log f
j
P
f
j
> inf
2A
1 inf
2C I
() "
:
Bythedenitionsofand
d
thereisd>0su hthatff
1
;:::;f
k g
d .
Therefore, forea h 2A
1
and 2C,
sup
f2d
\
E log
f(y)
P
f(y)
(dy) max
i=1;:::;k
\
E log
f
i (y)
P
f
i (y)
(dy)
> inf
2A
1 inf
2C I
() "
and
limsup
d!1 k
d
inf
2A
1 inf
2C I
() ":
Sin e thelastinequalityholdsforany ">0,and
d
inf
2A
1 inf
2C I
();
theequality (14)follows,whi h ompletes theproof.
In the proof of Theorem 1 the ompa tness of the set C P(E) is
important. Torelaxthisrequirementanextraassumptiononthetransition
operatorP
(x;)is made:
(B2) Thereisa ontinuousfun tion :A E !R su hthat (;x)1
for2A, x2E,themapping
(15) E3x7! sup
2A
\
E
(;y)P
(x;dy)
isboundedon ompa tsubsets of E and forea h m>0 theset
(16) K
m :=
x2E: inf
2A
(;x)
T
E
(;y)P
(x;dy)
m
is ompa t.
Let
(17) %:= inf
x2E inf
2A
(;x)
T
E
(;y)P
(x;dy) :
If (B2) is satisedthen% >0. The followingtwo lemmas areeasy adapta-
Lemma 1.If (B2) issatised then for ea h ">0,
(18) Q
nx
(f2P(E):(K
m )>"g)
(;x)exp
nlog% n"log m
%
:
Lemma2.If (B2)issatisedthenforanyrealnumberaandany ompa t
set W E there exists a ompa t subsetC(a)P(E) su h that
(19) limsup
n!1 n
1
sup
2A sup
x2W Q
nx (C(a)
( )
) a:
Using Lemma 2, one an generalize Theorem 1 in the same way as in
Theorem4.4 of[3℄.
Theorem2. If (B1) and(B2) aresatised, then for any losed subsets
C and A
1
of P(E) and A, respe tively, and ompa t subsets W of E,
(20) limsup
n!1 n
1
sup
2A1 sup
x2W logQ
nx
(C) inf
2A1 inf
2C I
():
Subsequently,the followinglemmais used.
Lemma 3.If (B1) and (B2) are satised then for ea h m>0 the set
(21) C
m
=f2P(E): inf
2A I
()mg
is ompa t in P(E).
Proof. Sin e for ea h xed 2 C
m
the mapping 7! I
() is lower
semi ontinuous, there is an () 2 A su h that I
() m. By (B2) for
r>0theset
K
r
=
x2E:
(;x)
T
E
(;y)P
(x;dy)
r
K
r
is ompa tinEwhereK
r
isdenedin(16). AdaptingthemethodofLemma
4.2of [3℄ it an beshownthatforr >%,
((K
r )
)
m log%
logr log%
<":
Therefore, for">0 there isr >0su hthat forany2C
m ,
((K
r )
)((K ()
r )
)":
Consequently,thefamilyC
m
of measuresis tight. Sin ethe mapping
(22) A P(E)3(;)7!I
()
islowersemi ontinuous, thesetC
m
is ompa t.
Inthenextpropositiontheratefun tionalI
()isshowntobepositive.
Proposition1. Assume (B1), (B2) and that for ea h 2A there is a
uniqueinvariantmeasure of thetransitionoperator P
(x;). LetA and
C be losed subsets of A andP(E) su h that
62C for 2A
1
. Then
(23) inf
2A
1 inf
2C I
()>0:
Proof. Note rst that the mapping (15) is lower semi ontinuous and
therefore it attains its minimumon ompa t sets. Moreover I
()0 for
2 P(E) and 2 A and, by Lemma 2.5 of [3℄, I
() = 0 if and only if
=
.
By the ompa tness of the set C
m
, dened in (21), for ea h m > 0 it
follows that
inf
2A
1 inf
2C\C
m I
() >0:
Sin e by thedenition(21),
inf
2A inf
2C
m I
()m>0;
(23) isveried.
An additional assumptionis made.
(B3) Forea h thereisauniqueinvariantmeasure
forthetransition
operator P
(x;) and for 2 C(E) the mappingA 3 7!
( ) is
ontinuous.
Using the assumptions (B1){(B3) we give a uniform estimate of the
deviationsofthe running ostsfromthe limit.
Theorem 3. If (B1){(B3) are satised, then for " > 0, 2 C(E) and
any ompa t set W E there are a p > 0 and a positive integer N su h
that for nN,
(24) sup
2A sup
x2W P
x n
n 1
n 1
X
j=0 (x
j )
( )
"
o
e np
:
Proof. By (B3) for " >0 and 2 A there exists Æ
>0 su h that if
0
2B(;Æ
) :=f2A :%
A
(; )Æ
g,then j
( )
0( )j
". Sin e
A is ompa tthere is a niteset
1
;:::;
k
su h that A S
k
i=1 B(
i
;Æ
i ).
For i = 1;:::;k let C
"
(
i
) := f 2 P(E) : j( )
i
( )j 2"g. Clearly
C
"
(
i
) is a losed subset of P(E) and if 2 B(
i
;Æ
i
) then
62C
"
(
i ).
Therefore,byProposition1andTheorem2therearep>0andN >0su h
thatforn>N and i=1;:::;k;
sup
2B(
i
;Æ
i )
sup
x2W Q
nx (C
"
(
i ))e
np
:
Equivalentlythismeansthat
sup
2B(
i
;Æ
i )
sup
x2W P
x n
n 1
n 1
X
(x
j )
i ( )
2"
o
e np
fori=1;:::;k. By thedenitionof B(
i
;Æ
i
)it follows that
sup
2B(i;Æ
i )
sup
x2W P
x n
n 1
n 1
X
j=0 (x
j )
( )
"
o
e np
fori=1;:::;k and onsequently(24) isobtained.
Theremainingpartofthisse tionisdevotedtothestudyofthelargede-
viationsforempiri aldistributionsofpairsof onse utivestates. Byanalogy
to (2)and (3), forB
1
;B
2
2B(E) let
(25) S
n (B
1
B
2 )=
1
n n 1
X
i=0
B1 (x
i )
B2 (x
i+1 )
and forC2B(P(EE)) let
(26) Q
nx
(C)=P
x fS
n 2Cg:
By analogyto (4)and (5) dene
(27) :=ff 2C(EE):9a>0 8x;y2E f(x;y)ag
and for2P(EE),
(28) I
()=sup
f2
\
E log
f(x;y)
P
f(y)
(dx;dy)
withP
f(y)= T
E
f(y;z)P
(y;dz).
Theorem4.If (B1) holds then for any ompa t subsets CP(EE)
and A
1
A,
(29) limsup
n!1 n
1
sup
2A
1 sup
x2E logQ
nx
(C) inf
2A1 inf
2C I
():
Proof. Note rst thatiff 2
d
,where
d
:=ff 2C(EE): sup
x;y2E
f(x;y)d inf
x;y2E
f(x;y)g;
then
E
x
exp
n 1
X
i=0 log
f(x
i
;x
i+1 )
P
f(x
i+1 )
d:
If 2B(P(EE)) and 2A then
sup
x2E Q
nx
( )dexp
n inf
2
\
E log
f(x;y)
P
f(y)
(dx;dy)
Remark 1. IfI
isdenedas
sup
f2
\
E log
f(x;y)
P
f(x)
(dx;dy)
thenfollowingSe tion1.3 of[7℄ an analogueof Theorem4 an beobtained
witha simplerproofbe ause
E
x
exp
n 1
X
i=0 log
f(x
i
;x
i+1 )
P
f(x
i )
=1:
However to adaptTheorem 4to the aseof anon ompa t losed setC the
rate fun tionalI
ofthe formdened in(28)is required.
An additional assumptionis madenow.
(B4) There is a ontinuous fun tion : A E E ! R su h that
(;x;y) 1for 2A and x;y2E,the mapping
E3x7! sup
2A
\
E
(;x;y)P
(x;dy)
isboundedon ompa tsets and forea h m>0 theset
K
m :=
(x;y)2EE : inf
2A
(;x;y)
T
E
(;x;y)P
(y;dz)
m
is ompa t inEE.
Let
(30) %:= inf
x;y2E inf
2A
(;x;y)
T
E
(;x;y)P
(y;dz) :
AdaptingLemmas1{3andTheorem2tothe aseof onse utivepairseasily
yieldsthefollowingresults.
Lemma 4.If (B4) issatised, then for ea h ">0,
(31) Q
nx
(f2P(EE):(K
m
)>"g)
(;x;y)exp
nlog% n"log m
%
:
Lemma5.If (B4) issatised, thenfor any real number aandany om-
pa t setW E thereexists a ompa t subsetC(a)P(EE) su h that
(32) limsup
n!1 n
1
supsup Q
nx (C(a)
) a:
Theorem5. If (B1) and(B4) aresatised, then for any losed subsets
C of P(EE) and A
1
of A, and any ompa t subset W of E,
(33) limsup
n!1 n
1
sup
2A
1 sup
x2W Q
nx
(C) inf
2A
1 inf
2C I
():
Lemma 6.If (B1) and (B4) are satised,then for ea h m>0,
(34) C
m
=f2P(EE): inf
2A I
()mg
is a ompa t subsetof P(EE).
To prove an analogueofProposition1 thefollowinglemma is used.
Lemma 7. Assume (B1) and (B4) and that for ea h 2 A there is a
uniqueinvariant measure
of thetransition operator P
(x;dy). Thenfor
2P(E E) the following equivalen e issatised:
(35) I
()=0,(dx;dy) =P
(x;dy)
(dx):
To verify thislemma the methodsof the proof of Lemma 2.5 of [3℄ an
beused. The detailsareleft to thereader.
Using Lemma 7 we an state an analogue of Proposition 1 ombined
withTheorem5.
Theorem 6. If (B1), (B3) and (B4) are satised, then for any om-
pa t sets W E, A
1
A and losed set C P(E E) su h that
P
(x;dy)
(dx) 62 C for 2 A
1
there are p > 0 and a positive integer
N su h that for nN,
(36) sup
2A
1 sup
x2W P
x n
n 1
n 1
X
i=0
xi;xi+1
()2C o
e np
:
Remark 2. If E is ompa t then the assumptions (B2) and (B4) are
learlysatised. An analysisof theproofsofTheorems 3 and 6shows that
the onstantsp an be giveninterms oftheinmaof theratefun tionsI
and I
,respe tively.
3. Adaptive ontrol with observation of ost. Consider now a
ontrolledMarkovpro ess(x
n
,n2N) withtransitionoperatorP
0
;v
n
(x
n
;).
The followingassumptionsaremade.
(A1) Forf 2C(E) themapping
A UE 3(;v;x)7!P ;v
f(x)
is ontinuous.
(A2) Thereisa ontinuousfun tion :A E !R su hthat (;x)1
for2A, x2E, themapping
E 3x7! sup
2A sup
v2U
\
(;y)P ;v
(x;dy)
isboundedon ompa tsubsets ofE and forea h m>0the set
K
m :=
x2E : inf
2A inf
v2U
(;x)
T
E
(;y)P ;v
(x;dy)
m
is ompa t.
(A3) For u 2 A
:= A\C(E;U) there is a unique invariant measure
u
for P
;u
(x;), and for 2 C(E) the mapping A 3 7! u
( )is
ontinuous.
(A4) For 2A theoptimalvalueofthe ostfun tionalJ
denedin(1)
oin idesalmost surelywiththatof
(37) J
((v
n
;n2N)) =limsup
n!1 n
1
E
x n
n 1
X
i=0 (x
i
;v
i )
o
:
Moreover, forany">0thereisanite lassU(")=fu
1
;:::;u
r g
A
of "-optimal ontrol fun tions for J
, that is, for any 2 A
there isa u
j
2U(") su hthat
J
x ((u
j (x
n
))) inf
(vn) J
((v
n ))+":
Furthermore, there is a ompa tset W E that is positivere ur-
rentfortheMarkovpro ess(x
n
;n2N) ontrolledwithany ontrol
fun tionofthe lassU(").
An analogue ofTheorem3 isgiven now.
Theorem 7. Assume that (A1){(A4) are satised and x ">0. Then
there are p>0 and a positive integer N su h that for nN,
(38) sup
k=1;:::;r sup
2A sup
x2W P
u
k
x n
n 1
n 1
X
j=0 (x
j
;u
k (x
j ))
\
E (y;u
k (y))
u
k
(dy)
"
o
e np
:
Moreover, the following adaptive strategy ( bv
j
) is "-optimal with probability
(1 e np
) r
: hoose n N, use the ontrol fun tion u
1
for i < T
1 :=
T(T(0)+n),thenu
2 forT
1
i<T
2
:=T(T
1
+n);:::;andu
i forT
r 1
i<
T
r
:=T(T
r 1
+n),whereT() denotes the rst hittingtime tothe positive
re urrent ompa t setW after the random time ,determine k2f1;:::;rg
su h that
(39)
T
k 1 +n 1
X
j=T
k 1 (x
j
;n
k (x
j
))= min
q=1;:::;r T
q 1 +n 1
X
j=Tq
1 (x
j
;u
q (x
j ))
and after T
r
usethe ontrol fun tionu.
Proof. Note that(38) follows from(24). Then forx2W andnN,
inf
2A P
x n
n 1
Tq 1+n 1
X
j=Tq
1 (x
j
;u
q (x
j ))
\
E (z;u
q (z))
u
q
(dz)
"forq =1;:::;r o
(1 e np
) r
:
Consequently,by(39) forx2W,
inf
2A P
x n
\
E (z;u
k (z))
u
k
(dz) min
q=1;:::;r
\
E (z;u
q (z))
u
q
(dz)+2"
o
(1 e np
) r
:
Sin e by (A4), J
0
(( bv
n ))=
T
E (z;u
q (z))
u
k
(dz),P
0
-a.e., itfollows that
J
0
(( bv
n
))3"+inf
(vi) J
0
((v
i ))
withprobability(1 e np
) r
.
4. Examples. Two models are des ribed to demonstrate the appli a-
bilityoftheprevious resultsthatused largedeviations.
Model I.E isa ompa tmetri spa e. Thereis aprobabilitymeasure
su h that
(40) P
;v
(x;B)=
\
B
p(x;y;;v)(dy)
for B 2 B(E), the mapping EE A U 3 (x;y;;v) 7! p(x;y;;v)
is ontinuous and p(x;y;;v) > 0 for x;y 2 E, 2 A, v 2 U. Clearly
(A1){(A3) aresatisedforthismodel. Furthermore,there isanitefamily
of almostoptimal ontrols.
Theorem 8. For ea h ">0 there is a nite lass U(") A
of "-op-
timal ontrol fun tions for Model I with ost fun tionalsJ
, 2A.
Proof. The proof onsistsof three steps:
Step I.Initiallyitisshownthatthereisaniteset b
U(")=fu
1
;:::;u
r g
of pie ewise onstant "-optimal ontrol fun tions su h that the set of their
dis ontinuitypointsisof -measure zero.
Let fE n
1
;E n
2
;:::;E n
d
n
g, n = 1;2;:::; be a sequen e of partitions of E
and fe n
1
;e n
2
;:::;e n
dn
g, n = 1;2;:::; be a sequen e of their representative
elementssu h that
E = d
n
[
E n
i
; E
n
i
\E n
j
=; for i6=j;
(E n
i
) = 0, the diameter of E n
i
is not greater than 1=n, e n
i 2 E
n
i for
i=1;:::;d, fE n+1
1
;:::;E n+1
dn+1
g isa subpartitionoffE n
1
;:::;E n
dn gand
fe n
1
;:::;e n
d
n gfe
n+1
1
;:::;e n+1
d
n+1 g:
Moreover, letp ;v
n (e
n
i
;e n
j )=P
;v
(e n
i
;e n
j ).
Consider now a ontrolled Markov pro ess on fe n
1
;:::;e n
d
n
g with tran-
sition operator p ;v
n (e
n
i
;e n
j
). By Lemma 3 of [1℄ for given " >0 there is a
nitesetfu n
1
;:::;u n
r
n
g of ontrolfun tionssu hthat
(41) sup
2A sup
1ir
n sup
1jd
n h
w n
(e
n
j )
n
d
n
X
i=1 w
n
(e
n
l )p
;u n
i (e
n
j )
n
(e n
j
;e n
l )
+ (e n
j
;u n
i (e
n
j ))
i
"
2
where n
and w n
are the optimal value and the Bellman fun tion of the
orresponding ost fun tionalJ
,respe tively.
Let u n
j
(x) = u n
j (e
n
l ), w
n
(x) = w n
(e
n
l
), and P ;v
n
(x;) = P ;v
(e n
l
;) for
x2E n
l
,l=1;:::;d
n
:It follows that
(42) sup
2A min
1jr
n sup
x2E fw
n
(x) n
P
;u n
i (x)
n
w n
(x)+
n (x;u
n
i
(x))g"=2
with
n
(x;v) = (e n
l
;v) for x 2 E n
l
and v 2 U. Note, moreover, that n
is also the optimal value of the ost fun tional J
orresponding to the
ontrolledMarkov pro ess onE withtransition operatorP ;v
n
(x;).
By Lemma 3.3.3 of [6℄ for u 2 A, n = 1;2;:::; there exist probability
measures u
and
u;n
su h thatforx2E and B 2B(E),
(43) j(P
u;
) k
(x;B) u
(B)j(1 d) k 1
and
(43) j(P
u;
n )
k
(x;B) u;n
(B)j(1 d) k 1
with
d= inf
x;y2E inf
2A inf
v2U
p(x;y;;v):
An analysisoftheproofofProposition1of [9℄shows thatthereexistsa
onstant K >0that isindependentof u2A,2A and n2N, for whi h
(44) k
u
u;n
k
var
sup
x2E KkP
u;
n
(x;) P u;
(x;)k
var
wherekk
var
denotesthevariationnorm.
Sin e bythe ontinuityof thetransition densityp(x;y;;v),
(45) lim
n!1
supsupsupKkP u;
n
(x;) P u;
(x;)k
var
=0;
for
=inf
u2A
E
(x;u(x)) u
(dx) itfollows that
sup
2A j
n
jsup
2A k
u
u;n
k
var
+ sup
x2E;v2U
j (x;v)
n
(x;v)j!0
as n! 1. Therefore, for suÆ iently largen from (42), (44), and (45) we
obtain
(46) sup
2A min
1jr
n sup
x2E fw
n
(x)
P
;u n
i (x)
w n
(x)+
n (x;u
n
i
(x))g";
whi h means ( f. the proof of Theorem 3.2.2 of [6℄) that b
U(") = fu n
1
;:::
:::;u n
r
g is a setof "-optimal ontrol fun tions fortheoriginalMarkovpro-
ess. It followsfrom the onstru tion thatthesetof dis ontinuitypointsof
theabove ontrol fun tionsis of-measure0.
Step II.It is learthat fori=1;:::;r there isa sequen e u
i
(n)2A
,
n=1;2;:::;su hthat
(47) lim
n!1
(fz2E :u
i
(z)6=u
i
(n)(z)g)=0:
We laimthat foranyboundedBorelfun tion f :E !R su h that the
set ofdis ontinuitypointsoff isof -measure 0,and fori=1;:::;r,
(48) lim
n!1 sup
2A j
ui
(f) ui(n)
(f)j=0:
Assume that ontrary to (48) there is a sequen e (
n
;n2N) from A su h
that
n
!2A and forsome i2f1;:::;rg,
(49) j
u
i
n
(f) u
i (n)
n
(f)j>Æ>0
forsuÆ ientlylargen.
By thetightnessof asuitably hosensubsequen e n
k
itfollows that
u
i (n
k )
n
k
=)
whereisaprobabilitymeasure onE. Wenowshowthat isinvariantfor
thetransition operatorP a;u
i
(x;). In fa t,forg2C(E) itfollows that
j(g) (P ;u
i
g)j j(g) u
i (n
k )
n
k (g)j
+j
ui(nk)
n
k
(g) ui(nk)
n
k (P
n
k
;ui(nk)
g)j
+j
u
i (n
k )
n
k (P
n
k
;u
i (n
k )
g) u
i (n
k )
n
k (P
;u
i (n
k )
g)j
+j
ui(nk)
n
k (P
;ui(nk)
g) ui(nk)
n
k (P
;ui
g)j
+j
u
i (n
k )
n
k (P
;u
i
g) (P ;u
i
g)j
=I
1k +I
2k +I
3k +I
4k +I
5k :
ClearlyI
1k
!0 ask!1, andI
2k
=0. Moreover,
I
3k
jgj sup supjp(x;y;
n
k
;v) p(x;y;;v)j!0
ask !1. Dene
(50) M = sup
x;y2E sup
2A sup
v2U
p(x;y;;v):
Sin e u
i (n
k )
n
k
is an invariant measure it follows that u
i (n
k )
n
k
() M().
Therefore, () M(). Consequently, by (47), I
4k
! 0 as k ! 1, and
sin etheset of dis ontinuitypointsof P ;u
i
gis of -measure 0,I
5k
!0 as
k !1.
Bytheuniquenessoftheinvariantmeasureitfollowsthat = u
i
. Sin e
doesnotdependon a parti ularsubsequen e itfollows that
(51)
u
i (n)
n
=) u
i
asn!1. Usingsimilarargumentsitalso follows that
ui
n
=) ui
asn!1,whi htogetherwith(51) ontradi ts(49). Thus,(48)issatised.
Step III.The followinginequalityiselementary:
sup
2A
\
E (x;u
i (x))
ui
(dx)
\
E (x;u
i
(n)(x)) ui(n)
(dx)
sup
2A h
\
E (x;u
i (x))
u
i
(dx)
\
E (x;u
i (x))
u
i (n)
(dx)
+
\
E ( (x;u
i
(n)(x)) (x;u
i (x)))
u
i (n)
(dx)
i
=I
1n +I
2n :
By (48) learly I
1n
!0asn!1. Sin e by(47) also
I
2n
k kM(fx2E :u
i
(n)(x)6=u
i
(x)g)!0
as n ! 1 with M dened in (50), for suÆ iently large n, for 2 A and
i=1;:::;r it followsthat
(52) J
x (u
i
(n))J
x (u
i
)+"
+2":
Thus, the set fu
1
(n);:::;u
r
(n)g ontains 2"-optimal ontrol fun tions for
the ost fun tionalJ
with 2A.
Therefore, (A4) isalso satised.
Model II.LetE =R d
. Assume (x
n
;n2N) satises thefollowingre ur-
sive formula:
(53) x
n+1
=f(x
n
; 0
;v
n
)+g(x
n )w
n
wheref :R d
A U !R d
and g:R d
!R d
R d
are ontinuousbounded
fun tions, there is a ontinuous bounded inverse matrix g 1
, and w
n is a
variables. By the ontinuityof f,g, and g 1
it follows that(A1) and (A3)
( f. [9℄) aresatised. In (A2)let (;x)=x 2
+1 sothat
(;x)
T
E
(;y)P v
(x;dy)
=
x 2
+1
f 2
(x;;v)+g 2
(x)
and K
m
is ompa tif
sup
2A sup
v2U [f
2
(x;;v)+g 2
(x)℄ =o(x 2
)
asx 2
!1, and,inparti ular,iff and g arebounded.
Hen e (A2)is satised. JustasforModelIthere isthefollowingresult.
Theorem 9. For given ">0 there is a nite family U(") of "-optimal
ontrol fun tions for ModelII with ost fun tionalsJ
, 2A.
Proof. ByLemma1of[9℄thereareanitemeasurethatisabsolutely
ontinuouswithrespe ttoLebesguemeasureinR d
and a onstant M su h
thatforx2R d
,2A and v2U,
()P ;v
(x;)M():
Consider now a sequen e of partitions fE n
1
;:::;E n
dn
g, n = 1;2;:::; of R d
and representative elementsfe n
1
;:::;e n
d
n
g,n=1;2;:::;su h that
R d
= dn
[
i=1 E
n
i
; E
n
i
\E n
j
=; fori6=j;
(E n
i
) = 0, e n
i 2E
n
i
for i= 1;:::;d
n
, the diameter of E n
i
is not greater
than1=nfori=1;:::;d
n 1,e
n
d
n
fx2R d
:kxk>ng,fE n+1
1
;:::;E n+1
dn+1 g
isa subpartitionof fE n
1
;:::;E n
d
n gand
fe n
1
;:::;e n
d
n gfe
n+1
1
;:::;e n+1
d
n+1 g:
NowthemethodsoftheproofofTheorem7 an beused. Noteonlythat
thevalueof d in(43) isnowrepla ed by(R d
).
Thus, (A1){(A4) are satised. Moreover, for any u 2 A and transition
operator P u(x)
(x;), the assumption (B4) is satised with the fun tion
(;x;y) =x 2
+y 4
+1.
5. Adaptive ontrol with estimation. Consider rst the ase of
ModelI. Foru2A
and ; 0
2A let
(54) K u
(; 0
):=
\
E
\
E
p(x;y;;u(x))logp(x;y;
0
;u(x))(dy)
u (dx):
Fix ">0. By the proofsof Propositions7 and 4 in [5℄ there is Æ >0 su h
0 u u 0
(55) k
u
u
0k
var
":
By the ontinuityof K u
(; 0
) and the ompa tness of A there area nite
sequen e
1
;:::;
k
and > 0 su h that for 2 A there is
j
su h that
2B
(
j
)=f2A :%
A ( ;
j
)g and foru2U("),
K u
(;) K u
(;
j
)Æ=2;
sup
2A jK
u
(
j
; ) K u
(; )j Æ=16:
(56)
Foru2U(")let
(57) b u
n
= n
j :
n 1
Y
i=0 p(x
i
;x
i+1
;u(x
i );
j )
= max
q=1;:::;k n 1
Y
i=1 p(x
i
;x
i+1
;u(x
i );
q )
> max
qj 1 n 1
Y
i=0 p(x
i
;x
i+1
;u(x
i );
q )
o
:
Theorem 10. For Model I there exist a p > 0 and a positive integer
nN su h that for u2U("),
(58) inf
2A inf
x2E P
x fk
u
b u
n k
var
"g1 ke np
;
whereb u
n
is given by (57).
Proof. Let
(59) C u
(
i
;
j )=
n
l2P(EE):
\
EE
logp(x;y;u(x);
j
)l(dx;dy) K u
(
i
;
j )
Æ=8 o
:
If2B
(
i
)then P u
(x;dy)
u
(dy)62C u
(
i
;
j
) be ause
\
EE
logp(x;y;u(x);
j )P
u
(x;dy) u
(dx)=K u
(;
j )
andby(57), jK u
(;
j
) K(
i
;
j
)jÆ=16. Therefore,byTheorem6there
are a p > 0 and a positive integer N su h that for n N, i;j = 1;:::;k
and u2U("),
sup
2B
(
i )
sup
x2E P
u
x n
n 1
X
m=0
logp(x
m
;x
m+1
;u(x
m );
j )
K u
(; )
Æ=8 o
e np
:
Then from(57) fornN and i;j=1;:::;k;
(60) sup
2B
(
i )
sup
x2E P
u
x n
n 1
X
m=0
logp(x
m
;x
m+1
;u(x
m );
j )
K u
(;
j )
Æ=4 o
e np
:
Sin e
n 1
n 1
X
m=0
logp(x
m
;x
m+1
;u(x
m );b
u
n )n
1 n 1
X
m=0 logp(x
m
;x
m+1
;u(x
m );
j )
forj=1;:::;k;using(60)it followsthat
K u
(;b u
n
)+Æ=4K u
(;
j
) Æ=4
forj=1;:::;k withprobability1 ke np
,for2A and u2U(").
Consequently,using(56)for u2U(") we get
inf
2A inf
x2E P
u
x fK
u
(;) K u
(;b u
n
)Æg1 ke pn
and by (55)itfollowsthat (58)is satised.
Now onsider thefollowing adaptive strategy: Choose n N where N
is asinTheorem10 and test ea h ontrolfun tionof thefamily U(") forn
unitsof time. Then forq=1;:::;r determine
(61)
q
= n
j :
nq 1
X
i=n(q 1) p(x
i
;x
i+1
;u
q (x
i );
j )
= max
j 0
=1;:::;k nq 1
X
i=n(q 1) p(x
i
;x
i+1
;u
q (x
i );
j 0
)
> max
j 0
j 1 nq 1
X
i=n(q 1) p(x
i
;x
i+1
;u
q (x
i );
j 0
) o
and ndp2f1;:::;rg su h that
(62)
\
E (y;u
p (y))
u
p
p
(dy)= min
q=1;:::;r
\
E (y;u
p (y))
u
p
p (dy):
By Theorem10 thefollowing orollary easilyfollows:
Corollary 1. The ontrol strategy bv
j
= u
p (x
j
) for j nr is 2"k k-
optimal withprobability (1 ke np
) r
.
ForModel II dene, foru2A
,
(63) K u
(; 0
)=
\
d
\
d
k(y f(x;
0
;u(x)))g 1
(x)k 2
P u(x)
(x;dy) u
(dx)
and
(64) K u
m (;
0
)
=
\
R d
\
R d
(k(y f(x;
0
;u(x)))g 1
(x)k 2
^m)P u(x)
(x;dy) u
(dx):
ByProposition1of[9℄,K u
(; 0
)andK u
m (;
0
)are ontinuousfun tionsin
theirtwovariables. Moreover,thefollowing ontinuitypropertyissatised.
Lemma 8.If u2A
then
sup
; 0
2A jK
u
(; 0
) K
u
m (;
0
)j!0
as m!1.
Proof. Supposethatfor
m
! and 0
m
! 0
thereisÆ>0su hthat
jK u
(
m
; 0
m
) K
u
m (
m
; 0
m )j>Æ:
Sin e
(65) K u
(; 0
)=
\
R d
Ef k(f(x; ;u( x)) f(x;
0
;u(x)))g 1
(x) + k 2
g u
(dx)
and
(66) K u
m (;
0
)=
\
R d
Efminfk(f(x;;u(x))
f(x;
0
;u(x)))g 1
(x)+k 2
;mgg u
(dx);
where is anN(0;1) randomvariableand E isexpe tation,and byPropo-
sition1of[9℄,k
u
m
u
k
var
!0,itfollows thatK u
m (
m
; 0
m )!K
u
(; 0
)
and K u
(
m
; 0
m )!K
u
(; 0
)as m!1. Thisisa ontradi tion.
Lemma9.For ">0 and u2A
thereisa Æ >0 su h that if K u
(; 0
)
Æ then k
u
u
0k
var
".
Proof. If the lemma does not hold then K u
(
n
; 0
n
) ! 0,
n
! ,
0
n
! 0
and k
u
n
u
0
n k
var
" for n = 1;2;::: and an " > 0. From
Lemma 8 it follows that K(; 0
) = 0. By (65) and the proof of Theorem
9 it follows that f(x;;u(x)) = f(x;
0
;u(x)) for almost all x 2 R d
with
respe t to d-dimensional Lebesgue measure. Sin e f and u are ontinuous
fun tions, f(x;;u(x)) = f(x;
0
;u(x)) for all x 2 R d
, and onsequently
u
= u
0
,a ontradi tion.
CombiningLemmas 9and 8 givesthefollowing orollary.
Corollary 2. For " > 0 there are m > 0 and Æ > 0 su h that for
u2U("), if K u
m (;
0
)<Æ then k
u
u
0 k
var
".
Fix">0andtakemasinCorollary2. Thereareaniteset
1
;:::;
k 2
A and >0su hthat for2B
(v
j
)and u2U("),
(67) K
u
m (;
j
)Æ=2
and
(68) sup
2A jK
u
m (
j
; ) K u
m
(; )jÆ=16:
Let
(69) b u
m
= n
j :
n 1
X
i=0 k(x
i+1 f(x
i
;
j
;u(x
i )))g
1
(x
i )k
2
= min
q=1;:::;k n 1
X
i=1 k(x
i+1 f(x
i
;
q
;u(x
i )))g
1
(x
i )k
2
min
qj 1 n 1
X
i=0 k(x
i+1 f(x
i
;
q
;u(x
i )))g
1
(x
i )k
2 o
:
UsingTheorem6,Corollary2 and(67){(69)asinthe aseofModelIyields
thefollowingtheorem.
Theorem11.GivenModel IIfora given ompa t setW R d
thereare
p>0 and a positive integer N su h that for nN and u2U("),
inf
2A inf
2W P
x fk
u
u
b u
n k
var
"g1 ke pn
:
LetW R d
bea ompa tsetthatispositivere urrentforea hu2U(").
Choose nN and use ontrolu
1
for i< T
1 ,u
2 for T
1
i<T
2
;:::; and
u
r forT
r 1
i<T
r
whereT
i
is denedasinTheorem7.
Forq=1;:::;r let
q
= n
j :
Tq 1+n
X
i=Tq
1 k(x
i+1 f(x
i
;
j
;u(x
i )))g
1
(x
i )k
2
= min
j 0
=1;:::;n T
q 1 +n
X
i=T
q 1 k(x
i+1 f(x
i
;
j 0
;u(x
i )))g
1
(x
i )k
2
< min
j 0
j 1 T
q 1 +n
X
i=Tq
1 k(x
i+1 f(x
i
;
j 0
;u(x
i )))g
1
(x
i )k
2 o
:
Findp2f1;:::;rgsu hthat
\
E (u;u
p (y))
u
p
p
(dy)= min
q=1;2;:::;r
\
E (y;u
q (y))
u
p
p (dy):
Corollary3.The ontrol bv
j
=u
p (x
j
) forjT
r
is2"k k-optimal with
probability (1 ke np
) r
.
Referen es
[1℄ G.B.DiMasiandL . Stettner,Bayesianergodi adaptive ontrolofdis retetime
Markovpro esses,Sto hasti sSto hasti sRep. 54(1995),301{316.
[2℄ M.D.DonskerandS.R.S.Varadhan,Asymptoti evaluationof ertainMarkov
pro ess expe tationsforlargetime; I,Comm.PureAppl. Math. 28(1975),1{47.
[3℄ |,|,Asymptoti evaluationof ertainMarkovpro essexpe tationsforlargetime|
III,ibid. 29(1976),389{461.
[4℄ M. Duflo, Formule de Cherno pour des ha^nes de Markov (d'apres Donsker et
Varadhan), in: Grandes deviations et appli ations statistiques, Seminaire Orsay
1977{78,Asterisque68(1979),99{124.
[5℄ T.E.Dun an,B.Pasik-Dun anandL . Stettner,Dis retizedmaximumlikeli-
hoodandalmostoptimal ontrolofergodi Markovmodels,SIAMJ.ControlOptim.
36(1998),422{446.
[6℄ O.Hernandez-Lerma,AdaptiveMarkovControlPro esses,Springer,1976.
[7℄ N. Maigret, Majorations de Cherno pour des ha^nes de Markov ontr^olees, Z.
Wahrs h.Verw. Gebiete51(1980),133{151.
[8℄ |,Statistiques des ha^nes ontroles Felleriennes, in: Grandesdeviationsetappli-
ationsstatistiques,SeminaireOrsay,1977{1978, Asterisque68(1979),143{169.
[9℄ /L.Stettner,On nearlyself-optimizingstrategies foradis rete-time uniformlyer-
godi adaptivemodel,Appl. Math. Optim. 27(1993),161{177.
T.E.Dun an,B.Pasik-Dun an
DepartmentofMathemati s
UniversityofKansas
Lawren e,KS66045,U.S.A.
E-mail: dun anmath.ukans.edu
bozennakuhub. .ukans.edu
L ukaszStettner
DepartmentofMathemati s
UniversityofKansas
Lawren e,KS66045,U.S.A.
E-mail: stettnerimpan.gov.pl
Re eivedon19.1.1999