• Nie Znaleziono Wyników

) a discrete time controlled Markov process on a compact state space E, en- dowed with the Borel σ-field E , with transition kernel P

N/A
N/A
Protected

Academic year: 2021

Share ") a discrete time controlled Markov process on a compact state space E, en- dowed with the Borel σ-field E , with transition kernel P"

Copied!
14
0
0

Pełen tekst

(1)

L. S T E T T N E R (Warszawa)

ERGODIC CONTROL OF PARTIALLY OBSERVED MARKOV PROCESSES WITH EQUIVALENT

TRANSITION PROBABILITIES

Abstract. Optimal control with long run average cost functional of a partially observed Markov process is considered. Under the assumption that the transition probabilities are equivalent, the existence of the solution to the Bellman equation is shown, with the use of which optimal strategies are constructed.

1. Introduction. Let (Ω, F , P ) be a probability space and (x

n

) a discrete time controlled Markov process on a compact state space E, en- dowed with the Borel σ-field E , with transition kernel P

v

(x, dz) for v ∈ U , where (U, U ) is a compact space of control parameters. Assume the only observations of x

n

are R

d

-valued random variables y

1

, . . . , y

n

such that for Y

n

= σ{y

1

, . . . , y

n

} we have

(1) P {y

n+1

∈ A | x

n+1

, Y

n

} = P {y

n+1

∈ A | x

n+1

} = R

A

r(x

n+1

, y) dy for n = 0, 1, . . . with r : E×R

d

→ R

+

a measurable function, and A ∈ B(R

d

), the family of Borel subsets of R

d

.

The Markov process (x

n

) is controlled by a sequence (a

n

) of Y

n

-measur- able U -valued random variables. The best mean square approximation of x

n

based on the available observation is given by a filtering process π

n

, defined as a measure valued process such that for A ∈ E ,

(2) π

n

(A) = P {x

n

∈ A | Y

n

} for n = 1, 2, . . . ,

and π

0

(A) = µ(A)

where µ is the initial law of (x

n

).

1991 Mathematics Subject Classification: Primary 93E20; Secondary 93E11.

Key words and phrases: stochastic control, partial observation, long run average cost, Bellman equation.

(2)

The following lemma gives the most general formula for π

n

. Its proof, unlike those in [5] and [8], which have more restrictive hypotheses, is not based on the reference probability method.

Lemma 1. Under (1), for n = 0, 1, . . . and A ∈ E we have (3) π

n+1

(A) =

R

A

r(z

2

, y

n+1

) R

E

P

an

(z

1

, dz

2

) π

n

(dz

1

) R

E

r(z

2

, y

n+1

) R

E

P

an

(z

1

, dz

2

) π

n

(dz

1

) .

P r o o f. Denote the right hand side of (3) by M

an

(y

n+1

, π

n

)(A). Let F : (R

d

)

n

→ R be a bounded measurable function, Y

n

= (y

1

, . . . , y

n

) and C ∈ B(R

d

).

By (1), Fubini’s theorem and properties of conditional expectations we have

R

M

an

(y

n+1

, π

n

)(A)χ

C

(y

n+1

)F (Y

n

) dP

= R

E[M

an

(y

n+1

, π

n

)(A)χ

C

(y

n+1

) | x

n+1

, Y

n

]F (Y

n

) dP

= R

R

C

M

an

(y, π

n

)(A)r(x

n+1

, y) dy F (Y

n

) dP

= R

R

C

M

an

(y, π

n

)(A)E[E[r(x

n+1

, y) | Y

n

, x

n

] | Y

n

] dy F (Y

n

) dP

= R

R

C

M

an

(y, π

n

)(A)E h R

E

r(z, y) P

an

(x

n

, dz) Y

n

i

dy F (Y

n

) dP

= R

R

C

M

an

(y, π

n

)(A) R

E

R

E

r(z, y) P

an

(z

1

, dz) π

n

(dz

1

) dy F (Y

n

) dP

= R

R

C

R

A

r(z

2

, y) R

E

P

an

(z

1

, dz

2

) π

n

(dz

1

) dy F (Y

n

) dP

= R

R

E

R

A

R

C

r(z

2

, y) dy P

an

(z

1

, dz

2

) π

n

(dz

1

) F (Y

n

) dP

= R

E h R

A

R

C

r(z

2

, y) dy P

an

(x

n

, dz

2

) Y

n

i

F (Y

n

) dP

= R

E h

E h R

C

r(x

n+1

, y) dy χ

A

(x

n+1

) Y

n

, x

n

i Y

n

i

F (Y

n

) dP

= R

R

C

r(x

n+1

, y) dy χ

A

(x

n+1

)F (Y

n

) dP

(3)

= R

E[χ

C

(y

n+1

) | Y

n

, x

n+1

A

(x

n+1

)F (Y

n

) dP

= R

χ

C

(y

n+1

A

(x

n+1

)F (Y

n

) dP = R

π

n+1

(A)χ

C

(y

n+1

)F (Y

n

) dP.

Therefore, by the definition of conditional expectation, (3) follows.

The class of controls a

n

= u(π

n

), where u is a fixed measurable, U -valued function, is of special interest. Namely, we have

Lemma 2. Under (1), if additionally a

n

= u(π

n

) with u a fixed measur- able function from the space P(E) of probability measures on E, endowed with the topology of weak convergence, into (U, U ), then π

n

is a Y

n

-Markov process with transition operator

(4) Π

u(ν)

(ν, F ) = R

E

R

Rd

F (M

u(ν)

(y, ν))r(z, y) dy R

E

P

u(ν)

(z

1

, dz) ν(dz

1

)

where

(5) M

v

(y, ν)(A) = R

A

r(z, y) R

E

P

v

(z

1

, dz) ν(dz

1

) R

E

r(z, y) R

E

P

v

(z

1

, dz) ν(dz

1

) for v ∈ U , ν ∈ P(E) and F : P(E) → R bounded measurable.

P r o o f. By (1) we easily obtain E[F (π

n+1

) | Y

n

]

= E[F (M

u(πn)

(y

n+1

, π

n

)) | Y

n

]

= E[E[F (M

u(πn)

(y

n+1

, π

n

)) | Y

n

, x

n+1

] | Y

n

]

= E h R

Rd

F (M

u(πn)

(y, π

n

))r(x

n+1

, y) dy Y

n

i

= E h R

Rd

E[F (M

u(πn)

(y, π

n

))r(x

n+1

, y) | Y

n

, x

n

] dy Y

n

i

= E h R

Rd

R

E

F (M

u(πn)

(y, π

n

))r(z, y)P

u(πn)

(x

n

, dz) dy Y

n

i

= R

E

R

Rd

F (M

u(πn)

(y, π

n

)) R

E

r(z, y) P

u(πn)

(z

1

, dz) π

n

(dz

1

) dy

= Π

u(πn)

n

, F ) .

Thus (π

n

) is Markov with transition operator of the form (4).

(4)

In this paper we are interested in minimizing the following long run average cost functional:

(6) J

µ

((a

n

)) = lim sup

n→∞

n

−1

E

µ

n

n−1

X

i=0

c(x

i

, a

i

) o

over all U -valued, Y

n

-adapted processes a

n

, with c : E × U → R

+

a given bounded measurable cost function.

By the very definition of a filtering process we have (7) J

µ

((a

n

)) = lim sup

n→∞

n

−1

E

µ

n

n−1

X

i=0

R

E

c(z, a

i

) π

i

(dz) o

.

The optimal strategies for the cost functional J

µ

are constructed with the use of a suitable Bellman equation, the solution of which is found as a limit of w

β

(x) = ϑ

β

(x) − inf

z∈E

ϑ

β

(z) as β → 1, where ϑ

β

is the value function of the β-discounted cost functional. Since our limit results are based on compactness arguments, obtained via the Ascoli–Arzel` a theorem, in Section 2 we show the continuity of ϑ

β

. Then in Section 3 we prove the uniform boundedness of w

β

. Using the concavity of w

β

, obtained from the concavity of ϑ

β

, proved in Section 2, we get equicontinuity of w

β

, which allows us to use the Ascoli–Arzel` a theorem.

The discrete time ergodic optimal control problem with partial observa- tion was studied in [1], [2], [3], [6], [8], [9]. In [1] and [8] the observation was corrupted with white noise. In addition, in [1] there was a finite state space and a rich observation structure. In [8] the state space was general but there were some restrictions on controls. The papers [2] and [3] con- tain a general theory but the fundamental example used is a very simple maintenance-replacement model.

In [6] a model with a finite state space and almost steady state transition probabilities was studied. Finally, finite state space semi-Markov decision processes with a completely observable state were considered in [9]. Our paper generalizes [6] in various directions. Namely, we have a general, com- pact state space. Although the techniques to show the boundedness and the equicontinuity of w

β

follow in some sense the arguments of [6], by a more detailed estimation we obtain the results under the assumptions which are much less restrictive than the corresponding ones in [6], even when E is finite.

2. Discounted control problem. In this section we characterize the value function ϑ

β

of the discounted cost functional J

µβ

defined as follows:

(8) J

µβ

((a

n

))

def

= E

µ

n X

i=0

β

i

c(x

i

, a

i

) o

= E

µ

n X

i=0

β

i

R

E

c(z, a

i

) π

i

(dz) o

with β ∈ (0, 1).

(5)

The theorem below provides a complete solution to the discounted par- tially observed control problem.

Theorem 1. Assume (1) and (A1) c : E × U → R

+

is continuous,

(H1) for F ∈ C(P(E)), the space of continuous functions on P(E), if µ

n

⇒ µ, i.e. µ

n

converges weakly in P(E) to µ, we have

(9) sup

a∈U

a

n

, F ) − Π

a

(µ, F )| → 0 as n → ∞, (H2) for F ∈ C(P(E)), if U 3 a

n

→ a we have

(10) Π

an

(µ, F ) → Π

a

(µ, F ).

Then

(11) ϑ

β

(µ)

def

= inf

(an)

J

µβ

((a

n

))

is a continuous function of µ ∈ P(E) and is a unique solution to the Bellman equation

(12) ϑ

β

(µ) = inf

a∈U

h R

E

c(x, a) µ(dx) + βΠ

a

(µ, ϑ

β

) i .

There exists a measurable selector u

β

: P(E) → (U, U ) for which the infimum on the right hand side of (12) is attained. Moreover , we have

(13) ϑ

β

(µ) = J

µβ

((u

β

n

))) .

In addition, ϑ

β

can be uniformly approximated from below by the sequence

(14)

ϑ

β0

(µ) ≡ 0 , ϑ

βn+1

(µ) = inf

a∈U

h R

E

c(x, a) µ(dx) + βΠ

a

(µ, ϑ

βn

) i

, and each ϑ

βn

is concave, i.e. for µ, ν ∈ P(E) and α ∈ [0, 1], (15) ϑ

βn

(αµ + (1 − α)ν) ≥ αϑ

βn

(µ) + (1 − α)ϑ

βn

(ν).

P r o o f. We only point out the main steps since the proof is more or less standard (for details see [4], Thm. 2.2).

Define, for ϑ ∈ C(P(E)), T ϑ(µ) = inf

a∈U

h R

E

c(x, a) µ(dx) + βΠ

a

(µ, ϑ) i

.

By (A1) and (H1), T is a contraction on C(P(E)). Thus, by the Banach

principle there is a unique fixed point ϑ

β

of T , which is a unique solution to

(6)

the Bellman equation (12). Since by (A1) and (H2) the map U 3 a → R

E

c(x, a) µ(dx) + βΠ

a

(µ, ϑ

β

)

is continuous, there exists a measurable selector u

β

. The identity (13) is then almost immediate. Since T is monotonic and contractive, ϑ

βn

is increasing and converges to ϑ

β

. It remains to show the concavity of ϑ

βn

. We prove this by induction. Clearly, ϑ

β0

≡ 0 is concave. Provided ϑ

βn

is concave, by Jensen’s lemma we have for α ∈ (0, 1),

Π

a

(αµ + (1 − α)ν, ϑ

βn

) ≥ αΠ

a

(µ, ϑ

βn

) + (1 − α)Π

a

(ν, ϑ

β

) and therefore from (14),

ϑ

βn+1

(αµ + (1 − α)ν) ≥ αϑ

βn+1

(µ) + (1 − α)ϑ

βn+1

(ν) ,

i.e. ϑ

βn+1

is concave. By induction, ϑ

βn

is concave for each n. The proof of the theorem is complete.

Below we formulate sufficient conditions for (H1) and (H2).

Proposition 1. Assume (A2) r ∈ C(E × R

d

),

(A3) for fixed a ∈ U , P

a

(x, · ) is Feller , i.e. for any ϕ ∈ C(E), if x

n

⇒ x we have

(16) P

a

(x

n

, ϕ) → P

a

(x, ϕ), (H3) if U 3 a

n

→ a, then for each ϕ ∈ C(E),

(17) sup

x∈E

|P

an

(x, ϕ) − P

a

(x, ϕ)| → 0, (A4) for R(z, ψ)

def

= R

Rd

r(z, y)ψ(y) dy where ψ ∈ C(R

d

), if E 3 z

n

→ z we have

(18) R(z

n

, · ) ⇒ R(z, · ).

Then (H1) and (H2) are satisfied.

P r o o f. Notice first that from (16) and (17), if U 3 a

n

→ a and µ

n

⇒ µ, we have

(19) P

an

n

, ϕ)

def

= R

E

P

an

(x, ϕ) µ

n

(dx) → P

a

(µ, ϕ)

as n → ∞, for ϕ ∈ C(E) . Since U × P(E) is compact, to prove (H1) and (H2) it is sufficient to show that

U × P(E) 3 (a, µ) → Π

a

(µ, F ) is continuous for F ∈ C(P(E)).

(7)

Therefore we shall show that

(20) Π

an

n

, F ) → Π

a

(µ, F )

for U 3 a

n

→ a, P(E) 3 µ

n

⇒ µ and F ∈ C(P(E)). We have (21) |Π

an

n

, F ) − Π

a

(µ, F )|

R

E

R

Rd

(F (M

an

(y, µ

n

)) − F (M

a

(y, µ)))r(z, y) dy P

an

n

, dz) +

R

E

R

Rd

F (M

a

(y, µ))r(z, y) dy (P

an

n

, dz) − P

a

(µ, dz))

= I

n

+ II

n

.

From (19), II

n

→ 0, provided

(22) E 3 z → R

Rd

F (M

a

(y, µ))r(z, y) dy ∈ C(E).

By (A4), R

d

3 y → M

a

(y, µ) ∈ P(E) is continuous. Then, again by (A4), the map (22) is continuous, and consequently II

n

→ 0.

If

(23) sup

z∈E

R

Rd

(F (M

an

(y, µ

n

)) − F (M

a

(y, µ)))r(z, y) dy → 0 then clearly I

n

→ 0.

By (A4), for each ε > 0 there exists a compact set K ⊂ R

d

such that for any z ∈ E,

(24) R(z, K

c

) < ε

2kF k . Therefore

R

Rd

(F (M

an

(y, µ

n

)) − F (M

a

(y, µ)))r(z, y) dy

≤ R

K

|F (M

an

(y, µ

n

)) − F (M

a

(y, µ))|r(z, y) dy + ε

and to obtain (23) it remains to show that

(25) M

an

(y, µ

n

)(ϕ) → M

a

(y, µ)(ϕ) for any ϕ ∈ C(E), uniformly in y ∈ K.

Using the Stone–Weierstrass approximation theorem (see [7], Thm. 9.28,

cf. also the proof of Lemma A.1.2 of [8]) and (19), we obtain

(8)

R

E

r(z, y)ϕ(z) R

E

P

an

(z

1

, dz) µ

n

(dz

1

) − R

E

r(z, y)ϕ(z) R

E

P

a

(z

1

, dz) µ(dz

1

)

R

E

r(z, y)ϕ(z)(P

an

n

, dz) − P

a

(µ, dz)) → 0 uniformly in y ∈ K. Thus, we have uniform convergence of the numerators and denominators in the formula defining M

an

, and consequently conver- gence of the ratios from which (25) follows.

The proof of Proposition 1 is complete.

R e m a r k 1. (A4) is satisfied when sup

z∈E

r(z, y) is integrable.

Define

(26) w

β

(ν) = ϑ

β

(ν) − ϑ

β

β

) and w

βn

(ν) = ϑ

βn

(ν) − ϑ

βn

nβ

)

where µ

β

= arg min ϑ

β

and µ

nβ

= arg min ϑ

βn

. Clearly, w

β

is a solution to the equation

(27) w

β

(ν) + (1 − β)ϑ

β

β

) = inf

a∈U

h R

E

c(x, a) ν(dx) + βΠ

a

(ν, w

β

) i

and w

nβ

(ν) → w

β

(ν) uniformly in ν ∈ P(E). We would like to let β ↑ 1 in (27) and thus obtain a solution w(ν) to the long run average Bellman equation

(28) w(ν) + γ = inf

a∈U

h R

E

c(x, a) ν(dx) + Π

a

(ν, w) i .

Since we wish to apply the Ascoli–Arzel` a theorem, we have to show the boundedness and the equicontinuity of w

β

for β ∈ (0, 1), which are studied successively in the next sections.

3. Boundedness of w

β

. We make the following assumption:

(29) (A5) inf

z,z0∈E

inf

a,a0∈U

inf

C∈E, Pa(z,C)>0

P

a0

(z

0

, C) P

a

(z, C)

def

= λ > 0.

We have

Proposition 2. Under (A5) and the assumptions of Theorem 1, the functions w

β

(ν) are uniformly bounded for β ∈ (0, 1), ν ∈ P(E).

P r o o f. We improve the proof of Theorem 2 of [6]. Namely, we show by induction the uniform boundedness of w

βn

(ν) for ν ∈ P(E), β ∈ (0, 1), n = 0, 1, . . . For n = 0, w

0β

(ν) ≡ 0.

Assume that for any β ∈ (0, 1), ν ∈ P(E), w

nβ

(ν) ≤ L where L ≥ kckλ

−2

.

(9)

Let a, a

0

∈ U be such that for fixed ν ∈ P(E), w

βn+1

(ν) = R

E

c(x, a) ν(dx) − R

E

c(x, a

0

) µ

n+1β

(dx) (30)

+ β[Π

a

(ν, ϑ

βn

) − Π

a0

n+1β

, ϑ

βn

)] . For y ∈ R

d

, define

m(y)(B) = M

a0

(y, µ

n+1β

)(B) − λ

2

M

a

(y, ν)(B) for any B ∈ E .

By (29) we have

R

B

r(z, y) R

E

P

a0

(z

1

, dz) µ

n+1β

(dz

1

) ≥ λ R

B

r(z, y) R

E

P

a

(z

1

, dz) ν(dz

1

)

= λM

a

(y, ν)(B) R

E

r(z, y) R

E

P

a

(z

1

, dz) ν(dz

1

)

≥ λ

2

M

a

(y, ν)(B) R

E

r(z, y) R

E

P

a0

(z

1

, dz) µ

n+1β

(dz

1

) and therefore m(y)(B) ≥ 0 for B ∈ E .

If λ = 1 we have a stationary, noncontrolled Markov chain with P

a

(z, C)

= η(C) for any a ∈ U , z ∈ E and some fixed η ∈ P(E), and consequently w

βn

≡ 0 for any n = 0, 1, . . . Therefore we restrict ourselves to the case λ < 1.

Then (1 − λ

2

)

−1

m(y) ∈ P(E). Since

M

a0

(y, µ

n+1β

) = λ

2

M

a

(y, ν) + (1 − λ

2

)[(1 − λ

2

)

−1

m(y)] , by concavity of ϑ

βn

we obtain

(31) ϑ

βn

(M

a0

(y, µ

n+1β

)) ≥ λ

2

ϑ

βn

(M

a

(y, ν)) + (1 − λ

2

βn

((1 − λ

2

)

−1

m(y)) and from (30) we have

w

βn+1

(ν) (32)

≤ kck + β R

E

R

Rd

ϑ

βn

(M

a

(y, µ))r(z, y) dy

×  R

E

P

a

(z

1

, dz) ν(dz

1

) − λ

2

R

E

P

a0

(z

1

, dz) µ

n+1β

(dz

1

) 

− β(1 − λ

2

) R

E

R

Rd

ϑ

βn

((1 − λ

2

)

−1

m(y))r(z, y) dy

× R

E

P

a0

(z

1

, dz)µ

n+1β

(dz

1

)

(10)

= kck + β R

E

R

Rd

βn

(M

a

(y, µ)) − ϑ

βn

nβ

))r(z, y) dy

×  R

E

P

a

(z

1

, dz) ν(dz

1

) − λ

2

R

E

P

a0

(z

1

, dz) µ

n+1β

(dz

1

) 

− β(1 − λ

2

) R

E

R

Rd

βn

((1 − λ

2

)

−1

m(y)) − ϑ

βn

nβ

))r(z, y) dy

× R

E

P

a0

(z

1

, dz) µ

n+1β

(dz

1

)

≤ kck + βL var  R

E

P

a

(z

1

, · ) ν(dz

1

) − λ

2

R

E

P

a0

(z

1

, · ) µ

n+1β

(dz

1

)

 .

By (A5) for any B ∈ E ,

(33) R

E

P

a

(z

1

, B) ν(dz

1

) ≥ λ

2

R

E

P

a0

(z

1

, B) µ

n+1β

(dz

1

) . Thus

(34) w

n+1β

(ν) ≤ kck + βL(1 − λ

2

) ≤ L

and the bound L is independent of ν ∈ P(E), β ∈ (0, 1). By induction w

βn

(ν) ≤ L for any ν ∈ P(E), n = 0, 1, . . . , β ∈ (0, 1). Since by the very definition w

βn

(ν) ≥ 0, and for each β, w

βn

(ν) → w

β

(ν) as n → ∞, we finally obtain w

β

(ν) ≤ L for ν ∈ P(E) and β ∈ (0, 1).

R e m a r k 2. One can easily see that in the case of a finite state space E, the assumption

(35) (A5

0

) inf

z,z0∈E

inf

a,a0∈U

inf

x∈E, Pa(z,x)>0

P

a0

(z

0

, x) P

a

(z, x) > 0

also implies the boundedness of w

β

. Thus Proposition 2 significantly im- proves Theorem 2 of [6]. This was possible because of the choice of µ

nβ

in (32) as the argument of minimum of ϑ

βn

.

R e m a r k 3. Assumption (A5) says that the transition probabilities for different controls and initial states are mutually equivalent, with Radon–

Nikodym density bounded away from 0. In particular, in the case when P

a

(z, C) = R

C

g

a

(z, x) η(dx) the assumption

(36) inf

z,z0∈E

inf

a,a0∈U

inf

x∈E, ga(z,x)>0

g

a0

(z

0

, x)

g

a

(z, x) > 0

is sufficient for (A5) to be satisfied.

(11)

4. Main theorem. Before we formulate and prove our main result, we show the equicontinuity of w

β

for β ∈ (0, 1). For this purpose we need an extra assumption:

(A6) If P(E) 3 µ

n

⇒ µ ∈ P(E) then sup

a∈U

sup

C∈E

|P

a

n

, C) − P

a

(µ, C)| → 0 with

P

a

(µ, C)

def

= R

E

P

a

(x, C)µ(dx) . We have

Proposition 3. Under (A5), (A6) and the assumptions of Theorem 1, the family of functions w

β

, β ∈ (0, 1), is equicontinuous, i.e.

(37) ∀

ε>0

δ>0

µ,µ0∈P(E)

%(µ, µ

0

) < δ ⇒ ∀

β∈(0,1)

|w

β

(µ) − w

β

0

)| < ε with % standing for a metric compatible with the weak convergence topology of P(E).

P r o o f. For ν, µ ∈ P(E) let (38) λ(ν, µ)

def

= inf

a∈U

inf

C∈E,Pa(µ,C)>0

P

a

(ν, C) P

a

(µ, C) . From (A5) and (A6), if ν ⇒ µ, then

(39) λ(ν, µ) → 1 and λ(µ, ν) → 1.

By (27) for ν, µ ∈ P(E) we have (40) w

β

(ν) − w

β

(µ)

≤ sup

a∈U

R

E

c(x, a)(ν(dx) − µ(dx))

+ β sup

a∈U

a

(ν, w

β

) − Π

a

(µ, w

β

)).

By analogy with the proof of Proposition 2 define

m

a

(y, µ, ν)(B) = M

a

(y, µ)(B) − λ(µ, ν)λ(ν, µ)M

a

(y, ν)(B) for B ∈ E .

Clearly, m

a

(y, µ, ν)(B) ≥ 0 for B ∈ E , and λ(µ, ν)λ(ν, µ) ≤ 1.

If λ(µ, ν)λ(ν, µ) = 1, then w

β

≡ 0 for β ∈ (0, 1), and consequently the equicontinuity property is satisfied. Therefore assume λ

2

= λ(µ, ν)λ(ν, µ)

< 1. Then by the concavity of w

β

,

(41) w

β

(M

a

(y, µ)) ≥ λ

2

w

β

(M

a

(y, ν)) + (1 − λ

2

)w

β

((1 − λ

2

)

−1

m

a

(y, µ, ν)) .

(12)

From (40),

(42) w

β

(ν) − w

β

(µ) ≤ sup

a∈U

R

E

c(x, a)(ν(dx) − µ(dx)) + β sup

a∈U

n R

E

R

Rd

w

β

(M

a

(y, ν))r(z, y) dy (P

a

(ν, dz) − λ

2

P

a

(µ, dz))

+ R

E

R

Rd

2

w

β

(M

a

(y, ν)) − w

β

(M

a

(y, µ)))r(z, y) dy P

a

(µ, dz) o

= I + II + III . Now

II ≤ 2kw

β

k sup

a∈U

sup

B∈E

|P

a

(ν, B) − λ

2

P

a

(µ, B)|

(43)

= 2kw

β

k(1 − λ(µ, ν)λ(ν, µ)) and using (41) and the nonnegativity of w

β

we have (44) III

≤ sup

a∈U

R

E

R

Rd

2

− 1)w

β

((1 − λ

2

)

−1

m

a

(y, µ, ν))r(z, y) dy P

a

(µ, dz) ≤ 0 .

Interchanging ν and µ in (40)–(44) we obtain the same estimates and there- fore

|w

β

(ν) − w

β

(µ)| ≤ sup

a∈U

R

E

c(x, a)(ν(dx) − µ(dx)) (45)

+ 2kw

β

k(1 − λ(µ, ν)λ(ν, µ)) .

Since by the Stone–Weierstrass theorem (Thm. 9.28 of [7]) c(x, a) can be uniformly approximated on E × U by continuous functions of the form P

r

i=1

c

i

(x)d

i

(a), from (39) we obtain

ν⇒µ

lim sup

β∈(0,1)

|w

β

(ν) − w

β

(µ)| = 0 . Let us comment on the assumption (A6):

R e m a r k 4. (H3) clearly follows from (A6).

R e m a r k 5. In the case of a finite state space E = {1, . . . , N }, (A6) can be written as

(46) sup

a∈U N

X

k=1

N

X

i=1

(s

ni

− s

i

)P

a

(i, k)

→ 0

(13)

for s

n

= (s

n1

, . . . , s

nN

) → s = (s

1

, . . . , s

N

), 0 ≤ s

ni

≤ 1, 0 ≤ s

i

≤ 1, P s

ni

= 1, P s

i

= 1, and this is satisfied since

sup

a∈U N

X

k=1

N

X

i=1

(s

ni

− s

i

)P

a

(i, k) ≤

N

X

i=1

|s

ni

− s

i

| → 0 as s

n

→ s . R e m a r k 6. Assume P

a

(z, C) = R

C

g

a

(z, x)η(dx) for C ∈ E and that the mapping

(47) U × E × E 3 (a, z, x) → g

a

(z, x) is continuous.

Then (A6) is satisfied.

In fact, by the Stone–Weierstrass theorem we can approximate g

a

uni- formly on U ×E×E by continuous functions of the form P

k

i=1

b

i

(a)c

i

(z)d

i

(x) and

sup

a∈U

sup

C∈E

|P

a

n

, C) − P

a

(µ, C)|

≤ sup

a∈U

R

E

R

E

g

a

(z, x)(µ

n

(dz) − µ(dz)) η(dx)

≤ ε +

k

X

i=1

sup

a∈U

|b

i

(a)| R

E

|d

i

(x)|η(dx)

R

E

c

i

(z)(µ

n

(dz) − µ(dz)) → ε as n → ∞.

Now we can prove our main result:

Theorem 2. Assume (A1)–(A6). Then there exist w ∈ C(P(E)) and a constant γ which are solutions to the Bellman equation

(48) w(µ) + γ = inf

a∈U

h R

E

c(x, a) µ(dx) + Π

a

(µ, w) i .

Moreover , there exists u : P(E) → U for which the infimum on the right hand side of (48) is attained. The strategy a

n

= u(π

n

) is optimal for J

µ

and

(49) J

µ

((u(π

n

))) = γ .

P r o o f. By Theorem 1, each ϑ

βn

is concave. Therefore w

βn

is concave and w

β

as limit of w

nβ

is also concave. Since by Proposition 2, the w

β

are uniformly bounded, and by Proposition 3 equicontinuous, from the Ascoli–

Arzel` a theorem the family w

β

, β ∈ (0, 1), is relatively compact in C(P(E)).

Moreover, |(1 − β)ϑ

β

β

)| ≤ kck. Therefore one can choose a subsequence β

k

→ 1 such that

(1 − β

k

βk

βk

) → γ and

w

βk

→ w in C(P(E)) as k → ∞.

(14)

Letting β

k

→ 1 in (27) we obtain (48).

The remaining assertion of the theorem follows easily from Theorem 3.2.2 of [4].

References

[1] G. B. D i M a s i and L. S t e t t n e r, On adaptive control of a partially observed Markov chain, Applicationes Math., to appear.

[2] E. F e r n a n d e z - G a u c h e r a n d, A. A r a p o s t a t i s and S. J. M a r c u s, Adaptive con- trol of a partially observed controlled Markov chain, in: Stochastic Theory and Adap- tive Control, T. E. Duncan and B. Pasik-Duncan (eds.), Lecture Notes in Control and Inform. Sci. 184, Springer, 1992, 161–171.

[3] —, —, —, On partially observable Markov decision processes with an average cost criterion, Proc. 28 CDC, Tampa, Florida, 1989, 1267–1272.

[4] O. H e r n a n d e z - L e r m a, Adaptive Markov Control Processes, Springer, New York, 1989.

[5] H. K o r e z l i o g l u and G. M a z z i o t t o, Estimation recursive en transmission nu- merique, Proc. Neuvi`eme Colloque sur le Traitement du Signal et ses Applications, Nice, 1983.

[6] M. K u r a n o, On the existence of an optimal stationary J -policy in non-discounted Markovian decision processes with incomplete state information, Bull. Math. Statist.

17 (1977), 75–81.

[7] H. L. R o y d e n, Real Analysis, Macmillan, New York, 1968.

[8] W. J. R u n g g a l d i e r and L. S t e t t n e r, Nearly optimal controls for stochastic ergodic problems with partial observation, SIAM J. Control Optim. 31 (1993), 180–218.

[9] K. W a k u t a, Semi-Markov decision processes with incomplete state observation—

average cost criterion, J. Oper. Res. Soc. Japan 24 (1981), 95–108.

LUKASZ STETTNER

INSTITUTE OF MATHEMATICS POLISH ACADEMY OF SCIENCES P.O. BOX 137

00-950 WARSZAWA, POLAND

E-mail: STETTNER@IMPAN.IMPAN.GOV.PL

Received on 22.6.1992

Cytaty

Powiązane dokumenty

When is it

Stack-losses of ammonia Y were measured in course of 21 days of operation of a plant for the oxidation of ammonia (NH3) to nitric acid (HNO 3 )... Discuss the obtained

In this paper the Drazin inverse of matrices will be applied to find the solutions of the state equations of descriptor fractional discrete-time linear systems with regular

[r]

It is well known that any complete metric space is isomet- ric with a subset of a Banach space, and any hyperconvex space is a non- expansive retract of any space in which it

As a particular case we get Ta˘ımanov’s theorem saying that the image of a Luzin space under a closed continuous mapping is a Luzin space.. The method is based on a parametrized

N ) whenever N is such that the norms of three or more consecutive (as determined by the continued fraction expansion of ω (see [13])) principal reduced ideals are powers of a

In fact, we know the correspondence, at least generically, in the case of second order linear ordinary differential equations of Fuchsian type (with a large parameter) and we