Bayesian Control of a Discrete-Time Linear System with Uniformly Distributed Disturbances

(1)

Dariusz Walczak (Houston)

Bayesian Control of a Discrete-Time Linear System with Uniformly Distributed Disturbances

Abstract The main objective of this article is to develop Bayesian optimal control for a class of linear stochastic discrete time systems. By taking into consideration that the disturbances in the system are given by a random variable having a uniform distribution with a natural parameter, we prove that the control in the sense of Bayes is the solution of a linear system of algebraic equations for the conjugate priors.

2010 Mathematics Subject Classification: 60G40, 62L15.

Key words and phrases: Bayes control, optimal, singular system, disturbances, Pareto distribution.

1. Introduction Linear stochastic discrete time systems, are systems in which the variables take their value at instantaneous time points. Discrete time systems differ from continuous time ones in that their signals are in the form of sampled data. In real systems, the discrete time system often appears when it is the result of sampling the continuous-time system or when only discrete data are available for use. With the development of the digital computer, the stochastic discrete time system theory plays an important role in the general control theory.

When considering such systems, the performance measurement and the available information at the moments of control specification are two very important factors. The small deviations of the parameters can be treated as disturbances. As the random disturbance is admitted, the performance measure will be the mean value of the deviation of the system state from the required behavior. When all the parameters of the system are known and the distribution of disturbances is well defined, then the optimal control can be determined. The extension of this model to an adaptive one means that the disturbances are uncertain. Adaptive control is the control method used by a controller which must adapt to a controlled system with parameters that vary, or are initially uncertain (see Tesfatsion [14] for the history of the adaptive control).

Based on the behavior of the system we can learn the details of the disturbances. It is assumed that the disturbance has a fixed probabilistic descrip- tion which is determined by the assumption. In this paper it is assumed that

(2)

the distribution function is known with an accuracy of its parameters and the disturbances additionally change the state of the system. It resembles the statistical problem of estimation.

Initially, it was the seminal papers by Wald [19], [20], where the back- ground of modern statistical decision theory was established. The statistical decision theory approach to control problems was then applied some years later (see books by Sworder [12], Aoki [1], Sage and Melsa [10]). The new class of control systems under uncertainty was called adaptive (see Tesfatsion [14]).

In these adaptive control problems Bayesian systems play an important role.

In this class of control models it is assumed that the preliminary knowledge of the disturbances is given by a priori distributions of their parameters.

In this work we find the optimal feedback control of a dynamic linear system with discrete time and with additive disturbances. The disturbances are assumed to be independently and identically distributed (’i.i.d’). The distribution is specified up to a set of parameters. The loss function is a positively semi-definite quadratic form that depends on the system state and the control applied. The control horizon is a bounded random variable with a known distribution, and independent from the random disturbances in the system. In the Bayesian approach we assume knowledge of the prior distribution. Problems of this kind are classified in the literature as the adaptive control problems. For the particular case of i.i.d. disturbances belonging to the exponential family of distributions, and using a dynamic programming approach that we also utilize here, optimal controls can be found in [16]; other results in related settings, and also including minimax control are available in [16], [17], and [15].

The form of feedback controls considered here is straightforward to com- pute due to its recursive nature. We are also allowing incomplete information about the distributions involved by explicitly modeling uncertainty in the parameter, which is the setting often found in practice. Our solution approach based on the Bayes’ theorem is intuitive and explicitly handles this uncertainty via the theory of conjugate distributions (cf. [3], [13]). Bayesian methods are experiencing a recent resurgence in interest due to their appli- cability to pattern recognition and more generally in the machine learning area.

By implementing the dynamic programming approach we determine the analytical form of the optimal controls in the closed feedback loop: in Sec- tion 3 for disturbances distributed uniformly on [0, λ] and in Section 4 for disturbance uniformly distributed on [λ₁, λ2].

2. Model Formulation The control system under consideration is formulated as follows:

x_n+1 = α_nx_n+ u_n+ γ_nv_n, (1) where x₀ = e, n = 0, 1, . . . , M, and un ∈ (−∞, +∞), where n is the time

(3)

index. The x_n is the state variable, u_n is the control, and v₀, . . . , v_M are the i.i.d random variables modeling system disturbances; α_n and γ_n are given constants, and we assume that γ_n 6= 0. The control horizon N is a random variable bounded by M and independent from disturbances and the controls;

it is distributed with the given probabilities p_k so that:

Pr(N = k) = p_k, k = 0, 1, . . . , M,

M

X

k=0

p_k = 1, and p_M > 0.

We will use the following notation:

Xn= (x0, x1, . . . , xn), Un= (u0, u1, . . . , un).

Our control policy is U = U_M, and given the policy the loss function is defined as:

L(U, X_N) =

N

X

i=0

(s_ix²_i + k_iu²_i),

where s_i, and k_i are given positive numbers. Assuming that u_n depends on X_n, Un, and because of the fact that γ_n 6= 0, the u_n is a function of v₀, . . . , v_n−1. Let the distribution of U be parameterized by λ, then for a given initial state x₀ the risk R(λ, U ) under policy U is defined as:

R(λ, U ) = E_N

E_λL(U, X_N)

= E_N

E_λ

N

X

i=0

(s_ix²_i + k_iu²_i) | X₀

.

The expectations in the above formula are with respect to the distribution of the disturbances and the random horizon, respectively. For a given prior distribution π of the parameters the corresponding risk has the following form:

H(π, U ) = E_πR(λ, U ) = E_NE_π

N

X

i=0

(s_ix²_i + k_iu²_i) | X₀

. We will call a control policy Bayesian if it satisfies this condition:

H(π, U^∗) = inf

U ∈UΓ

H(π, U ),

where U_Γ is a set of policies for which the risk H(π, U ) exists.

We approach the problem by considering sub-problems obtained by con- ditioning on subsequent times (decision epochs) n, that is, for time n, we assume the knowledge of X_n, U_n−1 and seek the optimal (u_n, . . . , u_M) from that time on. The corresponding expectation of risk is then determined as:

H_n(π, U ) = E_N

Eπ

N

X

i=n

(six²_i + kiu²_i) | Xn, Un−1

| N ≥ n

.

(4)

The control policy which minimizes the expected risk is called the Bayes control (cf. [7], [8], [11], [18]).

3. Bayesian Control with Disturbances Uniformly Distributed on [0, λ] The distribution of the disturbance is absolutely continuous with respect to the Lebesgue measure with density of the form (where for any set A, 1_Ais its indicator function):

p(v, λ) = 1

λ1_[0,λ](v), Λ = (0, +∞), λ ∈ Λ.

The parameter λ is unknown but we assume that its prior distribution is Pareto:

g(λ; β, r) = βr^β

λ^β+11_(r,+∞)(λ), r > 0, β > 2.

The family of Pareto distributions with parameters β and r (also known as the family of one-sided Pareto distributions) constitutes a conjugate family with respect to the uniform distribution of the type considered here (cf. [3]). A convenient property of the conjugate distribution family is that the posterior distribution is also of the same type but with different parameters. Using this property we calculate, for given X₁ and U₀, the v₀ and obtain via the Bayes Rule that:

f (λ | X1, U0) = f (λ | v0 = v)

= p(v, λ)g(λ; β, r) R

Λp(v, λ)g(λ; β, r)dλ

= 1

λ^β+2(β + 1)(r ∨ v)^β+11_(r∨v,+∞)(λ)

=: g(λ; β₁, r₁),

where r ∨ v := max{r, v}. We thus see that the posterior distribution is also of one-sided Pareto type with new parameters:

β₁= β + 1, r₁ = r ∨ v.

Analogously, after observing X_n and knowing U_n−1, the control applied at time n − 1, we obtain:

f (λ | X_n, U_n−1) = f (λ | v_n−1= v) = g(λ; β_n, r_n),

with β_n = β_n−1 + 1, r_n = r_n−1 ∨ v. The conditional distribution of the random variable v_n after observing X_n has density

h(v | X_n, U_n−1) = Z +∞

0

p(v, λ)g(λ; β_n, r_n)dλ

= Z +∞

0

1

λ1_(0,λ)(v)β_nr_n^βⁿ

λ^βⁿ⁺¹1_(r_n_,∞)(λ)dλ

= βnr^βnⁿ

βn+ 1

1

(rn∨ v)^βⁿ⁺¹1_(0,+∞)(v).

(5)

Lemma 3.1 The following equalities hold E(v_n| X_n, U_n−1) = 1

2 β_n

βn− 1r_n= Qⁿr_n, Qⁿ= β_n 2(βn− 1) E(v²_n| X_n, U_n−1) = Qⁿ₁r²_n, Qⁿ₁ = β_n

3(β_n− 2) E(r_n+1| X_n, U_n−1) = Qⁿ₂r_n, Qⁿ₂ = β_n²

β²_n− 1 E(r²_n+1| X_n, Un−1) = Qⁿ₃r²_n, Qⁿ₃ = βn(βn− 1)

(β_n+ 1)(β_n+ 2) E(x_n+1| X_n, U_n−1) = α_nx_n+ u_n+ γ_nQⁿr_n

E(x²_n+1| X_n, Un−1) = (αnxn+ rn)²+ 2(αnxn+ un)γnQⁿrn+ γ_n²Qⁿ₁r²_n E(x_n+1r_n+1| X_n, U_n−1) = (α_nx_n+ u_n)Qⁿ₂r_n+ γ_nQⁿ₄r_n², Qⁿ₄

= β_n²

2(β_n+ 1)(β_n− 2).

Proof We show explicit derivation for only the last two of the above equalities as the remaining ones can be demonstrated analogously.

E(x²_n+1| X_n, U_n−1) = E((α_nx_n+ u_n+ γ_nv_n)² | X_n, U_n−1)

= (αnxn+ un)²+ 2(αnxn+ un)γnE(vn| X_n, Un−1) + γ²E(v_n² | X_n, U_n−1)

= (αnxn+ un)²+ 2(αnxn+ un)γnQnrn+ γ_n²Qⁿ₁r_n²

E(x_n+1r_n+1| X_n, U_n−1) = E((α_nx_n+ u_n+ γ_nv_n)(r_n∨ v_n) | X_n, U_n−1)

= (αnxn+ un)Qⁿ₂rn+ γn

βn

β_n+ 1r_n^βⁿ

1 2

1

r_n^βⁿr_n²+ 1 β_n− 2

= (α_nx_n+ u_n)Qⁿ₂r_n+ γ_n β_n²

2(β_n+ 1)(β_n− 2)r_n² = (α_nx_n+ u_n)Qⁿ₂r_n+ Qⁿ₄r_n², We are now ready for the main result of this section.

Theorem 3.2 Under the assumptions of this section concerning the distur- bances and the prior distribution π of the parameter, the optimal Bayesian control of System (1) takes the form

u^∗_n= −

πn+1

πn α_nA_n+1 k_n+ ^πⁿ⁺¹_π

n A_n+1xn−

πn+1

πn (γ_nA_n+1Qⁿ+ B_n+1Qⁿ₂) k_n+ ^πⁿ⁺¹_π

n A_n+1 rn, (2)

with its corresponding risk of the form

Wn= Anx²_n+ 2Bnxnrn+ Cnr²_n. (3)

(6)

The A_n, B_n, and C_n are functions of β_n, do not depend on r_n, and satisfy the recursive relationships shown below

An= sn+

πn+1

πn k_nα²_nA_n+1 k_n+^πⁿ⁺¹_π

n A_n+1 (4)

Bn= kn πn+1

πn αn

γnAn+1Qⁿ+ Bn+1Qⁿ₂

k_n+^πⁿ⁺¹_π

n A_n+1 (5)

C_n= πn+1

π_n

γ_n²Qⁿ₁A_n+1+ 2B_n+1γ_nQⁿ₄ + C_n+1Qⁿ₃

(6)

−kn+^π_πⁿ⁺¹

n An+1

k_n²α²_n · B_n²

AM = sM, BM = CM = 0. (7)

Proof Let πk=PM

i=kp_i, with this notation we can write risk H as H_n= EN

E

hX^N

i=n

i

| N ≥ n

(8)

=

M

X

k=n

EhX^k

i=n

(s_ix²_i + k_iu²_i) | X_n, U_n−1ip_k πk

= EhX^M

i=n

π_i

π_n(s_ix²_i + k_iu²_i) | X_n, U_n−1i .

We derive now the Bayesian control u^∗_nand the risk associated with it

W_n= min

un

H_n= min

ui,n≤i≤MEhX^M

i=n

π_i πn

(s_ix²_i + k_iu²_i) | X_n, U_n−1i .

From Bellman’s Optimality Principle we obtain Wn= min

ui,n≤i≤M

snx²_n+ knu²_n (9)

+ min

ui,n+1≤i≤ME

M

X

i=n+1

π_i πn

= min

ui,n≤i≤M

snx²_n+ knu²_n

+ min

ui,n+1≤i≤M

π_n+1 πn

E

M

X

i=n+1

π_i πn+1

(six²_i + kiu²_i) | Xn+1, Un

| X_n, Un−1

(7)

and thus it follows that W_n= min

un

s_nx²_n+ k_nu²_n+π_n+1 πn

E

W_n+1| X_n, U_n−1

(10) Since the integrand is bounded from below we can move the minimum inside the integral, which results in an equation that the Bayesian control ^∗_n has to satisfy

2k_nu^∗_n+ ∂

∂un

π_n+1 πn

E

W_n+1 | X_n, U_n−1

= 0 (11)

and together with Equation1 we further obtain 2knu^∗_n+πn+1

πn

E

∂

∂xn+1

Wn+1| X_n, Un−1

= 0. (12)

Now, by means of backward induction, we show that W_nhas the desired form (3).

1. For n = M , W_M = S_mx²_M and thus A_M = s_M, B_M = C_M = 0.

2. Assume (inductively) that W_n+1has the form as in (3). We obtain that

∂

∂xn+1

Wn+1 = 2An+1xn+1+ 2Bn+1rn+1, and also that

E

∂

∂xn+1

Wn+1| X_n, Un−1

= 2An+1(αnxn+ un) + 2An+1γnQⁿrn

+ 2B_n+1Qⁿ₂r_n. Equation (12) gives us

2k_nu^∗_n+π_n+1 πn

2A_n+1(α_nx_n+ u^∗_n) + 2A_n+1γ_nQⁿr_n+ 2B_n+1Qⁿ₂r_n

= 0, which can be converted into the expression for the optimal control (2) that we have been seeking.

Substituting into Equation10 the expression for E

Wn+1| X_n, Un−1

previ- ously obtained in Lemma 3.1 as well as the just obtained expression for u^∗_n, by equating coefficients of x²_n, x_nrn, and r²_n we verify relationships (4).

4. Bayesian Control with Disturbances Uniformly Distributed on [λ1, λ2] In this section we consider a more complex model with two

(8)

unknown parameters for the distribution of disturbances. Namely, the disturbances are uniformly distributed over [λ₁, λ₂], and both λ₁ and λ₂ are unknown; for the sake of notation we will sometimes use λ to denote the vector (λ₁, λ2). These i.i.d. disturbances have the following density:

p(v, λ) = 1

(λ2− λ₁)1_[λ₁_,λ₂_](v), where λ₂ > λ₁.

The prior distribution is assumed to be the two-sided Pareto distribution whose density with respect to the Lebesgue measure on the plane is

g(λ; α, β, γ) = γ(γ + 1)(β − α)^γ

(λ₂− λ₁)^γ+2 ·1_(−∞,α)(λ₁)1_(β,+∞)(λ₂), where β > α, γ > 2.

(13) We expand the notation slightly to accommodate additional parameters and operators needed:

x ∨ y := max{x, y}, and x ∧ y := min{x, y}.

We write the system equation now as

x_n+1= a_nx_n+ u_n+ c_nv_n, n = 0, 1, . . . , M. (14) In order to determine the optimal control in the Bayes’ sense we will follow the same approach as in the previous section. For a given X₁, U0 we thus have

f (λ | X₁, U₀) = f (λ | v₀ = v) = p(v, λ)g(λ; α, β, γ) R

Λp(v, λ)g(λ; α, β, γ)dλ, Λ = {(x, y) : y > x}.

Integrating out λ in the last denominator gives us Z

Λ

p(v, λ)g(λ; α, β, γ)dλ = γ(β − α)^γ

(γ + 2)(β ∨ v − α ∧ v)^(γ+1) and ultimately

f (λ | X₁, U₀) = (γ + 1)(γ + 2)

(λ₂− λ₁)^γ+3 ·1_{(−∞,α∧v)}(λ₁)·1_{(β∨v,+∞)}(λ₂)·(β∨v−α∧v)^γ+1

= g(λ; α1, β1, γ1).

We can see that the posterior distribution of λ has indeed the same form as the prior with updated parameters

α1 = α ∧ v, β1 = β ∨ v, γ1 = γ + 1.

In an analogous manner, at the n-th stage, given X_nand U_n−1we obtain the posterior with parameters

αn= αn−1∧ v, β_n= βn−1∨ v, γ_n= γn−1+ 1,

(9)

and the conditional distribution of v_n with density h(v | X_n, U_n−1) = γ_n

γ_n+ 2

(β − α)^γ (β ∨ v − α ∧ v)^γ+1. To help with formalism we need to introduce some notation. Let

S₁ⁿ= 1

γn+ 2, S₂ⁿ= γn

γn+ 2, S₃ⁿ= 1

(γn− 1)(γ_n+ 2),

S₄ⁿ= 1

(γ_n− 1)(γ_n+ 2)(γ_n− 2). We then write

E0= Z βn

αn

h(v | Xn, U n − 1)dv = S₂ⁿ, E1=

Z αn

−∞

v · h(v | Xn, Un−1)dv = E₁^α· α_n+ E₁^β· β_n, E₁^α= S₁ⁿ+ S₃ⁿ, E₁^β = −S₃ⁿ,

E2= Z βn

αn

v · h(v | Xn, Un−1)dv = E₂^α· α_n+ E₂^β· β_n, E₂^α= E₂^β = 1

2S₂ⁿ, E3=

Z +∞

βn

v · h(v | Xn, Un−1)dv = E₃^α· α_n+ E₃^β· β_n, E₃^α = −S₃ⁿ, E₃^β = S₁ⁿ+ S₃ⁿ,

E4= Z αn

−∞

v²· h(v | X_n, Un−1)dv = E₄^α² · α²_n+ E₄^αβ· α_nβn+ E₄^β²· β_n², E₄^α² = S₁ⁿ+ 2S₃ⁿ+ 2S₄ⁿ, E₄^αβ = −2Sⁿ₃ − 4S₄ⁿ, E₄^β² = 2S₄ⁿ,

E5= Z βn

αn

v²· h(v | X_n, Un−1)dv = 1

3S₂ⁿ(β_n²+ αnβn+ β_n²), E₅^α² = E₅^αβ = E₅^β² = 1

3S₂ⁿ, E6=

Z +∞

βn

v²· h(v | X_n, Un−1)dv = E₆^α² · α²_n+ E₆^αβ· α_nβn+ E₆^β²· β²_n, E₆^α² = 2S₄ⁿ, E₆^αβ = −2S₃ⁿ− 4S₄ⁿ, E₆^β² = S₁ⁿ+ 2Sⁿ₃ + 2S₄ⁿ,

E

v | X_n, U_n−1

= E₁+ E₂+ E₃ = 1

2(α_n+ β_n), E

v²| X_n, Un−1

= E4+ E5+ E6 = Qⁿ₁(α²_n+ β_n²) + Qⁿ₂αnβn, Qⁿ₁ = S₁ⁿ+ 2S₃ⁿ+ 4S₄ⁿ, Qⁿ₂ = −4S₃ⁿ− 8S₄ⁿ.

(10)

Lemma 4.1 Under the assumptions of this section and utilizing the above notation the following relationships hold

E

αn+1βn+1 | X_n, Un−1

= E₃^α(α²_n+ β_n²) + (2E₁^α+ Sⁿ₂)αnβn

E

α²_n+1+ β_n+1² | X_n, Un−1

= (E₄^α² + E₂^α+ E₃^β+ E₆^α²)(α²_n+ β²_n) + 2(E₂^β+ E₃^β+ E₄^β)αnβn

E

x_n+1(α_n+1+ β_n+1) | X_n, U_n−1

= (a_nx_n+ u_n)(2S₁ⁿ+ S₂ⁿ)(α_n+ β_n) +cn

E₄^α²+E₂^α+E₃^α+E₆^α²(α²_n+β²_n)+ E₄^αβ+E₂^β+E₃^β+E₆^αβ+E₁^α+E₂^αα_nβn

E

x²_n+1 | X_n, Un−1

= (anxn+ un)² + 2c_n(a_nx_n+ u_n)1

2(α_n+ β_n) + c²_n

Qⁿ₁(α²_n+ β²_n) + Qⁿ₂α_nβ_n

E

x_n+1| X_n, U_n−1

= a_nx_n+ u_n+ c_n1

2(α_n+ β_n). (15) Proof We explicitly show derivation of only one of the equalities as the remaining ones can be verified in a analogous manner.

E

xn+1(αn+1+ βn+1) | Xn, Un−1

= E

(a_nx_n+ u_n+ c_nv_n)(α_n∧ v + β_n∨ v) | X_n, U_n−1

= (anxn+ un)(2S₁ⁿ+ S₂ⁿ)(αn+ βn) + cn

E₄^α²+ E₂^α+ E^α₃ + E₆^α²α_n²

+ E₄^α² + E₂^β+ E₃^β+ E₆^αβ+ E₁^α+ E₂^αα_nβn+ E₄^β² + E₆^β²+ E₁^β+ E₂^ββ_n²

= (anxn+ un)(αn+ βn) +c_n

E₄^α²+E₂^α+E₃^α+E₆^α²(α_n²+β_n²)+ E₄^αβ+E₂^β+E₃^β+E₆^αβ+E₁^α+E₂^αα_nβ_n

since E₄^α² = E₆^β², E₄^β² = E₆^α², E₂^β = E₂^α, and E₃^α= E₁^β. In full analogy to the proof of Theorem3.2one can show a similar result under the distributional assumption of this section, i.e. for disturbances being i.i.d on [λ₁, λ₂].

(11)

Theorem 4.2 Under the above assumptions on the distribution of distur- bances and the prior, the optimal Bayesian control of system (14) takes the form

u^∗_n= −

πn+1

πn A_n+1a_n kn+^π_πⁿ⁺¹

n An+1

xn−

πn+1

πn (¹₂c_nA_n+1+ B_n+1) kn+^π_πⁿ⁺¹

n An+1

(αn+ βn), (16) with its corresponding risk of the form

W_n= A_nx²_n+ 2B_nx_n(α_n+ β_n) + C_n(α²_n+ β_n²) + 2D_n(α_n+ β_n), (17) where A_n, B_n, C_n, and D_n do not depend on β_n and satisfy the following recursive relationships

An= sn+

πn+1

πn A_n+1k_na²_n k_n+^πⁿ⁺¹_π

n A_n+1 Bn= kn

πn+1

πn a_n(¹₂c_nA_n+1+ B_n+1) k_n+ ^π_πⁿ⁺¹

n A_n+1 C_n= πn+1

π_n

C_n²Qⁿ₁A_n+1+ 2c_nB_n+1 E₄^αβ+ E₂^β+ E₃^β+ E₆^αβ + D_n+1E₃^α + E₄^α² + E₂^α+ E₃^α+ E₆^α²C_n+1

−kn+^πⁿ⁺¹_π

n An+1

2 B_n,

D_n= π_n+1 πn

A_n+1c_nQⁿ₂ + B_n+1c_n(E₄^αβ + E₂^β+ E^β₃ + E₆^αβ+ E₁^α+ E₂^α) + C_n+1(E₂^β+ E₃^β+ E₄^αβ) + 2D_n+1E₁^α

with A_M = s_M, B_M = C_M = D_M = 0.

It seems that one cannot obtain results of Theorem3.2directly from Theorem 4.2even though the model in this section is the most general as far as uniform distribution on the line. When we try to specialize results of the latter to uniform distribution on [0, λ] considered in Theorem 3.2 by setting α_n = 0, γn = βn, and βn = rn, we obtain a slightly different form of control.

Namely instead of Qⁿ = _2(β^βⁿ

n−1) we have ¹₂, and instead of Qⁿ₂ = _β^β2²ⁿ n−1 we have 1, the general form of the risk is the same but with different coefficients.

One can also use this general model to control the model with disturbances i.i.d. on [λ − a, λ + a] with a constant a and λ unknown, but the optimality cannot be guaranteed a priori.

5. Conclusion By utilizing certain properties of conjugate distributions we have obtained analytical expressions for the adaptive feedback control in the sense of Bayes for a linear model with discrete time and a finite random horizon with additive i.i.d. disturbances. Two types of uniform distributions

(12)

were considered, and for each type we show that the controls can be easily calculated numerically using finite recursion.

The model can be further researched to determine conditions for a convenient form of feedback control under additive and independent disturbances that are distributed uniformly on [λ − a, λ + a] with unknown a and λ. One can also analyze the structure of control when the number of disturbances is random, e.g. when for each time point we introduce an independent binary random variable to turn the disturbance on or off.

Extending the model to multiple dimensions is interesting in itself but seems to require considerably more effort. However, such an extension should also be more relevant in practice given various potential applications (see e.g. [4], [2], [5], [6], [9]).

6. Acknowledgment I would like to thank Professor Krzysztof Sza- jowski for the initial inspiration as well as discussions and support throughout this research project.

References

[1] M. Aoki. Optimization of stochastic systems. Topics in discrete-time systems. Mathe- matics in Science and Engineering, Vol. 32. Academic Press, New York-London, 1967.

MR 0234749 Zbl 0168.15802.

[2] W. S. Black, P. Haghi, and K. B. Ariyur. Adaptive systems: History, techniques, problems, and perspectives. Systems, 2:606–660, 2014.doi: 10.3390/systems2040606.

[3] M. DeGroot. Optimal Statistical Decision. McGraw Hill Book Comp., New York, 1970. Zbl 1136.62011.

[4] T. E. Duncan, B. Pasik-Duncan, and L. Stettner. Adaptive control of a partially observed discrete time Markov process. Appl. Math. Optim., 37(3):269–293, 1998.

doi: 10.1007/s002459900077;MR 1610799.

[5] J. I. Gonz´alez-Trejo, O. Hern´andez-Lerma, and L. F. Hoyos-Reyes. Minimax con- trol of discrete-time stochastic systems. SIAM J. Control Optim., 41(5):1626–1659 (electronic), 2002. doi: 10.1137/S0363012901383837;MR 1971966.

[6] A. Grzybowski. Minimax control of a system with actuation errors. Zastos. Mat., 21(2):235–252, 1991.MR 1145478;Zbl 0756.93088.

[7] H. Kushner. Introduction to stochastic control. Holt, Rinehart and Winston, Inc., New York-Montreal, Que.-London, 1971. MR 0280248;Zbl 0293.93018.

[8] Z. Porosiński, K. Szajowski, and S. Trybuła. Bayes control for a multidimensional stochastic system. Systems Sci., 11(2):51–64 (1987), 1985.MR 919393;Zbl 0629.93073.

[9] W. J. Runggaldier. Concepts and methods for discrete and continuous time control un- der uncertainty. Insurance Math. Econom., 22(1):25–39, 1998. The interplay between insurance, finance and control (Aarhus, 1997). doi: 10.1016/S0167-6687(98)00006-7;

MR 1625819;Zbl 0916.93085.

(13)

[10] A. P. Sage and J. L. Melsa. Estimation theory with applications to communications and control. McGraw-Hill Book Co., New York-D¨usseldorf-London, 1971. McGraw- Hill Series in Systems Science.MR 0501447;Zbl 0255.62005.

[11] G. Sawitzki. Exact filtering in exponential families: discrete time. Math. Operations- forsch. Statist. Ser. Statist., 12(3):393–401, 1981. doi: 10.1080/02331888108801598;

MR 640558.

[12] D. Sworder. Optimal adaptive control systems. Mathematics in Science and Engineer- ing. Vol. 25. Academic Press, New York-London, 1966. MR 0211801 Zbl 0168.15801.

[13] K. Szajowski and S. Trybuła. Bayes control of a discrete time linear system with random disturbances. Random horizon case. Podstawy Sterowania, 14:109–115, 1984.

Zbl 0552.93066.

[14] L. Tesfatsion. A dual approach to Bayesian inference and adaptive control. Theory and Decision, 14(2):177–194, 1982.doi: 10.1007/BF00133976;MR 665583;Zbl 0489.93059.

[15] S. Trybuła. Sterowanie dualne przy samoreprodukujących się rozkładach. In Prace V Krajowej Konferencji Automatyki, pages 163–169, Gdańsk, 1971. Sekcja 1. Teoria sterowania.

[16] S. Trybuła and K. Szajowski. Decision making in an incompletely known stochastic system. I. Zastos. Matem., 19:31–41, 1987. MR 897512;Zbl 0645.62008.

[17] S. Trybuła and K. Szajowski. Decision making in an incompletely known stochastic system. II. Zastos. Matem., 19:43–56, 1987. MR 897512;Zbl 0645.62009.

[18] D. Walczak. Bayes and minimax control of discrete time linear dynamical systems.

Technical report, Wrocław University of Technology, Faculty of Fundamental Problems of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland, 1986. Master’s Thesis (in Polish).

[19] A. Wald. Contributions to the theory of statistical estimation and testing hypothe- ses. Ann. Math. Statistics, 10:299–326, 1939. doi: 10.1214/aoms/1177732144; MR 0000932;Zbl 65.0585.03.

[20] A. Wald. Statistical Decision Functions. John Wiley & Sons, Inc., New York, N. Y.;

Chapman & Hall, Ltd., London, 1950.

(14)

Sterowanie bayesowskie systemem liniowym z czasem dyskretnym przy jednostajnych zakłóceniach

Dariusz Walczak

Streszczenie W pracy tej rozważa się zagadnienie sterowania optymalnego linio- wym systemem dynamicznym z dyskretnym czasem, przy addytywnych zakłóce- niach. Zakłócenia są niezależnymi zmiennymi losowymi o jednakowym rozkładzie podanym z dokładnością do parametru. Sterowanie odbywa się w układzie zamknię- tym. Funkcja strat to nieujemnie określona forma kwadratowa zależna od stanu systemu i zastosowanego sterowania. Horyzont sterowania jest ograniczoną zmienną losową o znanym rozkładzie, niezależną od zakłóceń, a pomiary stanu nie są obar- czone błędem. Wykorzystując metodę programowania dynamicznego wyznaczono analityczną postać algorytmu bayesowskiego sterowania optymalnego w układzie zamkniętym: dla zakłóceń o rozkładzie jednostajnym na [0, λ] oraz dla zakłóceń o rozkładzie jednostajnym na [λ1, λ2].

2010 Klasyfikacja tematyczna AMS (2010): 60G40, 62L15.

Słowa kluczowe: Sterowanie bayesowskie, zakłócenia, rozkład jednostajny, rozkład Pareto, rozkłady sprzężone.

Dariusz Walczak holds a PhD degree in Business Admin- istration (Operations & Logistics) from Sauder Business School, University of British Columbia in Vancouver, along with MSc degrees in Mathematics from UBC, and MEng in Applied Mathematics/Engineering from Wro- claw University of Technology.

He is a Principal Research Scientist at PROS Inc.

in Houston, Texas where he is involved in design and deployment of revenue management (RM) and pricing optimization applications across a variety of industries. Dariusz currently chairs the Revenue Management and Pricing (RMP) Section of INFORMS.

Dariusz Walczak PROS Inc.

3100 Main Street, Suite 900 Houston, TX 77002, USA U.S.A.

E-mail: dwalczak@pros.com URL: http://www.pros.com

Communicated by: Krzysztof Szajowski

(Received: 23rd of November 2015; revised: 21th of December 2015)