Constrained minimization: Handling of linear and nonlinear constraints

(1)

CONSTRAINED MINIMIZATION.

HANDLING OF LINEAR AND NONLINEAR CONSTRAINTS

(2)

CONSTRAINED MINIMIZATION:

HANDLING OF LINEAR AND NONLINEAR CONSTRAINTS

l i l l

M

_ilUII: -J O c^ o fVJ U l

o as

o cc

BIBLIOTHEEK TU Delft P 1856 7021

(3)

CONSTRAINED MINIMIZATION:

HANDLING OF LINEAR AND NONLINEAR CONSTRAINTS

PROEFSCHRIFT

TER VERKRIJGING VAN DE GRAAD VAN DOCTOR IN DE

TECHNISCHE WETENSCHAPPEN AAN DE TECHNISCHE

HOGESCHOOL TE DELFT, OP GEZAG VAN DE RECTOR

MAGNIFICUS, IR. H.B. BOEREMA, HOOGLERAAR IN DE

AFDELING DER ELEKTROTECHNIEK, VOOR EEN

COMMIS-SIE AANGEWEZEN DOOR HET COLLEGE VAN DEKANEN,

TE VERDEDIGEN OP WOENSDAG 20 MAART 1974 TE

14.00 UUR

DOOR

CASPAR SCHWEIGMAN

WISKUNDIG INGENIEUR

GEBOREN TE 'S-HERTOGENBOSCH

1974

DRUK: STICHTING STUDENTENPERS NIJMEGEN

(4)

p

Dit proefschrift is goedgekeurd door de promotor:

PROF. DR. R. TIMMAN

(5)

PREFACE

On the subject of constrained minimization many papers have been published, and to add yet another requires some motivation. This study was started because in practice - the author is working in the Chemical Industry - there was felt to be a serious lack of efficient computer programs to solve

con-strained nonlinear minimization problems. This type of problems is encountered with increasing frequency. The computer programs have to be safe, since the problems to be solved are sometimes very intricate and may require much computing time. Examples of such problems are parameter estimation problems, where the parameters occur in sets of differential equations or in sets of nonlinear algebraic equations.

It appeared that the variables x in nonlinear physical and chemical minimization problems were often subject to the simple type of constraints:

a < X < b ,

where a and b are constants. Examples of this type of constraints are concentrations lying between 0 and 1, flows that are positive, reaction rate constants that are positive, etc. First, efforts were made to develop an efficient computer program specifically for this type of constraints. A simple method could be derived. The computer programs are successfully used in the solution of practical problems, like the interpretation of Gel Permeation Chromatographic curves and X-ray diffraction curves, the estimation of rate constants of polymerization and several

(6)

other reactions, the estimation of parameters in rheological models, color matching problems, etc. Later, an extension of

the method to linear constraints was developed. The

computa-tional results were promising, both in comparison with those of other techniques and in practice.From the theoretical point of view the extension to nonlinear constraints is the most interesting part of the study. The first results obtained with the recently developed computer programs are hopeful.

ACKNOWLEDGEMENTS

The author expresses his thanks to the Board of Management of Akzo and to the Management of Akzo Research Laboratories Arnhem for their consent to publish the results of this study, to his colleagues of the Applied Mathematics Department of Akzo Research Laboratories for their interest and stimulating comment, espe-cially to Mr. Ko Petiet and Mr. Piet de Mol for writing the computer programs and for valuable suggestions and criticism. Our discussions have contributed considerably to the results of this study. Special thanks are due to Mrs. M.M. Schippers and Mrs. Wilma Visscher for decoding the manuscript and typing it so carefully, and to Mr. W.J.A. van der Eijk for welcome lin-guistic criticism and advice.

(7)

CONTENTS

Preface 5

Acknowledgements 6

1 . I n t r o d u c t i o n 9

1.1 Constrained minimization 9 1.2 Scope of the thesis 17 1.3 Note on a notation 18

2. Bounded variables 19 2.1 Treatment of the constraints 19

2.2 Application to Gauss-Newton's method 28 2.3 Application to a rank-one quasi-Newton method 29

3. Linear inequality constraints 35

3.1 Introduction 35 3.2 Treatment of the constraints 39

3.3 Proof of convergence 43 3.4 Application to Gauss-Newton's method 50

3.5 Application to a rank-one quasi-Newton method 52

4. Nonlinear inequality constraints 55

4.1 Introduction 55 4.2 Solving nonlinear equations 58

4.3 Treatment of the constraints 62

(8)

5. Equality constraints 68 6. Dependence of the normal vectors on the 70

active boundaries

7. Illustrations of application of the algorithm 75

8. Computer programs 86 9. Literature 109 Appendix I 15

(9)

1. INTRODUCTION l.I Constrained minimization

The problem i s considered of how to minimize a n o n l i n e a r function F(x) T where the v a r i a b l e s x = (x , x „ , . . . , x ) a r e s u b j e c t to t h e i n e q u a l i t y c o n s t r a i n t s : h^(x) > 0, i = 1,2 mj and the e q u a l i t y c o n s t r a i n t s h^(x) = 0, i = mj + 1 m^

The functions F ( x ) , h.(x), i = l,2,...,m are assumed to be continuously differentiable.

Methods to solve constrained minimization problems are generally iterative processes. They contain a strategy to handle the constraints and use an unconstrained minimization technique to solve unconstrained minimization subproblems. Many unconstrained minimization techniques are available, such as direct search methods, Cauchy's steepest descent method, Newton's method and modifications of it, conjugate gradient methods, and variable metric or quasi-Newton methods. Some of these unconstrained minimization techniques have

suitable properties, which will be called the "basic properties" of the unconstrained minimization technique. For instance, by Newton's method a convex quadratic function is minimized in one step, by a variable metric method in n+1 iteration steps,

(10)

n being the number of variables. For recent surveys of uncon-strained minimization techniques reference is made to Powell [37, 3 8 ] , Fletcher [18] and Box, Davies and Swann [ 5 ] .

The handling of the constraints is the main theme of this study. The published methods of handling constraints can be divided into two classes:

(i) penalty function techniques (Fiacco and Mc Cormick [17])solving a sequence of unconstrained minimization problems.

. . . k (ii) methods where from an iteration point x a new

k+J

iteration point x is computed in the intersection k of some of the constraints which are active in x

k k (a constraint i is active in x , if h.(x ) = 0 ) . Approaches (i) and (ii) differ widely. Methods (i) - for a recent survey of which reference is made to Lootsma [33] -have the advantage that linear and nonlinear constraints can be treated in the same way. However, these methods can be time -consuming and are generally not so efficient as some of the methods (ii). This follows from numerical experience, while

for simple type of constraints, e.g. non-negativity constraints,

it can also be shown by reasoning. As a result, recently considerable efforts have been made to develop methods of class

(ii). Two approaches of (ii) are well known. They are the

feasible direotion methods and the projeatton methods. The

feasible direction method (Zoutendijk [50]) determines in an

. . . T

Iteration point x a direction s by maximizing s g subject to linear

(11)

are locally linearized. Dependent on the mormalization of s, the feasible direction method is a repetition of linear programming or of quadratic programming (Zoutendijk [50], Hadley [26]). Although the feasible direction method is attractive because of

its elegance, the efficiency is questionable; if no constraints are considered, the method equals the steepest descent method where the rate of convergence can be very low (cf. Powell [37]). Since there is no reason why the handling of the constraints should accelerate the process, the efficiency may not be expected to be better. Numerical experiences of the feasible direction method, published in the literature, are scarce. The main

atten-tion in recent years has been focussed on projeotion methods

originally presented by Rosen ([39, 40]). In an iteration point X these methods project a direction s computed with the aid of an unconstrained minimization technique on the intersection of the active constraints. Rosen's gradient projection method combines projection with the steepest descent method. Afterwards several combinations of the projection technique with other uncon-strained techniques have been published, e.g. combinations with the variable metric method of Davidon, Fletcher and Powell by Goldfarb [23] for linear constraints, with a modified Newton's method by Shanno [4, 45] for linear constraints and with a

rank-one variable metric method by Murtagh and Sargent [35, 4 4 ] for linear and nonlinear constraints. Handling of nonlinear constraints can be reduced to handling of linear constraints by introducing slack variables. It is difficult

(12)

experience is scarce and no computer programs have been published. The handling of nonlinear constraints with the aid of projection methods seems not to be simple. It would be beyond the scope of this thesis to give a detailed survey of all the suggestions put forward to handle constraints. The interested reader is also referred to Wolfe [47] and

Abadie [ 1 ] (the reduced gradient method), Fletcher [21] and Daniel [11 , 12].

A general sketch of the proposed method is as follows. The method is a strategy of type (ii);

. simple criteria are derived to decide whether the iteration process has to be continued in the intersection of some of the active constraints or not.

. if linear or nonlinear constraints are active in an iteration point, a linear or nonlinear transformation of coordinates

is introduced and the unconstrained minimization technique is applied to a lower-dimensional minimization problem within the transformed system of coordinates. The new iteration point is computed in such a way that the basic properties of the minimization technique are maintained.

The proposed method to handle the constraints can be applied to a wide class of minimization techniques. These minimization techniques have to satisfy some conditions, which will be summarized briefly. Minimization techniques will be considered,

(13)

which, when applied to unconstrained minimization problems, k . . k k generate in an iteration point x a direction s = s(x ) by:

**s*" = -**

H V

(1)

k k k where g is the gradient g(x ) and the n x n matrix H is

symmetrical. Moreover, the following condition has to be satisfied: T k k s g < 0 T k k k and s g = 0 ifj and only if, g = 0

Many minimization techniques satisfy these conditions, k , . T k

especially if H is positive definite, i.e. if x H x > 0 for

k k all X * 0. The increment vector d in the iteration point x

is given by:

d = A s

k+1

and the new iteration point x by: k+1 k ,k

X = X + d

To determine the scalar X , in this study two possibilities are distinguished:

a. line minimization is applied. In this case: T

k+1 k -g s = 0

k+1 k+1 k b. in the point x only holds F(x ) < F(x ) .

Situation b) is the most important from the practical point of view.

The handling of the constraints will be applied to two efficient minimization techniques. The first is Newton's method to

(14)

F(x) = E f.2(x) (2) j = ] J

It is assumed that the number of functions N is greater than or equal to the number of variables n : N > n. This type of functions frequently occurs in parameter estimation problems, especially when a least squares fit to experimental data has to be obtained. Newton's method to minimize functions of the

type (2) is called Gauss-Newton's method. This method (Hartley

[27]) is based on linearization of the functions f. (x) , j = 1,2,...,N in the neighbourhood of an iteration point x and

k

computes a direction s defined by (1) with the aid of: T

H*" = KG*" G'')"' (3)

where G is the N x n matrix with elements:

k ^^i^^'')

° i i " ~ 6 ^ . i = 1 . 2 N;j = l,2,....n (4)

J

T k k

It will be assumed that the matrix G G is non-singular. This T

k k k

implies that G G and hence also H defined by (3) are positive definite. Gauss-Newton's method has the property that the. minimum of F(x) is found within one step if the functions f. (x), j = 1,2 N are linear in x, so F(x) is a convex quadratic function.

The treatment of the constraints is also applied to a quasi-Newton or variable metric method. Quasi-quasi-Newton methods, for a survey of which reference is made to Powell [38], involve that

k . k .

in an iteration point x the matrix H is constructed as:

(15)

where the correction matrix is constructed in such a way that the inverse of matrix of second order partial derivatives is appoximated. The variable metric methods generally have the property that for a quadratic convex function

T T

F(x) = a + a X + jx Gx (6) in the point x the matrix H equals G . Important aspects

are, whether line minimization is applied and whether the matrix k

H is positive definite. Examples of quasi-Newton methods are the method of Davidon-Fletcher-Powell [15,19], where line

k minimization is applied and positive definiteness of H can be guaranteed, a rank-one method suggested by Broyden [ 6 ] ,

Davidon [16], Murtagh and Sargent [35] and others, where the k

matrix C is of rank one, and a rank-two method of Fletcher [20], where C has rank two. In the last two methods no line minimization has to be applied, but positive definiteness may be lost. Application of the treatment of the constraints is

illustrated on the basis of the Broyden's rank-one method, where the correction matrix C in (5) is constructed as

. .,k-l k-1 k-l,,,k-l k-1 k-l.T nk ^ (d - H y )(d - H y ) ,,k-l „k-1 k-l,T k-1 (d - H y ) y (7) jk-1 k k-1 with d = x - X k-1 k k-1 y = g - g k

The correction matrix C has rank one. If the increment vectors d , k = l,2,...,n are chosen independent and a convex quadratic function of the form (6) is minimized, then in the point x the matrix H equals G and the minimum of the quadratic

(16)

function IS found in x - H g . I t should be emphasized that for the proof of this property not only line minimization can be omitted, but that also the way the independent vectors d are chosen is immaterial. This last fact, already emphasized by Wolfe [48] and Powell [38], is of special importance for the

treatment of the constraints. The fact is that even if the

iteration process is continued on some boundaries, in n+1 steps the matrix H can still be equal to G when the increment vectors are independent. This implies that the treatment of the constraints generally does not decrease the efficiency of the rank-one method to minimize quadratic functions, as will be

seen in the next chapters. The choice of the independent vectors k . k

d is free, generally d is computed as

d^ = -

\W^

(8)

where X is a scalar. Bus [7 ] has shown that for a quadratic

function holds: i f d - H y s^C, i = l , 2 , . . . , k then the vectors

i . k d , 1 = l,2,...,k are independent. Computation of d by means

k k k

of (8) does not guarantee that F(x +d ) < F(x ) . For quadratic functions it is not necessary that this inequality is satisfied. For minimizing non-quadratic functions, however, one will prefer to generate a decreasing sequence of function values.

k^ k k k

If - g H g < 0, the scalar X in (8) can be chosen so that k k k

F(x +d ) < F(x ) . Murtagh and Sargent [35] derive conditions k

which guarantee that the matrix H is positive definite. In practice, however, these conditions are not necessarily satisfied

k'^ k k

(17)

several modifications can be applied, for instance the

Greenstadt correction [25]. Greenstadt replaces the matrix H , k T

which can be written as H = X A X , X consisting of the eigenvectors of H , and A = diag (X ,...,A ) , where A ,A ,...,A

I n 1 2 n

are the eigenvalues,by the matrix X | A | X with |A| = diag (I A I , |A I ,...|A | ) . A simpler procedure replaces

k k k k . . . k s by -s . If s = 0 and x is not a solution point, H can be chosen the identity matrix. This simple correction procedure is chosen here and is satisfactory.

1.2 Scope of the thesis

In Chapter 2 the constraints x > 0 are considered. The extension to linear inequality constraints is described in Chapter 3, where also a proof of convergence is given. For non convex

functions F (x) it is proved that a subsequence of the iteration process converges to a solution point, for convex functions the iteration process itself converges to the minimum point. For Gauss-Newton's method and the rank-one quasi-Newton method the minimum of a quadratic convex function subject to linear

con-straints can be found in a finite number of steps. In Chapter 4 the extension to nonlinear inequality constraints is considered, including the proof of convergence. Special attention is paid to the solution of sets of nonlinear equations when a new iteration point on a curved constraint has to be computed. To simplify the descriptions in Chapters 3 and 4 equality constraints are treated separately in Chapter 5. Throughout Chapters 3 and 4 the condition is required that the normal

(18)

vectors on the active constraints are independent. In Chapter 6 it is shown that the method of handling the constraints can also be applied without stating this condition. This is

illustrated for the situation where one normal vector is a linear combination of the other normal vectors. The idea can probably be extended to more general situations but it is not elaborated here. Applications to several well-known test functions are illustrated in Chapter 7. For some applications to chemical problems of the algorithms presented,reference is made to [22, 28]. In Chapter 8 separate computer programs are given

for:

(i) constraints of the type a < x < b (ii) linear constraints

(iii) nonlinear constraints

(i) and (iii) contain Gauss-Newton's method as unconstrained minimization technique, (ii) the rank-one quasi-Newton method.

1.3 Note on a notation

In the following chapters many sets will be introduced. Frequently use will be made of the following definition. Consider three sets A, B and C. Let A c c and B c c. Sets are defined by:

C - A = { i e C / i ) ^ A }

(19)

2. BOUNDED VARIABLES

2.1 Treatment of the constraints

This Chapter d e a l s with t h e problem of minimizing F(x) s u b j e c t to t h e c o n s t r a i n t s

X. > 0, i = 1,2 n (9)

The way of handling this type of restrictions can easily be extended to restrictions of the type:

a. < x. < b. , i = 1 ,2 n (10)

1 = 1 = 1 » . > \ /

where a. and b. , i = l,2,...,n are constants.

1 1

A set I is defined by

I = { l,2,...,n } The feasible region S is given by:

S = { x / x . > 0 , i e l }

The hyperplane x. = 0 is called boundary i.

Let a feasible point xeS lie on some boundaries. Subsets A(x) c I and B(x) c X are introduced by:

A(x) = { i e l / x . = 0 } (11)

B(x) = { i e l / 0 < x . < 6 } (12)

where 6 is a small number, > 0. The boundaries i e A(x) are

called active in point x, the boundaries i e B(x)-A(x)

almost active in point x. The motivation to introduce 5 is

(20)

a) b)

X

X . 1 boundary i 0 X. = 5 1 boundary i

Fig.1 a) boundary i is active in x.

b) boundary i is almost active in x.

Let g(x) be the gradient of F(x) . If x is a solution point, X necessarily satisfies:

g^(x) > 0, i e A(x)

(13) g^(x) = 0, i £ I-A(x)

These conditions are the Kuhn - Tucker conditions. For convex functions these conditions are also sufficient to guarantee that x is a solution point (global minimum). For non-convex functions, (13) implies that x is a global minimum, a local minimum or a saddle point. A point x satisfying (13) is also called a Kuhn-Tucker point.

In the algorithm the following sets will be used:

Aj(x) = { i £ A(x) / g.(x) > 0 } (14)

Bj(x) = { i £ B(x) / g.(x) > 0 } (15)

Before the proposed algorithm is described in detail a general sketch of the strategy of handling the constraints

(21)

will be given for a situation where in a iteration point

k k . k X the boundaries i e B(x ) are all active, so that B(x ) =

k k A(x ) . First it is investigated which boundaries i £ A(x )

k

satisfy g.(x ) > 0. The iteration process is continued on the k

boundaries A (x ) and the unconstrained minimization techni-k

que is applied to the variables x., i £ I-A,(x ) . So a direction s is computed where s. = 0 , i £ A (x ) . If the point X + s is not feasible because one or more active boundaries i - say i £ P - are crossed, the unconstrained minimization technique is applied to the variables x.,

k

i £ I-A (x )-P. This procedure is repeated until no active boundaries are crossed. If a new inactive boundary is crossed, the intersection with this boundary is determined.

A full description of the algorithm is given below. In an k

iteration point x the procedure is as follows:

Step 1• Investigation of the sign of the gradient.

A subset Q c I is introduced. First for Q is chosen:

Q : = I - Bj(x'') (16)

where B (x ) is defined by (15).

Step 2. Computation of a tentative direction s.

The number of elements of Q are called q. The elements of Q are ~k

called J,,J,,...iJ . A q - dimensional vector g is introduced by:

g^ = gi , i = 1,2 q,

(22)

A q-dimensional vector s is now computed by:

i = - njg'' (17)

The choice of the q x q matrix H depends on the used unconstrained minimization technique and is dealt with in 2.2 and 2.3.

k

A n-dimensional vector s = s(Q;x ) is defined by: s = s. , i = 1 ,2,.. .,q

•'i

s. = 0 , i £ I

^i "• ( 1 8 )

Step 3. Investigation of the direction s. A subset P c Q is defined by:

P = { i £ Q / i £ B(x'')-Bj(xS. s^ < 0 } (19) k k

where B(x ) and B (x ) are defined in (12) and (15). If P is empty go to step 4, else set

Q : = Q - P

and repeat step 2.

k

Step 4. Computation of the final direction s . The set Q is called Q .

k

A set T is introduced, defined by:

T^ = B|(x'')-Aj(x'') (20) k

The elements of T have been dropped on the basis of (16). k k k k

The last direction vector s = s(Q ;T ;x ) is given by: k . . -k s. = s. for 1 £ Q 1 1 ^ s"^ = -:^ for i £ T*^ (21) 1 1 s^ = 0 for i £ I-Q'^-T'^

(23)

Step 5. Crossing new boundaries.

It is now investigated if new boundaries are crossed. A subset [ c I - B(x ) is defined by:

r = { i £ I - B(x'') / x*;- + s^ < 0 } k k Positive numbers Y- e (0,1] , i e f and y = Y ( X ) £ (0,1] are introduced by:

x'^ k

v. = i , i£[ ; Y = m m Y,- (22)

^ ^ i£[ ^ k

If no new boundary is crossed, Y = '•

Step 6. Determination of the new iteration point. k k k

If s * 0, a number A = A(x ) £ (0,1]is determined by line k k k k

minimization between x and x + y s or on the basis of the requirement that F(x + A y s ) - F(x ) < - a2A y .

I g(x ) s(x ) I , where a.2 is a small number > 0. k k k k k

The increment vector d = d(x ) = d(Q ;T ;x ) is given by:

d'' = \^ y'^s^ (23) k+1

and the new iteration point x by: k+1 k ^ ,k

X = X + d

k .

The matrix H in (17) will be chosen is such a way that the following conditions are satisfied. Let g(Q;x ) be defined by g^(Q;x'^) = g*^, i £ Q and g.(Q;xS = 0, i £ I - Q and let

i

k k .

s(Q;x ) be defined by (18). H is chosen so that

g(Q;x^'^s(Q;x^ < - aj | g(Q;x'') | ^ < 0 (24) where a > 0 is a small fixed number. For the last set Q*^ and

(24)

from (21), (24) and the definition of T :

g(Q'';x'')^s(Q'^;T'';x'') < - aj |g(Q'';x^|2 -.Z^k ^i^iC^^^) i °

implying (25)

g(x'')^s(Q'';T^)= 0 if, and only if, g.(x^)= 0, i £ Q*^, i £ T''.

The following theorem is essential to the proposed algorithm.

k k Theorem 1. s , defined by (21), = 0 if, and only if, x is

a solution point.

Proof: a. Let x be a solution point. Then (13) is satisfied k k

and it follows that B (x ) = A (x ) . Consider the set Q = I - A (x ) in the first step of the algorithm. Because

k k g. = 0 for i £ Q , also s.(Q ;x ) = 0, i £ Q due to (17) and

(18). Because T*^ in (20) is $, Q*^ = Q and s^ = 0.

k

b. Conversely, if all s. = 0, i e I, it follows from (20) and (21) that T'' = $. From (25) it follows that g'f = 0 for i e Q*'. Assume that x is not a solution point. Then there is at least one integer j ^ Q satisfying:

j £ A(x'^) and g'^ < 0 (26)

or j e I - A(x'^) and g*^ * 0 (27) k k k k

Since Q = I - B(x ) and g. = 0 for i £ Q a j satisfying k k k (27) necessarily belongs to B(x ) and since T = B (x )

-A (x ) = $ also j i B (x*^) . So the conditions (26) and (27)

may be combined into:

j £ B(x'') - Bj(x'') A g'f < 0 (28) If a j £ I satisfies (28), j has necessarily been dropped

(25)

on the basis of (19). Consider the vector s corresponding k .

with the last set Q => Q from which one or more integers j are taken away on the basis of (19). It follows from (19) and (28) that

s . < O A g ' ! ^ < 0 , J £ Q-Q'^

k . k From this equation follows, because g. = 0 for i e Q , that

T

g"^ s(Q;xS > 0

k

This however, contradicts, (24). So x must be a solution point.

k k

Corollary 1. Let s be bounded, d , defined by (23), equals k

0 if, and only if, x is a solution point.

Proof: It is trivial that it follows from s = 0 that k k k k k d = A y s = 0 . Conversely, if s. * 0 for at least one i e I, it follows from (17), (21) and (25), that g(x^)^s'^ < 0, so A > 0. Because | s | < M one may write y > — (see (22))

k k . k k and d J= 0. So d = 0 if, and only if, s(x ) = 0 or x is a solution point (Theorem 1 ) .

Attention is drawn to the fact that a more direct approach

like "remove all active elements i with s . < 0 from Q and

repeat the calculations until a feasible direotion is computed"

can be shown to be incorrect. The iteration process can stop at a point which is no solution point. This is illustrated by the following example for two variables, where an iteration

k k k T .

(26)

The feasible region is given by x ^ 0, x > 0.

>^

u

J

/ ^-.'•

n

'///////////

Fig.2 Iteration point x , on two boundaries.

k k The negative gradient -g is shown in Fig. 2. x is not a

solution point because g. < 0. Let first Q = I. The computed

1J

increment vector s = s (Q;x ) is also drawn in Fig. 2. If the elements i £ Q satisfying s. < 0 are eliminated from Q, both variables x and x are eliminated and the algorithm stops in

k

X . Also if first x is eliminated and the minimization tech-nique is applied to only the variable x , a new direction

k

(0, s„') with s„' < 0 is computed, because g > 0 and

(0, s„') g < 0. So the variable x„ is eliminated as well and k k .

the algorithm stops in x , although x is not a solution point. In view of both the applicability of the algorithm in practice and the proof of Theorem 1, the order of the steps of the algorithm is essential.

Several variations of the algorithm are possible. For instance, (i) start with Q: = I - A (x ) and replace (16) by

(27)

After step 2 repeat step 1 until Q does not change. (ii) replace (19) by P= { i e Q / s.g. < 0 }.

For these variations analoga of Theorem 1 can be proved. In practice the difference between the proposed algorithm and the variation (i) is not important. Since the proposed al-gorithm is simpler, only this will be taken into account. Variation (ii) can have great influence on the itaration process, but generally does not accelerate the iteration process, because it disturbs the basic properties of the unconstrained minimization technique too much. This

modi-fication cannot be recommended and will be left out of consideration.

To conclude the description of the algorithm for handling constraints of the type (9), attention is drawn to the choice of 6 > 0 in (12). Both in the proof of Corollary 1 and in the proofs of convergence in the following chapters, the choice of 6 > 0 is essential. Moreover, also from a practical point of view the choice of 6 > 0 is useful to prevent that two

k k+1 . . k iteration points x and x lie too close together if x lies very close to boundary i and boundary i is crossed. This fact has also been taken into consideration by Murtagh and Sargent for their projection method [44].

For every 6 > 0 the algorithm is applicable. But if 6 is large there is no reason to prefer the iteration point computed with the aid of (21) to a normal intersection point. Therefore,

(28)

in practice, S is chosen small.

The extension to constraints of the type (10) is easy and will not be described in detail here. To this type of constraints the proof of convergence in the next chapter is applicable. As the algorithm for the constraints (10) is easier than the algorithm for general linear constraints, and (10) frequently occurs in practice, separate computer programs have been written for this type of constraints (see Chapter 8 ) .

2.2 Application to Gauss-Newton's method

For Gauss-Newton's method of minimizing functions of the type (2), the matrix H in (17) is given by:

, , T , -1

H = H G G'^)

q q q k

where G is the N x q gradient matrix consisting of the q k

columns j , j j of the matrix G defined by (4). As the

T T k k . k k .

assumption was made that G G is not singular, G G is q q k

positive definite, as is H . It follows that (24) is satisfied.

If f.(x), j = 1,2,...,N are linear, so F(x) is a quadratic convex function, the method of handling the constraints (10) in combination with Gauss-Newton's method imply that the minimum can be found in a finite number of steps. This is proved in the next chapter.

(29)

2.3 Application to a rank-one quasi-Newton method k

In t h i s s e c t i o n the c o n s t r u c t i o n of H in (17) w i l l be q

considered for t h e rank-one quasi-Newton method, which i s b r i e f l y d e s c r i b e d in Chapter 1. Let a n x n symmetric m a t r i x H be w r i t t e n a s :

H = «11 «12^

1^21 ^22j

where the dimensions of the matrices H , H

«21 ""'^ «22 are q x q, q x (n-q) and (n-q) x (n-q). Let H and H be nonsingular. A q x q matrix P (H) is defined by:

Fq(H) = H , ,

«12 «22«21 (29)

Writing

-1

°11

°]2l

1^°21 °22j

it is easily to derive that P (H) = D

k

Let in the iteration point x be given the n x n symmetric k-1 . . k k-1 k-1 matrix H . In the iteration point x = x + d the

k k k-1 k k

matrix H is H = H + C , where C is given in (7). The construction of H is given by:

H'' = P ( H ^ q q

(30)

For minimizing a convex quadratic function of the form (6), where the symmetric matrix G will be written as:

1 1 •^,2^ ^^^21 '^22J

(31)

and where t h e dimensions of G , G

(30)

q X q, q X (n-q) and (n-q) x (n-q), the construction (30) implies the following theorem.

Theorem 2: Consider the convex quadratic function (6). Let K = { 1,2 k } . Let the iteration points x , i £ K be given, and let x = x + d , i € K . Consider a subset Q c I. Without loss of generality it may be assumed that Q = { J,2 q } . The n-dimensional vectors d , i £ K will be

.T _.T _.T

written as d = (d ,d ) , where the q-dimensional vector d contains the first q elements of d and the (n-q) vector d the last (n-q) elements. If a subset K c K can be indicated by the following properties:

K contains exactly q elements . k £ K

. d^ = 0 for i £ K

d are independent, i £ K k+1

then in point x may be written: H"^*' = P (H»^"') = G-'

q q' ^ 11

Proof: The rank-one method has the property, see Broyden [6 ] , which easily can be proved by induction that:

H'^'^'G d"- = d^, i £ K

^i . k+1 Because d = 0, i e K it follows, when H is written as

1^21 ^22J

(31)

( ^ l S l " ^ 2 S l ) d. = d . . i c K,

(^21^^11 " ^ 2 ^ 1 ^ ^^1 = ° • i ^ ^

(32)

Consider H^"^' G^ ^ d^ = (A^ ^ - AJ2A22A2,) G, , d^; from (32) it

follows that H*^"^' G,,d^ = d^ , i £ K, (33) q 1 1 1

Since the vectors d , i £ K are independent and the number k+1 -1

of elements of K is q, necessarily H ~ G., .

If in the iteration points x , i £ K the increment vectors d , i e K are computed with the aid of (17) and (23), it can easily be checked for a quadratic function whether the vectors d , i £ K are independent. This follows from the property (see Bus [ 7 ]) that if d^ H \ ^ J= 0, i £ K , so see (7) -the correction matrix C * 0, -then -the vectors d , i £ K are independent. An analogon of this property is given in the following Theorem.

Theorem 3: Consider the iteration points x , i £ K and the vectors d , d , i e K as introduced in Theorem 2. If d = 0 for i £ K, c K and if d - H y ?= 0, i £ K, , where y contains

1 q-^ 1 •' the first q elements of y , then for quadratic functions the vectors d , i £ K are independent.

Proof: Let K' = { i £ K , i < k } . Assume d , i £ K are dependent and let d = .^„, a.d .

(32)

i £ K, it follows that HS'*" = .^„, O . H ' ^ G . J ^ = d'^ due to 1 q-^ leK 1 q 1 1

~k k~k

(33). This contradicts, however, d -H y * 0. So the vectors q

d , i £ K must be independent.

The matrix H defined by (30) may also be written as:

H^ = P (H*^) = H^"' + C*^ (34) q q q q

k-1 k-1 k where the matrix H = P (H ) . For C the following

q q q theorem can be derived.

-k-1 ~k-l Theorem 4: Let the q-dimensional vectors d and y contain

k-1 k-1

the first q elements of the vectors d and y respectively. ~k-l

Let the (n-q) dimensional vector d contain the last n-q

]5^_] ^k-l k elements of d . If d = 0 then the q x q matrix C in (34)

q equals: ,ïk-l „k-l-k-l, ~k-l „k-l~k-LT (d - H y ) (d - B y ) C*" = 9 3 (35) ^ -k-1 „k-l-k-l,T-k-l (d " Hq y ) y where H''"' = P (H*^"'). q q

Proof: the proof is given in the Appendix.

From Theorem 3 and 4 it follows that if C ^ 0 and d^ = 0

q

for i £ K. c K, the vectors d , i £ K are independent.

1^

The construction of H with the aid of (30) requires a q

k partitioning of the full n x n dimensional matrix H . In some situations it is also possible to compute H by

(33)

partitioning lower dimensional matrices. Let in the iteration k—1 . k-1

point X be given the set Q . To simplify the notation we

k—1 k-1 write R = Q and assume R = { 1,2 r }. Let in x not

k-1

only the n x n matrix H be given, but also the r x r matrix H defined by H = P (H ) . In the point x the r x r

r r r k

matrix H is computed as; r

H*^ = H*^-' + C*^ r r r

k k-1 where C is constructed analogous to (35). Let d. = 0,

k k i £ I - R, Theorem 4 may be applied and H = P (H ) , If in the

k k point x the set Q satisfies Q c R, the matrix H may also be

q

computed as H^ = P (H ) = P (H ) . So when Q c R the matrix H*^

q q r q ' ^ q can also be computed by partitioning the r x r matrix H , which

generally is less time consuming than computing P (H ) .

The treatment of the boundaries does not influence the efficiency of the rank one method for minimizing quadratic functions. If the first n iteration points do not all lie on the same boundaries, then generally in x the matrix H

-1 . . . I k

equals G . I f , out of k iteration points x ,...,x , q < k iterations points all lie on a set of (n-q) boundaries x. = 0,

1

k+1 k+1 —! i £ I-Q then in x the matrix H equals G . In practice, the number of iteration steps to minimize a convex quadratic function subject to the constraints x. > 0, i £ I will be not

J 1 = '

more than the number of iterations needed to minimize the function without constraints.

(34)

H is not necessarily positive definite. If (24) is not satisfied for non quadratic functions the same precautions are taken as described in Chapter 1 to guarantee that a monotonically decreasing sequence of function values is

(35)

3. LINEAR INEQUALITY CONSTRAINTS

3.1 Introduction

In Chapters 3 and 4 the minimization of the function F(x) is considered, where variables x are subject to the constraints:

h^(x) > 0, i = 1,2 mj (36) m being the number of inequality constraints. A set J is

introduced by:

J = { 1.2 fflj } The feasible region S is given by;

S = { X / h.(x) > 0, i e J } (37) The boundary h.(x) = 0 is called boundary i. A point x is

called feasible if x £ S. The point x is called feasible with regard to boundary i if x satisfies h.(x) > 0.

k

Consider a feasible iteration point x , which may lie on some k k

boundaries. Subsets A(x ) c J and B(x )c J are introduced by:

A(x'^) = { i £ J / h.(x'^) = 0 } (38) B(x'') = { i £ J / 0 < h^(x'^) < & } (39)

k k The boundaries i e A(x ) are active in the point x , the

k

boundaries i £ J - A(x ) inactive. On the analogy of the choice of 6 in (12), the number (5 in (39) is chosen a small positive number. The boundaries i £ B(x ) - A(x )are almost

k k

active in x . Let in x the elements of J be ordered in such a way that:

(36)

B(x'^) = { 1,2 m }

It will be assumed here that m < n. The situation where m > n will be discussed in Chapter 6.

In the present chapter linear inequality constraints are considered. The constraint functions h.(x) in (39) are given

by:

h.(x) = n^x + b. , i £ J (40)

1 1 1

The normal vector n. is directed inwards the feasible region k

S. A vector s in the point x is called a feasible direction with regard to boundary i £ J if a positive number A exists

T k

so that n.(x + As) + b. > 0, i £ J for all A £ [0,A ]. Any direction s is feasible with regard to the inactive

k

boundaries i e J-A(x ) . The function F(x) is minimal in a

point X if there exists a ball B(x ;r) = { x e S / | x-x | < r }, where r > 0, in such a way that for all x e B(x ;r) holds F(x) > F(x'*^).

A transformation of coordinates is introduced by:

C. = v^(x-x''), i £ I (41)

where I = { 1,2 n }. Let:

V. = n. for i £ A(x^) (42)

The transformation of coordinates (41) can also be written as:

C(x;x'') = 5 = N(x'') (x-x*") (43) where the rows of the n x n matrix N(x ) consist of the vectors

(37)

T k v., i £ I. It will be assumed that N(x ) is non-singular. An

k

active boundary i £ A(x ) can be written as: 5. = 0 , i £ A(x'')

A point 5 is feasible with regard to boundary i £ A(x ) if

C^ > 0 , i £ A(x'^) (44)

A direction s, corresponding with a = N(x )s in the new system of coordinates, is feasible with regard to an active boundary i £ A(x ) if, and only if:

0. > 0 , i £ A(x'^) (45)

The g r a d i e n t x(x ) , defined by: , k. 6F . k, . X,-(x ) = j r - ( x ) , 1 £ I i s given by:

T

g ( x ^ = N(x'') x(x'') (46)

The Kuhn-Tucker conditions will be given in the following theorem.

Theorem 5. k

Let in X a transformation of coordinates (41) be given and (42) be satisfied. Let x(x ) be defined by (46).

k

a. If F(x) is minimal in a feasible point x , the following conditions have necessarily to be satisfied:

X . ( x S > 0 , i £ A(x'')

' " (47)

X^ix^) = 0 , i e I-A(x'')

(38)

k F(x) is an absolute minimum in the feasible point x .

Proof:

1^

a. Assume that F(x) is minimal in x and that (47) is not (^ satisfied. This means that for at least one j £ A(x ) holds

k k

that x-(x ) < 0 or that for at least one j e I-A(x ) :

k . k k

X.(x ) * 0. Assume one j £ A(x ) satisfies x•(x ) < 0. Let o

be defined by:

a ^ = 0 , i £ l , i ? j

o. = - X j ( x )

k

The d i r e c t i o n s , defined by a = N(x ) s , i s f e a s i b l e because

T k T k 2 k k (45) i s s a t i s f i e d . Since s g(x ) = a x(x ) = ~ X'(x ) < 0 , x

cannot be a minimum. So the c o n d i t i o n s (47) a r e n e c e s s a r i l y . k k If for one j E I - A ( X ) holds t h a t x . ( x ) * 0 , t h e proof i s a n a l o g o u s .

b. For convex functions the following property holds (see e.g. Zangwill [49]) T F(x) - F(x'') > g ( x S ( x - x ^ so - with (43) en (46): T F(x) - F(x'') > x(x'') C

For each point x £ S, (44) holds, so if (47) is satisfied, k T . . . k also x(x ) S Ï. 0, which means that F(x) is minimal in x.

The identity of the conditions (47) with the Kuhn-Tucker theorem stating that in a minimum point necessarily the

(39)

k

gradient g(x ) is a linear combination of the normal vectors k

n. , i £ A(x ) with positive coefficients or:

g ( x S = E , u n (48) i£A(x'') "• "•

with:

u^ > 0, i £ A(x^) (49) can easily be shown. As

g ( x S = l X.CxSn. + Z X,-(xSv. (50)

i£A(x'') "• i£ I-A(x'') "• "• it follows from (48) and (50) due to the independence of the

k . k vectors n.,i £ A(x ) and v., i £ I-A(x ) that:

(51) "i " '^i*'^ ) . i e A(x )

0 = X i ( x S . i £ I-A(xS

Consequently,(49) and (51) imply (47).

3.2 Treatment of the constraints

The c o n d i t i o n s (47) have t h e same form as the c o n d i t i o n s (13)

in Chapter 2, where c o n s t r a i n t s of the type x. > 0, i e I

were c o n s i d e r e d . On t h i s analogy t h e treatment of t h e

con-s t r a i n t con-s i con-s bacon-sed. In an i t e r a t i o n p o i n t x a t r a n con-s f o r m a t i o n

of c o o r d i n a t e s (41) i s i n t r o d u c e d , where

V. = n. for i £ B(x )

1 1

It will be assumed that the vectors n., i £ B(x ) are k

independent. The vectors v., i £ I-B(x ) are chosen so that k

also v., i £ I are independent, so the matrix N(x ) in (43) is non-singular. The vectors v., i £ I-B(x ) are assumed to

(40)

k ü be chosen in a unique way, i.e. if in two points x and x holds that B(x^) = B(x^) then also N(x ) = N(x ) . This assumption will be used in the proof of convergence. On the

k k analogy of (14) and (15) the sets A (x') and B (x ) are defined by:

A|(xS = { i £ A ( x S / x-Cx*") > 0 } (52) B|(x'') = { i £ B ( x S / x.(xS > 0 } (53) where x(x ) is defined in (46). On the analogy of the algorithm described in the previous Chapter in an iteration point x the procedure is as follows.

Step 1. Investigation of the sign of the gradient A subset Q c I is introduced. First for Q is chosen:

Q: = I - Bj(x'') where B (x ) is defined by (53).

Step 2. Computation of a tentative direction a

The number of elements of Q is called q. The elements of Q are ~k

called j , j^,...,j . A q-dimensional vector x is introduced by:

~k k . , ,

X^ = X- , 1 = 1 ,2,. . .,q. i

A q-dimensional vector is now computed by:

a = - H^ x " (54) q

The choice of the q x q matrix H depends on the used uncon-q

strained minimization technique and is dealt with in 3.4 and 3.5. An n-dimensional vector a = o(Q;x ) is defined by:

(41)

o- = a , i = 1 ,2 q i i

(55) o. = O, , i £ I-Q

Step 3. Investigation of the direction o

This step is the same as in the algorithm of Chapter 2, if s. is replaced by o..

k

Step 4. Computation of the final direction a

k k

The set Q is called Q . A set T is introduced, defined by:

T"" = B|(x^ - Aj(xS (56) k k k k k

The l a s t d i r e c t i o n v e c t o r a = o(x ) = a(Q ;T ;x ) i s given by:

k . . „k o. = o. for 1 £ Q 1 1 ^ a"^ = - h . ( x ^ ) for i £ T^ (57) 1 1 o'^ = O for i £ I - Q ^ ' - T ' ' 1

Step 5. Crossing new boundaries

In the original system of coordinates s is computed by:

s^ = N~'(x'')o'' (58) It is ascertained if new boundaries are crossed. A subset

[ c J is defined by:

r = { i £ J-B(x^)/nT(x'^ + s*^) + b. < 0 }

For all i e r a number y. £ (0,1] is determined by n. (x + y. s ) + b. = 0 or: 1 I 1 T k ^ n.x +b.

^i = - -VlT^ ' i ^ r (59)

n.s 1

(42)

Y = min y. (60) i£[ "

-k If no new boundary is crossed, y = 1 .

Step 6. Determination of the new iteration point

k+1 k k The new iteration point x = x + d with

d^ = d(x'') = d(Q^;T'';x'') = A*^ y*" s^ (61) is computed in the same way as in step 6 of the algorithm of

Chapter 2.

k k k Let x(Q;x ) be defined by x-(Q;x ) = x•(x ) , i £ Q and X.(Q;x'*^) = 0, i £ I-Q and let o(Q;x'^) be defined by (55). On

k

the analogy of (24), H is chosen so that T

x(Q;xS a(Q;x^ < " «, | x(Q;x^ |^ < 0

Moreover, on t h e analogy of ( 2 5 ) : (62) X ( Q N X ^ '

a(QSTSxS < - a , I

X(QNX'^ | '

- I ^h ( x ^ x . ( x ' ' ) < 0

The following theorem is the analogon of Theorem 1.

k k .

Theorem 6: Let s(x ) be bounded, d defined by (61) is 0, if k

and only if, x is a solution point. ]^

Proof: Given the non-singularity of N(x ) , it is trivial that k k k k k from a = N(x )s = 0 it follows that d = 0 . Assume a * 0.

It follows from (54), (55), and (62) that x(x'^) a(x'^) k ^ k k = g(x ) s < 0 implying that A(x ) > 0.

k

(43)

k 5 k y > -— (see (59) and (60)). Thus from d = O it necessarily

M

k k k

follows that a = 0 . The proof that a = 0, if and only if, x

is a solution point, is analogous to the proof of Theorem 1.

3.3 Proof of convergence

The proof of convergence for the algorithm in 3.2 will be given k both for minimization techniques where in an iteration point x

k . . k the increment vector s(x ) is only a function of x , and for

k . . !>'

minimization techniques where s(x ) is a function of x , 0 < i < k, such as the variable metric methods. In the proof of convergence some lemmas about calculating a lower function value

k k k

on the line segment between x and y(x )s(x ) are used, which will be considered first.

Lemma 1 Let: K = { 0,1 ,2 } k * { X } , k € K be convergent to x { s(x'') } , k £ K " " " s* { y(x'') } , k £ K " " " y* * { A(x'') ) , k £ K " " " ^

F(x) is continuously differentiable and let F(x ) = F(x +A y s ) k

Let a number a > 0 exist, so that A(x ) > a, k £ K. k "^ k

Assume g(x ) s(x ) < 0, k £ K. If line minimization between

k k k k k X and X + y(x )s(x ) is applied, so that the number A(x )

e [0,1] is determined in such a way that:

F(x'' + A(x'^) y(xSs(x'')) < F(x'' + A Y ( X S S ( X S ) (63) for all A£[0,1]

(44)

then:

g(x*)''"s* = 0 (64) Proof: Owing to the continuity of F(x) and the made assumptions

it follows from (63) that:

F(x* + A* y*s*) - F(x*) = 0 < F(x* + Ay*s*) - F(x*) (65)

k •'• k

for all A£[0,i]. Because g(x ) s(x ) < 0, k e K also

* T Tt * T *

g(x ) s < 0. Assume g(x ) s < 0. This implies that there exist numbers A £(0,1] and C > 0 in such a way that

F(x* + Ajy*s*) - F(x*) < - C A,y* | g(x*)^s* | (66) Since it follows from Y ( X ) > a,k£K that y > ex, the right member

of (66) < 0, which, however, contradicts (65). So necessarily (64) is satisfied.

An analogous lemma will be of importance, if no line

minimization is applied but only a lower function value on the k k k k

line segment between x and x + Y ( X )s(x ) is computed. The next lemma is derived for the situation where - if x is not a

k k k k solution point - the line segment between x and x + y(x )s(x )

k k k k is bisected until in the point x + A(x ) Y ( X ) s(x ) ,

k

where A(x ) £(0,I], a function value is computed satisfying: F(x'' + A ( x S y ( x S s ( x S ) - F(x'') < - a A ( x ^ y(x'')

I g ( x ^ s(x'') I

The constant a may be any fixed number £(0,1), which in

k T k practice will be chosen very small. Since g(x ) s(x ) < 0,

k

a A(x ) £(0,1] satisfying (67) can be found indeed. If bisection is replaced by other methods such as the use of Fibonacci

(45)

numbers, quadratic interpolation, etc. analogous proofs of the following lemma can be given.

Lemma 2.

Let the sequences and assumptions of Lemma 1 be given. If in k k an iteration point x the value of A(x ) is computed by means

k k k k of bisection of the interval between x and x + Y ( X ) s(x ) in such a way that:

A ( x ^ = 2~\, i^ = 0,1,2....

and

F(x'' + A(x'') y(x'') S ( X ^ ) - F(x'') < - a^ A ( x ^ y(x'')

I , k T , k^ I g(x )•" s(x ) F(x'' + 2~J y ( x S s(x^)) - F(x'') > - «2 2"^ Y ( X S (68) (69) I g(x'')^ s ( x ^ I

for all j with 0 > j < i, -1 , then (64) is satisfied.

Proof: Analogous to the derivation of equation (65) it follows from (68) that:

* * * * , * , * * I , * , T * 1

F(x + A y s ) - F(x ) = 0 = - C A y | g(x ) s | . Since * * * T *

y > a , it follows that A = 0 or g(x ) s = 0 . Assume * T *

g(x ) s < 0. A number A exists, so that

F(x* + Ay*s*) - F(x*) < - a^ Ay* | g(x*)^s* | (70)

for all A£[0,A ] . Consider a fixed i with 2 "^ •^ i • From (69) *

It follows, since A = 0, that a number k exists, so that for

all k > kj : F(x'' + 2~^ y (x^) s(x^)) - F(x'^) > - ct^ 2~^ y(x'^)*

j g(x') s(x ) |. Consequently also:

(46)

It follows from (70) and (71) that

, * -i * *, _ , *, «-i * 1 * T * I F(x + 2 "• y s ) - F(x ) = «2 2 y | g(x )'s | As this equation has to be satisfied for all i with 2 "-A.» necessarily

* , *,T * .. F(x* + 2"\-*sJ-F(x*) * I , *,T * I y g(x ) s = S-im 1 '- i —i . = a2Y | g(x ) s |

i->-" 2~i * T *

which is impossible. So g(x ) s = 0 .

The proof of convergence is based on the following assumptions: 1. the feasible region S, given in (37), is closed and bounded

(compact).

2. F(x) is continuously differentiable.

3. the starting point x of the iteration process is feasible:

X £ S .

4. let in an iteration point x , k £ K = 0,1 ,2,. , . a transforma-tion of coordinates defined in (43) be given. The matrix

k

N(x ) is non-singular.

5. equations (62) are satisfied .

6. the number A(x ) , k £ K in (61) satisfies (63) or (67) , k . . .

7. a(Q;x ) given in (55) is bounded, i.e. there exists a

constant M so that a(Q;x ) < M. for all k £ K and all Q c I. k k k k If the matrix H only depends on x - so H = H(Q;x ) - and if

q q '^' ^ k . k .

ff(Q;x ) is continuous in x for given Q, then condition 7 follows from the continuity.

Two situations can be distinguished: k

(47)

* . * steps m a point x because the increment vector d(x ) = 0.

So X is a solution point (Theorem 6) b. the number of iterations is infinite:

k Due to condition 1 and because all iteration points x , k £ K lie within S there is a subsequence { x } , S, £ L <= K convergent

* Hi

to the accumulation point x . Consider the sequence { x + d }, St £ L c K, where d^ = d(Q'^;TSx'^) = A(x'^) y(x'^) S(Q'^ ;T'^ ;x^) . The sets Q and T have been defined in the algorithm. Consider the infinite subsequence { x } , m £ M c L with the following properties:

Q = Q' i.e. Q contains the same elements for m £ M (72) T " = T' B(x") = B' { y(x ) }, m £ M is convergent to y r , , m^ , ... * I A ( x ) j , m £ M i s convergent to A { o(Q;x ) }, m £ M is convergent to o (Q) for all Q c I (73) There exists such a subsequence { x }, m £ M because the number of realizations of Q , T and B(x ) is finite, y(x ) £(0,1 ] , A(x ) £(0,1], and because condition 7 is fulfilled.

N(x ) , m £ M is determined (see page 40) in such a way that from B(x™) = B', m £ M it follows that N(x") = N', m £ M. From

(73) and the definition of a(Q',T';x™) in (57) it follows that also a(Q',T';x ) is convergent to a vector a (Q';T').

(48)

then a*(Q';T') = N' s*(Q';T').

Because of condition J and continuity of F(x) the function F(x) k

is bounded from below, the sequence { F ( x ) } , k £ K

decreases monotonically due to condition 5 and 6, and converges to F(x ) . Moreover, F(x + A y S * ( Q ' , T ' ) ) = F(x*) since a bounded monotonically decreasing sequence has only one accumula-tion point. Since a(Q';T';x") is bounded and so y(x ) > a = —

1I2

given in Theorem 6, the Lemma's 1 and 2 may be applied, resulting

g(x*)^ s*(Q';T') = 0

Let x(x ) be defined by g(x ) = N' x(x ) then also:

X(x*)'^ o*(Q';T') = 0 (74) Due to condition (5) and (62) one may write:

X(x*)^ a*(Q';T') < " a^ | x(Q';x*) |^ - .^^, h.(x*) x.(x*)

implying - with (74) and the definition of T' by (56):

for i £ Q' : x.(x*) = 0 (75)

i £ T' : X-(x*) = 0 or i £ A^ (x*) (76)

Consider the elements of the sequence { x }, m £ M within the ball B(x ;r) = { x £ S / | x-x | < r }, where r is chosen so small that:

if: h.(x*) > 0 also h.(x"') > 0, i £ J (77)

X.(x*) > 0 also X^(x") > 0, i £ I (78)

o.*(Q) > 0 also a.(Q;x") > 0, i £ Q,

(79) for all Q c I

(49)

of sets Q.,j = !,2,...,p, where p < n, from which elements are eliminated on the basis of step 3 of the algorithm. It should be remarked that Q, = Q„ ^ ... Q = 0 ' . Consider Q . The elements

1 Z p p i eliminated from Q satisfy:

P

a.(Qp;x") < 0 A x.(x") < 0, i £ Q^-Q' Using (78) and (79) it follows that

o.(Q ) < 0 A x.(x ) < 0, i £ Q -Q' P = 1 - = -p

or - due to condition 5 - and analogous to the derivation of (75)i X.(x*) = 0 , i £ Qp

In the same way:

X.(x*) = 0 , i £ Qj , j = 1,2 p Since B (x ) = A (x ) uT' and for i £ A (x ) necessarily i £ A (x ) or h. (x ) = 0 A XJ;(X ) = 0 - due to (77) -, and for i £ T ' (76) is satisfied it follows that:

For i £ B (x ) : i £ A (x ) or x-(x ) = 0 (80) For i £ I-Bj(x"): x-(x*) = 0

(80) implies that x is a solution point.

It has been proved that a subsequence { x } , J l £ L c K o f the

k * iteration process { x }, k £ K converges to a solution point x .

For strictly convex functions the following corollary holds.

Corollary 2: For strictly convex functions the sequence k . ^

{ x } , k £ K converges to the solution point x . For, assume { X } , k e K, be not convergent or convergent to a point

** * k

(50)

{ x } , Jl' £ L ' c K e x i s t s so t h a t J,im x = x * x and S,'-*<»

** . . * X e S. I n t r o d u c e t h e t r a n s f o r m a t i o n of c o o r d i n a t e s 5 = N(x )

, * . . *

(x-x ) , where the non-smgular matrix N(x ) contains the rows

T ^ A ^ ir "kir ^ n . , i £ A(x ) . D e f i n e £ = N(x ) ( x - x ) . B e c a u s e F ( x ) i s , *, , **^ , * T , ** *s * T ** s t r i c t l y c o n v e x , F ( x ) - F ( x ) > g ( x ) (x - x ) = x ( x ) C

* ** * . .

= Z , *^ x . ( x ) S- > O b e c a u s e x i s a s o l u t i o n p o i n t and 1 £ A (.X ) I 1 = X £ S. T h i s c o n t r a d i c t s F ( x ) = F ( x ) , so { x } , k £ K i s c o n v e r g e n t t o x .

3.4 Application to Gauss-Newton's method

F o r G a u s s - N e w t o n ' s method t o m i n i m i z e f u n c t i o n s of t h e type k ( 2 ) , t h e m a t r i x H i n (54) i s g i v e n b y : q T q q q k

where the N x q matrix [ consists of the columns i £ Q of the q

k

N X n matrix f defined by:

r*" = G'^N(X'')"'

k k where G is defined by (4) and N(x ) given in (46). This way

to calculate H guarantees that the minimum of F as a function

q

of 5•> i £ Q is found within one step if the functions f. (x) in (2) are linear in 5, i.e. linear in x. If f.(x) , i =

X

],2,...,N are linear functions m x and G G is not singular, it follows that F(x) is a strictly convex quadratic function.

Minimization of a quadratic function subject to linear con-straints is known as quadratic programming. Application of Newton's method as described above to solve quadratic

(51)

programming problems implies that the minimum can be found in a finite number of steps. This will be proved for variation (i) of the algorithm as given on page 26. To solve quadratic program-ming problems, from theoretical point of view this variation may be preferred to the described algorithm. In practice the difference is not important.

Theorem 7.

If Gauss-Newton's method and the algorithm described in this Chapter - including variation (i) on page 26 - are applied to minimize a strictly convex quadratic function subject to linear

inequality constraints, then the number of iterations is finite. k

Proof: The sequence { x } , k £ K converges to the solution

* k point X (Corollary 2 ) . Those elements of { x } , k e K are

taken into consideration which lie within the ball B(x ;r) = { X £ S / I x-x I < r }, where r is so small that analogous expressions of (77) - (79) are satisfied. Consider the first

k k * point X 1, k £ K so that all x , k > k lie within B(x ; r ) .

k ki

Let 5 = N(x 1)(x '+s) be the point computed in the first step of the algorithm with the aid of Newton's method. Then:

5. = 0 , i £ A|(x 1) X.(Ü = 0 , i £ I-A,(x''l)

Call £* = N(x''l)(x*-x^l). Since Aj (x*) D A|(x^'):

5? = 0 , i e A|(x •)

X.(5*) = 0 , i £ Aj(x*) - Aj(x''l) due to (80) X^(C*) = 0 , i £ I-Aj(x*) due to (47).

(52)

k * Since N(x ) is non-smgular it follows that C = 5 and

ki *

x '+s = x .

3.5 Application to a rank-one quasi-Newton method

For the rank one method in Chapter 1 and 2 the construction k .

H requires the following computations.

k-1 k-1 Let in x the n x n matrix H be given. In the point

k k-1 ,k-l ^ • „k • , „k-1 „k x = X + d the matrix H is constructed as H + C ,

k k

where C is computed with the aid of (7) and the matrix H is

defined as

H^ = N(x'') H*" N^(x'')

The matrix } ^ is given by ff*^ = P (« ) , where the

q "^ q q

definition of the partitioned matrix P is given in (29). The analogon of Theorem 3 can be formulated as:

Theorem 8.

Consider the convex quadratic function (6), which in the k k

transformed system of coordinates 5 = N(x )(x-x ) can be written

as F(5) = a° + a^5 + 5 ^ H , where f = N " ' (X"*) G N ' ^ X ' ^ ) .

Let K = { l,2,...,k }. Let the iteration points x , i £ K be given and let x = x + d , i £ K . Define 5 = N(x ) d , i e K. Consider the set Q c I. Let Q = { 1,2 q }. The q-dimensional vectors 6 , i £ K contain the first q elements of 6 , the (n-q) vector 5 , i £ K the last (n-q) elements of 6 , i £ K.

Let the elements of the set of boundaries J be ordered in such a way that the last (n-q) rows of the matrix N(x ) are the

T

(53)

If a subset K c K can be indicated in such a way that K contains exactly q elements

k £ K

X . i

n. IS a row of N(x ) , i £ K , j = q+1 n n.d = 0, j = q+1 n or 6 = 0, i £ K

6 are independent, i £ K

.-u • .-u . ^ k+1 .^ „k+1 I--1 , p-l

then in the point x one may write H = I .., where I ..

is the inverse of the q x q submatrix given in[= '^11 '^12

l'^21 '^22j k i i .

P r o o f : S i n c e f o r t h e r a n k one method H G d = d , i e K a l s o

k i i .

may be written E [ 6 = 6 , i £ K . The proof is further

analogous to the proof of Theorem 3.

k-1 k .

Let two successive iteration points x and x lie on the same k-1 k . k - 1

boundaries and let N(x ) = N(x ) . Let in x be given the k-1 . . . k-1 set Q . To simplify the notation we set R = Q and assume

\^

R = {l,2,...,r}. If in the point x the set Q satisfies Q c R k k k

the matrix H can also be computed as H = P (ff ) , where the

q q q r' k

r X r dimensional matrix H is constructed analogous to the

r

k •

construction of H in 2.3, page 33. This can be less time

consuming. For handling linear constraints this modification is not so important in practice, because frequently the

k— I k

requireijient N(x ) = N(x ) is not satisfied.

For the rank one quasi-Newton method the following Theorem can be derived:

(54)

Theorem 9.

Let the rank-one quasi-Newton method and the algorithm described in this Chapter - including variation (i) on page 26 - be applied to minimize a strictly convex function

* subject to linear inequality constraints. Let the minimum x lie on (n-q) boundaries, which without loss of generality may be assumed to be the boundaries q+1, q+2,...,n.

Define Q = {l,2,...,q}. If the conditions of Theorem 8 are satisfied, the minimum is found within a finite number of steps.

Proof: Because from Theorem 8 follows that after a finite number of steps the iteration process equals Newton's method. Theorem 9 follows from Theorem 7.

(55)

4. NONLINEAR INEQUALITY CONSTRAINTS

4.1 Introduction

In this chapter the minimization of the function F(x) is considered, where the variables x are subject to the nonlinear constraints:

h.(x) > 0, i £ J

The functions h.(x), i £ J are assumed to be continuously

k k k differentiable. Let in a point x the sets A(x ) and B(x ) be defined by (38) and (39) . A transformation of coordinates is now introduced by;

C. = Jl.(x) , i £ I (81) k

The origin of the new system of coordinates is chosen in x ,

l.{-x^) = 0 i e l

For i £ B(x ) is chosen

i.(x) = h.(x) - h.(x'^), i £ B(x'*^)

The functions S,.(x), i e I-B(x ) are chosen continuously differentiable functions. It will be assumed that the gradient vectors V X,. (x ) , i £ I are independent.

Consider a point o in the new system of coordinates

£,, C.,,...>C with a. > 0 for i e A(x ) . The points to, where

1 2 n 1 =

t > 0, are all feasible with regard to the active boundaries k

i £ A(x ) . These points to generally correspond with the points of a curve x(t) in the original system of coordinates defined by:

(56)

to. = £.(x(t)), i £ I (82) On the basis of the assumptions that:

a. the functions Jl.(x), i £ I are continuously dif f erentiable k

b. the gradient vectors V Jl. (x ) , i £ I are independent, the im-plicit function theorem (Kantorovic [31], Rudin [42]) may be applied which states that there is a constant t > 0 in such a way that for all t £ [0,t ] the solution x(t) of (82) is uniquely

determined, and x(t) is continuously differentiable in t. It should be noted that x(t=0) = x .

k

The gradient x(x ) is given by (46) where the rows of the n x n k

non-singular matrix N(x ) are now defined as the vectors X k

V £.(x ) , i £ I. Due to the implicit function theorem the following lemma can be derived.

Lemma 3

k . . . T k

If X is a minimum point then necessarily a x(x ) ^ 0 for each

a satisfying o, > 0 for i e A(x ) .

Proof: Define the ball-segment B(x ;r) = { x / | x-x | < r , k

h.(x) > 0, i £ A(x ) }. The implicit function theorem implies that B(x ;r) contains an infinite number of points x. Due to the continuity of h.(x), i e J, a radius r = r > 0 can be chosen so that for all x £ B(x ;r ) holds x £ S. From the differentiability of F(x) and the continuity of x(t) for t £[0,t ] it follows that constants C > 0 and t„ £(0,t] exist

dF

(57)

-TV, J • ^- dF. ,_., , , k.T dx, k, , k^T „, k-1 The derivative — ( x ( 0 ) ) equals g(x ) Tr(x ) = g(x ) N(x ) a

k T dx k

= X(x ) 0, where Tr(x ) follows from differentiating (82). Due k

to the continuity of x(t) and o. > 0 , i £ A(x ) the number t„ k

corresponds with a radius r , so that all x(t) £ B(x ;r ) ,

k T k t £[0, t . ] . Assume x(x ) a < 0, then for all x(t) £ B(x ;r ) ,

t £(0, t ] - where r is defined by r = min (r ,r ) - holds k k that x(t) £ S and F(x(t)) - F(x ) < 0, so that x cannot be a minimum point.

It should be remarked that the condition that N(x ) is non-singular implies that the ball B(x ;r) introduced in the proof

k

of Lemma 3 contains more elements than x . The non-singularity of N(x ) - see also Lootsma [32] - may be called the 'constraint qualification', a concept introduced in almost all papers on nonlinear programming. Several conditions for the 'constraint qualification' have been formulated (see e.g. Daniel [12], for a detailed treatise about this subject see Gould and Tolle [24]). The result of Lemma 3 will be used in the following Theorem.

Theorem 10:

k

a. If F(x) is minimal in a feasible point x , the Kuhn-Tucker conditions (47) have to be satisfied.

b. For a convex function F(x) and concave functions h.(x), i £ A(x ) , the conditions (47) guarantee that F(x) is an absolute minimum in x .

(58)

Proof:

a. Let (47) not be satisfied because one x•(x ) < 0 for j £ A(x ) , The direction a defined by a. = 0, i £ I, i * j and o. =

• ^ 1 j

k T k - X-(x ) is feasible because o. > 0, i e l . Moreover a x(x )

2 k k = - X•(x ) < 0. Due to Lemma 3, x cannot be a minimum point. The proof is analogous if (47) is not satisfied because for

k k

one j £ I-A(x ) holds x•(x ) * 0.

b. Let (47) be satisfied in x . Due to a property for convex functions and with the aid of (46):

F(x) - F(x'') > x^(x'') N(x'') (x-x'') or - see (47):

F(x) - F ( x ^ > ^^(^k^ X ^ C x ^ V h.(x'')^ (x-x*") k

The functions h.(x), i £ A(x ) are concave, so it follows that

V h.(x'')'^ (x-x'^) > h.(x) - h.(x^) = h.(x), i £ A(x'')

For each feasible point x £ S holds h.(x) > 0, so with (47) it follows that:

F(x) - F(x'') > 0 k

Accordingly x is a global minimum.

4.2 Solving nonlinear equations

The algorithm to treat nonlinear constraints, which will be k described in detail in 4.3, computes in an iteration point x

k

a direction a within the system of coordinates 5 = <l(x). It has to be ascertained if the points t a , where t £[0,1], correspond with a curve x(t) in the original Euclidean space. Whether this curve x(t) exists and, if it exists, whether it