Index of /rozprawy2/10510

Pełen tekst

(1)AGH UNIVERSITY OF SCIENCE AND TECHNOLOGY KRAKÓW, POLAND. Faculty of Electrical Engineering, Automatics, Computer Science and Electronics. Ph. D. Thesis. Omer S.M. Jomah. Global Optimization Methods for Solving Multicriteria Optimal Control Problems. Supervisor:. prof. dr hab. Andrzej M.J. Skulimowski. Kraków, 2012.

(2) ACKNOWLEDGEMENT Although my name appears on the cover of this dissertation, so many great people have contributed to its completion and production. I owe my sincere and deep gratitude to everyone who has made this dissertation possible and because of whom my study experience has been something that I will like forever. My deepest gratitude is, first of all, to my supervisor, Prof. Dr. hab. Andrzej Skulimowski. I have been extremely lucky to have him as a supervisor; he gave me the freedom to explore on my own, and at the same time, the guidance to recover when my steps faltered. Professor Skulimowski has taught me how to question thoughts and ideas and to express them easily in a spontaneous way. His incredible patience and support aided me overcome many obstacles and crisis conditions until I finished the dissertation. I hope that one day, in the future, I would become a supervisor to students as Prof. Skulimowski has always been to me and to other students. I have benefited greatly from his thoughtful criticism, advice, mentoring and supervision. I also like to express my gratitude to the committee members and reviewers for their valuable time and efforts in guiding and commenting on the thesis. My work on this dissertation has involved many people that it will be impossible to thank and express gratitude to them all adequately. On top of those who helped me is Mr. ZAWADA who let me experience the Matlab programming, patiently corrected my writing and essentially supported my research; without his help, advice, and encouragement, this dissertation could never have been produced. I would also like to thank my parents. They were always supporting me and encouraging me with their best wishes. My wife who put up with and endured my continuous absences, both physically and psychologically, for a number of years, while I was studying in Poland, does deserve thanks and appreciation beyond measure. She and my children spent three very long years during which I was busy and totally wrapped up with the sheets of my dissertation; she with the little kids, proved willing to endure and showed great forbearance, tolerance and, above all, willingness to reap the fruits of my success. All of them, but particularly my wife, have contributed to this dissertation in ways, some they know, and most they cannot know.. -2-.

(3) DEDICATION. To My Parents with Love and Gratitude To My Wife. To My Children Khaula, Hamza and Saleh. -3-.

(4) TABLE OF CONTENT ACKNOWLEDGEMENT.................................................................................................................................. 2 DEDICATION.................................................................................................................................................... 3 1.. Introduction ........................................................................................................................................... 8. 2.. The Mathematical Background .................................................................................................... 11. 2.1. Vector, Normed and Metric Spaces ...................................................... 11 2.2. Topological Spaces ............................................................................... 14 2.3. The Hausdorff Distance ........................................................................ 16 2.4. Compactness of a Set ............................................................................ 17 2.5. Continuity of a Function ....................................................................... 18 2.6. Convex Hull of a Set ............................................................................. 18 2.7. Separation Theorems ............................................................................. 19 3.. An Introduction to Optimization................................................................................................. 21. 3.1. Introduction ........................................................................................... 21 3.2. The Nelder-Mead Method ..................................................................... 22 3.3. Algorithm Modification ........................................................................ 24 3.3.1. Multiple Reflections ............................................................................................................................ 24 3.3.2. Convergence Criteria.......................................................................................................................... 24. 3.4. Optimization of the Coefficients for the Set of Function ..................... 25 3.5. Applications .......................................................................................... 27 3.6. Tests Applied ........................................................................................ 29 3.6.1. Searching for the Minimum ............................................................................................................. 29. 3.6.2. Coefficient Optimization ................................................................................................................... 37. 3.7. The Library of the Test Functions ....................................................... 40 3.8. Database Creation ................................................................................. 40 3.9. Database Schema .................................................................................. 42 -4-.

(5) 3.10. Application ............................................................................................ 43 4.. Approaches to Solve Global Optimization Problems Occurring in Control ................. 45. 4.1. Local Search Methods ........................................................................... 45 4.1.1. Random Local Search ......................................................................................................................... 46. 4.1.2. Conjugate Gradient ............................................................................................................................. 47 4.1.3. Stochastic Approximation ................................................................................................................ 48. 4.2. Global Optimization .............................................................................. 48 4.2.1. Methods with Guaranteed Accuracy ............................................................................................ 49. 4.2.2. Indirect Methods .................................................................................................................................. 50 4.2.3. Direct Methods...................................................................................................................................... 50. 4.3. Introduction to Evaluation Algorithm................................................... 51 4.4. Evolutionary Approach ......................................................................... 52 4.4.1. The Main Ideas of Evolutionary Computation ......................................................................... 52 4.4.2. Genetic Algotithms .............................................................................................................................. 54. 4.4.3. Selection .................................................................................................................................................. 55. 4.4.4. Crossing ................................................................................................................................................... 56 4.4.5. Mutation .................................................................................................................................................. 56. 4.5. Non-dominated Sorting Genetic Algorithm (NSGA II) ....................... 57 4.5.1. Short Description of the Algorithm .............................................................................................. 57. 4.6. Genetic Operators.................................................................................. 59 4.7. Differential Evolution Algorithm (DE) ................................................ 60 5.. Multicriteria Optimization and Optimal Control ................................................................... 62. 5.1. The Formulation of the Multicriteria Optimization Problem ............... 62 5.1.1. An Overview of the General Methodology of Multicriteria Optimization .................... 63. 5.2. Solutions to the Global Multicriteria Optimization Problems .............. 65 5.2.1. A Selection of Approaches to Solve Multicriteria Problems .............................................. 65. 5.2.2. Scalarization Methods and Algorithms....................................................................................... 69 -5-.

(6) 5.3. Preference Modeling and its Applications to Selecting Compromise Solutions ................................................................................................ 72 5.3.1. The Basics of Utility and Value Theory ....................................................................................... 72. 5.4. Multicriteria Optimal Control ............................................................... 75 5.4.1. The Formulation of Multicriteria Optimal Control Problem for Difference and. Ordinary Differential Equations ................................................................................................ 75. 5.4.2. The Aggregation of Time Preferences ......................................................................................... 77 5.4.3. An Approach to Approximate the Pareto Set in Optimal Control Problems ............... 77. 5.4.4. Multicriteria Trajectory Optimization ........................................................................................ 78. 5.5. Basic Approaches to Solving Linear Multicriteria Optimal Control. Problems ................................................................................................ 78 6.. A Generalization of the Nelder-Mead Algorithm for Discrete and Discretized Optimal Control Problems............................................................................................................................... 80. 6.1. Introduction ........................................................................................... 80 6.1.1. An Introduction to Topological-Algebraic Ideas to Formalize Optimization Process. ................................................................................................................................................................. 81. 6.2. An Algebraic Structure in the Family of Simplices .............................. 82 6.3. The Nelder-Mead-type Algorithms for Solving Global Constrained and Combinatorial Optimization Problems ................................................. 85 6.3.1. Delauney Triangulation [170] ................................................................................................. 87. 6.4. Hybrid Nelder Mead Algorithm for Switching Point Optimization ..... 96 6.5. Multicriteria Population-based Extensions of Nelder-Mead Algorithm ............................................................................................................. 101 7.. Optimization of the Water Supply System in Libya – MMR (Man-Made River) ......... 103. 7.1. Libyan Water Supply System ............................................................. 103 7.1.1. Background ......................................................................................................................................... 103. 7.2. Water Supply System’s Optimal Control in a Discrete Time Space .. 109 8.. Final Discussion and Conclusions.............................................................................................. 114 -6-.

(7) INDEX OF FIGURES ................................................................................. 118 INDEX OF TABLES .................................................................................. 120 BIBLIOGRAPHY ....................................................................................... 121 APPENDIX A: Matlab codes. 131. -7-.

(8) 1. Introduction The main goal of this dissertation is to prove the following three theses: 1. The common Nelder-Mead algorithm for nondifferentiable optimization can be generalized using the simplicial complex theory to a cooperative system of global optimization processes that imitates an evolutionary procedure, but has a strict mathematical properties, and yields the global minimum. 2. The idea of cooperative optimization processes, derived from the Nelder-Mead algorithm can be further generalized yielding a procedure for solving global multicriteria optimization problems that finds all local Pareto minima. 3. Multicriteria optimal control problems governed by several specified classes of ordinary differential equations (linear stationary, with separable variables, and nonlinear locally linearizable) can be solved using a sequential extension of the global multicriteria Nelder-Mead algorithm, applied to the original optimal control problem after a discretization of control switching times. The above goal has been achieved by presenting the underlying mathematical and computational background in Chapters 2 to 5, then, in Chapter 6, by elaborating new discrete and hybrid discrete-continuous optimization methods based on the Nelder-Mead scheme, and by applying them to solving a multicriteria optimal control problem related to water supply in Chapter 7. Specifically, the dissertation is structured as follows. In the next Chapter 2 we provide a review of main concepts and facts concerning the mathematical background for the results presented in the subsequent chapters of this dissertation. We give an overview of definitions and methods of functional analysis that will be used throughout the thesis, including the basic topology, ordered spaces and separation theorems. In Chapter 3 we introduce a classical Nelder-Mead Method, and find its optimal parameters using a set of test functions. This method created by Nelder and Mead [91] is a simple and elegant way to determine the minimum of many real variables function. The algorithm’s coefficients (reflection, expansion, contraction, shrinking) for any number of functions has been optimized. With the parameter modifications proposed in Chapter 3 the algorithm becomes even faster and more accurate. In this formulation several further improvements simplifying the process of finding the minimum by the algorithm have been applied. A -8-.

(9) simultaneous determination of the two optima for two functions appears to be an interesting application. When operating with two criteria (two functions) the algorithm determines optimal points for both criteria. The results of the numerical tests are also shown. It is also worth to consider the stop conditions that can be used in the Nelder Mead method. Most of them compare the values of the optimized criterion achieved at points of the current simplex iteration and takes into account the size of this simplex. Any inconsistency in applying the stopping condition of this kind may considerably affect the accuracy of the end results. Summing up, the modified Nelder-Mead algorithms presented in Chapter 3 should be taken into consideration when solving multicriteria optimization problems. The results of the tests prove that this method is efficient. In Chapter 4, first we overview the methods of global optimization that can be used in solving global optimal control problems which is the main goal of this dissertation. Following the results of the previous Chapter 3, we pay a special attention to the direct methods, specifically to the Nelder-Mead algorithm which was selected as most suitable for solving discretized control problems. Then, we propose a method to optimize the parameters of the Nelder Mead algorithm by minimizing the number of steps necessary to reach a closeto-optimum-point of a linear combination of a family of test functions. The parameters are optimized in IR 4, by the Nelder-Mead method itself. In Chapter 5 we will present the differential evolution method, which shows some relationships to the Nelder-Mead method and may serve as a benchmark for the new algorithms based on the Nelder-Mead one introduced in this dissertation. Differential evolution (DE) is a method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality [137]. DE is used for multidimensional real-valued functions but like the Nelder-Mead method it does not use the gradient of the problem being optimized, which means DE does not require for the optimization problem to be differentiable as it is the case of other classic optimization methods such as gradient descent and quasi-Newton methods. DE can therefore also be used on optimization problems that are not even continuous, are noisy, change over time, etc. Being a typical representative of evolutionary algorithms, DE optimizes a problem by maintaining a population of candidate solutions and creating new candidate solutions by combining existing ones according to a simple formula., Then it keeps whichever candidate solution has the best score or fitness on the optimization problem at hand. In this way the optimization problem is treated as a black box that merely provides a measure of quality -9-.

(10) given a candidate solution and the gradient is therefore not needed. Such methods are commonly known as metaheuristics as they make few or no assumptions about the problem being optimized and can search very large spaces of candidate solutions. However, metaheuristics such as DE do not guarantee that an optimal, or even a near-optimal solution is ever found. In Chapter 6 the well-known Nelder-Mead algorithm for continuous non-differentiable optimization is modified in such a way that it can be used to solve discretized optimal control problems. Let us recall that it uses N+1 starting points in an N-dimensional decision space, which need just to be linearly independent - hence the other name of the algorithm: the downhill simplex method. The method, originally proposed by Nelder and Mead in 1965 [91], has become very popular and appeared in many variants since then, despite of its convergence deficiencies. In the present thesis we study the combinatorial properties of the generalized optimization algorithm based on the Nelder-Mead ideas. Namely, following [126] we study the Nelder-Mead-type procedure starting from a simplicial complex S rather than a simplex to allow for a parallel processing of mutually communicating search processes. This procedure can also be applied as a combinatorial search on a discrete set of vertices of all simplices from S. We propose an algebraic structure that describes the action of this algorithm for a given function f to be minimized and prove some properties of the proposed class of algorithms. In the next Chapter 7 the algorithms elaborated in Chapter 6 are applied to solving multicriteria optimal control problems related to drinking water supply in Libya. The final Chapter 8 contains conclusions.. - 10 -.

(11) 2. The Mathematical Background 2.1. Vector, Normed and Metric Spaces We start by presenting basic notions that will be used throughout this thesis. Definition 2.1.. A set 𝑉 is called a vector space (over the field IK) if the following conditions are satisfied: 1) X is an abelian group (written additively), i.e.. a) 𝑥 + 𝑦 = 𝑦 + 𝑥 for all 𝑥, 𝑦∈𝑉,. b) for each 𝑥∈𝑉 there exists a unique vector (denoted by 0) such that 𝑥 + 0 = 𝑥,. c) to each 𝑥∈𝑉 there corresponds a unique vector –x such that (– 𝑥) + 𝑥 = 0;. 2) A scalar multiplication is defined: to every element 𝑥∈𝑉 and each 𝛼∈𝐼𝐾 there is an. element of 𝑉, denoted by 𝛼𝑥, such that we have. a) 𝛼(𝑥 + 𝑦) = 𝛼𝑥 + 𝛼𝑦 for all 𝑥, 𝑦∈𝑉,. b) (𝛼 + 𝛽)𝑥 = 𝛼𝑥 + 𝛽𝑥 for all 𝑥∈𝑉 and 𝛼, 𝛽∈ 𝐼𝐾, c) 𝛼(𝛽𝑥) = (𝛼𝛽)𝑥 for all 𝑥∈𝑉 and 𝛼, 𝛽∈ 𝐼𝐾,. d) 1𝑥 = 𝑥 for all 𝑥∈𝑉.. The elements of a vector space are called vectors. In what follows, we consider vector spaces over IK = IR , i.e. the real number field. Definition 2.2.. A linear transformation of a vector space 𝑉 into a vector space 𝑊 is a mapping Λ: 𝑉 → 𝑊 such that 𝛬(𝛼𝑥 + 𝛽𝑦) = 𝛼𝛬𝑥 + 𝛽𝛬𝑦 for all 𝑥, 𝑦 ∈𝑉 and all scalars 𝛼 and 𝛽.. Λ is called a linear functional if 𝑊 is the field of scalars.. Now we give some examples of linear functionals, which are considered in further sections. n. 1) Let IR be an n-dimensional real space whose elements are denoted by 𝑥 = (𝑥1 , . . . , 𝑥𝑛 ).. Given 𝑎 = (𝑎1 , . . . , 𝑎𝑛 ) ∈ 𝐼𝑅𝑛 leta = (a1 , … , an ), a function F defined by 𝑛. 𝐹(𝑥) = � 𝑎𝑖 𝑥𝑖 n. is a linear functional on IR .. 𝑖=1. 2) Let. - 11 -. (2.1).

(12) 𝑏. 𝐹(𝑓): = � 𝑓(𝑥)𝑝(𝑥)𝑑𝑥,. (2.2). 𝑎. where a and b are real numbers such that 𝑎 < 𝑏, 𝑝 is a continuous positive-valued function on the interval [a,b] and f is a real measurable function on [𝑎, 𝑏]. Such F is a linear. functional on the space of all integrable real-valued functions on [𝑎, 𝑏], denoted by. 𝐿([𝑎, 𝑏]).. Definition 2.3.. Let V be a vector space. A nonnegative real valued function ‖. ‖: 𝑉 → 𝐼𝑅 is called a norm on V if the following conditions hold: a) ‖𝑓‖=0 if and only if 𝑓 = 0,. b) ‖𝛼𝑓‖=|𝛼|‖𝑓‖ for all 𝑓 ∈ 𝑉 and all scalars α, where |𝛼| denotes the absolute value of α, c) ‖𝑓 + 𝑔‖ ≤ ‖𝑓‖+‖𝑔‖ for all 𝑓, 𝑔 ∈ 𝑉.. A pair (V, ‖. ‖) is called a normed space. Remark that if all above conditions are satisfied except for a), then the function ‖. ‖ is called a seminorm in V.. A concept of a norm leads in a natural way to another important notion, which formally lets us measure the distance between two arbitrary vectors. Suppose that X is a nonempty set (not necessarily a vector space). Definition 2.4.. A nonnegative function 𝜌: 𝑋 × 𝑋 → 𝐼𝑅 is called a metric on X if it has the following properties:. a) ρ(x, y) = 0ifandonlyifx = y. � if and only if 𝑥 = 𝑦, b) 𝜌(𝑥, 𝑦) = 𝜌(𝑦, 𝑥) for all 𝑥, 𝑦 ∈ 𝑋,. c) 𝜌(𝑥, 𝑦) ≤ 𝜌(𝑥, 𝑧) + 𝜌(𝑧, 𝑦) for all 𝑥, 𝑦, 𝑧 ∈ 𝑋 (the triangle inequality).. A pair (𝑋, 𝜌) is called a metric space. Given a metric ρ on X, the number 𝜌(𝑥, 𝑦) is often called the distance between x and y in the metric ρ.. If (𝑉, ‖. ‖) is a normed space, then a metric on V is given by the formula 𝜌(𝑥, 𝑦) = ‖𝑥 − 𝑦‖.. Let us consider some useful examples of such spaces. - 12 -.

(13) Examples. a). n. The most common norm in IR (the Euclidean norm) is defined by 𝑛. ‖𝑥‖ = �� 𝑥2𝑖 for each 𝑥 = (𝑥1 , … , 𝑥𝑛 ). 𝑖=1. The metric induced by this norm is given by. 𝑛. 𝜌(𝑥, 𝑦) = ��(𝑥𝑖 − 𝑦𝑖 )2 𝑖=1. In the case of n=1, the above formula takes a simpler well-known form 𝜌(𝑥, 𝑦) = ‖𝑥 − 𝑦‖.. b) Let 𝐶 [𝑎, 𝑏] be the set of all continuous real functions defined on the interval [𝑎, 𝑏] (the notion of continuity will be discussed later). A norm inC[a, b] can be established by ‖𝑓‖ = max𝑎≤𝑡≤𝑏 |𝑓(𝑡)|.. Hence the distance between two functions f and 𝑔 is measured as. c). 𝜌(𝑓, 𝑔) = max𝑎≤𝑡≤𝑏 |𝑔(𝑡) − 𝑓(𝑡)|.. For 𝑝 ≥ 1, let ℒ𝑝 [𝑎, 𝑏]denote the space of all real-valued functions f on [𝑎, 𝑏] such that. 𝑓 𝑝 is an integrable (i.e. the integral of |𝑓|𝑝 over [𝑎, 𝑏] exists and is finite). A seminorm. ‖. ‖ in ℒ𝑝 [𝑎, 𝑏] is given by the formula. 𝑏. ‖𝑓‖ = �� 𝑓 𝑎. 𝑝 (𝑡)𝑑𝑡. 1/𝑝. �. .. Note that if the values of two functions, say f and g, differ in at least one point, we may still have the equality ‖𝑓‖ = ‖𝑔‖. This leads to the conclusion that the condition a) from the definition of a norm does not hold. To avoid this problem, we introduce an equivalence relation in 𝐿𝑝 [𝑎, 𝑏] identifying all functions, which are equal almost everywhere (i.e.. everywhere expect on a set of measure zero, which may be thought of as a “thin” set). We write [ 𝑓 ] for the equivalence class of a function 𝑓 ∈ 𝐿𝑝 [𝑎, 𝑏] associated with this relation. It can be shown that for 𝑓, 𝑔 ∈ 𝐿𝑝 [𝑎, 𝑏], the equality [ 𝑓 ] = [ 𝑔 ] is equivalent to f = g. almost everywhere. Finally, in the collection of all equivalence classes (denoted by - 13 -.

(14) 𝐿𝑝 [𝑎, 𝑏]) we may define a norm ||[ 𝑓 ]|| in the same way as the above seminorm. For simplicity, we will write || 𝑓 || instead of ||[ 𝑓 ]|| (having in mind the fact that this notation signifies that f is almost uniquely determined). Consequently, the distance between two elements (in what follows also called functions) in 𝐿𝑝 [𝑎, 𝑏] can be defined as 𝑏. 2. 1/2. 𝜌(𝑓, 𝑔) = �� 𝑔(𝑡) − 𝑓(𝑡)� � 𝑎. The spaces 𝐿1 [𝑎, 𝑏] and 𝐿2 [𝑎, 𝑏] are particularly useful for our purposes. Remark that normed spaces in b) and c) are referred to as function spaces, i.e. spaces whose elements are functions, regarded as just single points of the space. Note also that the functions belonging to 𝐿𝑝 spaces described in c) are not necessarily continuous. We will close this subsection with the following definitions. Definition 2.5.. For 𝑥∈𝑋 and 𝑟 > 0, we denote by 𝐾(𝑥, 𝑟) the set all elements 𝑦∈𝑋 such that 𝜌(𝑥, 𝑦) < 𝑟. and call it an open ball with the center at x and the radius r. Definition 2.6.. We say that a subset B of a metric space (𝑋, 𝜌) is bounded if there exist 𝑥∈𝑋 and 𝑟 > 0 such that 𝐵⊂𝐾(𝑥, 𝑟).. 2.2. Topological Spaces. A major role in the subsequent applications is played by both functionals and continuous mappings. In order to define the notion of continuity, one has to introduce first a topology. A topological space is a more general concept then a normed space or a metric space.. Definition 2.7.. A collection τ of subsets of a nonempty set X is called a topology on X if the following conditions hold: a) ∅∈X and X∈τ, b) if {𝑈𝑖 }𝑖∈𝐼 is a family of sets from τ, then ⋃𝑖∈𝐼 𝑈𝑖 ∈τ,. c) if 𝑈, 𝑉∈τ, then 𝑈∩𝑉∈τ.. - 14 -.

(15) An ordered pair (𝑋, τ) consisting of a set X and a topology τ on X is called a topological space.. Note that, by induction, the above condition c) can be equivalently written as c’) if 𝑈1 , … , 𝑈𝑛 ∈τ, then 𝑈1 ∩𝑈2 ∩ … ∩𝑈𝑛 ∈τ.. A subset U of X is called an open set if U∈τ. We say that a set U is a neighbourhood of an element x∈X if x∈U and U is an open set. A subset F of X is called a closed set if its complement in X is an open set. Therefore, the sets Ø and X are closed, arbitrary intersections of closed sets are closed, and finite unions of closed sets are also closed. An example of a topological space, which we shall frequently encounter, is the extended real line [-∞,∞]. Its topology is defined by declaring the following sets to be open: (a, b), [-∞, a), (a, ∞], and any union of segments of this type. When defining a topology in X, we do not have to be able to measure the distance between two arbitrary points in X. However, one of the most important and frequently used ways of imposing topology is to define it by a metric (in case the latter is given on a considered set). More precisely, if (𝑋, 𝜌) is a metric space, then a topology τ on X, which is said to be induced by the metric ρ, is defined as follows:. 𝑈 ∈ 𝜏 ⇔ ∀𝑥 ∈ 𝑈∃𝑟𝑥 > 0: 𝐾(𝑥, 𝑟𝑥 ) ⊂ 𝑈.. In other words, a set U is open in X if for its every element x we can find an open ball, which is contained in U, with the center at x. It is worth noting that, in view of the definition of τ, each open ball in X is an open set. As an example of such topology, considered in this dissertation, can serve so-called natural n. topology in IR induced by the Euclidean metric.. Finally, let us recall three fundamental notions that frequently prove useful to formulate the mathematical background of optimal control.. Let A be a subset of a topological space X. The smallest closed set in X containing A is called the closure of A. The largest open set contained in A is called the interior of A. The boundary of A is the intersection of the closures of A and 𝑋\𝐴. We denote the closure, the interior and the boundary of A by cl(A), int(A) and ∂A, respectively.. - 15 -.

(16) 2.3. The Hausdorff Distance By a multifunction we mean a function from a topological space X into the family of all subsets P(Y) of another topological space Y. Closed-valued multifunctions with values included in a metric space Y play an important role in optimization and optimal control. The family of all closed subsets of Y will be denoted by Cl(Y) and endowed with the metric called the Hausdorff distance in Cl(Y), which is defined below (we utilize this notion in Chapter 5). Definition 2.8.. Let X and Y be two non-empty subsets of a metric space (𝑀, 𝑑). We define their Hausdorff distance 𝑑𝐻 (𝑋, 𝑌) by. 𝑑𝐻 (𝑋, 𝑌) = 𝑚𝑎𝑥 { 𝑠𝑢𝑝𝑥∈𝑋 𝑖𝑛𝑓𝑦∈𝑌 𝑑(𝑥, 𝑦), 𝑠𝑢𝑝𝑦∈𝑌 𝑖𝑛𝑓𝑥∈𝑋 𝑑(𝑥, 𝑦)}.. (2.3). Figure 2.1. The Hausdorff distance of sets X and Y. It can be shown that 𝑑𝐻 (𝑋, 𝑌) = 𝑖𝑛𝑓{𝜀 > 0: 𝑋⊂𝑌𝜀 and 𝑌⊂𝑋𝜀 }, where 𝑋𝜀 = ⋃{𝑧∈ 𝑀 ∶. 𝑑(𝑧, 𝑥) ≤ 𝜀} (the set of the form 𝑋𝜀 is sometimes called a generalized ball of radius ε. around X). Informally, we can say that two sets are close in the Hausdorff distance if every point of either set is close to some point of the other set. The Hausdorff distance is a common measure of convergence of vector optimization algorithms that aim at an approximation of the set of nondominated points (cf. Chapters 3 - 16 -.

(17) and 7). It is well-known [126] that the subset of weakly nondominated points of a continuous multifunction is continuous in the Hausdorff distance, but the set of nonodominated points is not. This fact must be considered when designing vector optimization algorithms.. 2.4. Compactness of a Set Definition 2.9.. A collection C of subsets of X is said to be a covering of X if the union of the elements of C is equal to X (in this case we say that C covers X). If elements of C are open sets, then C is called an open covering of X. Definition 2.10.. We say that a subset K of a topological space X is compact if every open covering of K contains a finite subcollection that also covers K. If X is itself compact, then we say that X is a compact space. n. Remark that IR equipped with a natural topology is not a compact space, because we cannot choose its finite subcovering, for example, from the family of balls 𝐾(0, 𝑛), where n∈ IN. n. However, every point of IR has a neighbourhood whose closure is compact. A topological space having such property is called a locally compact space. In general, it takes some effort to decide whether a given topological space is compact or n. not. In the case of X= IR and a topology on X is induced by the euclidean metric, the compactness of a set can be equivalently described by the following elegant condition: n. a subset K of IR is compact if and only if it is closed and bounded. Since the definition of boundedness involves the notion of a distance, this useful characterization of compactness does not make sense in general (i.e. not necessarily metric) topological spaces.. - 17 -.

(18) 2.5. Continuity of a Function Definition 2.11.. Let (𝑋, 𝜏𝑋 ) and (𝑌, 𝜏𝑌 ) be topological spaces. A function 𝑓: 𝑋 → 𝑌 is said to be continuous. if for each open subset V of Y, the preimage V under f is an open subset of X.. Theorem 2.1. For 𝑓: 𝑋 → 𝑌, the following conditions are equivalent: a). f is continuous,. b) for each x∈X and each neighbourhood V of f (x), there exists a neighbourhood of 𝑈 of x such that 𝑓(𝑉) ⊂ 𝑈.. If the condition in b) holds for the point x of X, then we say that f is continuous at x. A continuity of a function can be expressed in a more convenient way if topologies on X and Y are induced by some metrics. Namely, if (𝑋, 𝜌) and (𝑌, 𝑑) are metric spaces, then the. continuity of 𝑓: 𝑋 → 𝑌 at the point 𝑥∈𝑋 is equivalent to the requirement that given 𝜀 > 0, there exists 𝛿 > 0 such that for all 𝑦∈𝑋 we have. 𝜌(𝑥, 𝑦) < 𝛿 ⇒ 𝑑(𝑓(𝑥), 𝑓(𝑦)) < 𝜀.. We are in a position to formulate the Weierstrass theorem, which plays an important role in mathematical analysis. Theorem 2.2. Let (𝑋, τ) be a compact topological space. A real-valued continuous function on a nonempty compact set contained in X attains its maximum and minimum, each at least once.. The above result is posed in a very general setting. In particular, it holds true for differentiable functions (as well as for those of higher regularity), also considered in this n. thesis, defined on compact subsets of IR .. 2.6. Convex Hull of a Set Definition 2.12.. Let V be a vector space and X be an arbitrary subset of V. X is said to be convex if and only if [𝑥, 𝑦] ⊂ 𝑋 for all 𝑥, 𝑦∈𝑋, where [𝑥, 𝑦] stands for the set - 18 -.

(19) {𝑡𝑥 + (1 − 𝑡)𝑦: 𝑡∈[0,1]}. Definition 2.13.. A convex hull or convex envelope for a set of points X in a real vector space V is the minimal convex set containing X. In two-dimensional spaces the convex hull can be represented by a sequence of the vertices of the line segments forming the boundary of the polygon, ordered along that boundary [128] . To verify that the convex hull of a set X exists, notice that X is contained in at least one convex set (the whole space V, for example), and any intersection of convex sets containing X is also a convex set containing X. It is then clear that the convex hull is the intersection of all convex sets containing X. This can be used as an alternative definition of the convex hull. Algebraically, the convex hull 𝐻𝑐𝑜𝑛𝑣𝑒𝑥 (𝑋) of X can be characterized as follows: 𝑘. 𝑘. 𝑖=1. 𝑖=1. 𝐻𝑐𝑜𝑛𝑣𝑒𝑥 (𝑋) = �� 𝛼𝑖 𝑥𝑖 |𝑥𝑖 ∈ 𝑋, 𝛼𝑖 ∈ 𝐼𝑅, 𝛼𝑖 ≥ 0, � 𝛼𝑖 = 1, 𝑖 = 1,2, …�.. 2.7. Separation Theorems In this section we make a brief overview of so-called separation results, which can be formulated either in algebraic or in geometric form. For the purposes of this thesis, the latter one is more convenient. Definition 2.14.. Let V be a real vector space. A real-valued function ƒ is called sublinear if 𝑓(𝛾𝑥) = 𝛾𝑓(𝑥) for all 𝛾 ≥ 0 and 𝑥∈𝑉 (positive homogeneity),. 𝑓(𝑥 + 𝑦) ≤ 𝑓(𝑥) + 𝑓(𝑦) for all 𝑥, 𝑦∈𝑉 (subadditivity).. Note that every seminorm on V (and so every norm on V) is sublinear. First, let us recall a classical result of functional analysis obtained independently by H. Hahn and S. Banach. Theorem 2.3. Let V be a real vector space and 𝑝: 𝑉 → 𝐼𝑅 be a sublinear function. Suppose that 𝜆: 𝑌 → 𝐼𝑅. is a linear functional defined on a linear subspace Y of X which satisfies 𝜆(𝑥) ≤ 𝑝(𝑥) for - 19 -.

(20) all 𝑥 ∈ 𝑋. Then, there exists a linear functional Λ: 𝑋 → 𝐼𝑅 such that Λ(𝑥) ≤ 𝑝(𝑥) for all 𝑥 ∈ 𝑋, and Λ(𝑥) = 𝜆(𝑥) for all 𝑥 ∈ 𝑌.. Remark that the above Hahn-Banach theorem can be extended to the case of a complex vector space X. From its original form one can also derive its geometric version, formulated below, called the separation theorem. Definition 2.15. If λ is a real-valued linear functional defined on a vector space and 𝑎 ∈ 𝐼𝑅, then the set. {𝑥 ∈ 𝑋: 𝜆(𝑥) = 𝑎} is called a hyperplane. Definition 2.16.. We say that two sets A and B are separated by a hyperplane if there exists a continuous realvalued linear functional λ and 𝑎 ∈ 𝐼𝑅 such that 𝜆(𝑥) ≤ 𝑎 for all 𝑥 ∈ 𝐴 and 𝜆(𝑥) ≥ 𝑎 for all. 𝑥 ∈ 𝐵. If 𝜆(𝑥) < 𝑎 for all 𝑥 ∈ 𝐴 and 𝜆(𝑥) > 𝑎 for all 𝑥 ∈ 𝐵, then A and B are strictly. separated by a hyperplane. Theorem 2.4.. Suppose that A and B are disjoint convex sets in a locally convex vector space X. Then the ensuing conditions hold: a) If A or B is open, then they can be separated by a hyperplane. b) If A and B are both open or A is compact and B is closed, they can be strictly separated by a hyperplane.. Separation theorems are often used to prove the sufficient optimality conditions, in both, single, and multiple criteria cases.. - 20 -.

(21) 3. An Introduction to Optimization 3.1. Introduction In this chapter we will provide the basic notions of optimization in n-dimensional real spaces (Multiple Variable Optimization), multicriteria optimization and the Nelder-Mead algorithm, the latter called also the ‘downhill simplex’ or ‘Amoeba Method’.. The methods of finding the function extrema can be divided into stochastic and deterministic. Stochastic Methods The characteristic feature of these methods is the use of random finding mechanisms of the target function extrema. The stochastic methods include e.g. simulated annealing and evolutionary processing. Deterministic Methods We can distinguish between the gradient and non-gradient deterministic methods. The gradient methods demand knowing the gradient, or the first derivative, whereas in the nongradient methods, the gradient is not necessary to be known. These methods are often called direct search methods. When can the non-gradient methods be employed? In the case when the function is not differentiable but continuous in Lipschitz’ sense, so called direct search or non-gradient methods can be implemented. They can be used to search for specific directions. The most promising directions are searched for in some of the methods. Recall that if (𝑋, 𝜌) and (𝑌, 𝑑) are metric spaces, we say that a function 𝑓: 𝑋 → 𝑌 fulfils. the Lipschitz condition if there exists a constant 𝐿 > 0 such that 𝜌(𝑓(𝑥1 ), 𝑓(𝑥2 )) ≤ 𝐿𝜌(𝑥1 , 𝑥1 ). for all 𝑥1 , 𝑥2 ∈ 𝑋. A function fulfilling the Lipschitz condition is uniformly continuous. Examples of the optimization algorithms using the non-gradient methods: 1.. Nelder-Mead Method [91],. 2.. Hooke-Jeeves Method [68], called also Pattern Search - 21 -.

(22) 3.. Rosenbrock Method [87],. 4.. Gauss-Seidel Method [96],. 5.. Powell’s Method [48],. 6.. Zangwill’s Method [10].. 3.2. The Nelder-Mead Method In this subsection we will show the basics of the Nelder-Mead Algorithm that belongs to the class of direct search methods. It maintains the set of temporary points that belong to a simplex in the decision space. A simplex is a figure that has one more vertex than the dimension of the domain of the target function and all its vertices are linearly independent. This algorithm was created in 1965 by Nelder and Mead and is also famously known as the Downhill Simplex Method. More on the Nelder-Mead algorithm is contained in Chapter 6. Here we shall concentrate our attention on optimizing the method’s parameters. This method is working well for nonlinear function, but demands a great input of numerical work, especially if there is a great number of decision variables. That is why it is recommended to use this algorithm for target function with no more than 10 dimensions. An one-dimensional simplex is a segment with two vertices, a two-dimensional simplex is a triangle, and generally,an n-dimensional simplex with n+1 vertices is a set of all points specified by vectors:. 𝑥=�. 𝑛+1 𝑗=1. 𝑥𝑗 𝑆𝑗 , where 𝑥 = �. 𝑛+1 𝑗=1. 𝑥𝑗 = 1 and 𝑥𝑗 ≥ 0.. (3.1). So this is a regular polyhedron with 𝑛 + 1 vertices coinciding with 𝑛 + 1 basis vectors 𝑆𝑗 . The coordinates of the point of the simplex are marked as xj [90]. The basic operations: •. •. The reflection of the point Ph across Sr 𝑃∗ = (1 + 𝑎)𝑆𝑟 − 𝑎𝑃ℎ. Expansion of point P* across Sr. 𝑃∗∗ = (1 + 𝑐)𝑃∗ − 𝑐𝑆𝑟 - 22 -.

(23) •. •. Contraction of the point Ph across Sr 𝑃∗∗∗ = 𝑏𝑃ℎ + (1 − 𝑏)𝑆𝑟. Shrinking of the point Ph across Sr. 𝑃 ∗∗∗∗ = 𝑃∗ − (𝑃∗ − 𝑆𝑟 ). Notation: •. a Reflection coefficient (assumed 1). •. c Expansion coefficient (assumed 2). •. b Contraction coefficient (assumed 0.5). •. d Shrinking coefficient (assumed 0.5). •. PL chosen vertex point of the simplex among n+1 vertices. •. Pi, the point where the function reaches its minimum. •. Ph chosen vertex point of the simplex among n+1 vertices. •. Pi, the point where the function reaches its maximum. •. Sr the centre of symmetry of the simplex not including the Ph point defined as. •. N iteration number. 𝑆𝑗 =. ∑𝑛+1 𝑗=1 𝑃𝑗 𝑛. , where 𝑗 ≠ ℎ. (3.2). Steps of the basic algorithm: 1. Calculation of the dependent variable of the target function in the vertex points of the simplex, 𝐹𝑗 = 𝑓(𝑃𝑗 ) for 𝑗 = 1, . . . , 𝑛 + 1.. 2. Determination of h and L such that 𝑓(𝑃ℎ ) = 𝑚𝑎𝑥 and 𝑓(𝑃𝐿 ) = 𝑚𝑖𝑛 among set Fj.. 3. Calculation of the centre of symmetry for Sr simplex. 4. Reflection of P∗ point Ph across Sr.. 5. Calculation of the dependent variable of the function 𝐹𝑆 = 𝑓(𝑆𝑟 ) and 𝐹0 = 𝑓(𝑃∗ ). If 𝐹0 < 𝑚𝑖𝑛, then:. 6. Calculation P∗∗ (expansion) and the dependent variable 𝐹𝑒 = 𝑓(𝑃∗∗ ).. 7. If 𝐹𝑒 < 𝑚𝑎𝑥, we substitute 𝑃ℎ = 𝑃 ∗∗ , otherwise 𝑃ℎ = 𝑃∗ .. 8. The repetition of the first step algorithm if the criterion for the minimum is not fulfilled. - 23 -.

(24) If 𝐹0 > 𝑚𝑖𝑛, then:. 9. If 𝐹0 ≥ 𝑓(𝑃𝑗 ) for 𝑗 = 1, … , 𝑛 + 1 (not including j = h) and 𝐹0 ≥ 𝑚𝑎𝑥, transition to the next step.. If 𝐹0 < 𝑚𝑎𝑥, we substitute 𝑃ℎ = 𝑃∗ .. 10. Contraction of the P∗∗∗ point Ph about Sr. 11. Calculation of 𝐹𝑘 = 𝑓(𝑃∗∗∗ ).. 12. If 𝐹𝑘 ≥ 𝑚𝑎𝑥, we reduce simplex according to the formula 𝑃𝑗 = 0, 5∗ (𝑃𝑗 + 𝑃𝐿 ) for 𝑗 = 1, … , 𝑛 + 1.. 13. Whereas, if 𝐹𝑘 < 𝑚𝑎𝑥, we substitute 𝑃ℎ = 𝑃 ∗∗∗ and continue with the step 11. 14. If 𝐹0 < 𝑓(𝑃𝑗 ) for 𝑗 = 1, … , 𝑛 + 1 (not including j = h), we substitute 𝑃ℎ = 𝑃∗ .. 15. Repetition of the first step procedure, if the stop criterion was not fulfilled.. 3.3. Algorithm Modification 3.3.1. Multiple Reflections The step of the algorithm which is the reflection of P* point Ph across Sr undergoes modification. Reflection of P* is made for Ph1 and Ph2 points across their symmetry centre Sr1 and S r2. Ph1 and Ph2 are such points that 𝑃𝐿 < 𝑃ℎ1 and 𝑃𝐿 < 𝑃ℎ2 .. Next, 𝐹ℎ1 = 𝑓(𝑃ℎ1 ) for 𝐹ℎ2 = 𝑓(𝑃ℎ2 ) is counted.. If 𝐹ℎ1 < 𝐹ℎ2 , then the next for the next step point 𝐹ℎ is left, otherwise point 𝐹ℎ2 . 3.3.2. Convergence Criteria. In the original version of the algorithm simplex can be converted as long as the distance between its vertices, close to the searched for extreme, is smaller than the assumed accuracy of calculation 𝑒 > 0.. In the modified version there are various different stop criteria: • •. 𝑑 ≤ od 𝑒 > 0, where d is the sum of all lengths of simplex edges, 𝐹ℎ − 𝐹𝐿 ≤ od 𝑒 > 0,. - 24 -.

(25) �𝑃ℎ𝑘 − 𝑃𝐿𝑘 � ≤ od 𝑒 > 0, where 𝑘 = 1, … , 𝑁,. •. �𝑃ℎ𝑘 − 𝑃𝐿𝑘−1 � ≤ od 𝑒 > 0, where 𝑘 = 1, … , 𝑁.. •. 3.4. Optimization of the Coefficients for the Set of Function In order to optimize the algorithm coefficients for the set of function the ensuing steps should be followed: •. Creation of a tableau (cells) with the indices for the function.. •. Creation of the initial simplex with which the optimization of the function will be started.. •. Creation of the simplex with the algorithm coefficients. Four coefficients: reflection, expansion, contraction and shrinking.. •. Making a function optimizing the coefficients with parameters which are the variables that had been created in the previous steps.. Example. Coefficients for four functions will be optimized - ackley, beale, bh1, booth. These are functions of two variables. That is why the initial simplex with which the function values will be optimized needs to have three vertices, each of them with two coordinates. The algorithm uses four coefficients, that is why the initial simplex needs to have five vertices, each of them specified with one coordinate. Definitions [169]:. Ackley function: 𝑓(𝑥) = 20 + 𝑒 − 20𝑒. 1� � 1 ∑𝑛 𝑥 2 5 𝑛 𝑖=1 𝑖. −1. −𝑒𝑛. ∑𝑛 𝑖=1 𝑐𝑜𝑠 2𝜋𝑥𝑖. .. Beale function: 𝑓(𝑥) = (1.5 − 𝑥1 + 𝑥1 𝑥2 )2 + (2.25 − 𝑥1 + 𝑥1 𝑥22 )2 + (2.625 − 𝑥1 + 𝑥1 𝑥23 )2 . Bohachevsky functions: 𝑓1 (𝑥) = 𝑥12 + 2𝑥22 − 0.3𝑐𝑜𝑠 (3𝜋𝑥1 ) − 0.4𝑐𝑜𝑠 (4𝜋𝑥2 ) + 0.7, 𝑓2 (𝑥) = 𝑥12 + 2𝑥22 − 0.3𝑐𝑜𝑠 (3𝜋𝑥1 )𝑐𝑜𝑠 (4𝜋𝑥2 ) + 0.3, 𝑓1 (𝑥) = 𝑥12 + 2𝑥22 − 0.3𝑐𝑜𝑠 (3𝜋𝑥1 + 4𝜋𝑥2 ) + 0.3. - 25 -.

(26) Booth function:. Griewank function:. Hump function:. 𝑓(𝑥) = (𝑥1 + 2𝑥2 − 7)2 + (2𝑥1 + 𝑥2 − 5)2 . 𝑛. 𝑛. 𝑖=1. 𝑖=1. 𝑥𝑖2 𝑥 𝑓(𝑥) = � − � cos( 𝑖� ) + 1 4000 √𝑖 1. Rastrigin function:. 𝑓(𝑥) = 4𝑥12 − 2.1𝑥14 + 3𝑥 6 + 𝑥1 𝑥2 − 4𝑥22 + 4𝑥24 . 1. 𝑛. 𝑓(𝑥) = 10𝑛 + �(𝑥𝑖2 − 10 cos(2𝜋𝑥𝑖 )) 𝑖=1. •. Creation of a table with the indices for the function » functions = @ackley, @beale, @bohachevsky, @booth functions =. @ackley @beale @bohachevsky @booth. •. Creation of the initial simplex with which the optimization of the function will be started. » simplex_function = [0,0; 0,1; 1,0] simplex_function =. •. 00 01 10. Creation of the simplex with the algorithm coefficients. Four coefficients: reflection, expansion, contraction and shrinking. » simplex_coefficients = [ 1, 2, 0.5, 0.5; 0.8, 1.6, 0.4, 0.4; 1.2, 2, 0.6, 0.5; 1, 2.2, 0.5, 0.7;. - 26 -.

(27) 1, 2, 0.3, 0.3; ]. simplex_coefficients =. •. 1.0000 2.0000 0.5000 0.5000 0.8000 1.6000 0.4000 0.4000 1.2000 2.0000 0.6000 0.5000 1.0000 2.2000 0.5000 0.7000 1.0000 2.0000 0.3000 0.3000. Making a function optimizing the coefficients with parameters which are the variables that had been created in the previous steps.. » nm_opt_cof (function, simplex_function, simplex_coefficients) ans = 0.9500 500 0.3250 0.4750. 3.5. Applications. We have developed a procedure aimed at finding the optimal point while considering the values of the two described functions. Functions introduced should be recognizable from the Matlab command line. The functions should take arguments in the form of a table. Optimization of two functions 1.. Call the first function with its name and argument, e.g. fun1(x).. 2.. Call the second function with its name and argument function, e.g. fun2(x).. 3.. Click on the „optimize” button.. 4.. The optimal point for the two given functions will appear as the „optimal point”.. Optionally: 2a. Set up the initial simplex. 2b. Set up the coefficients of the algorithm or order the automatic optimization of the coefficients clicking on “optimize the coefficients”.. Drawing graphs of the functions 1.. Optimize the functions (as above).. 2.. Click on „Draw functions”. - 27 -.

(28) Optionally: 2a. To have the possibility to rotate the graphs click on „switch rotation on”.. The main window of this application is shown in Figure 3.1. After clicking „optimize coefficients for many functions” the program opens a new window, see Figure 3.2. In this window the program allows for the automatic optimization of the Nelder-Mead algorithm coefficients in terms of any number of functions. The field „function” should contain function indices (the function name following the @ sign), divided by a comma. Figure 3.1. The main window of the program. Optimization of the coefficients 1.. Introduce the function indices in the “functions” field, e.g. "@ackley, @beale".. - 28 -.

(29) 2.. Type in the initial simplex which will be used for the optimization of already described functions, e.g. [0,0; 0,1; 1,0].. 3.. Introduce simplex of the coefficients according to their sequence: reflection, expansion, contraction, shrinking.. 4.. Click on “optimize” and wait untill the algorithm finishes its work.. 5.. In the “optimal coefficients” fields the optimal coefficients will appear, with which the optimization of the described function is most efficient.. 3.6. Tests Applied These tests were applied for different sets of functions. These tests concern counting the minimum as well as the optimization of the algorithm coefficients. Figure 3.2. Window responsible for the optimization of the coefficients for many functions. 3.6.1. Searching for the Minimum Notation:. - 29 -.

(30) min1 – minimum counted for both functions min2 – minimum counted for one function. Test for Ackley and Booth functions Ackley Function •. the number of n variables. •. global minimum xmin = (0, ..., 0), f (xmin) = 0. Booth function •. The number of variables 2. •. global minimum xmin = (1, 3), f (xmin) = 0. Test 1 The initial simplex: (3,3;3,6;6,3) Results The figure of the simplex movement, Fig. 3.3 Optimal parameters: (0,3) Table 3.1. Ackley, Booth minimum (test 1) Ackley. Booth. min1. 6,9142. 5. min2. 8,881784e-016 6,. 198883e-006. - 30 -.

(31) Figure 3.3. Simplex on the function contour (Ackley on the left, Booth on the right). Test 2 Initial simplex: (0,0;2,4;4,2) Results Illustration of the moving simplex, Fig. 3.4 Optimal parameters: (1,2) Table 3.2. Ackley, Booth minimum (test 2) Ackley. Booth. min1. 5,42213. 5. min2. 8,881784e-016. 8,082975e-007. - 31 -.

(32) Figure 3.4. Simplex on the function contour ( Ackley on the left, Booth on the right). Test 3 Initial simplex: (-1,1;-2,3;-4,3) Results Illustration of the moving simplex, Fig. 3.5 Optimal parameters: (1,1). Table 3.3. Ackley, Booth minimum (test 3) Ackley. Booth. min1. 3,62538. 20. min2. 2,579934e+000. 1,022467e-005. - 32 -.

(33) Figure 3.5. Simplex on the function contour ( Ackley on the left, Booth on the right). Test for Hump and Beale functions •. number of the variables 2. •. global minimum xmin = (0.0898,−0.7126), (−0.0898, 0.7126)f (xmin) = 0. Beale function •. number of variables 2. •. global minimum xmin = (3, 0.5), f (xmin) = 0. Test 1 Initial simplex: (3,3;3,6;6,3) Results Illustration of the simplex movement, Fig. 3.6 Optimal parameters: (-1.84909,1.37908) Table 3.4. Hump and Beale minimum (test 1) Hump. Beale. min1. 7,79282. 1,11882. min2. 5,734565e-007. 6,100249e-006. - 33 -.

(34) Figure 3.6. Simplex on the function contour (Hump on the left, Beale on the right). Test 2 Initial simplex: (0,0;2,4;4,2) Results Illustration of the simplex movement, Fig. 3.7 Optimal parameters: (0.296875,0.546875) Table 3.5. Hump and Beale minimum (test 2) Hump. Beale. min1. 0,691925. 11,6825. min2. 8,161705e-001. 1,420313e+0016. Test minimum (test 1)3 Initial simplex: (-1,1;-2,3;-4,3) Results Illustration of the simplex movement, Fig. 3.8 Optimal parameters: (0.25,0.75). - 34 -.

(35) Table 3.6. Hump and Beale minimum (test 3) Hump. Beale. min1. 0,476632. 12,8014. min2. 2,974622e-006. 1,320323e+001. Figure 3.7. Simplex on the function contour (Hump on the left, Beale on the right). Test for Rastrigin and Griewank functions Rastigin function •. The number of variables n. •. global minimum xmin = (0, 0), f (xmin) = 0. Griewank function •. Number of n variables. •. global minimum xmin = (0, ..., 0), f (xmin) = 0. Test 1 Initial simplex (3,3;3,6;6,3) Results Illustration of the simplex movement, Fig. 3.9 Optimal parameters (3,1875,3,09375). - 35 -.

(36) Table 3.7. Rastrigin and Griewank minimum (test 1) Rastrigin. Griewank. min1. 10,2344. 0,427102. min2. 0. 2,288969e-002. Figure 3.8. Simplex on the function contour (Hump on the left, Beale on the right). Test 2 Initial simplex: (0,0;2,4;4,2) Results Illustration of the simplex movement, Fig. 3.10 Optimal parameters: (0,0). Table 3.8. Rastrigin and Griewank minimum (test2) Rastrigin. Griewank. min1. 0. 0. min2. 0. 0. - 36 -.

(37) Test 3 Initial simplex: (-1,1;-2,3;-4,3) Results Illustration of the movement of the simplex, Fig. 3.11 Optimal prameters: (0.000224113,0.992478) Table 3.9. Rastrigin and Griewank minimum (test2) Rastrigin. Griewank. min1. 0,996189. 0,236557. min2. 9,94621e-001. 9,94621e-001. Figure 3.9. Simplex on the function contour (Rastrigin on the left, Griewank on the right). 3.6.2. Coefficient Optimization Test1 Test for Ackley, Beale, Hump and Booth function Initial simplex: (0,0; 0,1; 1,0) Simplex of coefficients (reflection, expansion, contraction, shrinking): (1, 2, 0.5, 0.5; 0.8, 1.6, 0.4, 0.4; 1.2, 2, 0.6, 0.5; 1, 2.2, 0.5, 0.7; 1, 2, 0.3, 0.3;) - 37 -.

(38) Optimal Coefficients: Table 3.10. Optimal coefficients (test 1) reflection. expansion. contraction. shrinking. 0,8626. 1,9813. 0,2625. 0,39638. Test 2 Test for Ackley, Beale, Hump and Booth function Initial simplex: (0,0; 0,1; 1,0) Simplex of coefficients (reflection, expansion, contraction, shrinking): (1.2, -2.3, 1.5, 3.5; 0.2, -1.6, 1, 2.4; -1.2, 1, 1.6, 3.5; -1, 1.2, 0.1, 3.7 -1.3, 2.3, 3.3, -0.3;) Figure 3.10. Simplex on the function contour (Rastrigin on the left, Griewank on the right). Optimal Coefficients: Table 3.11. Optimal coefficients (test 2) reflection. expansion. contraction. shrinking. 1,6906. -1,7203. -6,125. 10,7078. - 38 -.

(39) Test 3 Test for Rastrigin, Griewank, Hump, Booth functions Initial simplex: (0,0; 0,1; 1,0) Simplex of the coefficients (reflection, expansion, contraction, shrinking): (1, 2, 0.5, 0.5; 0.8, 1.6, 0.4, 0.4; 1.2, 2, 0.6, 0.5; 1, 2.2, 0.5, 0.7; 1, 2, 0.3, 0.3;). Optimal coefficients: Table 3.12. Optimal coefficients (test 3) reflection. expansion. contraction. shrinking. 0,9. 1,55. 0,3. 0,125. Figure 3.11. Simplex on the function contour (Rastrigin on the left, Griewank on the right). Test 4 Test for Rastrigin, Griewank, Hump, Booth functions Intial simplex: (0,0; 0,1; 1,0) Simplex of coefficients (reflection, expansion, contraction, shrinking): (1.2, -2.3, 1.5, 3.5; 0.2, -1.6, 1, 2.4; -1.2, 1, 1.6, 3.5; -1, 1.2, 0.1, 3.; -1.3, 2.3, 3.3, -0.3). - 39 -.

(40) Optimal coefficients: Table 3.13. Optimal coefficients (test 4) reflection. expansion. contraction. shrinking. 1,3125. -2,3063. -0,15. 6,9062. 3.7. The Library of the Test Functions The program for management of testing functions was created as a part of the multicriteria optimization project. This application was realized in Matlab 2008b, and the Database Toolbox. 1.1. Connection to database. 1.2. How to add the JDBC driver. For the connection with database a JDBC driver is used. It is not active in Matlab as a standard, so its location must be specified. Follow the instructions on http://www.mathworks.com/access/helpdesk/help/toolbox/database/gs/braiey2-1.html#braiey2-24. In the project a JDBC driver should be available after following steps: 1.. Open the classpath.txt file, which is in the catalogue <katalogmatlab>/toolbox/local/. 2.. At the end of the file paste adress track with the ODBC driver, e.g. C:/java/mysqlconnector-java-5.1.12-bin.jar This controller is attached to the project. You can also download it from the internet.. 3.. Restart Matlab.. 3.8. Database Creation The testing functions module uses database the structure of which is defined in the tf_db.sql file. 1. The MySQL database should be created generating the script from the tf_db.sql file. It can easily be done with help of GUI tools for the MySQL database management, e.g. MySQL Workbench, or the MySQL Query Browser.. - 40 -.

(41) Configuration of the connection For the module of management of the testing functions database to communicate well with the database adequate connection parameters should be set: •. Open the tf_database.m file. •. Find the text lines with %% CONFIGURATION. •. Set the values of variables containing the connection configuration – databaseName – the name of the database which was created from the script tf_db.sql – userName – name of the user who can connect to the above mentioned – userpassword – the password for the user. Optionally: – host - name / IP adress of the server, where the database is available.. Testing the connection If everything was configured correctly you can activate the module of management of testing functions - starting tf_gui_main.m. The easiest way to test if the program can connect to the database is to produce the command: tf_database. If everything is set property than no error will be displayed. Otherwise there will be the error message: ??? Error using ==> tf_database>tf_database.tf_database at 42 Access denied for user matlaba'@'localhost' (using password: YES). in this case informing about the wrong username.. - 41 -.

(42) 3.9. Database Schema Figure 3.12. Database schema. - 42 -.

(43) 3.10.. Application. Chosen application windows are presented below. Figure 3.13. Main window of the program. Figure 3.14. Window – edition of function categories. - 43 -.

(44) Figure 3.15. Window- bibliography edition. - 44 -.

(45) 4. Approaches to Solve Global Optimization Problems Occurring in Control 4.1.. Local Search Methods. Let us begin by providing an overview of the local optimization and by describing several local search algorithms. Next, we discuss the global optimization problem and review standard methods of global optimization. This review highlights the use of local search in these methods. The methods of local search have been broadly used in both theoretical computer science and numerical optimization. A classification of global optimization and local search methods is presented in Table 4.1. Table 4.1. A classification of global optimization and local search methods The description of the method. The class of methods. Applicability. Random Local Search. Local search methods. performing local search on smooth function without derivative information. Local search methods. pattern recognition problem makes changes to the current solution using randomly selected samples. Local search methods. Applied to functions like this that have narrow “valleys” the steepest descent method is inefficient. You might expect that the first line minimization would take you to the bottom of the valley, and the second would finish the minimization. Stochastic Approximation. Conjugate Gradient. A classification of global optimization methods minimized the presence of constraints that restrict the domain of the search [39]. In this dissertation we will both, modify the methods for unconstrained optimization to solve constrained problems, as well as we will use methods developed more specifically for constrained problems. - 45 -.

(46) Optimization focuses on local search methods used for minimizing continuous function on compact sets. In the previous Chapter we have outlined several non-gradient methods. The other group of methods, the “gradient methods” base on the use of derivative information likes gradients, 𝑓 ′ (𝑥), and Hessians, 𝑓 ′ ′(𝑥). Differentiable optimization algorithms can be classified by the highest order derivatives that they use. Algorithms using derivative information of order greater than zero are somewhat more effective than those which only use function evaluations (order zero derivatives). However, derivative information demands additional calculation, and these algorithms do not always generate good solutions quickly enough to compensate for the additional expense. First, we will discuss the non derivative methods proposed by Solis and Wets [128]. Next, conjugate gradient methods [47], [53], which are used to minimize continuous function using gradient information, will be presented. Finally, stochastic approximation is used in pattern recognition methods to find the optimal weights for parametric models of data [27]. 4.1.1. Random Local Search Solis and Wets [128] propose a lot of random local search methods to be used in local search on smooth function without derivative information. These use normally distributed steps to generate new points in the search space. A new point is generated by adding zero mean normal deviates to every dimension of the current point. If the value of the new point is worse than the current point, then this algorithm examines the point generated by taking a step in the opposite direction from the new point. If neither point is better than the current point, another new point is generated. This algorithm consists of parameters which automatically reduce or increase the variance of the normal deviates responding to the rate at which better solutions are found. If new solutions are better often enough, the variance is increased to allow the algorithm to take larger steps. If poorer solutions are repeatedly generated, the variance is decreased to focus on the search closer to the current solution.. It is important to remember, however, that this algorithm does not have a well defined stopping condition. Solis and Wets examine several attempts to define stopping criteria for random search techniques, and come to a conclusion that “… the search for a good stopping. - 46 -.

(47) criterion seems doomed to fail.” In practice, this method is stopped after a fixed number of iterations, or when the step size becomes smaller than a given threshold. 4.1.2. Conjugate Gradient Many types of local search algorithms have been described for algorithms that use gradient information. For instance, conjugate methods use the gradient information efficiently. An analysis of the steepest descent method motivates the conjugate gradient methods. The steepest descent method iteratively performs line searches in the local downhill gradient direction −∇𝑓(𝑥). A line search effects a minimization through a one dimensional slice of a function, specified by an initial search direction. Thus, the steepest descent method interactively minimizes the objective function in the gradient direction. Consider the path of the steepest descent method shown in Figure 4.1. Figure 4.1. The steepest descent method. When applied to functions like this that have narrow “valleys” the steepest descent method is inefficient. You might expect that the first line minimization would take you to the bottom of the valley, and the second would finish the minimization. However, the new gradient at the minimum of the first line search is perpendicular to the first gradient.. - 47 -.

(48) Thus the new gradient does not, in general, point toward the local minimum. Conjugate gradient methods remedy this situation by using search directions that are conjugate to the previous search direction (that initial search direction is the downhill gradient). The notion of conjugacy attempts to preserve the minimization along the previous search direction by requiring that the change the current gradient remain perpendicular to the previous search direction. A quadratic function can be expressed as: 1 𝑓(𝑥) = 𝑐 + 𝑏 𝑇 𝑥 + 𝑥 𝑇 𝐴𝑥, 2. (4.1). where c and b are vectors and A is symmetric. For quadratic functions, which are using conjugate search direction guarantees, subsequent line searches continues with the previous minimizations [108]. Since it uses gradient information, conjugate gradient has a well-defined stopping criterion. The conjugate gradient method uses gradient information to stop when the algorithm has reached a critical point of the objective function [39], [108]. 4.1.3. Stochastic Approximation Both random local search and conjugate gradient methods can be used to minimize J(w), since gradient information is usually available for this function. An alternative method of minimizing J(w) is stochastic approximation. Different than these other methods, stochastic approximation changes the current solution using samples selected at random. To use information from a single sample, suppose that (xi ,yi) is randomly selected from the data set.. 4.2. Global Optimization Methods of global optimization are different from the methods of local optimization as that they attempt to find not just any local optimum, but the smallest (largest) local optimum in the search space D. Global optimization problems are by definition difficult, and few assumptions can be made about problems of practical interest. The methods shown below assume that the function is almost everywhere continuous over D. In general, methods that utilize a priori information about a problem will give better results than general purpose methods that need. - 48 -.

(49) less information. However, in many practical problems information beyond these basic assumptions will be non-accessible. One important characteristic of the global optimization methods is the estimation of the global optimum which should guarantee to convergence of the global optimum. For a deterministic algorithm, the estimates of the global optimum, xn, converge if lim𝑛→∞ 𝑥𝑛 = 𝑥 ∗ . Natural generalization of convergence can be defined for stochastic. algorithms [149]. Unfortunately, convergence is always applied in a limit that is inaccessible in practical terms: Table 4.2. Direct and indirect global optimization methods Methods with guaranteed accuracy. Indirect methods. a) Methods approximating the level sets. Direct methods. a) Clustering methods b) Generalized descent methods. a) Covering methods b) Methods approximating the function. c) Random search methods. Time typically precludes the ability to search enough to guarantee convergence to the optimum, so heuristics are often used to generate near-optimal solutions rapidly. I now review standard methods of global optimization. Since my interest concerns global search methods that use local search, I pay close attention to the role of local search techniques in these global optimization algorithms. 4.2.1. Methods with Guaranteed Accuracy A. Covering Methods The covering methods use a global search strategy in which regions of the search space based on estimates of how much the function can vary over small regions are excluded. For example, quasi-Monte Carlo methods [92],[93],[94] deterministically generate a sequence of points that regularly extend across the search space. The accuracy of the estimated global optimum is counted using measure of the consistency of the sequence of points. Covering methods do not usually include local search strategies, though they could improve their final estimate by the performance of the local search with the best solution found. While covering - 49 -.

(50) methods have testable convergence properties, often the user is required to estimate properties like the Lipschitz constant. Unfortunately, these properties are not easy to estimate, so the effectiveness of these algorithms is ambiguous in many practical applications [68]. 4.2.2. Indirect Methods Indirect methods use local information like function evaluations to construct a model of either the function or its levels sets. This model is then used for the selection of new samples. Since the building and preservation of the model of the function can be quite expensive, these approaches are suitable when the objective function costs a lot to evaluate. The local search strategies are used in none of the indirect approaches described above. As Torn and Zilinskas observe, these methods are of special use in single dimensional problems, though they have found an effective application in problems with dimensionality less than or equal to 15. 4.2.3. Direct Methods The algorithm that is the purpose of this dissertation can most obviously be classified as a direct method. Direct methods are different from the indirect methods in that they do not expensively process the local information. Instead, they directly use the local information itself to guide the global and local search.. A. Generalized Descent The methods for generalized descent attempt to keep the basic functionality of the standard local search procedures while performing global search. Trajectory methods change the trajectory of the local search routine so that it passes through all of the local optima [45]. For example, the method proposed in [32] is composed of three sub algorithms that are used to (1) descend toward a local minimum, (2) arise from minimum up to a saddle point, and (3) go through a saddle point. Using these sub algorithms, new local minima are identified from searches taking origin from previously found local minima. Penalty methods change the objective function with penalty terms that make the local search procedure shun the local minima that it has previously looked for. The tunneling method presented by Gomez and Levy [40] consists of two phases: local minimization and tunneling. The local minimization phase finds local minimum 𝑥’. The tunneling phase minimizes a modified objective function to find - 50 -.

(51) a point 𝑥’’ such that 𝑓(𝑥 ′′ ) < 𝑓(𝑥 ′ ). The modified objective function is designed so that a. local search procedure can be performed to search for x’’ starting from x’. According to Torn. and Zilinskas [141] the implementation of generalized descent techniques bears some similarity to the multistart procedure using non-local optimization techniques.. B. Clustering Methods Clustering methods are among the most successfully used algorithms created for global optimization. These methods consist of several steps. First, they perform Monte Carlo sampling of the search space. The samples are concentrated to obtain groups around the local minima and then clustered to give clusters identifying local minima. Finally, a complete local search is applied to all samples from each cluster. Various methods can be utilized for each of these steps. The concentration of the samples is normally performed by improving the samples with a few steps of local search in order to retain a fraction of the best samples. Torn and Zilinskas [141] present a number of clustering algorithms that have been used with these methods, among them standard hierarchical methods. Clustering methods are cooperative for an analysis because they use consistently spread samples. Rinnooy Kan and Timmer [112], [113] show a clustering method and describe a condition for which any local minima will be identified within a finite number of iterations with probability one.. A disadvantage of using the cluster methods is that there is a tendency of them performing poorly on function with many local minima. For these functions, many more samples are needed to find the local minima. It is not clear whether the rather poor performance on these types of functions is a result of inadequate stopping criteria or of a bias in the clustering methods towards clusters. These techniques have proved effective in their application to problems with as many as 40 dimensions [105].. 4.3. Introduction to Evaluation Algorithm Many problems in the real world that in order to be solved necessitate the examination of various conflicting criteria can be reformulated so that the optimization involves a certain number of competing search processes for the best solution or a compromise.. - 51 -.

(52) Figure 4.2. The illustration of the interdependence between the exemplary criteria cost and size. The figure above shows that a given point can be compared only to the dominating points and to dominated points. Other points are incomparable [29]. The difficulty connected to multicriteria optimization is that often it is not enough to find just one solution for a problem. That is why the algorithms are expected to give results that provide a set of alternative solutions. The set of all such points is called a Pareto set or a Pareto-optimal set [34], [88]. Traditional approaches to multicriteria optimization involved the problem transformation for the optimization of one criterion, where e.g. the weighted sum was used for all objective functions. Such approach results in an insensitivity to complicated shapes of the Pareto set and it is necessary to determine which criterion is more important than the other [88].. 4.4. Evolutionary Approach 4.4.1. The Main Ideas of Evolutionary Computation The evolutionary algorithms search for the alternative solutions to a problem in order to find the best, or the potentially best solutions. It belongs to the class of heuristic algorithms. The search involves the evolutionary mechanisms and natural selection. In 1984 Schaffer proposed the first evolutionary algorithm for the multicriteria optimization. The algorithm was called VEGA and it functioned dividing the population in k subpopulation, where k is the number of criteria. The selection found place in all of the subpopulations - 52 -.