Evolutionary computing - Biologically inspired artificial intelligence methods

2. A RTIFICIAL I NTELLIGENCE 1. Foundations

2.2. Biologically inspired artificial intelligence methods

2.2.2. Evolutionary computing

The genetic algorithms have been developed as systems, which simulate the biological evolution (see section 4.1), however their applications are focused mainly in optimization of problems having little, if any, connections with biology. The scientific theory of artificial evolutionary systems have been founded in the sixties of the twentieth century, when Holland (1967) developed genetic algorithms, Fogel et al. (1966) proposed evolutionary programming, and Schwefel (1965) introduced the idea of evolutionary strategies. Mimicking the natural evolution results in a terminology of evolutionary computing, which uses such

terms as genotypes, phenotypes, chromosomes, alleles, etc. Let us start with definitions of these notions, as formalized by Radcliffe (1997).

Definition 2.2:5 (Search space, alleles)

Let S be continuous or discrete, finite or infinite set of objects, known as search space, and let A1, A2, ..., An be finite sets of elements aki  Ai called alleles.

▬

Definition 2.2:6 (Representation space)

The representation space I is defined as a Cartesian product of sets A_i An

A A

I  ₁ ₂... (2.2:63)

▬

Definition 2.2:7 (Decoding function)

The decoding function d is defined as a function which maps vectors from representation space I to search space S

S I

d:  . (2.2:64)

▬

Definition 2.2:8 (Representation)

The representation of S is defined as the ordered pair (I, d).

▬

Definition 2.2:9 (Chromosome)

Chromosomes are defined as the elements from the representations space I. Alternatively, chromosomes are referred to as genotypes, since simple haploid genetic models with one chromosome per individual are considered.

▬

Definition 2.2:10 (Genes)

Genes at a locus i are defined as the elements xi of the chromosome x  I .

▬

Using definitions 5 and 10, it is clear that gene x_i at a locus i, can take one out of possible values (alleles) aki from a set of alleles Ai.

Definition 2.2:11 (Extended set of alleles)

For each set of alleles Ai, let us define the extended set of alleles A^*i as }

{



 

i A

A . (2.2:65)

▬

Definition 2.2:12 (Schema)

Each element of the set , defined as



  



 A₁ A₂ ... A_n (2.2:66)

is called a schema ___n)  which describes a set of chromosomes with alleles identical with  at all positions i, for which i

)}

(

} ,...

2 , 1 { :

{    _i   _i 

  

 x I i n x_i . (2.2:67)

▬

Definition 2.2:13 (Defining positions)

All loci i of the schema for which  iare called the defining positions.

▬

Definition 2.2:14 (Order of the schema)

The order O( of the schema is defined as a number of defining positions.

▬

Definition 2.2:15 (Defining length of the schema)

The defining length () is the distance between the first and the last defining position.

▬

Definition 2.2:16 (Fitness function)

Consider an objective function F: S. Then the fitness function f : S+ is defined as such function that satisfies

     

 F opt x

d F f x

I   d

max (2.2:68)

▬

Definition 2.2:17 (Pareto-dominance)

In multi-objective optimization, the fitness function is defined as a vector function f = (f1,f2,...,fn), where fi: I. If each of the functions fi should be minimized then x,yI x dominates over y in Pareto sense if and only if in fi(x)fi(y) and

jn fj(x) fj(y).

▬

Definition 2.2:18 (Pareto-optimal solution)

The Pareto-optimal solution xI, is a solution not dominated by any other solution.

▬

Definition 2.2:19 (Genetic operations)

The genetic operations: selection s, mutation m, and recombination c are defined by the following functions s: I ^λI ^μ, m: I ^κI ^λ, and c: I ^μI ^κ, respectively. Note, that these operators are defined for the whole population.

▬

After presenting formal definitions of notions present in evolutionary computation, the evolutionary algorithm will be introduced. Denote by μ and λ sizes of the parent and the child population, respectively. Moreover, let P (t) = (a₁(t),...,aμ(t))^μbe a population, f(t)  +, a fitness vector for this population, and function Evaluate (t), the operation used for computation of the fitness in generation t. Specify also sets of parameters smc, for genetic operationss, m, and c, respectively, and denote by τ the criterion of the end, which is dependent on the current population P (t) and the set of parameters τ.

Then, the optimization performed with the use of evolutionary algorithm (EA) can be expressed in pseudo-code as (Bäck, 1997a)

Input: μ, λ, _s _m_c, _τ

Output: a*P* - the best found individual and/or P** - the best found population t 0;

P (t)Initialize (μ);

f (t) Evaluate (P (t), μ);

while (τ (P (t), τ)true) do P’(t)c (P (t), c);

P”(t) m (P’(t, _m);

f (t)Evaluate (P”(t), μ);

P (t+1)s (P”(t), f (t), μ, _s);

tt 

Probabilistic behavior of the evolving population P(t) in AE can be modeled by the stochastic process, which is a homogeneous Markov model. Therefore, the evolutionary process at step tk + 1, does not depend on state of this process before step tk, if the state of the process at step tk is known (Rudolf, 1997).

Specific operation of EA is defined by the details of three genetic operators, formally involved in parameters s mc. These operators belong to two functionally different classes. Mutation and recombination are operators responsible for generating new solutions, whereas selection is an operator responsible for the choice of the most fitted solutions with probability higher than those, which are less fitted. Selection can be described by a coefficient called the selection pressure and a closely related coefficient called takeover time, as defined below.

Definition 2.2:20 (Selection pressure, after Grefenstette 1997a)

The selection pressure is defined as a rate of increase of the best individual in a population in the absence of mutation and recombination.

▬

Definition 2.2:21 (Takeover time, after Grefenstette 1997a)

The takeover time is defined as the time required for population to be composed of the copies of the best individual only, assuming selection is the only operation, and there is exactly one best individual at the beginning.

▬

It is evident that when the selective pressure increases, the takeover time decreases, and vice versa. The selection is also dependent on the choice of the fitness function, especially in multi-objective optimization, since the fitness value regulates the probability of survival of particular individual during the evolution.

Formally, the fitness function f is described as a superposition of the scaling s, the

and the fitness function is always maximized due to existence of scaling function s.

In the multi-objective optimization the vector fitness function has to be scalarized. There exist several methods of scalarization , which satisfy the condition that the final fitness of the given solution is not worse than the scalar fitness of all other solutions dominated in the Pareto-sense (Fonseca and Fleming 1997). Since such mappings are not unique, they require the specification of objective preferences. The most commonly used is a scalarization based on the weighted sum, where the preferences are introduced as values of weights wk. Such approach is given by (see Fonseca and Fleming 1997)

 

Another strategy used for scalarization is applied in a MINI-MAX method (see Fonseca and Fleming 1997)

where wk and gk are parameters responsible for the introduction of preferences.

Yet another method is used in Pareto-scalarization (Goldberg 1989, Fonseca and Fleming 1997), which is defined by recurrent equations

 

This scalarization has no possibility to introduce the preferences, however it guarantees that Pareto postulates are automatically fulfilled. Obviously, after scalarization, the multi-objective optimization becomes single-criterion optimization with scalar fitness function.

Let us now consider the influence of three genetic operators on the evolutionary algorithm behavior, starting with the selection. The most natural is the proportional selection, which resembles the selection occurring in biological evolution. The probability distribution of survival is given in this type of selection as (Grefenstette 1997a)





where f(i) denotes the fitness of the I^th individual, and  denotes the size of size of population.

The takeover time in proportional selection is larger than in many other selection types.

For the fitness function f(x) = x^c, the takeover time proportional is (Golberg and Deb 1991)

al c easy for parallel implementation. However, it should be taken into account that the takeover time in this selection is one of the shortest, i.e., this selection generates very strong selective pressure. It follows, that for the tournament of the size q performed in a population of the size

, the takeover time tournament is given by Golberg and Deb (1991) as

 

It is clear that the takeover time is decreasing (and the selective pressure is growing) with the increase of q. Therefore, in tournament selection the user can easily control the values of these important parameters.

Yet another selection is the one based on ranking of individuals in a population. It is possible to consider this type of selection with linear or nonlinear probability distribution of the survival, but in both cases these distributions are based on rankings (0 for the worst, and

μ – 1 for the best) and not on fitness values of particular individuals. Hence, such selection, similarly to the tournament selection, is invariant with respect to the scale and shift of the fitness function. The linear probability distribution of survival is given by (Grefenstette 1997b)

where _rank is a number of children of the worst, and _rank for the best fitted individual.

For the linear distribution the takeover time is approximately (see Goldberg and Deb

The nonlinear distributions are often used as geometrical or exponential distributions.

Finally, let us consider the Boltzmann selection based on simulated annealing (Mahfoud 1997). The key concept in this selection is a Boltzmann draw, i.e. comparison of individual i and j, in which individual i is a winner with the logistically given probability (Michalewicz 1992) global optimum (Sarma and De Jong 1997).

The first studies concerning genetic algorithms supported view that the recombination is a fundamental operator and mutation is less important. With the advent of evolutionary computation with more complex representations, the role of mutation has become more and more evident, and it was stressed that in principle this operator is able operate without a crossing-over (Bäck 1997a). In this context, Cyran et al. (1997) demonstrated that the mutation used without recombination is able to learn ANN in stochastic evolutionary training. In canonical form of genetic algorithm (see below), the mutation operator is defined on binary vectors a = (a₁,...,a_l) ^lof the length l. If we denote by pm the probability

of mutation, than the mutation operator m:{0,1}^l  {0,1}^l generates new vector a’=m(a) complex representations the mutation operator may be defined by a lot of variants.

For real-valued vectors xⁿ, a new vector x‟= m(x) is produced by mutation m, which is defined most often as (Fogel 1997b)

M x

x'  (2.2:82)

where M is a vector of random variables with expected values equal zero, i.e. E(x‟)=x.

Michalewicz (1992) proposed an non-uniform mutation, changing in time, as described below. Let real-valued chromosome be defined as a vector x, indexed by time t expressed in number of generations function (t,y) takes values from a range [0,y], given that probability of (t,y) being close to zero is increasing with time t. Hence, the initial mutations have relatively large effects (in order to search the whole space), and then the local search is performed. Michalewicz (1992) proposed function (t,y) defined as

) and b is a parameter, which describes the influence of the generation number on the result of the function.

Recombination is in general a binary operator defined on the Cartesian product of the chromosome representation space. It is a mapping r given by (Booker 1997)

The mask vector m defines the form of the recombination as one-point or multi-point crossing-over. It is also possible to define the uniform recombination in which the number of crossing-over points is not a constant but each point is determined with probability p_x independently for each position on the chromosome. For chromosomes represented in ⁿ it is possible to apply the arithmetic recombination, which does not exchange genes but it is averaging the gene values. In this type of recombination, two parents, x1 and x2, are creating one child x’ according to (Fogel 1997c)

i i

i x x

x' ₁ (1) ₂ (2.2:88)

where α is a number from [0,1]. This operator can be generalized to arbitrary many parents, as an n-ary operator defined by





After presenting details of the three genetic operators (selection, mutation, and recombination) let us now consider different types of chromosome representation. The classical representation of chromosomes, the binary vectors a = (a1,...,al){0,1}^l constitute the canonical form of genetic algorithms. Genes of such chromosomes take values from binary allele. This representation is especially useful for implementing pseudo-Boolean optimization problems F: {0, 1}^l  . However, it is also possible to apply them to optimization of the type F: S where S is a search space having different structure as compared to the chromosome representation space I = {0,1}^l.

One of the most often encountered problems of this class are problems defined as f:ⁿi.e. problems of optimization of continues parameters. These problems require the discretization of the space of continues variables xi   onto [ui, vi] such that ui  xi  vi. Then, each such variable can be represented by binary sequence of the length lx, which is a sub-sequence of the sequence of the length l. For n variables, it follows that l = nl_x. The

For decoding of the variables expressed in the Gray‟s code (in which representations of the consecutive numbers are the binary vectors with Hamming distance equal one) the following formula is used (Bäck 1997b)

 nonlinear transformations in computing the effective objective function F’: {0,1}^l given as (Bäck 1997b) than the original optimization of the objective function F: S  Therefore, more complex representations of chromosomes are proposed, which are more similar to the representation of objects in the original search space.

In the last two decades, there is growing interest in real-valued representation of chromosomes. Many practical applications of parametric optimization uses this representation and indicates its usefulness (Fogel 1997b, Cyran and Mrózek 2001) and greater effectiveness as compared to binary representation (Michalewicz 1992). However, such conclusion, although confirmed by a number of experimental studies, is contradicting the classical interpretation of the fundamental theorems about canonical genetic algorithms.

In particular this is the case with the interpretation of the Schema Theorem. This theorem has been formulated for arbitrary finite alphabet, and below, this general version is presented.

Theorem 2.2:4 (The Schema Theorem – after Radcliffe 1997)

Let be the schema over the representation space I. Moreover, let this space be searched by evolutionary algorithm using proportional selection and classical operations of mutation and recombination. Let us also denote by N (t) the number of schemas  in the generation t.

Then the number N (t+1) of this schema in the next generation is given by



1 ( )



1 ( )



of the destructive effects on the number of elements belonging to the schema caused by the recombination and mutation, respectively.

Proof

It is clear that without mutation and recombination the expected number of representatives of schema  in generation t+1, E[N(t+1)], is equal to the number of schema representatives in generation t, N(t), multiplied by a the relative fitness of that schema

) ˆ t(

f_ / f(t). Mutation and recombination can destroy the schema, however it is very hard to estimate the actual destructive effects of these operations. Rather the upper limits of them are used D_c() andD_m() (which are easily computable) and therefore the actual E[N(t+1)]

can be larger than the expression on the right side of formula (93) since the actual destructive effects are typically smaller than their upper limits used in this formula.

■

The Schema Theorem expresses the fact that the number of those schemas which are short (i.e with low recombination destructive effect), low-order (i.e. with low mutation destructive effect) and have the over-average fitness, is exponentially increasing in the population during evolution. Such schemas are referred to as building blocks. Therefore, the hypothesis has been formulated that the evolutionary algorithms are processing not only the chromosomes, but also they implicitly process schemas, which represent chromosomes included in a population.

Assuming the same number of possible solutions represented in the chromosome, coding of these solutions using alphabet {0, 1} assures maximum number of schemas as compared to any other alphabet A, for which card (A) > 2. At first it seems that binary alphabet is the most efficient, since it assures the maximum number of schemas to be processed, and therefore the level of hidden parallelism is as big as theoretically possible (Bäck 1997b).

However currently, it is often raised that the building blocks hypothesis, being the foundation for the notion of hidden parallelism, is not explaining correctly the mechanism of optimization with the use of genetic algorithms. Additionally, since practical experiments do not confirm better efficiency of binary chromosomes (and even they suggest contrary), therefore, the implications of the Schema Theorem have to be carefully reconsidered. At least these implications should not lead to the conclusion about superiority of the search in binary representation spaces I = {0,1}^l (Radcliffe 1997).

Another representations used in evolutionary computation include permutations, finite state machines, trees, neural networks, and others. Permutations are predominantly used in combinatorial problems, which belong to NP- complete problems, and therefore to solve them the heuristics are often used. The example is a traveling salesman problem, which was tried to be solved with the evolutionary approach (Whitley 1997a). To make it effective the

operations of mutation and recombination working in permutation representation space have been proposed.

Whitley (1997b) shows that mutation operator can be implemented as so called 2-opt operator used for local searching. This operator chooses two points along the permutation chain, and then it reverses the sequence in the chosen segment (Fig. 3).

(A B C D E F) parent

reversed fragment

(A D C B E F) child

Fig. 2.2:3. Mutation for permutation representation, implemented as 2-opt operator Rys. 2.2:3. Mutacja dla reprezentacji permutacyjnej implementowana jako operator 2-opt

The recombination can be implemented with the use of operator referred to as crossing-over with ordering (Whitley 1997c). First, the two cutting positions are randomly chosen, and then the genes of the first parent, which are between the cutting points are copied to the child.

Finally, starting from the position directly after the second cutting point, the genes of the second parent are examined if they are not present in already created part of the child. If this is satisfied, these genes are copied onto subsequent positions. After reaching the end of the chromosome, the process is continued starting from the first position of the second parent, and continued until the first cutting point. Defined recombination operator inherits from the first parent information about the sequence, absolute position, and adjacencies of genes.

However, it inherits only information about sequence form the second parent.

Let us now consider the finite-state representations used in evolutionary computing.

Definition 2.2:22 (Finite state machines, after Fogel 1997d)

W dokumencie Artifical intelligence, branching processes and coalescent methods in evolution of humans and early life (Stron 42-53)