Coevolutionary Gradient Algorithms and their Application to Othello

(1)

and their Application to Othello

Marcin Szubert

Krzysztof Krawiec

Institute of Computing Science Poznan University of Technology

(2)

Outline

1

Starting Point – Previous Research

Coevolution and Reinforcement Learning

Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

Topology and Weight Evolving ANNs

N-tuple Networks

3

Coevolutionary Gradient Algorithms

4

Experimental Results

5

Summary

(3)

Outline

1

Starting Point – Previous Research

Coevolution and Reinforcement Learning

Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

Topology and Weight Evolving ANNs

N-tuple Networks

3

Coevolutionary Gradient Algorithms

4

Experimental Results

5

Summary

(4)

Outline

1

Starting Point – Previous Research

Coevolution and Reinforcement Learning

Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

Topology and Weight Evolving ANNs

N-tuple Networks

3

Coevolutionary Gradient Algorithms

4

Experimental Results

5

Summary

(5)

Coevolutionary Algorithms

Bio-inspired methods that attempt to harness Darwinian notions of

heredity and survival of the fittest but in contrast to traditional

evolutionary algorithms do not attempt to objectively measure the

fitness of individuals. Instead, individuals are compared on the

basis of their outcomes from interactions with other individuals.

Natural evolution is coevolution, where the fitness of an individual is

defined with respect to its competitors and collaborators, as well as to

the environment.

Simon M. Lucas

(6)

Coevolutionary Algorithms

The outcome of evaluating an individual in a coevolutionary algorithm depends

upon the context of whom the individual interacts with. This context sensitivity

is characteristic of coevolutionary systems and responsible for the complex

dynamics for which coevolution is (in)famous.

Sevan G. Ficici

Single-population coevolutionary algorithm

18 2 Coevolution

Algorithm 1: Basic scheme of a generational evolutionary algorithm

1:_{P ← createRandomPopulation()} 2:_{A ← initializeArchive()} 3:evaluatePopulation(A, P) 4:_{while ¬terminationCondition() do} 5: S ← selectParents(P) 6: P ← recombineAndMutate(S) 7: evaluatePopulation(A, P) 8: updateArchive(A, P) 9:end while 10: return getFittestIndividual(A, P)

The family of EA is composed of a few methods that differ slightly in technical de-tails, but all match the basic scheme presented in Algorithm 1. The most important difference between these methods concerns so called representation which defines a mapping from phenotypes onto a set of genotypes and specifies what data structures are employed in this encoding. Phenotypes are objects forming solutions to the original problem, i.e., points of the problem space of possible solutions. Genotypes, on the other hand, are used to denote points in the evolutionary search space which are subject to genetic operations. The process of genotype-phenotype decoding is intended to model natural phenomenon of embryogenesis. More detailed description of these terms can be found in [Weise 09].

Returning to different dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA) [Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in Genetic Programming (GP) [Koza 92]. A certain representation might be preferable if it makes encoding solutions to a given problem more natural. Obviously, genetic operations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees between combined individuals.

The most significant advantage of EA lies in their flexibility and adaptability to the given task. This may be explained by their metaheuristic character of “black box” that makes only few assumptions about the underlying objective function which is the subject of optimization. EA are claimed to be robust problem solvers showing roughly good performance over a wide range of problems, as reported by Goldberg [Goldberg 89]. Especially the combination of EA with problem-specific heuristics including local-search based techniques, often make possible highly efficient opti-mization algorithms for many areas of application. Such hybridization of EA is getting popular due to their capabilities in handling real-world problems involving noisy environment, imprecision or uncertainty. The latest state-of-the-art

method-18 2 Coevolution

Procedure evaluatePopulation(A, P)

1: E ← selectEvaluators(A, P)

2: performInteractions(P, E)

3: aggregateInteractionOutcomes(P, E)

The family of EA is composed of a few methods that differ slightly in technical de-tails, but all match the basic scheme presented in Algorithm 1. The most important difference between these methods concerns so called representation which defines a mapping from phenotypes onto a set of genotypes and specifies what data structures are employed in this encoding. Phenotypes are objects forming solutions to the original problem, i.e., points of the problem space of possible solutions. Genotypes, on the other hand, are used to denote points in the evolutionary search space which are subject to genetic operations. The process of genotype-phenotype decoding is intended to model natural phenomenon of embryogenesis. More detailed description of these terms can be found in [Weise 09].

Returning to different dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA) [Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in Genetic Programming (GP) [Koza 92]. A certain representation might be preferable if it makes encoding solutions to a given problem more natural. Obviously, genetic operations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees between combined individuals.

The most significant advantage of EA lies in their flexibility and adaptability to the given task. This may be explained by their metaheuristic character of “black box” that makes only few assumptions about the underlying objective function which is the subject of optimization. EA are claimed to be robust problem solvers showing roughly good performance over a wide range of problems, as reported by Goldberg [Goldberg 89]. Especially the combination of EA with problem-specific heuristics including local-search based techniques, often make possible highly efficient opti-mization algorithms for many areas of application. Such hybridization of EA is getting popular due to their capabilities in handling real-world problems involving noisy environment, imprecision or uncertainty. The latest state-of-the-art method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].

(7)

Reinforcement Learning

Reinforcement Learning (RL)

Machine learning paradigm focused on solving problems in which

an agent interacts with an environment by taking actions and

receiving rewards at discrete time steps. The objective is to find

such a decision policy that maximizes cumulative reward.

Agent Environment

2.action a_t

3. reward r_t

1.state s_t

4. learnon the basis of < s_t_{, a}_t_{, r}_t_{, s}_t+1 >

In board games:

agent =⇒ player

environment =⇒ game

state =⇒ board state

action =⇒ legal move

reward =⇒ game result

(8)

Temporal Difference Learning

Temporal Difference Learning (TDL)

RL method which attempts to estimate a value function by

observing the progression of states – the learner adjusts it to make

the value of the current state more like the value of the next state.

Value function V (b) can be represented as a neural network

with a modifiable weight vector w.

The adjustment is based on a gradient-descent update, e.g.

∆w

i

:= ηeb

i

e = v - v

(9)

Outline

1

Starting Point – Previous Research

Coevolution and Reinforcement Learning

Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

Topology and Weight Evolving ANNs

N-tuple Networks

3

Coevolutionary Gradient Algorithms

4

Experimental Results

5

Summary

(10)

Coevolutionary Temporal Difference Learning

Coevolutionary Temporal Difference Learning (CTDL)

A hybrid of coevolutionary search with reinforcement learning that

works by interlacing one-population competitive coevolution with

temporal difference learning.

18

2 Coevolution

Algorithm 1 Basic scheme of a generational evolutionary algorithm

1:

_{P ← createRandomPopulation()}

2:

A ← initializeArchive()

3:

_{evaluatePopulation(A, P)}

4:

_{while ¬terminationCondition() do}

5:

S ← selectParents(P)

6:

_{P ← recombineAndMutate(S)}

7:

individualReinforcementLearning(P)

8:

evaluatePopulation(A, P)

9:

updateArchive(A, P)

10:

end while

11:

_{return getFittestIndividual(A, P)}

The family of EA is composed of a few methods that differ slightly in technical

de-tails, but all match the basic scheme presented in Algorithm 1. The most important

difference between these methods concerns so called representation which defines a

mapping from phenotypes onto a set of genotypes and specifies what data structures

are employed in this encoding. Phenotypes are objects forming solutions to the

original problem, i.e., points of the problem space of possible solutions. Genotypes,

on the other hand, are used to denote points in the evolutionary search space which

are subject to genetic operations. The process of genotype-phenotype decoding is

intended to model natural phenomenon of embryogenesis. More detailed description

of these terms can be found in [Weise 09].

Returning to different dialects of EA, candidate solutions are represented

typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)

[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite

state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in

Genetic Programming (GP) [Koza 92]. A certain representation might be preferable

if it makes encoding solutions to a given problem more natural. Obviously, genetic

operations of recombination and mutation must be adapted to chosen

representa-tion. For example, crossover in GP is usually based on exchanging subtrees between

combined individuals.

The most significant advantage of EA lies in their flexibility and adaptability to

the given task. This may be explained by their metaheuristic character of “black

box” that makes only few assumptions about the underlying objective function which

is the subject of optimization. EA are claimed to be robust problem solvers showing

roughly good performance over a wide range of problems, as reported by Goldberg

[Goldberg 89]. Especially the combination of EA with problem-specific heuristics

including local-search based techniques, often make possible highly efficient

opti-mization algorithms for many areas of application. Such hybridization of EA is

getting popular due to their capabilities in handling real-world problems involving

noisy environment, imprecision or uncertainty. The latest state-of-the-art

(11)

Relative Methods Performance Over Time

4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 0 10 20 30 40 points in tournaments games played (x 100 000) CTDL + HoF CTDL TDL CEL + HoF CEL

(12)

Observations and Motivation

Observations on learning Othello strategies

Temporal Difference Learning is much faster and under most

experimental settings it is able to learn better strategies.

Coevolution can eventually produce better strategies if it is

supported by an archive which sustains progress.

CTDL benefits from these complementary characteristics.

Motivation for further research on CTDL

No need for human expertise – useful when the knowledge of

the problem domain is unavailable or expensive to obtain.

Potential for employing more complex learner architecture.

Interesting biological interpretation.

(13)

Outline

1

Starting Point – Previous Research

Coevolution and Reinforcement Learning

Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

Topology and Weight Evolving ANNs

N-tuple Networks

3

Coevolutionary Gradient Algorithms

4

Experimental Results

5

Summary

(14)

Outline

1

Starting Point – Previous Research

Coevolution and Reinforcement Learning

Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

Topology and Weight Evolving ANNs

N-tuple Networks

3

Coevolutionary Gradient Algorithms

4

Experimental Results

5

Summary

(15)

Evolution of Artificial Neural Networks

Typically, a network topology is chosen before the experiment

and evolution searches the space of weight connections.

Can evolving topologies along with weights provide an

advantage over evolving weights on a fixed-topology?

Any continuous function can be approximated by a fully connected neural network

having only one internal hidden layer and with an arbitrary sigmoidal nonlinearity.

George V. Cybenko

Challenges of Topology and Weight Evolving ANNs (TWEANNs)

How to cross over disparate topologies in a meaningful way?

How can topological innovation that needs a few generations to be

optimized be protected so that it does not disappear prematurely?

How can topologies be minimized throughout evolution?

(16)

Evolvability and Neural Interference

Evolvability is an organism’s capacity to generate heritable phenotypic variation.

Marc Kirschner & John Gerhart

Evolvability of neural networks allows evolutionary algorithms to find weight

settings that produce a desired behavior or approximate a given function.

Julian Togelius

Topology of a neural network largely influences its evolvability

– it can be increased by removing single inputs or connections.

The availability of certain information at certain points in the

network can lead evolution into local optima.

Neural interference appears in nonmodular neural networks

that learn complex behavior consisting of multiple tasks.

(17)

Neuroevolution of Augmenting Topologies (NEAT)

Matching Topologies using Innovation Numbers

Different network structures (size and connection order) –

Competing Conventions Problem

NEAT performs artificial synapsis based on historical markings

Evolving NN’s through Augmenting Topologies

Figure 1: The competing conventions problem. The two networks compute the same exact function even though their hidden units appear in a different order and are repre-sented by different chromosomes, making them incompatible for crossover. The figure shows that the two single-point recombinations are both missing one of the 3 main components of each solution. The depicted networks are only 2 of the 6 possible per-mutations of hidden unit orderings.

We now turn to several specific problems with TWEANNs and address each in turn.

2.2 Competing Conventions

One of the main problems for NE is the Competing Conventions Problem (Montana and Davis, 1989; Schaffer et al., 1992), also known as the Permutations Problem (Radcliffe, 1993). Competing conventions means having more than one way to express a solution to a weight optimization problem with a neural network. When genomes represent-ing the same solution do not have the same encodrepresent-ing, crossover is likely to produce damaged offspring.

Figure 1 depicts the problem for a simple 3-hidden-unit network. The three hid-den neurons A, B, and C, can represent the same general solution in 3! = 6 different permutations. When one of these permutations crosses over with another, critical in-formation is likely to be lost. For example, crossing [A, B, C] and [C, B, A] can result in [C, B, C], a representation that has lost one third of the information that both of the parents had. In general, for n hidden units, there are n! functionally equivalent solu-tions. The problem can be further complicated with differing conventions, i.e., [A, B, C] and [D, B, E], which share functional interdependence on B.

An even more difficult form of competing conventions is present in TWEANNs, because TWEANN networks can represent similar solutions using entirely different topologies, or even genomes of different sizes. Because TWEANNs do not satisfy strict constraints on the kinds of topologies they produce, proposed solutions to the com-peting conventions problem for fixed or constrained topology networks such as nonre-dundant genetic encoding (Thierens, 1996) do not apply. Radcliffe (1993) goes as far as calling an integrated scheme combining connectivity and weights the “Holy Grail in

Evolutionary Computation Volume 10, Number 2 103

A 2 3 1 B C C 2 3 1 B A [A,B,C] [A,B,A] [C,B,C] Crossovers: [C,B,A]

x

(both are missing information)

Figure comes from “Evolving Neural Networks through Augmenting Topologies” by K. Stanley and R. Miikkulainen

(18)

Neuroevolution of Augmenting Topologies (NEAT)

Two types of structural mutations in NEAT:

2 NEUROEVOLUTION OF

AUGMENTING TOPOLOGIES (NEAT)

The NEAT method of evolving artificial neural networks

combines the usual search for appropriate network weights

with complexification of the network structure. This

ap-proach is highly effective: NEAT outperforms other

neu-roevolution (NE) methods, e.g. on the benchmark double

pole balancing task by a factor of five (Stanley and

Miik-kulainen 2001, 2002b,c). The NEAT method consists of

so-lutions to three fundamental challenges in evolving neural

network topology: (1) What kind of genetic representation

would allow disparate topologies to crossover in a

mean-ingful way? (2) How can topological innovation that needs

a few generations to optimize be protected so that it does

not disappear from the population prematurely? (3) How

can topologies be minimized throughout evolution so the

most efficient solutions will be discovered? In this section,

we explain how NEAT addresses each challenge.

1

2.1 GENETIC ENCODING

Evolving structure requires a flexible genetic encoding. In

order to allow structures to complexify, their

representa-tions must be dynamic and expandable. Each genome in

NEAT includes a list of connection genes, each of which

refers to two node genes being connected. Each

connec-tion gene specifies the in-node, the out-node, the weight of

the connection, whether or not the connection gene is

ex-pressed (an enable bit), and an innovation number, which

allows finding corresponding genes during crossover.

Mutation in NEAT can change both connection weights and

network structures. Connection weights mutate as in any

NE system, with each connection either perturbed or not.

Structural mutations, which form the basis of

complexifi-cation, occur in two ways (figure 1). In the add connection

mutation, a single new connection gene is added

connect-ing two previously unconnected nodes. In the add node

mutation an existing connection is split and the new node

placed where the old connection used to be. The old

con-nection is disabled and two new concon-nections are added to

the genome. This method of adding nodes was chosen in

order to integrate new nodes immediately into the network.

Through mutation, genomes of varying sizes are created,

sometimes with completely different connections specified

at the same positions.

In order to perform crossover, the system must be able to

tell which genes match up between any individuals in the

population. The key observation is that two genes that have

the same historical origin represent the same structure

(al-1_{A more comprehensive description of the NEAT method is}

given in Stanley and Miikkulainen (2001, 2002c).

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 6 5 1!>4 1!>4 1!>4 1!>4 2!>4 2!>4 2!>4 2!>4 2!>5 2!>5 2!>5 2!>5 3!>5 3!>5 3!>5 3!>5 4!>5 4!>5 4!>5 4!>5 3!>4 3!>6 6!>5 DIS DIS DIS DIS DIS 1 1 1 1 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 8 9

Mutate Add Connection

Mutate Add Node

Figure 1:

The two types of structural mutation in NEAT.

Both types, adding a connection and adding a node, are illus-trated with the genes above their phenotypes. The top number in each genome is the innovation number of that gene. The bottom two numbers denote the two nodes connected by that gene. The weight of the connection, also encoded in the gene, is not shown. The symbol DIS means that the gene is disabled, and therefore not expressed in the network. The figure shows how connection genes are appended to the genome when a new connection is added to the network and when a new node is added. Assuming the de-picted mutations occurred one after the other, the genes would be assigned increasing innovation numbers as the figure illustrates, thereby allowing NEAT to keep an implicit history of the origin of every gene in the population.

though possibly with different weights), since they were

both derived from the same ancestral gene from some point

in the past. Thus, all a system needs to do to know which

genes line up with which is to keep track of the historical

origin of every gene in the system.

Tracking the historical origins requires very little

compu-tation. Whenever a new gene appears (through structural

mutation), a global innovation number is incremented and

assigned to that gene. The innovation numbers thus

rep-resent a chronology of every gene in the system. As an

example, let us say the two mutations in figure 1 occurred

one after another in the system. The new connection gene

created in the first mutation is assigned the number , and

the two new connection genes added during the new node

mutation are assigned the numbers

and . In the future,

whenever these genomes crossover, the offspring will

in-herit the same innovation numbers on each gene;

innova-tion numbers are never changed. Thus, the historical origin

of every gene in the system is known throughout evolution.

Through innovation numbers, the system now knows

ex-actly which genes match up with which. Genes that do not

match are either disjoint or excess, depending on whether

they occur within or outside the range of the other parent’s

innovation numbers. When crossing over, the genes in both

Figure comes from “Evolving Neural Networks through Augmenting Topologies” by K. Stanley and R. Miikkulainen

(19)

Neuroevolution of Augmenting Topologies (NEAT)

Protecting Innovation through Speciation

Changing the topology of a network is often very disruptive.

Structural innovation is unlikely to survive in the population.

NEAT divides the population into species that compete

primarily within their own niches.

Minimizing Dimensionality

Forcing minimal topologies could be achieved by incorporating

network size into the fitness function.

NEAT biases the search towards minimal-dimensional spaces

by starting with a population with no hidden nodes.

(20)

Outline

1

Starting Point – Previous Research

Coevolution and Reinforcement Learning

Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

Topology and Weight Evolving ANNs

N-tuple Networks

3

Coevolutionary Gradient Algorithms

4

Experimental Results

5

Summary

(21)

N-tuple Network Architecture

Type of ANN that operates on compound object (matrix,

image) x which elements can be easily indexed and retrieved.

Formed by a set of m tuples – each created by (randomly)

sampling input object with n locations.

29 39 191 134 210 203

195 189 90

Figure comes from “Face Recognition with the Continuous N-tuple Classifier” by S. M. Lucas

(22)

N-tuple Network Output Value

Each input location has v possible values – a single n-tuple

represents an n-digit number in base-v numeral system.

Each n-tuple has an associated look-up table (LUT) which

contains parameters equivalent to weights in standard ANN.

Locations a

ij

, where

j =0..n−1

specified by each n-tuple t

i

are

used to identify an address in a look-up table.

The output of the network is calculated by summing LUT

values indexed by particular n-tuples:

f (x) =

m

X

i =0

f

i

(x) =

m

X

i =0

LUT

i





n−1

X

j =0

x(a

ij

)v

j





(23)

N-tuple Network for Othello

In the context of Othello, an n-tuple network acts as a state

evaluation function – computes utility of a given board state.

2 0 1 1 0 2 0 0 0.57 26 - 0.02 - 0.34 0.87 1 19

LUT

1 0 0.43 80 0.09 - 0.76 - 0.21 1 33

LUT

2

Snake-shaped inputs are randomly assigned and stay fixed

while learning affects weights in the look-up table.

(24)

N-tuple Network as TWEANN

Structural Genetic Operators

Mutation consists in changing the input assignment of a single

element of a tuple to one of its neighbouring locations.

Size of tuples remains constant throughout the evolution.

Crossover is restricted to exchanging whole tuples.

Each tuple represents an independent module that can be

easily combined with other modules.

Innovations are protected by applying an intensive individual

learning to a newly created structures.

Size of the representation does not grow.

(25)

Outline

1

Starting Point – Previous Research

Coevolution and Reinforcement Learning

Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

Topology and Weight Evolving ANNs

N-tuple Networks

3

Coevolutionary Gradient Algorithms

4

Experimental Results

5

Summary

(26)

Coevolutionary Gradient Search Process

Our approach is to analyse characteristics of the problem search space and

thence to identify the algorithms (within the class considered) which exploit

these characteristics – we pay for our lunch, one might say.

Lionel Barnett

We aim to search both spaces in parallel – discrete network

topology space and continuous weight space.

How to move in these spaces to gain from their character?

Coevolutionary Gradient Search

Directed gradient search – numerically estimates direction of

change in the vicinity of the current candidate solution.

Undirected coevolutionary search – stochastically jumps over

the search space starting from the fittest configurations.

(27)

Search Operators

Genetic Operators

Following genetic operators operate on the fittest individuals:

Weight mutation (m

w

)

Topology mutation (m

t

)

Topology crossover (x )

Gradient Operators

Gradient-based search operators work in the weight space and

consist in a single gradient-descent TDL learning scenario.

How to create a competitive learning environment?

self-play scenario (s)

population opponent (p)

archival opponent (a)

(28)

Guiding the Search Process

Interactions between candidate solutions is the only source of

information that guides the search process.

18 2 Coevolution

Algorithm 1 Basic scheme of a generational evolutionary algorithm

P ← createRandomPopulation()

evaluatePopulation(P)

while ¬terminationCondition() do

S ← selectParents(P)

P ← recombineAndMutate(S)

evaluatePopulation(P)

end while

return getFittestIndividual(P)

The family of EA is composed of a few methods that differ slightly in technical

de-tails, but all match the basic scheme presented in Algorithm 1. The most important

difference between these methods concerns so called representation which defines a

mapping from phenotypes onto a set of genotypes and specifies what data structures

are employed in this encoding. Phenotypes are objects forming solutions to the

original problem, i.e., points of the problem space of possible solutions. Genotypes,

on the other hand, are used to denote points in the evolutionary search space which

are subject to genetic operations. The process of genotype-phenotype decoding is

intended to model natural phenomenon of embryogenesis. More detailed description

of these terms can be found in [Weise 09].

Returning to different dialects of EA, candidate solutions are represented

typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)

[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite

state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in

Genetic Programming (GP) [Koza 92]. A certain representation might be preferable

if it makes encoding solutions to a given problem more natural. Obviously, genetic

operations of recombination and mutation must be adapted to chosen

representa-tion. For example, crossover in GP is usually based on exchanging subtrees between

combined individuals.

The most significant advantage of EA lies in their flexibility and adaptability to

the given task. This may be explained by their metaheuristic character of “black

box” that makes only few assumptions about the underlying objective function which

is the subject of optimization. EA are claimed to be robust problem solvers showing

roughly good performance over a wide range of problems, as reported by Goldberg

[Goldberg 89]. Especially the combination of EA with problem-specific heuristics

including local-search based techniques, often make possible highly efficient

opti-mization algorithms for many areas of application. Such hybridization of EA is

getting popular due to their capabilities in handling real-world problems involving

noisy environment, imprecision or uncertainty. The latest state-of-the-art

method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].

18 2 Coevolution

Algorithm 1 Basic scheme of a generational evolutionary algorithm

P ← createRandomPopulation()

evaluatePopulation(A)

while ¬terminationCondition() do

S ← selectParents(P)

P ← recombineAndMutate(S)

evaluatePopulation(P)

end while

return getFittestIndividual(P)

The family of EA is composed of a few methods that differ slightly in technical

de-tails, but all match the basic scheme presented in Algorithm 1. The most important

difference between these methods concerns so called representation which defines a

mapping from phenotypes onto a set of genotypes and specifies what data structures

are employed in this encoding. Phenotypes are objects forming solutions to the

original problem, i.e., points of the problem space of possible solutions. Genotypes,

on the other hand, are used to denote points in the evolutionary search space which

are subject to genetic operations. The process of genotype-phenotype decoding is

intended to model natural phenomenon of embryogenesis. More detailed description

of these terms can be found in [Weise 09].

Returning to different dialects of EA, candidate solutions are represented

typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)

[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite

state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in

Genetic Programming (GP) [Koza 92]. A certain representation might be preferable

if it makes encoding solutions to a given problem more natural. Obviously, genetic

operations of recombination and mutation must be adapted to chosen

representa-tion. For example, crossover in GP is usually based on exchanging subtrees between

combined individuals.

The most significant advantage of EA lies in their flexibility and adaptability to

the given task. This may be explained by their metaheuristic character of “black

box” that makes only few assumptions about the underlying objective function which

is the subject of optimization. EA are claimed to be robust problem solvers showing

roughly good performance over a wide range of problems, as reported by Goldberg

[Goldberg 89]. Especially the combination of EA with problem-specific heuristics

including local-search based techniques, often make possible highly efficient

opti-mization algorithms for many areas of application. Such hybridization of EA is

getting popular due to their capabilities in handling real-world problems involving

noisy environment, imprecision or uncertainty. The latest state-of-the-art

method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].

1. Play round robin tournament between population members

2. Randomly select archival individuals to act as opponents

3. Select the best-of-generation individual and add it to the archive

Search operators use different types of interaction feedback.

(29)

Outline

1

Starting Point – Previous Research

Coevolution and Reinforcement Learning

Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

Topology and Weight Evolving ANNs

N-tuple Networks

3

Coevolutionary Gradient Algorithms

4

Experimental Results

5

Summary

(30)

Learning 7 x 4 N-tuple Networks

0 0.1 0.2 0.3 0.4 0.5 0.6 0 200 400 600 800 1000 1200 1400 1600 1800 2000

average percentage score

games played (x 1,000) CTDL-sxmw + HoF CTDL-sxmw TDL CEL + HoF CEL

(31)

Learning 9 x 5 N-tuple Networks

0 0.1 0.2 0.3 0.4 0.5 0.6 0 200 400 600 800 1000 1200 1400 1600 1800 2000

games played (x 1,000) TDL CTDL-sxmw + HoF CTDL-sxmw CEL + HoF CEL

(32)

Learning 12 x 6 N-tuple Networks

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 200 400 600 800 1000 1200 1400 1600 1800 2000

games played (x 1,000) ETDL-sxmt CTDL-sxmt + HoF TDL CTDL-sxmw CEL

(33)

Relative Performance of Self-play Methods

2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 0 400 800 1200 1600 2000 points in tournaments games played (x 1,000) TDL PTDL CTDL-s CTDL-sx CTDL-sxmt CTDL-sxmt + HOF

(34)

Relative Performance of Mutual-play Methods

2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 0 400 800 1200 1600 2000 points in tournaments games played (x 1,000) CTDL-p CTDL-px CTDL-pxmt CTDL-ax + HoF CTDL-asxmt + HoF

(35)

Relative Performance of All Methods

2000 3000 4000 5000 6000 7000 8000 9000 10000 0 400 800 1200 1600 2000 points in tournaments games played (x 1,000) CTDL-px ETDL-sxmt CTDL-sxmt CTDL-sxmt + HoF CTDL-asxmt + HoF