and their Application to Othello
Marcin Szubert
Krzysztof Krawiec
Institute of Computing Science Poznan University of Technology
Outline
1
Starting Point – Previous Research
Coevolution and Reinforcement Learning
Coevolutionary Temporal Difference Learning
2
Flexible Learner Architecture
Topology and Weight Evolving ANNs
N-tuple Networks
3
Coevolutionary Gradient Algorithms
4
Experimental Results
5
Summary
Outline
1
Starting Point – Previous Research
Coevolution and Reinforcement Learning
Coevolutionary Temporal Difference Learning
2
Flexible Learner Architecture
Topology and Weight Evolving ANNs
N-tuple Networks
3
Coevolutionary Gradient Algorithms
4
Experimental Results
5
Summary
Outline
1
Starting Point – Previous Research
Coevolution and Reinforcement Learning
Coevolutionary Temporal Difference Learning
2
Flexible Learner Architecture
Topology and Weight Evolving ANNs
N-tuple Networks
3
Coevolutionary Gradient Algorithms
4
Experimental Results
5
Summary
Coevolutionary Algorithms
Coevolutionary Algorithms
Bio-inspired methods that attempt to harness Darwinian notions of
heredity and survival of the fittest but in contrast to traditional
evolutionary algorithms do not attempt to objectively measure the
fitness of individuals. Instead, individuals are compared on the
basis of their outcomes from interactions with other individuals.
Natural evolution is coevolution, where the fitness of an individual is
defined with respect to its competitors and collaborators, as well as to
the environment.
Simon M. Lucas
Coevolutionary Algorithms
The outcome of evaluating an individual in a coevolutionary algorithm depends
upon the context of whom the individual interacts with. This context sensitivity
is characteristic of coevolutionary systems and responsible for the complex
dynamics for which coevolution is (in)famous.
Sevan G. Ficici
Single-population coevolutionary algorithm
18 2 Coevolution
Algorithm 1: Basic scheme of a generational evolutionary algorithm
1:P ← createRandomPopulation() 2:A ← initializeArchive() 3:evaluatePopulation(A, P) 4:while ¬terminationCondition() do 5: S ← selectParents(P) 6: P ← recombineAndMutate(S) 7: evaluatePopulation(A, P) 8: updateArchive(A, P) 9:end while 10: return getFittestIndividual(A, P)
The family of EA is composed of a few methods that differ slightly in technical de-tails, but all match the basic scheme presented in Algorithm 1. The most important difference between these methods concerns so called representation which defines a mapping from phenotypes onto a set of genotypes and specifies what data structures are employed in this encoding. Phenotypes are objects forming solutions to the original problem, i.e., points of the problem space of possible solutions. Genotypes, on the other hand, are used to denote points in the evolutionary search space which are subject to genetic operations. The process of genotype-phenotype decoding is intended to model natural phenomenon of embryogenesis. More detailed description of these terms can be found in [Weise 09].
Returning to different dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA) [Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in Genetic Programming (GP) [Koza 92]. A certain representation might be preferable if it makes encoding solutions to a given problem more natural. Obviously, genetic operations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees between combined individuals.
The most significant advantage of EA lies in their flexibility and adaptability to the given task. This may be explained by their metaheuristic character of “black box” that makes only few assumptions about the underlying objective function which is the subject of optimization. EA are claimed to be robust problem solvers showing roughly good performance over a wide range of problems, as reported by Goldberg [Goldberg 89]. Especially the combination of EA with problem-specific heuristics including local-search based techniques, often make possible highly efficient opti-mization algorithms for many areas of application. Such hybridization of EA is getting popular due to their capabilities in handling real-world problems involving noisy environment, imprecision or uncertainty. The latest state-of-the-art
method-18 2 Coevolution
Procedure evaluatePopulation(A, P)
1: E ← selectEvaluators(A, P)
2: performInteractions(P, E)
3: aggregateInteractionOutcomes(P, E)
The family of EA is composed of a few methods that differ slightly in technical de-tails, but all match the basic scheme presented in Algorithm 1. The most important difference between these methods concerns so called representation which defines a mapping from phenotypes onto a set of genotypes and specifies what data structures are employed in this encoding. Phenotypes are objects forming solutions to the original problem, i.e., points of the problem space of possible solutions. Genotypes, on the other hand, are used to denote points in the evolutionary search space which are subject to genetic operations. The process of genotype-phenotype decoding is intended to model natural phenomenon of embryogenesis. More detailed description of these terms can be found in [Weise 09].
Returning to different dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA) [Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in Genetic Programming (GP) [Koza 92]. A certain representation might be preferable if it makes encoding solutions to a given problem more natural. Obviously, genetic operations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees between combined individuals.
The most significant advantage of EA lies in their flexibility and adaptability to the given task. This may be explained by their metaheuristic character of “black box” that makes only few assumptions about the underlying objective function which is the subject of optimization. EA are claimed to be robust problem solvers showing roughly good performance over a wide range of problems, as reported by Goldberg [Goldberg 89]. Especially the combination of EA with problem-specific heuristics including local-search based techniques, often make possible highly efficient opti-mization algorithms for many areas of application. Such hybridization of EA is getting popular due to their capabilities in handling real-world problems involving noisy environment, imprecision or uncertainty. The latest state-of-the-art method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].
Reinforcement Learning
Reinforcement Learning (RL)
Machine learning paradigm focused on solving problems in which
an agent interacts with an environment by taking actions and
receiving rewards at discrete time steps. The objective is to find
such a decision policy that maximizes cumulative reward.
Agent Environment
2.action at
3. reward rt
1.state st
4. learnon the basis of < st , at , rt , st+1 >
In board games:
agent =⇒ player
environment =⇒ game
state =⇒ board state
action =⇒ legal move
reward =⇒ game result
Temporal Difference Learning
Temporal Difference Learning (TDL)
RL method which attempts to estimate a value function by
observing the progression of states – the learner adjusts it to make
the value of the current state more like the value of the next state.
Value function V (b) can be represented as a neural network
with a modifiable weight vector w.
The adjustment is based on a gradient-descent update, e.g.
∆w
i:= ηeb
ie = v - v
Outline
1
Starting Point – Previous Research
Coevolution and Reinforcement Learning
Coevolutionary Temporal Difference Learning
2
Flexible Learner Architecture
Topology and Weight Evolving ANNs
N-tuple Networks
3
Coevolutionary Gradient Algorithms
4
Experimental Results
5
Summary
Coevolutionary Temporal Difference Learning
Coevolutionary Temporal Difference Learning (CTDL)
A hybrid of coevolutionary search with reinforcement learning that
works by interlacing one-population competitive coevolution with
temporal difference learning.
18
2 CoevolutionAlgorithm 1 Basic scheme of a generational evolutionary algorithm
1:
P ← createRandomPopulation()
2:A ← initializeArchive()
3:evaluatePopulation(A, P)
4:while ¬terminationCondition() do
5:S ← selectParents(P)
6:P ← recombineAndMutate(S)
7:individualReinforcementLearning(P)
8:evaluatePopulation(A, P)
9:updateArchive(A, P)
10:end while
11:return getFittestIndividual(A, P)
The family of EA is composed of a few methods that differ slightly in technical
de-tails, but all match the basic scheme presented in Algorithm 1. The most important
difference between these methods concerns so called representation which defines a
mapping from phenotypes onto a set of genotypes and specifies what data structures
are employed in this encoding. Phenotypes are objects forming solutions to the
original problem, i.e., points of the problem space of possible solutions. Genotypes,
on the other hand, are used to denote points in the evolutionary search space which
are subject to genetic operations. The process of genotype-phenotype decoding is
intended to model natural phenomenon of embryogenesis. More detailed description
of these terms can be found in [Weise 09].
Returning to different dialects of EA, candidate solutions are represented
typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)
[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite
state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in
Genetic Programming (GP) [Koza 92]. A certain representation might be preferable
if it makes encoding solutions to a given problem more natural. Obviously, genetic
operations of recombination and mutation must be adapted to chosen
representa-tion. For example, crossover in GP is usually based on exchanging subtrees between
combined individuals.
The most significant advantage of EA lies in their flexibility and adaptability to
the given task. This may be explained by their metaheuristic character of “black
box” that makes only few assumptions about the underlying objective function which
is the subject of optimization. EA are claimed to be robust problem solvers showing
roughly good performance over a wide range of problems, as reported by Goldberg
[Goldberg 89]. Especially the combination of EA with problem-specific heuristics
including local-search based techniques, often make possible highly efficient
opti-mization algorithms for many areas of application. Such hybridization of EA is
getting popular due to their capabilities in handling real-world problems involving
noisy environment, imprecision or uncertainty. The latest state-of-the-art
Relative Methods Performance Over Time
4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 0 10 20 30 40 points in tournaments games played (x 100 000) CTDL + HoF CTDL TDL CEL + HoF CELObservations and Motivation
Observations on learning Othello strategies
Temporal Difference Learning is much faster and under most
experimental settings it is able to learn better strategies.
Coevolution can eventually produce better strategies if it is
supported by an archive which sustains progress.
CTDL benefits from these complementary characteristics.
Motivation for further research on CTDL
No need for human expertise – useful when the knowledge of
the problem domain is unavailable or expensive to obtain.
Potential for employing more complex learner architecture.
Interesting biological interpretation.
Outline
1
Starting Point – Previous Research
Coevolution and Reinforcement Learning
Coevolutionary Temporal Difference Learning
2
Flexible Learner Architecture
Topology and Weight Evolving ANNs
N-tuple Networks
3
Coevolutionary Gradient Algorithms
4
Experimental Results
5
Summary
Outline
1
Starting Point – Previous Research
Coevolution and Reinforcement Learning
Coevolutionary Temporal Difference Learning
2
Flexible Learner Architecture
Topology and Weight Evolving ANNs
N-tuple Networks
3
Coevolutionary Gradient Algorithms
4
Experimental Results
5
Summary
Evolution of Artificial Neural Networks
Typically, a network topology is chosen before the experiment
and evolution searches the space of weight connections.
Can evolving topologies along with weights provide an
advantage over evolving weights on a fixed-topology?
Any continuous function can be approximated by a fully connected neural network
having only one internal hidden layer and with an arbitrary sigmoidal nonlinearity.
George V. Cybenko
Challenges of Topology and Weight Evolving ANNs (TWEANNs)
How to cross over disparate topologies in a meaningful way?
How can topological innovation that needs a few generations to be
optimized be protected so that it does not disappear prematurely?
How can topologies be minimized throughout evolution?
Evolvability and Neural Interference
Evolvability is an organism’s capacity to generate heritable phenotypic variation.
Marc Kirschner & John Gerhart
Evolvability of neural networks allows evolutionary algorithms to find weight
settings that produce a desired behavior or approximate a given function.
Julian Togelius
Topology of a neural network largely influences its evolvability
– it can be increased by removing single inputs or connections.
The availability of certain information at certain points in the
network can lead evolution into local optima.
Neural interference appears in nonmodular neural networks
that learn complex behavior consisting of multiple tasks.
Neuroevolution of Augmenting Topologies (NEAT)
Matching Topologies using Innovation Numbers
Different network structures (size and connection order) –
Competing Conventions Problem
NEAT performs artificial synapsis based on historical markings
Evolving NN’s through Augmenting TopologiesFigure 1: The competing conventions problem. The two networks compute the same exact function even though their hidden units appear in a different order and are repre-sented by different chromosomes, making them incompatible for crossover. The figure shows that the two single-point recombinations are both missing one of the 3 main components of each solution. The depicted networks are only 2 of the 6 possible per-mutations of hidden unit orderings.
We now turn to several specific problems with TWEANNs and address each in turn.
2.2 Competing Conventions
One of the main problems for NE is the Competing Conventions Problem (Montana and Davis, 1989; Schaffer et al., 1992), also known as the Permutations Problem (Radcliffe, 1993). Competing conventions means having more than one way to express a solution to a weight optimization problem with a neural network. When genomes represent-ing the same solution do not have the same encodrepresent-ing, crossover is likely to produce damaged offspring.
Figure 1 depicts the problem for a simple 3-hidden-unit network. The three hid-den neurons A, B, and C, can represent the same general solution in 3! = 6 different permutations. When one of these permutations crosses over with another, critical in-formation is likely to be lost. For example, crossing [A, B, C] and [C, B, A] can result in [C, B, C], a representation that has lost one third of the information that both of the parents had. In general, for n hidden units, there are n! functionally equivalent solu-tions. The problem can be further complicated with differing conventions, i.e., [A, B, C] and [D, B, E], which share functional interdependence on B.
An even more difficult form of competing conventions is present in TWEANNs, because TWEANN networks can represent similar solutions using entirely different topologies, or even genomes of different sizes. Because TWEANNs do not satisfy strict constraints on the kinds of topologies they produce, proposed solutions to the com-peting conventions problem for fixed or constrained topology networks such as nonre-dundant genetic encoding (Thierens, 1996) do not apply. Radcliffe (1993) goes as far as calling an integrated scheme combining connectivity and weights the “Holy Grail in
Evolutionary Computation Volume 10, Number 2 103
A 2 3 1 B C C 2 3 1 B A [A,B,C] [A,B,A] [C,B,C] Crossovers: [C,B,A]
x
(both are missing information)Figure comes from “Evolving Neural Networks through Augmenting Topologies” by K. Stanley and R. Miikkulainen
Neuroevolution of Augmenting Topologies (NEAT)
Two types of structural mutations in NEAT:
2 NEUROEVOLUTION OF
AUGMENTING TOPOLOGIES (NEAT)
The NEAT method of evolving artificial neural networks
combines the usual search for appropriate network weights
with complexification of the network structure. This
ap-proach is highly effective: NEAT outperforms other
neu-roevolution (NE) methods, e.g. on the benchmark double
pole balancing task by a factor of five (Stanley and
Miik-kulainen 2001, 2002b,c). The NEAT method consists of
so-lutions to three fundamental challenges in evolving neural
network topology: (1) What kind of genetic representation
would allow disparate topologies to crossover in a
mean-ingful way? (2) How can topological innovation that needs
a few generations to optimize be protected so that it does
not disappear from the population prematurely? (3) How
can topologies be minimized throughout evolution so the
most efficient solutions will be discovered? In this section,
we explain how NEAT addresses each challenge.
12.1 GENETIC ENCODING
Evolving structure requires a flexible genetic encoding. In
order to allow structures to complexify, their
representa-tions must be dynamic and expandable. Each genome in
NEAT includes a list of connection genes, each of which
refers to two node genes being connected. Each
connec-tion gene specifies the in-node, the out-node, the weight of
the connection, whether or not the connection gene is
ex-pressed (an enable bit), and an innovation number, which
allows finding corresponding genes during crossover.
Mutation in NEAT can change both connection weights and
network structures. Connection weights mutate as in any
NE system, with each connection either perturbed or not.
Structural mutations, which form the basis of
complexifi-cation, occur in two ways (figure 1). In the add connection
mutation, a single new connection gene is added
connect-ing two previously unconnected nodes. In the add node
mutation an existing connection is split and the new node
placed where the old connection used to be. The old
con-nection is disabled and two new concon-nections are added to
the genome. This method of adding nodes was chosen in
order to integrate new nodes immediately into the network.
Through mutation, genomes of varying sizes are created,
sometimes with completely different connections specified
at the same positions.
In order to perform crossover, the system must be able to
tell which genes match up between any individuals in the
population. The key observation is that two genes that have
the same historical origin represent the same structure
(al-1A more comprehensive description of the NEAT method is
given in Stanley and Miikkulainen (2001, 2002c).
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 6 5 1!>4 1!>4 1!>4 1!>4 2!>4 2!>4 2!>4 2!>4 2!>5 2!>5 2!>5 2!>5 3!>5 3!>5 3!>5 3!>5 4!>5 4!>5 4!>5 4!>5 3!>4 3!>6 6!>5 DIS DIS DIS DIS DIS 1 1 1 1 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 8 9
Mutate Add Connection
Mutate Add Node
Figure 1:
The two types of structural mutation in NEAT.Both types, adding a connection and adding a node, are illus-trated with the genes above their phenotypes. The top number in each genome is the innovation number of that gene. The bottom two numbers denote the two nodes connected by that gene. The weight of the connection, also encoded in the gene, is not shown. The symbol DIS means that the gene is disabled, and therefore not expressed in the network. The figure shows how connection genes are appended to the genome when a new connection is added to the network and when a new node is added. Assuming the de-picted mutations occurred one after the other, the genes would be assigned increasing innovation numbers as the figure illustrates, thereby allowing NEAT to keep an implicit history of the origin of every gene in the population.
though possibly with different weights), since they were
both derived from the same ancestral gene from some point
in the past. Thus, all a system needs to do to know which
genes line up with which is to keep track of the historical
origin of every gene in the system.
Tracking the historical origins requires very little
compu-tation. Whenever a new gene appears (through structural
mutation), a global innovation number is incremented and
assigned to that gene. The innovation numbers thus
rep-resent a chronology of every gene in the system. As an
example, let us say the two mutations in figure 1 occurred
one after another in the system. The new connection gene
created in the first mutation is assigned the number , and
the two new connection genes added during the new node
mutation are assigned the numbers
and . In the future,
whenever these genomes crossover, the offspring will
in-herit the same innovation numbers on each gene;
innova-tion numbers are never changed. Thus, the historical origin
of every gene in the system is known throughout evolution.
Through innovation numbers, the system now knows
ex-actly which genes match up with which. Genes that do not
match are either disjoint or excess, depending on whether
they occur within or outside the range of the other parent’s
innovation numbers. When crossing over, the genes in both
Figure comes from “Evolving Neural Networks through Augmenting Topologies” by K. Stanley and R. Miikkulainen
Neuroevolution of Augmenting Topologies (NEAT)
Protecting Innovation through Speciation
Changing the topology of a network is often very disruptive.
Structural innovation is unlikely to survive in the population.
NEAT divides the population into species that compete
primarily within their own niches.
Minimizing Dimensionality
Forcing minimal topologies could be achieved by incorporating
network size into the fitness function.
NEAT biases the search towards minimal-dimensional spaces
by starting with a population with no hidden nodes.
Outline
1
Starting Point – Previous Research
Coevolution and Reinforcement Learning
Coevolutionary Temporal Difference Learning
2
Flexible Learner Architecture
Topology and Weight Evolving ANNs
N-tuple Networks
3
Coevolutionary Gradient Algorithms
4
Experimental Results
5
Summary
N-tuple Network Architecture
Type of ANN that operates on compound object (matrix,
image) x which elements can be easily indexed and retrieved.
Formed by a set of m tuples – each created by (randomly)
sampling input object with n locations.
29 39 191 134 210 203
195 189 90
Figure comes from “Face Recognition with the Continuous N-tuple Classifier” by S. M. Lucas
N-tuple Network Output Value
Each input location has v possible values – a single n-tuple
represents an n-digit number in base-v numeral system.
Each n-tuple has an associated look-up table (LUT) which
contains parameters equivalent to weights in standard ANN.
Locations a
ij, where
j =0..n−1specified by each n-tuple t
iare
used to identify an address in a look-up table.
The output of the network is calculated by summing LUT
values indexed by particular n-tuples:
f (x) =
mX
i =0f
i(x) =
mX
i =0LUT
i
n−1X
j =0x(a
ij)v
j
N-tuple Network for Othello
In the context of Othello, an n-tuple network acts as a state
evaluation function – computes utility of a given board state.
2 0 1 1 0 2 0 0 0.57 26 - 0.02 - 0.34 0.87 1 19
LUT
1 0 0.43 80 0.09 - 0.76 - 0.21 1 33LUT
2Snake-shaped inputs are randomly assigned and stay fixed
while learning affects weights in the look-up table.
N-tuple Network as TWEANN
Structural Genetic Operators
Mutation consists in changing the input assignment of a single
element of a tuple to one of its neighbouring locations.
Size of tuples remains constant throughout the evolution.
Crossover is restricted to exchanging whole tuples.
Each tuple represents an independent module that can be
easily combined with other modules.
Innovations are protected by applying an intensive individual
learning to a newly created structures.
Size of the representation does not grow.
Outline
1
Starting Point – Previous Research
Coevolution and Reinforcement Learning
Coevolutionary Temporal Difference Learning
2
Flexible Learner Architecture
Topology and Weight Evolving ANNs
N-tuple Networks
3
Coevolutionary Gradient Algorithms
4
Experimental Results
5
Summary
Coevolutionary Gradient Search Process
Our approach is to analyse characteristics of the problem search space and
thence to identify the algorithms (within the class considered) which exploit
these characteristics – we pay for our lunch, one might say.
Lionel Barnett
We aim to search both spaces in parallel – discrete network
topology space and continuous weight space.
How to move in these spaces to gain from their character?
Coevolutionary Gradient Search
Directed gradient search – numerically estimates direction of
change in the vicinity of the current candidate solution.
Undirected coevolutionary search – stochastically jumps over
the search space starting from the fittest configurations.
Search Operators
Genetic Operators
Following genetic operators operate on the fittest individuals:
Weight mutation (m
w)
Topology mutation (m
t)
Topology crossover (x )
Gradient Operators
Gradient-based search operators work in the weight space and
consist in a single gradient-descent TDL learning scenario.
How to create a competitive learning environment?
self-play scenario (s)
population opponent (p)
archival opponent (a)
Guiding the Search Process
Interactions between candidate solutions is the only source of
information that guides the search process.
18
2 Coevolution
Algorithm 1 Basic scheme of a generational evolutionary algorithm
P ← createRandomPopulation()
evaluatePopulation(P)
while ¬terminationCondition() do
S ← selectParents(P)
P ← recombineAndMutate(S)
evaluatePopulation(P)
end while
return getFittestIndividual(P)
The family of EA is composed of a few methods that differ slightly in technical
de-tails, but all match the basic scheme presented in Algorithm 1. The most important
difference between these methods concerns so called representation which defines a
mapping from phenotypes onto a set of genotypes and specifies what data structures
are employed in this encoding. Phenotypes are objects forming solutions to the
original problem, i.e., points of the problem space of possible solutions. Genotypes,
on the other hand, are used to denote points in the evolutionary search space which
are subject to genetic operations. The process of genotype-phenotype decoding is
intended to model natural phenomenon of embryogenesis. More detailed description
of these terms can be found in [Weise 09].
Returning to different dialects of EA, candidate solutions are represented
typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)
[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite
state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in
Genetic Programming (GP) [Koza 92]. A certain representation might be preferable
if it makes encoding solutions to a given problem more natural. Obviously, genetic
operations of recombination and mutation must be adapted to chosen
representa-tion. For example, crossover in GP is usually based on exchanging subtrees between
combined individuals.
The most significant advantage of EA lies in their flexibility and adaptability to
the given task. This may be explained by their metaheuristic character of “black
box” that makes only few assumptions about the underlying objective function which
is the subject of optimization. EA are claimed to be robust problem solvers showing
roughly good performance over a wide range of problems, as reported by Goldberg
[Goldberg 89]. Especially the combination of EA with problem-specific heuristics
including local-search based techniques, often make possible highly efficient
opti-mization algorithms for many areas of application. Such hybridization of EA is
getting popular due to their capabilities in handling real-world problems involving
noisy environment, imprecision or uncertainty. The latest state-of-the-art
method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].
18
2 Coevolution
Algorithm 1 Basic scheme of a generational evolutionary algorithm
P ← createRandomPopulation()
evaluatePopulation(A)
while ¬terminationCondition() do
S ← selectParents(P)
P ← recombineAndMutate(S)
evaluatePopulation(P)
end while
return getFittestIndividual(P)
The family of EA is composed of a few methods that differ slightly in technical
de-tails, but all match the basic scheme presented in Algorithm 1. The most important
difference between these methods concerns so called representation which defines a
mapping from phenotypes onto a set of genotypes and specifies what data structures
are employed in this encoding. Phenotypes are objects forming solutions to the
original problem, i.e., points of the problem space of possible solutions. Genotypes,
on the other hand, are used to denote points in the evolutionary search space which
are subject to genetic operations. The process of genotype-phenotype decoding is
intended to model natural phenomenon of embryogenesis. More detailed description
of these terms can be found in [Weise 09].
Returning to different dialects of EA, candidate solutions are represented
typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)
[Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite
state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in
Genetic Programming (GP) [Koza 92]. A certain representation might be preferable
if it makes encoding solutions to a given problem more natural. Obviously, genetic
operations of recombination and mutation must be adapted to chosen
representa-tion. For example, crossover in GP is usually based on exchanging subtrees between
combined individuals.
The most significant advantage of EA lies in their flexibility and adaptability to
the given task. This may be explained by their metaheuristic character of “black
box” that makes only few assumptions about the underlying objective function which
is the subject of optimization. EA are claimed to be robust problem solvers showing
roughly good performance over a wide range of problems, as reported by Goldberg
[Goldberg 89]. Especially the combination of EA with problem-specific heuristics
including local-search based techniques, often make possible highly efficient
opti-mization algorithms for many areas of application. Such hybridization of EA is
getting popular due to their capabilities in handling real-world problems involving
noisy environment, imprecision or uncertainty. The latest state-of-the-art
method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].
1. Play round robin tournament between population members
2. Randomly select archival individuals to act as opponents
3. Select the best-of-generation individual and add it to the archive
Search operators use different types of interaction feedback.
Outline
1
Starting Point – Previous Research
Coevolution and Reinforcement Learning
Coevolutionary Temporal Difference Learning
2
Flexible Learner Architecture
Topology and Weight Evolving ANNs
N-tuple Networks
3
Coevolutionary Gradient Algorithms
4
Experimental Results
5
Summary
Learning 7 x 4 N-tuple Networks
0 0.1 0.2 0.3 0.4 0.5 0.6 0 200 400 600 800 1000 1200 1400 1600 1800 2000average percentage score
games played (x 1,000) CTDL-sxmw + HoF CTDL-sxmw TDL CEL + HoF CEL
Learning 9 x 5 N-tuple Networks
0 0.1 0.2 0.3 0.4 0.5 0.6 0 200 400 600 800 1000 1200 1400 1600 1800 2000average percentage score
games played (x 1,000) TDL CTDL-sxmw + HoF CTDL-sxmw CEL + HoF CEL
Learning 12 x 6 N-tuple Networks
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 200 400 600 800 1000 1200 1400 1600 1800 2000average percentage score
games played (x 1,000) ETDL-sxmt CTDL-sxmt + HoF TDL CTDL-sxmw CEL
Relative Performance of Self-play Methods
2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 0 400 800 1200 1600 2000 points in tournaments games played (x 1,000) TDL PTDL CTDL-s CTDL-sx CTDL-sxmt CTDL-sxmt + HOFRelative Performance of Mutual-play Methods
2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 0 400 800 1200 1600 2000 points in tournaments games played (x 1,000) CTDL-p CTDL-px CTDL-pxmt CTDL-ax + HoF CTDL-asxmt + HoFRelative Performance of All Methods
2000 3000 4000 5000 6000 7000 8000 9000 10000 0 400 800 1200 1600 2000 points in tournaments games played (x 1,000) CTDL-px ETDL-sxmt CTDL-sxmt CTDL-sxmt + HoF CTDL-asxmt + HoFEvolutionary Player in the Othello League
Outline
1
Starting Point – Previous Research
Coevolution and Reinforcement Learning
Coevolutionary Temporal Difference Learning
2
Flexible Learner Architecture
Topology and Weight Evolving ANNs
N-tuple Networks
3
Coevolutionary Gradient Algorithms
4
Experimental Results
5