### and their Application to Othello

### Marcin Szubert

### Krzysztof Krawiec

Institute of Computing Science Poznan University of Technology

### Outline

1

### Starting Point – Previous Research

### Coevolution and Reinforcement Learning

### Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

### Topology and Weight Evolving ANNs

### N-tuple Networks

3

### Coevolutionary Gradient Algorithms

4

### Experimental Results

5

### Summary

### Outline

1

### Starting Point – Previous Research

### Coevolution and Reinforcement Learning

### Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

### Topology and Weight Evolving ANNs

### N-tuple Networks

3

### Coevolutionary Gradient Algorithms

4

### Experimental Results

5

### Summary

### Outline

1

### Starting Point – Previous Research

### Coevolution and Reinforcement Learning

### Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

### Topology and Weight Evolving ANNs

### N-tuple Networks

3

### Coevolutionary Gradient Algorithms

4

### Experimental Results

5

### Summary

### Coevolutionary Algorithms

### Coevolutionary Algorithms

### Bio-inspired methods that attempt to harness Darwinian notions of

### heredity and survival of the fittest but in contrast to traditional

### evolutionary algorithms do not attempt to objectively measure the

### fitness of individuals. Instead, individuals are compared on the

### basis of their outcomes from interactions with other individuals.

### Natural evolution is coevolution, where the fitness of an individual is

### defined with respect to its competitors and collaborators, as well as to

### the environment.

### Simon M. Lucas

### Coevolutionary Algorithms

### The outcome of evaluating an individual in a coevolutionary algorithm depends

### upon the context of whom the individual interacts with. This context sensitivity

### is characteristic of coevolutionary systems and responsible for the complex

### dynamics for which coevolution is (in)famous.

### Sevan G. Ficici

### Single-population coevolutionary algorithm

18 *2 Coevolution*

**Algorithm 1: Basic scheme of a generational evolutionary algorithm**

1:* _{P ← createRandomPopulation()}*
2:

*3:*

_{A ← initializeArchive()}*evaluatePopulation(A, P)*4:

*5:*

_{while ¬terminationCondition() do}*S ← selectParents(P)*6:

*P ← recombineAndMutate(S)*7:

*evaluatePopulation(A, P)*8:

*updateArchive(A, P)*9:

**end while**10:

**return getFittestIndividual(A, P)**The family of EA is composed of a few methods that differ slightly in technical
de-tails, but all match the basic scheme presented in Algorithm 1. The most important
*difference between these methods concerns so called representation which defines a*
*mapping from phenotypes onto a set of genotypes and specifies what data structures*
are employed in this encoding. Phenotypes are objects forming solutions to the
*original problem, i.e., points of the problem space of possible solutions. Genotypes,*
*on the other hand, are used to denote points in the evolutionary search space which*
are subject to genetic operations. The process of genotype-phenotype decoding is
*intended to model natural phenomenon of embryogenesis. More detailed description*
of these terms can be found in [Weise 09].

Returning to different dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA) [Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in Genetic Programming (GP) [Koza 92]. A certain representation might be preferable if it makes encoding solutions to a given problem more natural. Obviously, genetic operations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees between combined individuals.

The most significant advantage of EA lies in their flexibility and adaptability to the given task. This may be explained by their metaheuristic character of “black box” that makes only few assumptions about the underlying objective function which is the subject of optimization. EA are claimed to be robust problem solvers showing roughly good performance over a wide range of problems, as reported by Goldberg [Goldberg 89]. Especially the combination of EA with problem-specific heuristics including local-search based techniques, often make possible highly efficient opti-mization algorithms for many areas of application. Such hybridization of EA is getting popular due to their capabilities in handling real-world problems involving noisy environment, imprecision or uncertainty. The latest state-of-the-art

method-18 *2 Coevolution*

**Procedure evaluatePopulation(A, P)**

1: *E ← selectEvaluators(A, P)*

2: *performInteractions(P, E)*

3: *aggregateInteractionOutcomes(P, E)*

The family of EA is composed of a few methods that differ slightly in technical
de-tails, but all match the basic scheme presented in Algorithm 1. The most important
*difference between these methods concerns so called representation which defines a*
*mapping from phenotypes onto a set of genotypes and specifies what data structures*
are employed in this encoding. Phenotypes are objects forming solutions to the
*original problem, i.e., points of the problem space of possible solutions. Genotypes,*
*on the other hand, are used to denote points in the evolutionary search space which*
are subject to genetic operations. The process of genotype-phenotype decoding is
*intended to model natural phenomenon of embryogenesis. More detailed description*
of these terms can be found in [Weise 09].

Returning to different dialects of EA, candidate solutions are represented typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA) [Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in Genetic Programming (GP) [Koza 92]. A certain representation might be preferable if it makes encoding solutions to a given problem more natural. Obviously, genetic operations of recombination and mutation must be adapted to chosen representa-tion. For example, crossover in GP is usually based on exchanging subtrees between combined individuals.

The most significant advantage of EA lies in their flexibility and adaptability to the given task. This may be explained by their metaheuristic character of “black box” that makes only few assumptions about the underlying objective function which is the subject of optimization. EA are claimed to be robust problem solvers showing roughly good performance over a wide range of problems, as reported by Goldberg [Goldberg 89]. Especially the combination of EA with problem-specific heuristics including local-search based techniques, often make possible highly efficient opti-mization algorithms for many areas of application. Such hybridization of EA is getting popular due to their capabilities in handling real-world problems involving noisy environment, imprecision or uncertainty. The latest state-of-the-art method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].

### Reinforcement Learning

### Reinforcement Learning (RL)

### Machine learning paradigm focused on solving problems in which

### an agent interacts with an environment by taking actions and

### receiving rewards at discrete time steps. The objective is to find

### such a decision policy that maximizes cumulative reward.

**Agent**
**Environment**

2.action** a**_{t}

3. reward** r**_{t}

1.state** s**_{t}

4. learnon the basis of **< s**_{t}_{ , a}_{t}_{ , r}_{t}_{ , s}_{t+1}** >**

### In board games:

### agent =⇒ player

### environment =⇒ game

### state =⇒ board state

### action =⇒ legal move

### reward =⇒ game result

### Temporal Difference Learning

### Temporal Difference Learning (TDL)

### RL method which attempts to estimate a value function by

### observing the progression of states – the learner adjusts it to make

### the value of the current state more like the value of the next state.

### Value function V (b) can be represented as a neural network

### with a modifiable weight vector w.

### The adjustment is based on a gradient-descent update, e.g.

### ∆w

i### := ηeb

i### e = v - v

### Outline

1

### Starting Point – Previous Research

### Coevolution and Reinforcement Learning

### Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

### Topology and Weight Evolving ANNs

### N-tuple Networks

3

### Coevolutionary Gradient Algorithms

4

### Experimental Results

5

### Summary

### Coevolutionary Temporal Difference Learning

### Coevolutionary Temporal Difference Learning (CTDL)

### A hybrid of coevolutionary search with reinforcement learning that

### works by interlacing one-population competitive coevolution with

### temporal difference learning.

### 18

*2 Coevolution*

**Algorithm 1 Basic scheme of a generational evolutionary algorithm**

1:

_{P ← createRandomPopulation()}

2: _{P ← createRandomPopulation()}

*A ← initializeArchive()*

3: _{evaluatePopulation(A, P)}

4: _{evaluatePopulation(A, P)}

_{while ¬terminationCondition() do}

5:

_{while ¬terminationCondition() do}*S ← selectParents(P)*

6: _{P ← recombineAndMutate(S)}

7: _{P ← recombineAndMutate(S)}

*individualReinforcementLearning(P)*

8: *evaluatePopulation(A, P)*

9: *updateArchive(A, P)*

10: **end while**

11: _{return getFittestIndividual(A, P)}

_{return getFittestIndividual(A, P)}### The family of EA is composed of a few methods that differ slightly in technical

### de-tails, but all match the basic scheme presented in Algorithm 1. The most important

*difference between these methods concerns so called representation which defines a*

*mapping from phenotypes onto a set of genotypes and specifies what data structures*

### are employed in this encoding. Phenotypes are objects forming solutions to the

*original problem, i.e., points of the problem space of possible solutions. Genotypes,*

*on the other hand, are used to denote points in the evolutionary search space which*

### are subject to genetic operations. The process of genotype-phenotype decoding is

*intended to model natural phenomenon of embryogenesis. More detailed description*

### of these terms can be found in [Weise 09].

### Returning to different dialects of EA, candidate solutions are represented

### typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)

### [Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite

### state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in

### Genetic Programming (GP) [Koza 92]. A certain representation might be preferable

### if it makes encoding solutions to a given problem more natural. Obviously, genetic

### operations of recombination and mutation must be adapted to chosen

### representa-tion. For example, crossover in GP is usually based on exchanging subtrees between

### combined individuals.

### The most significant advantage of EA lies in their flexibility and adaptability to

### the given task. This may be explained by their metaheuristic character of “black

### box” that makes only few assumptions about the underlying objective function which

### is the subject of optimization. EA are claimed to be robust problem solvers showing

### roughly good performance over a wide range of problems, as reported by Goldberg

### [Goldberg 89]. Especially the combination of EA with problem-specific heuristics

### including local-search based techniques, often make possible highly efficient

### opti-mization algorithms for many areas of application. Such hybridization of EA is

### getting popular due to their capabilities in handling real-world problems involving

### noisy environment, imprecision or uncertainty. The latest state-of-the-art

### Relative Methods Performance Over Time

4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 0 10 20 30 40 points in tournaments games played (x 100 000) CTDL + HoF CTDL TDL CEL + HoF CEL### Observations and Motivation

### Observations on learning Othello strategies

### Temporal Difference Learning is much faster and under most

### experimental settings it is able to learn better strategies.

### Coevolution can eventually produce better strategies if it is

### supported by an archive which sustains progress.

### CTDL benefits from these complementary characteristics.

### Motivation for further research on CTDL

### No need for human expertise – useful when the knowledge of

### the problem domain is unavailable or expensive to obtain.

### Potential for employing more complex learner architecture.

### Interesting biological interpretation.

### Outline

1

### Starting Point – Previous Research

### Coevolution and Reinforcement Learning

### Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

### Topology and Weight Evolving ANNs

### N-tuple Networks

3

### Coevolutionary Gradient Algorithms

4

### Experimental Results

5

### Summary

### Outline

1

### Starting Point – Previous Research

### Coevolution and Reinforcement Learning

### Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

### Topology and Weight Evolving ANNs

### N-tuple Networks

3

### Coevolutionary Gradient Algorithms

4

### Experimental Results

5

### Summary

### Evolution of Artificial Neural Networks

### Typically, a network topology is chosen before the experiment

### and evolution searches the space of weight connections.

### Can evolving topologies along with weights provide an

### advantage over evolving weights on a fixed-topology?

### Any continuous function can be approximated by a fully connected neural network

### having only one internal hidden layer and with an arbitrary sigmoidal nonlinearity.

### George V. Cybenko

### Challenges of Topology and Weight Evolving ANNs (TWEANNs)

### How to cross over disparate topologies in a meaningful way?

### How can topological innovation that needs a few generations to be

### optimized be protected so that it does not disappear prematurely?

### How can topologies be minimized throughout evolution?

### Evolvability and Neural Interference

### Evolvability is an organism’s capacity to generate heritable phenotypic variation.

### Marc Kirschner & John Gerhart

### Evolvability of neural networks allows evolutionary algorithms to find weight

### settings that produce a desired behavior or approximate a given function.

### Julian Togelius

### Topology of a neural network largely influences its evolvability

### – it can be increased by removing single inputs or connections.

### The availability of certain information at certain points in the

### network can lead evolution into local optima.

### Neural interference appears in nonmodular neural networks

### that learn complex behavior consisting of multiple tasks.

### Neuroevolution of Augmenting Topologies (NEAT)

### Matching Topologies using Innovation Numbers

### Different network structures (size and connection order) –

### Competing Conventions Problem

### NEAT performs artificial synapsis based on historical markings

Evolving NN’s through Augmenting TopologiesFigure 1: The competing conventions problem. The two networks compute the same exact function even though their hidden units appear in a different order and are repre-sented by different chromosomes, making them incompatible for crossover. The figure shows that the two single-point recombinations are both missing one of the 3 main components of each solution. The depicted networks are only 2 of the 6 possible per-mutations of hidden unit orderings.

We now turn to several specific problems with TWEANNs and address each in turn.

**2.2 Competing Conventions**

*One of the main problems for NE is the Competing Conventions Problem (Montana and*
*Davis, 1989; Schaffer et al., 1992), also known as the Permutations Problem (Radcliffe,*
1993). Competing conventions means having more than one way to express a solution
to a weight optimization problem with a neural network. When genomes
represent-ing the same solution do not have the same encodrepresent-ing, crossover is likely to produce
damaged offspring.

Figure 1 depicts the problem for a simple 3-hidden-unit network. The three
*hid-den neurons A, B, and C, can represent the same general solution in 3! = 6 different*
permutations. When one of these permutations crosses over with another, critical
*in-formation is likely to be lost. For example, crossing [A, B, C] and [C, B, A] can result*
*in [C, B, C], a representation that has lost one third of the information that both of the*
*parents had. In general, for n hidden units, there are n! functionally equivalent *
*solu-tions. The problem can be further complicated with differing conventions, i.e., [A, B, C]*
*and [D, B, E], which share functional interdependence on B.*

An even more difficult form of competing conventions is present in TWEANNs, because TWEANN networks can represent similar solutions using entirely different topologies, or even genomes of different sizes. Because TWEANNs do not satisfy strict constraints on the kinds of topologies they produce, proposed solutions to the com-peting conventions problem for fixed or constrained topology networks such as nonre-dundant genetic encoding (Thierens, 1996) do not apply. Radcliffe (1993) goes as far as calling an integrated scheme combining connectivity and weights the “Holy Grail in

Evolutionary Computation Volume 10, Number 2 103

A 2 3 1 B C C 2 3 1 B A [A,B,C] [A,B,A] [C,B,C] Crossovers: [C,B,A]

### x

(both are missing information)Figure comes from “Evolving Neural Networks through Augmenting Topologies” by K. Stanley and R. Miikkulainen

### Neuroevolution of Augmenting Topologies (NEAT)

### Two types of structural mutations in NEAT:

**2 NEUROEVOLUTION OF**

**AUGMENTING TOPOLOGIES (NEAT)**

### The NEAT method of evolving artificial neural networks

### combines the usual search for appropriate network weights

### with complexification of the network structure. This

### ap-proach is highly effective: NEAT outperforms other

### neu-roevolution (NE) methods, e.g. on the benchmark double

### pole balancing task by a factor of five (Stanley and

### Miik-kulainen 2001, 2002b,c). The NEAT method consists of

### so-lutions to three fundamental challenges in evolving neural

### network topology: (1) What kind of genetic representation

### would allow disparate topologies to crossover in a

### mean-ingful way? (2) How can topological innovation that needs

### a few generations to optimize be protected so that it does

### not disappear from the population prematurely? (3) How

*can topologies be minimized throughout evolution so the*

### most efficient solutions will be discovered? In this section,

### we explain how NEAT addresses each challenge.

1**2.1 GENETIC ENCODING**

### Evolving structure requires a flexible genetic encoding. In

### order to allow structures to complexify, their

### representa-tions must be dynamic and expandable. Each genome in

*NEAT includes a list of connection genes, each of which*

*refers to two node genes being connected. Each *

### connec-tion gene specifies the in-node, the out-node, the weight of

### the connection, whether or not the connection gene is

*ex-pressed (an enable bit), and an innovation number, which*

### allows finding corresponding genes during crossover.

### Mutation in NEAT can change both connection weights and

### network structures. Connection weights mutate as in any

### NE system, with each connection either perturbed or not.

### Structural mutations, which form the basis of

*complexifi-cation, occur in two ways (figure 1). In the add connection*

### mutation, a single new connection gene is added

*connect-ing two previously unconnected nodes. In the add node*

### mutation an existing connection is split and the new node

### placed where the old connection used to be. The old

### con-nection is disabled and two new concon-nections are added to

### the genome. This method of adding nodes was chosen in

### order to integrate new nodes immediately into the network.

### Through mutation, genomes of varying sizes are created,

### sometimes with completely different connections specified

### at the same positions.

### In order to perform crossover, the system must be able to

*tell which genes match up between any individuals in the*

### population. The key observation is that two genes that have

### the same historical origin represent the same structure

(al-1_{A more comprehensive description of the NEAT method is}

given in Stanley and Miikkulainen (2001, 2002c).

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 6 5 1!>4 1!>4 1!>4 1!>4 2!>4 2!>4 2!>4 2!>4 2!>5 2!>5 2!>5 2!>5 3!>5 3!>5 3!>5 3!>5 4!>5 4!>5 4!>5 4!>5 3!>4 3!>6 6!>5 DIS DIS DIS DIS DIS 1 1 1 1 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 8 9

**Mutate Add Connection**

**Mutate Add Node**

### Figure 1:

**The two types of structural mutation in NEAT.**

Both types, adding a connection and adding a node, are
illus-trated with the genes above their phenotypes. The top number in
*each genome is the innovation number of that gene. The bottom*
two numbers denote the two nodes connected by that gene. The
weight of the connection, also encoded in the gene, is not shown.
*The symbol DIS means that the gene is disabled, and therefore not*
expressed in the network. The figure shows how connection genes
are appended to the genome when a new connection is added to
the network and when a new node is added. Assuming the
de-picted mutations occurred one after the other, the genes would be
assigned increasing innovation numbers as the figure illustrates,
thereby allowing NEAT to keep an implicit history of the origin
of every gene in the population.

### though possibly with different weights), since they were

### both derived from the same ancestral gene from some point

### in the past. Thus, all a system needs to do to know which

### genes line up with which is to keep track of the historical

### origin of every gene in the system.

### Tracking the historical origins requires very little

### compu-tation. Whenever a new gene appears (through structural

*mutation), a global innovation number is incremented and*

### assigned to that gene. The innovation numbers thus

### rep-resent a chronology of every gene in the system. As an

### example, let us say the two mutations in figure 1 occurred

### one after another in the system. The new connection gene

### created in the first mutation is assigned the number , and

### the two new connection genes added during the new node

### mutation are assigned the numbers

### and . In the future,

### whenever these genomes crossover, the offspring will

### in-herit the same innovation numbers on each gene;

### innova-tion numbers are never changed. Thus, the historical origin

### of every gene in the system is known throughout evolution.

### Through innovation numbers, the system now knows

### ex-actly which genes match up with which. Genes that do not

*match are either disjoint or excess, depending on whether*

### they occur within or outside the range of the other parent’s

### innovation numbers. When crossing over, the genes in both

Figure comes from “Evolving Neural Networks through Augmenting Topologies” by K. Stanley and R. Miikkulainen

### Neuroevolution of Augmenting Topologies (NEAT)

### Protecting Innovation through Speciation

### Changing the topology of a network is often very disruptive.

### Structural innovation is unlikely to survive in the population.

### NEAT divides the population into species that compete

### primarily within their own niches.

### Minimizing Dimensionality

### Forcing minimal topologies could be achieved by incorporating

### network size into the fitness function.

### NEAT biases the search towards minimal-dimensional spaces

### by starting with a population with no hidden nodes.

### Outline

1

### Starting Point – Previous Research

### Coevolution and Reinforcement Learning

### Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

### Topology and Weight Evolving ANNs

### N-tuple Networks

3

### Coevolutionary Gradient Algorithms

4

### Experimental Results

5

### Summary

### N-tuple Network Architecture

### Type of ANN that operates on compound object (matrix,

### image) x which elements can be easily indexed and retrieved.

### Formed by a set of m tuples – each created by (randomly)

### sampling input object with n locations.

29 39 191 134 210 203

195 189 90

Figure comes from “Face Recognition with the Continuous N-tuple Classifier” by S. M. Lucas

### N-tuple Network Output Value

### Each input location has v possible values – a single n-tuple

### represents an n-digit number in base-v numeral system.

### Each n-tuple has an associated look-up table (LUT) which

### contains parameters equivalent to weights in standard ANN.

### Locations a

ij### , where

j =0..n−1### specified by each n-tuple t

i### are

### used to identify an address in a look-up table.

### The output of the network is calculated by summing LUT

### values indexed by particular n-tuples:

### f (x) =

m### X

i =0### f

i### (x) =

m### X

i =0### LUT

i###

###

n−1### X

j =0### x(a

ij### )v

j###

###

### N-tuple Network for Othello

### In the context of Othello, an n-tuple network acts as a state

### evaluation function – computes utility of a given board state.

**2** **0** **1**
**1** **0** **2**
**0**
**0** **0.57**
**26** **- 0.02**
**- 0.34**
**0.87**
**1**
**19**

**LUT**

1
**0**

**0.43**

**80**

**0.09**

**- 0.76**

**- 0.21**

**1**

**33**

**LUT**

2
### Snake-shaped inputs are randomly assigned and stay fixed

### while learning affects weights in the look-up table.

### N-tuple Network as TWEANN

### Structural Genetic Operators

### Mutation consists in changing the input assignment of a single

### element of a tuple to one of its neighbouring locations.

### Size of tuples remains constant throughout the evolution.

### Crossover is restricted to exchanging whole tuples.

### Each tuple represents an independent module that can be

### easily combined with other modules.

### Innovations are protected by applying an intensive individual

### learning to a newly created structures.

### Size of the representation does not grow.

### Outline

1

### Starting Point – Previous Research

### Coevolution and Reinforcement Learning

### Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

### Topology and Weight Evolving ANNs

### N-tuple Networks

3

### Coevolutionary Gradient Algorithms

4

### Experimental Results

5

### Summary

### Coevolutionary Gradient Search Process

### Our approach is to analyse characteristics of the problem search space and

### thence to identify the algorithms (within the class considered) which exploit

### these characteristics – we pay for our lunch, one might say.

### Lionel Barnett

### We aim to search both spaces in parallel – discrete network

### topology space and continuous weight space.

### How to move in these spaces to gain from their character?

### Coevolutionary Gradient Search

### Directed gradient search – numerically estimates direction of

### change in the vicinity of the current candidate solution.

### Undirected coevolutionary search – stochastically jumps over

### the search space starting from the fittest configurations.

### Search Operators

### Genetic Operators

### Following genetic operators operate on the fittest individuals:

### Weight mutation (m

w### )

### Topology mutation (m

t### )

### Topology crossover (x )

### Gradient Operators

### Gradient-based search operators work in the weight space and

### consist in a single gradient-descent TDL learning scenario.

### How to create a competitive learning environment?

### self-play scenario (s)

### population opponent (p)

### archival opponent (a)

### Guiding the Search Process

### Interactions between candidate solutions is the only source of

### information that guides the search process.

### 18

*2 Coevolution*

**Algorithm 1 Basic scheme of a generational evolutionary algorithm**

*P ← createRandomPopulation()*

*evaluatePopulation(P)*

**while ¬terminationCondition() do**

**while ¬terminationCondition() do**

*S ← selectParents(P)*

*P ← recombineAndMutate(S)*

*evaluatePopulation(P)*

**end while**

**return getFittestIndividual(P)**

**return getFittestIndividual(P)**

### The family of EA is composed of a few methods that differ slightly in technical

### de-tails, but all match the basic scheme presented in Algorithm 1. The most important

*difference between these methods concerns so called representation which defines a*

*mapping from phenotypes onto a set of genotypes and specifies what data structures*

### are employed in this encoding. Phenotypes are objects forming solutions to the

*original problem, i.e., points of the problem space of possible solutions. Genotypes,*

*on the other hand, are used to denote points in the evolutionary search space which*

### are subject to genetic operations. The process of genotype-phenotype decoding is

*intended to model natural phenomenon of embryogenesis. More detailed description*

### of these terms can be found in [Weise 09].

### Returning to different dialects of EA, candidate solutions are represented

### typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)

### [Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite

### state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in

### Genetic Programming (GP) [Koza 92]. A certain representation might be preferable

### if it makes encoding solutions to a given problem more natural. Obviously, genetic

### operations of recombination and mutation must be adapted to chosen

### representa-tion. For example, crossover in GP is usually based on exchanging subtrees between

### combined individuals.

### The most significant advantage of EA lies in their flexibility and adaptability to

### the given task. This may be explained by their metaheuristic character of “black

### box” that makes only few assumptions about the underlying objective function which

### is the subject of optimization. EA are claimed to be robust problem solvers showing

### roughly good performance over a wide range of problems, as reported by Goldberg

### [Goldberg 89]. Especially the combination of EA with problem-specific heuristics

### including local-search based techniques, often make possible highly efficient

### opti-mization algorithms for many areas of application. Such hybridization of EA is

### getting popular due to their capabilities in handling real-world problems involving

### noisy environment, imprecision or uncertainty. The latest state-of-the-art

### method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].

### 18

*2 Coevolution*

**Algorithm 1 Basic scheme of a generational evolutionary algorithm**

*P ← createRandomPopulation()*

*evaluatePopulation(A)*

**while ¬terminationCondition() do**

**while ¬terminationCondition() do**

*S ← selectParents(P)*

*P ← recombineAndMutate(S)*

*evaluatePopulation(P)*

**end while**

**return getFittestIndividual(P)**

**return getFittestIndividual(P)**

### The family of EA is composed of a few methods that differ slightly in technical

### de-tails, but all match the basic scheme presented in Algorithm 1. The most important

*difference between these methods concerns so called representation which defines a*

*mapping from phenotypes onto a set of genotypes and specifies what data structures*

### are employed in this encoding. Phenotypes are objects forming solutions to the

*original problem, i.e., points of the problem space of possible solutions. Genotypes,*

*on the other hand, are used to denote points in the evolutionary search space which*

### are subject to genetic operations. The process of genotype-phenotype decoding is

*intended to model natural phenomenon of embryogenesis. More detailed description*

### of these terms can be found in [Weise 09].

### Returning to different dialects of EA, candidate solutions are represented

### typi-cally by strings over a finite (usually binary) alphabet in Genetic Algorithms (GA)

### [Holland 62], real-valued vectors in Evolution Strategies (ES) [Rechenberg 73], finite

### state machines in classical Evolutionary Programming (EP) [Fogel 95] and trees in

### Genetic Programming (GP) [Koza 92]. A certain representation might be preferable

### if it makes encoding solutions to a given problem more natural. Obviously, genetic

### operations of recombination and mutation must be adapted to chosen

### representa-tion. For example, crossover in GP is usually based on exchanging subtrees between

### combined individuals.

### The most significant advantage of EA lies in their flexibility and adaptability to

### the given task. This may be explained by their metaheuristic character of “black

### box” that makes only few assumptions about the underlying objective function which

### is the subject of optimization. EA are claimed to be robust problem solvers showing

### roughly good performance over a wide range of problems, as reported by Goldberg

### [Goldberg 89]. Especially the combination of EA with problem-specific heuristics

### including local-search based techniques, often make possible highly efficient

### opti-mization algorithms for many areas of application. Such hybridization of EA is

### getting popular due to their capabilities in handling real-world problems involving

### noisy environment, imprecision or uncertainty. The latest state-of-the-art

### method-ologies in Hybrid Evolutionary Algorithms are reviewed in [Grosan 07].

1. Play round robin tournament between population members

2. Randomly select archival individuals to act as opponents

3. Select the best-of-generation individual and add it to the archive

### Search operators use different types of interaction feedback.

### Outline

1

### Starting Point – Previous Research

### Coevolution and Reinforcement Learning

### Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

### Topology and Weight Evolving ANNs

### N-tuple Networks

3

### Coevolutionary Gradient Algorithms

4

### Experimental Results

5

### Summary

### Learning 7 x 4 N-tuple Networks

0 0.1 0.2 0.3 0.4 0.5 0.6 0 200 400 600 800 1000 1200 1400 1600 1800 2000average percentage score

games played (x 1,000) CTDL-sxmw + HoF CTDL-sxmw TDL CEL + HoF CEL

### Learning 9 x 5 N-tuple Networks

0 0.1 0.2 0.3 0.4 0.5 0.6 0 200 400 600 800 1000 1200 1400 1600 1800 2000average percentage score

games played (x 1,000) TDL CTDL-sxmw + HoF CTDL-sxmw CEL + HoF CEL

### Learning 12 x 6 N-tuple Networks

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 200 400 600 800 1000 1200 1400 1600 1800 2000average percentage score

games played (x 1,000) ETDL-sxmt CTDL-sxmt + HoF TDL CTDL-sxmw CEL

### Relative Performance of Self-play Methods

2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 0 400 800 1200 1600 2000 points in tournaments games played (x 1,000) TDL PTDL CTDL-s CTDL-sx CTDL-sxmt CTDL-sxmt + HOF### Relative Performance of Mutual-play Methods

2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 0 400 800 1200 1600 2000 points in tournaments games played (x 1,000) CTDL-p CTDL-px CTDL-pxmt CTDL-ax + HoF CTDL-asxmt + HoF### Relative Performance of All Methods

2000 3000 4000 5000 6000 7000 8000 9000 10000 0 400 800 1200 1600 2000 points in tournaments games played (x 1,000) CTDL-px ETDL-sxmt CTDL-sxmt CTDL-sxmt + HoF CTDL-asxmt + HoF### Evolutionary Player in the Othello League

### Outline

1

### Starting Point – Previous Research

### Coevolution and Reinforcement Learning

### Coevolutionary Temporal Difference Learning

2

_{Flexible Learner Architecture}

### Topology and Weight Evolving ANNs

### N-tuple Networks

3

### Coevolutionary Gradient Algorithms

4

### Experimental Results

5