Artificial neural network

(1)

Artificial neural network

(2)

Genesis of ANN

Neural network (artificial neural network) - the common name for mathematical structures and their software or hardware models, performing calculations or processing of signals through the rows of elements, called artificial neurons, performing a basic operation of your entrance. The original structure was inspired by the natural structure of neurons and neural systems, particularly the brain.

The neural network is a type of computer system architecture. It consists of data processing by neurons arranged in layers. The corresponding results are obtained through the learning process, which involves modifying the weights of those neurons that are responsible for the error.

The Definition of ANN

(3)

Where are neural networks being used?

• Signal processing: suppress line noise, with adaptive echo canceling, blind source separation

• Control: backing up a truck: cab position, rear position, and match with the dock get converted to steering instructions. Manufacturing plants for controlling automated machines.

• Siemens successfully uses neural networks for process automation in basic industries, e.g., in rolling mill control more than 100 neural networks do their job, 24 hours a day

• Robotics - navigation, vision recognition

• Pattern recognition, i.e. recognizing handwritten characters, e.g. the current version of Apple's Newton uses a neural net

• Medicine, storing medical records based on case information

• Speech production: reading text aloud (NETtalk)

• Vision: face recognition , edge detection, visual search engines

• Business, rules for mortgage decisions are extracted from past decisions made by experienced evaluators, resulting in a network that has a high level of agreement with human experts.

• Financial Applications: time series analysis, stock market prediction

• Data Compression: speech signal, image, e.g. faces

• Game Playing: chess, go, ...

(4)

The history of ANN

• 1943 - McCulloch and Pitts introduced the first neural network computing model.

• 1950's - Rosenblatt's work resulted in a two-layer network, the perceptron, which was capable of learning certain classifications by adjusting connection weights. Although the perceptron was successful in classifying certain patterns, it had a number of limitations. The perceptron was not able to solve the classic XOR (exclusive or) problem. Such limitations led to the decline of the field of neural networks. However, the perceptron had laid foundations for later work in neural computing.

• early 1980's -researchers showed renewed interest in neural

networks. Recent work includes Boltzmann machines, Hopfield

nets, competitive learning models, multilayer networks, and

adaptive resonance theory models.

(5)

Neural networks versus conventional computers

• Neural networks take a different approach to problem solving than that of conventional computers. Conventional computers use an algorithmic approach i.e. the computer follows a set of instructions in order to solve a problem.

• Computer can solve only the problem for which the specific steps that computer needs to follow are known.

• Neural networks process information in a similar way the human brain does.

Neural networks learn by example. They cannot be programmed to perform a specific task. The examples must be selected carefully otherwise useful time is wasted or even worse the network might be functioning incorrectly. The disadvantage is that because the network finds out how to solve the problem by itself, its operation can be unpredictable.

• Conventional computers use a cognitive approach to problem solving; the way the problem is to solved must be known, then converted to a high level language program and into machine code that the computer can understand.

These machines are totally predictable; if anything goes wrong is due to a software or hardware fault.

• Neural networks do not perform miracles. But if used sensibly they can produce some amazing results.

(6)

Neural networks in medicine

• Artificial Neural Networks (ANN) are currently a 'hot' research area in medicine and it is believed that they will receive extensive application to biomedical systems in the next few years. At the moment, the research is mostly on modelling parts of the human body and recognising diseases from various scans (e.g. cardiograms, CAT scans, ultrasonic scans, etc.).

• Neural networks are ideal in recognising diseases using scans since there is no need to provide a specific algorithm on how to identify the disease. Neural networks learn by example so the details of how to recognise the disease are not needed. What is needed is a set of examples that are representative of all the variations of the disease. The quantity of examples is not as important as the 'quantity'. The examples need to be selected very carefully if the system is to perform reliably and efficiently.

(7)

Biologically Inspired

• Electro-chemical signals

• Threshold output firing

Axon

Terminal Branches of Axon

Dendrites

(8)

The Perceptron

• Binary classifier functions

• Threshold activation function

Axon

Terminal Branches of Axon

Dendrites

S

x1

x2

w1 w2

wn xn

x3 w3

(9)

The Perceptron: Threshold Activation Function

• Binary classifier functions

• Threshold activation function

Step Threshold

(10)

Linear Activation functions

• Output is scaled sum of inputs

n N

n

x w u

y 



1

Linear

(11)

Nonlinear Activation Functions

• Sigmoid Neuron unit function

hid u

u e

y

_

  1 ) 1

(

Sigmoid

(12)

• The ability to learn is a fundamental trait of intelligence.

• Although a precise definition of learning is difficult to formulate, a learning process in the ANN context can be viewed as the problem of updating network architecture and connection weights so that a network can efficiently perform a specific task.

• The network usually must learn the connection weights from available training patterns.

• Performance is improved over time by iteratively updating the weights in the network.

• ANNs' ability to automatically learn from examples makes them attractive and exciting.

• Instead of following a set of rules specified by human experts, ANNs

appear to learn underlying rules (like input-output relationships)

from the given collection of representative examples. This is one of

the major advantages of neural networks over traditional expert

systems.

(13)

(14)

(15)

(16)

(17)

Learning – what it means exactly ?

• Learning is essential to most of neural network architectures.

• Choice of a learning algorithm is a central issue in network development.

• What is really meant by saying that a processing element learns?

Learning implies that a processing unit is capable of changing its input/output behavior as a result of changes in the environment.

Since the activation rule is usually fixed when the network is constructed and since the input/output vector cannot be changed, to change the input/output behavior the weights corresponding to that input vector need to be adjusted. A method is thus needed by which, at least during a training stage, weights can be modified in response to the input/output process.

• In a neural network, learning can be supervised, in which the

network is provided with the correct answer for the output during

training, or unsupervised, in which no external teacher is present.

(18)

At learning process…

• At each training step the network computes the direction in which each bias and link value can be changed to calculate a more correct output.

• The rate of improvement at that solution state is also known.

A learning rate is user-designated in order to determine how much the link weights and node biases can be modified based on the change direction and change rate.

• The higher the learning rate (max. of 1.0) the faster the network is trained.

• However, the network has a better chance of being trained to

a local minimum solution. A local minimum is a point at which

the network stabilizes on a solution which is not the most

optimal global solution.

(19)

learning rules

There are four basic types of learning rules:

• error correction,

• Boltzmann,

• Hebbian,

• and competitive learning.

(20)

parameters for quality the prediction

• Hidden layers: Both the number of hidden layers and the number of nodes in each hidden layer can influence the quality of the results. For example, too few layers and/or nodes may not be adequate to sufficiently learn and too many may result in overtraining the network.

• Number of cycles: A cycle is where a training example is presented and the weights are adjusted.

• The number of examples that get presented to the neural network during the learning process can be set. The number of cycles should be set to ensure that the neural network does not overtrain. The number of cycles is often referred to as the number of epochs.

• Learning rate: Prior to building a neural network, the learning rate should be set and this influences how fast the neural network learns.

(21)

Neural Network topologies

• In the previous section we discussed the properties of the basic processing unit in an artificial neural network. This section focuses on the pattern of connections between the units and the propagation of data. As for this pattern of connections, the main distinction we can make is between:

• Feed-forward neural networks, where the data flow from input to output units is strictly feedforward. The data processing can extend over multiple (layers of) units, but no feedback connections are present, that is, connections extending from outputs of units to inputs of units in the same layer or previous layers.

• Recurrent neural networks that do contain feedback connections. Contrary to feed-forward networks, the dynamical properties of the network are important.

In some cases, the activation values of the units undergo a relaxation process such that the neural network will evolve to a stable state in which these activations do not change anymore. In other applications, the change of the activation values of the output neurons are significant, such that the dynamical behaviour constitutes the output of the neural network (Pearlmutter, 1990).

• Classical examples of feed-forward neural networks are the Perceptron and Adaline. Examples of recurrent networks have been presented by Anderson (Anderson, 1977), Kohonen (Kohonen, 1977), and Hopfield (Hopfield, 1982) .

(22)

(23)

• Volume: 1400 cm³

• Area: 2000cm²

• Weight: 1,5 kg

• Covering the hemispheres of the cerebral cortex contains neurons: 10¹⁰

• The number of connections between cells:

10 ¹⁵

• The cells send and receive signals, the speed of operation= 10¹⁸ operations / sec

• The neural network is a simplified model of the brain!

•Fault-tolerant;

FLEXIBLE - easily adapts to changing environment;

TEACHES THE - NOT must be programmed;

Can deal with the Information fuzzy, random, noisy or inconsistent;

The PARALLEL HIGH DEGREE;

SMALL, very low power consumption.

(24)

Neurons and Synapses

The basic computational unit in the nervous system is the nerve cell, or

neuron. A neuron has:

1. Dendrites (inputs) 2. Cell body

3. Axon (output)

A neuron receives input from other neurons (typically many thousands).

Inputs sum (approximately). Once input exceeds a critical level, the neuron discharges a spike - an electrical pulse that travels from the body, down the axon, to the next neuron(s) (or other receptors). This spiking event is also called depolarization, and is followed by a refractory period, during which the neuron is unable to fire.

The axon endings (Output Zone) almost touch the dendrites or cell body of the next neuron. Transmission of an electrical signal from one neuron to the next is effected by neurotransmittors, chemicals which are released from the first neuron and which bind to receptors in the second. This link is called a synapse. The extent to which the signal from one neuron is passed on to the next depends on many factors, e.g. the amount of neurotransmittor available, the number and arrangement of receptors, amount of neurotransmittor reabsorbed, etc.

(25)

A Simple Artificial Neuron

Basic computational element (model neuron) is often called a node or unit.

It receives input from some other units, or perhaps from an external source.

Each input has an associated weight w, which can be modified so as to model synaptic learning. The unit computes some function f of the weighted sum of its inputs.

Its output, in turn, can serve as input to other units.

• The weighted sum is called the net input to unit i, often written net

_i

. Note that w

_ij

refers to the weight from unit j to unit i (not the other way around).

• The function f is the unit's activation function.

• In the simplest case, f is the identity function, and the unit's output is

just its net input. This is called a linear unit.

(26)

Features of intelligent system

The ability of learning from examples and generalize knowledge acquired to solve problems posed in a new context:

• Ability to create rules (associations), binding together the separate elements of the system (object)

• The ability to recognize objects (images features) on the basis of incomplete information.

Data classification is one of the main tasks performed using neural networks.

What it is about ?

The purpose of classification is to associate an object based on its characteristics of a certain category.

Data classification

(27)

Where we use the ANN?

• NO:

for calculations, multiplication tables, for word processing, etc. applications where you can easily use the well-known algorithm.

YES:

where the algorithm procedure is very difficult to achieve, where data are incomplete or inaccurate, where the course of the test is non-linear phenomena, etc. Where there is a lot of data, but some results do not yet know the methods of operation.

(28)

Artificial Neuron schema:

The inputs are fed signals from the input layer neurons in the network or the previous one. Each signal is multiplied by the corresponding numerical value called a weight. It affects the perception of the input signal and its part in creating the output neuron.

Weight can be invigorating - Delay positive or - negative;

if there is no connection between neurons is the weight is zero. Summed products of signals and weights are the neuron activation function argument.

(29)

A simplified model of a neuron showing expressed its

similarity to the natural model

(30)

Formula that describe the neuron working

) (s f

y  



ⁿ

i

i i

w x s

0 Where

The principle aim is to approximate a given function (in other words: learn the desired function by observing examples of its operation).

Approximation function

(31)

Number of layers

zero one More than one

(32)

Prediction

Input: X₁ X₂ X₃ Output: Y Model: Y = f(X₁ X₂ X₃)

0.5

0.6 -0.1 0.1 -0.2

0.7

0.1 -0.2

X₁ =1 X₂=-1 X₃ =2

0.2 f (0.2) = 0.55

0.55

0.9

f (0.9) = 0.71 0.71

-0.087

f (-0.087) = 0.478 0.478

0.2 = 0.5 * 1 –0.1*(-1) –0.2 * 2

Prediction Y = 0.478

If Y = 2

Then predition error = (2-0.478)

=1.522

f(x) = e^x / (1 + e^x)

f(0.2) = e^0.2 / (1 + e^0.2) = 0.55

(33)

Backpropagation

• One of the most popular techniques in

learning processes for ANN.

(34)

1. Choose randomly one of the observation

2. Go through the appropriate procedures

to determine the output value

3. Compare the desired value with

the actually obtained in the

network

4. Adjust the weight by calculating the error

Learning process

(35)

How to calculate the prediction error ?

where:

•Error_i is the error from the i-th node,

•Output_i is the value predicted by a network,

•Actual_i is the real value (which the network should learn).

(36)

Change the weights

L- is so called learning network ratio. (usually values are from [0,1]). The less is the value of this coefficient the slower the learning process is.

Often this ratio is set to the highest value initially, and then is

reduced by re-weighting network.

(37)

(38)

Example

(39)

Change the weights

L- is so called learning network ratio. (usually values are from [0,1]). The less is the value of this coefficient the slower the learning process is.

Often this ratio is set to the highest value initially, and then is

reduced by re-weighting network.

(40)

Example „step by step”

• 1 hidden layer: D, E

• Input layer: A, B, C

• Output layer: F

(41)

Randomly choosing one observation

(42)

Randomly choosing one observation

(43)

(44)

(45)

(46)

(47)

(48)

(49)

(50)

etc.…

(51)

For better understanding…

the backpropagation learning algorithm can be divided into two phases:

propagation and weight update.

Phase 1: Propagation which involves the following steps:

• Forward propagation of a training pattern's input through the neural network in order to generate the propagation's output activations.

• Backward propagation of the propagation's output activations through the neural network using the training pattern's target in order to generate the deltas of all output and hidden neurons.

Phase 2: Weight update For each weight-synapse:

• Multiply its output delta and input activation to get the gradient of the weight.

• Bring the weight in the opposite direction of the gradient by subtracting a ratio of it from the weight.

• This ratio influences the speed and quality of learning; it is called the learning rate. The sign of the gradient of a weight indicates where the error is increasing, this is why the weight must be updated in the opposite direction.

• Repeat the phase 1 and 2 until the performance of the network is good enough.

(52)

The size of ANN ?

• Big NN: few thousands of neurons, or even more.

• The number of neurons should depends on the type of the task of network.

• The power of the network depends on the

number of the neurons, the density of the

connections between neurons, and on the

proper choosing the values of weights.

(53)

How many hidden layers it should be?

• The number of the hidden layers is usually not higher than 2. In the hidden layers there is the fusion of the network signals.

• Input layer is usually responsible only for the initial preparation of input data.

• The output layer is responsible for aggregating

the final beats of the hidden layers of neurons,

and the presentation of the final result of the

network at the outputs of the neurons, which are

the outputs at the same time across the network.

(54)

Advantages of ANN

1. They can work fine in case of incomplete information

2. They do not require knowledge of the algorithm solving the problem (automatic learning)

3. Process information in a highly parallel way

4. They can generalize (generalize to cases unknown) 5. They are resistant to partial damage

6. They can perform associative memory (associative - like working

memory in humans) as opposed to addressable memory (typical

for classical computers)

(55)

• Advantages:

• A neural network can perform tasks that a linear program can not.

• When an element of the neural network fails, it can continue without any problem by their parallel nature.

• A neural network learns and does not need to be reprogrammed.

• It can be implemented in any application.

• It can be implemented without any problem.

Disadvantages: The neural network needs training to operate.

• • The architecture of a neural network is different from the architecture of microprocessors therefore needs to be emulated.

• Requires high processing time for large neural networks.

(56)

Advantages / disadvantages

Neural networks have a number of advantages:

• Linear and nonlinear models: Complex linear and nonlinear relationships can be derived using neural networks.

• Flexible input/output: Neural networks can operate using one or more descriptors and/or response variables. They can also be used with categorical and continuous data.

• Noise: Neural networks are less sensitive to noise than statistical regression models.

The major drawbacks with neural networks are:

• Black box: It is not possible to explain how the results were calculated in any meaningful way.

• Optimizing parameters: There are many parameters to be set in a

neural network and optimizing the network can be challenging,

especially to avoid overtraining.

(57)

How many hidden layers we need ?

• The number of hidden layers usually counts 2.

The user should decide how many hidden layers and how many neurons in each of them will be.

• Input layer usually is the same as the number of input data (number of conditional attributes in data set).

• The number of neurons in the output layer

depends on the type of the classication problem

(regresion, classification to some categories).

(58)

• The more neurons in hidden layer the higher the memory occupation needed from NN.

• More neurons can makes the process of classification overtrained and can make it too good for training set but too bad for new – unknown data.

• If You noticed overtraing in your neural

network You should consider the decreasing

of the number of neurons.

(59)

• Regresiion type of classification

area Garage age Hitting Where it is floor

…. Market price

• Classification into categories

incomes insurance age

Marital status employment

…. Get credit or

not ?

As output there will be estimated market price

As output there will be a decision about giving the credit or not 317 $

Yes

(60)

Categorical data is a problem…unless…

• continent: {Asia, Europe, America}

3 neurons are necessary:

• Asia 1 0 0

• Europe 0 1 0

• America 0 0 1

• One variable „continent” create 3 neurons !!!

• It would be better for such cases to consider the merging some calues into smaller number of categories.

• Usually the number of weights should be 10 times smaller than the

number of cases in training data set.

(61)

(62)

• The STATISTICA line of software provides a comprehensive and integrated set of tools and solutions for:

• Data analysis and reporting, data mining and predictive modeling, business intelligence, simple and multivariate QC, process monitoring, analytic optimization, simulation, and for applying a large number of statistical and other analytic techniques to address routine and advanced data analysis needs

• Data visualization, graphical data analysis, visual data mining, visual querying, and simple and advanced scientific and business graphing; in fact, STATISTICA has been acknowledged as the “king of data visualization software”

(by the editors of " PC Graphics & Video")

(63)

Install Statistica 10 EN

• http://usnet.us.edu.pl/files/statsoft/STATISTIC

A_EN_10_0.zip

(64)

Neural networks in Statistica

• Classification analysis (creditRisk.sta)

• Regression analysis (cycling.sta)

(65)

Classification for creditRisk.sta

(66)

(67)

(68)

(69)

Custom neural network CNN

(70)

(71)

(72)

(73)

Increasing neurons from 11 to 20

(74)

(75)

(76)

Automated network search ANS

(77)

(78)

(79)

(80)

Regression for cycling.sta

(81)

Choosing the variables

(82)

(83)