28/01/2020 28/01/2020

(1)

(2)

2 Some plots from

(3)

Basic terminology

(4)

4

(5)

Prediction: Least squares

(6)

Prediction: Least squares

6

(7)

Prediction: nearest neighbor classifier

(8)

Prediction: nearest neighbor classifier

8

(9)

Perfect classification?

(10)

Comparison

10

(11)

Bias-variance tradeoff

(12)

Where are the neural networks?

12

(13)

Neural networks

(14)

How do NNs work?

14

(15)

How do NNs learn?

(16)

How do NNs learn?

16

(17)

Deep-learning Neural Network

TensorFlow^TM

MNIST example

Scientific application:

Higgs CP measurement at LHC

(18)

Since 2010 new era in Machine Learning:

rapidly increasing areas of applications

18

Neural network

(19)

Since 2010 new era: rapidly increasing areas of applications

Neural network

(20)

Deep-Learning tutorial @ udacity

20

https://www.udacity.com/course/

deep-learning--ud730

(21)

Supervised Classifications

(22)

Supervised Classifications

22

(23)

Classifications for Detection

(24)

Classifications for Ranking

24

(25)

Logistic classifier: Linear model

(26)

Softmax

26

(27)

„One hot” encoding

(28)

„One hot” encoding

28

(29)

Optimisation: Cross-Entropy

(30)

Multinomial logistic classification

30

(31)

Optimisation of average loss

(32)

Gradient decent

32

(33)

Normalised input and output

(34)

Normalised input and output

34

(35)

Initialisation

(36)

Training, validation, testing

36

(37)

Gradient Descent

(38)

Stochastic Gradient Descent

38

(39)

SDG: optimising with momentum

(40)

SDG: learning rate

40

(41)

SDG: „black magic”

(42)

Input – linear - output

42

(43)

Linear models are linear

(44)

Linear models are stable

44

(45)

• This is still linear

• Lets introduce non-linearity

Linear models are here to stay

(46)

RELU: Rectified Linear Unit

46

(47)

Networks of RELU

(48)

The Chain Rule

48

(49)

Back - propagation

(50)

Optimisation tricks

50

(51)

Optimisation trick: dropout

(52)

Deep networks

52

(53)

Deep networks

(54)

tensorflow.org/paper/whitepaper2015.pdf

54

(55)

(56)

56

(57)

Hand-written diggits: MNIST

(58)

Simple linear model

58

Slides from M. Gorner tutorial

http://www.youtube.com/watch?v=vq2nnJ4g6NO

(59)

TensorFlow full python code

(60)

Simple linear model

60

Slides from M. Gorner@youtube

(61)

Multi-layer connected network

(62)

Multi-layer connected network

62

Slides from M. Gorner@youtube

(63)

• \

All tricks count

But noisy accuracy Use RELU

Exponentialy reduce learning rates Add drop-out

(64)

Can do better with conv network

64

(65)

References

http://www.deeplearning.book.org

http://download.tensorflow.org/paper/whitepaper2015.pdf

https://www.tensorflow.org/

http://www.youtube.com/watch?v=vq2nnJ4g6NO

https://www.udacity.com/course/deep-learning--ud730

(66)

• Multi-particle final state (cascade decays): 4 vectors

• CP information in correlations between decay planes and angles

• Physics intuition allowed to 1D optimal

observables, but is there more to be explored.

TensorFlow application

66

(67)

Defining the problem

(68)

• 6 hiden layers, each 300 nodes, with RELU activation function

• Sigmoid function on last layer

• Metric: negative log-likelihood of the rue targets under Bernoulli distribution

• Optimisation: SDG Adam algorithm

• Optimisation: batch normalisation, dropout

• Final score: weighted AUC

Defining NN model

68

(69)

Defining NN model

(70)

• NN can capture „optimal variables” but can do better given simple of 4-vectors.

• Given „simple” and „higher level” features can still improve in more complicated case.

• Will try now on the experimental data.

Results

70