Remarks on Bayesian Networks and Their Applications

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S

F O L IA O E C O N O M IC A 194, 2005

T o m a s z K o z d r aj *

REM ARKS O N BAYESIAN NETW ORKS A N D T H EIR A PPL IC A T IO N S

Abstract

B ayesian n e tw o rk s a re d irec te d acyclic g ra p h s th a t re p re se n t d e p en d en cies betw een variables in a p ro b a b ilistic m odel. T h ey are becom ing an increasingly im p o rta n t a rea Гог research a n d a p p licatio n s in the entire field o f A rtificial Intelligence. T h is p a p e r explores the n a tu re o f im p lica tio n s fo r Bayesian n etw o rk s beginning w ith a n overview an d c o m p ariso n o f inferential statistics an d B ayes’ T heorem . I t p re sen ts the possibilities o f ap p lic atio n s o f B ayesian n e tw o rk s in a field o f econom ic p ro b lem s and also focuses on the p ro b lem o f learning.

Key words: B ayesian netw o rk s, p ro b ab ilistic netw o rk s, lea rn in g B ayesian netw orks.

I. IN T R O D U C T IO N

B ayesian n etw orks are becom ing m ore and m o re p o p u la r in the field

o f research an d applications o f artificial intelligence. T hey play a significant

role in decision processes and knowledge representation in expert and decision

su p p o rt systems.

C onsidering expert system or decision su p p o rt system w ith reference to

th eir stru ctu re (Fig. 1) one can notice th a t the role o f co ncluding procedures

and know ledge base, where Bayesian netw orks can be applied, is extrem ely

im p o rtan t.

R etu rn in g to B ayesian m eth o d s particularly to B ayesian n etw o rks it is

com m on know ledge th a t they are classified to non-classical statistical m ethods,

because th e inference is based no t only o n a sam ple b u t also takes advantage

o f in fo rm atio n outside the sam ple. T h e in fo rm atio n outside the sam ple is

(2)

called prior in fo rm atio n and presents the m o st con trov ersial p o in t in the

B ayesian theory.

T h e basis o f the whole Bayesian statistics theory is conditional probability

theorem published by T h o m as Bayes in the year 1763.

Figure 1. T h e m ain elem ents o f an ex p ert system S ource: M u la w k a J.J. (1996), S y stem y ekspertow e, W N T , W arszaw a

T h e m ath em atical n o tatio n o f this theorem is the follow ing (for discrete

ra n d o m variable):

P W B ) = 1 M

W

_

(1)

E e i B . A j P t A j )

J= 1

w here events A lt A 2, A

N are independent events called hypotheses and

P ^ i ) , Р (Л 2), ...,Р (Л л ) are prior probabilities or subjective probabilities. T he

p ro b ab ility o f event A ; conditional on the occurrence o f event В (for 1) is

kn ow n as posterior probability.

It should be noticed th a t m ost o f the probabilities in eq u atio n (1) are

co nditional probabilities. T hey express the tru st fo r som e pro p o sals based

on the assu m p tio n th a t o th er proposals are true.

(3)

II. G E N E R A L N E T W O R K S T R U C T U R E A N D IN F E R E N C E

T he conception o f Bayesian networks is directly connected with conditional

probability theory. It is easy to notice th a t in the real w orld there are

m any situ atio n s in which occurrence o f one event is strictly d epen den t on

the occurrence o f a n o th e r event.

A pplying Bayesian netw orks allows m ore precise m odeling o f u n certain

ty and to predict the possibility o f occurrence o f som e situ atio n s thro u g h

using ad d itio n al inform ation. Since the know ledge ab o u t considered p ro b

lem has probab ilistic ch a racter and the m eth o d s are based on probabilistic

concepts it is used to call Bayesian netw orks as p ro b a b ility 1 netw orks or

belief netw orks.

F o rm ally one can say th a t Bayesian netw orks are graph ical structures

which represent dependencies between variables. T hey are directed acyclic

graphs w hich encode the stru ctu re o f system, its u ncertain ty and com p

rehension. T his sort o f graphs are com posed o f nodes and edges, where

nodes co rresp o n d to all random variables.

T herefo re, one can say ab o u t node

co rrespo nd in g to ra n d o m variable

X t fo r i = 1 ,2 ,..., n. T h e edge (path ) directed from n o d e X , to node X j

can be intuitively interpreted as representation o f direct dependence o f variable

X j from variable X t.

T h e occurrence o f such edge is usually symbolically denoted by expression

X t — Xj . F o r all nodes we can introduce afterm ath relation denoted as*

— th a t no d e X j is successor o f X t o r n o d e X, is predecessor o f node X j*

in Bayesian netw ork, w hat can be described as

if one o f the

follow ing co nditions com es tru e :

- there is a directed p ath from node X t to node X j , th a t is X t —■ X/ ,*

- there is a directed p ath from node X t to som e n o d e X k an d node

X j is successor o f X k, th a t is X i —> X k and X k — X j .*

A ccording to this definition node X j is a successor o f X t, if there exists

a p a th m ad e up o f directed edges from node X t to no d e X j . If X t —■ Xj ,*

it m eans th a t X t is a direct predecessor (parent) o f node X j o r node X j

is a direct successor (child) o f node X ľ

T h e in tro d u ctio n o f afterm ath relation is im p o rta n t because it m akes

possible to define m o re precisely and form ally the way o f sem antics of

B ayesian N etw ork, th a t is the in terp retatio n o f all edges. T h e netw ork is

in terpreted as th e assertion o f conditional independence o f each node from

all nodes which aren ’t its successors and w ith given values o f its predecessors.

T h e m a in idea o f this ap p ro ach is decom position o f the system to sim pler

parts, showing its m odularity (graph theory) and assuring cohesion (probability

theory).

(4)

T h e usefulness o f Bayesian netw orks w ith correct stru ctu re consists in

the ability to represent in an efficient way th e jo in t p ro b ab ility d istrib ution

for all ra n d o m variables o f the m odel.

I f wc d en o te by sym bol Ux the set (dom ain) o f nodes which are parents

o f node X then the effective rule to com pute the joint probability distribution

requires defining conditional probability distribution P ( X ,|I / X|) for each node

X t, th a t m eans the probability o f variable X t for the sake o f possible

outco m es o f its parents. O bviously, for a node which does n o t have parents

the conditio n al d istrib u tio n is equal to m arginal d istrib u tio n P (X (). T he

basic assu m p tio n in g ra p h -b ased m odels is th e assu m p tio n th a t jo in t

prob ab ility d istrib u tio n P(X) is equal to the p ro d u c t o f m arginal and

conditional distributions for all random variables. Therefore such a distribution

can be d eno ted in the follow ing way (chain rule):

It should be easy to notice th a t know ledge o f the jo in t p ro b ab ility m akes

possible the inference ab o u t values o f any chosen variables when values o f

o th er variables are know n. If thus it is possible to represent th e jo in t

probab ility d istrib u tio n o f variables using Bayesian netw orks, it is also

theoretically possible to use it for probabilistic inference to get the answer

for any q uestion o f interest assum ing th a t the stru ctu re (topology) is correct.

F o r this reason one can distinguish tw o types o f inference (M urph y,

2001), N iederm aycr (1998)). T h e first one going th ro u g h from effect to

cause and called b ottom up inference and the second one from cause to

effect, th a t is to p dow n inference. In som e cases wc use approxim ated

inference (Settim i, Sm ith, G argoum , 1999).

III. T H E T Y P E S O F B A Y ESIA N N E T W O R K S A N D L E A R N IN G P R O C E D U R E S

G enerally Bayesian netw orks can be divided into tw o groups i.e. dynam ic

Bayesian n etw orks o r static Bayesian netw orks. D ynam ic B ayesian netw orks

are used in tim e series m odeling, for exam ple in signal recognition processes.

In this ease series and netw ork are usually represented by first-order M arkov

process (G h a b ram a n i, 1997).

If Y l

t

Y T are ran d o m variables representing tim e series o f first-order

M ark o v process then the jo in t probability d istrib u tio n will be equal to:

N

(2)

(5)

Figure 2. A Bayesian N etw o rk rep resen tin g a first-o rd er M a rk o v process

These m odels d o no t directly represent dependencies between observations

over m ore th a n one tim e step, th erefore it is co m m o n to allow higher order

interactio n s betw een variables i.e. r ‘*-order M ark o v m odels.

A n o th e r w ay to extend M ark o v m odels is to posit th a t the observations

are depen d en t on hidden variables which wc can call the states and th a t

the sequence o f states is a M ark o v process. A classical m odel o f this kind

is the K a lm an filter.

Figure 3. A B ayesian netw ork specifying co n d itio n al indep en d en ce relatio n s fo r a K a lm a n filter m odel

U sing th e sh o rt n o ta tio n the jo in t probab ility d istrib u tio n for this case

for sequence from £ = 1 to £ = T is:

Р ( * „ У() = Р (Х 1) - Р ( У 1|Х 1) П Р ( Х (|Х ,_ 1) - Р ( У 1|Х [)

(4)

( = 2

T h e state tran sitio n probability P i-Y J-Y j-i) can be decom posed into d eter

m inistic and stochastic com ponent

X t = f , ( X t- i ) + ot

(5)

where / , is the determ inistic tran sitio n function, and o, is zero-m ean rand om

noise vector.

(6)

Sim ilarly, the o b servation probability P(Y (|X t), can be decom posed:

Yt = g,(X') + { t

(6)

T h ere are o th er ways o f representation com m only used in dynam ic Baye

sian netw ork s i.e. H id d en M ark o v M odels, F acto rial M ark o v M odels and

S w itching S ta te M odels (B ilm es (2 0 0 0 ),G h a b ra m a n i (1997), M u rp h y

(2002)).

S tatic B ayesian netw orks are usually used in m edical diagnoses o r can

be applied as decision su p p o rt tools in classification pro blem s (for exam ple

in com m u n icatio n insurances). T he structure o f static n etw o rk does not

differ from general scheme (acyclic graph) a p a rt from lack o f dynam ic

variables.

I f the n etw ork had for exam ple four ra n d o m variables X , Y, Z and

W (Fig. 4) the jo in t prob ab ility distribu tion w ould be as follows:

P (W, X , Y, Z ) = P(W) • P ( X ) ■

Р(У | W) ■

\ \ Z \ X , Y)

(7)

C ertainly, regardless o f the structure, the netw o rk can be subject to

learning process. Such a process can concern b o th p aram eters o f the

netw o rk or the stru ctu re, and can be associated w ith variable selection and

edge specification. T h ere are adequ ate algorithm s fo r p aram eter learning

w hich allow to o b tain the best estim ations (for exam ple gradient m ethods

o r m eth o d s based on m axim um likelihood function). G enerally p aram eter

learning is sim ply u p d atin g o f conditional p ro b ab ility tables for each node

o f the netw ork. H ow ever, m ore com plicated problem is learning the correct

stru ctu re. T h e stru ctu re learning procedures are based on searching between

all possible and acceptable netw orks o f interest to find one o r several

optim al netw orks.

(7)

In such cases the solu tio n is based on com plicated m ath em atical al

gorithm s based o n special m etrics e.g. K 2 m etric and in som e cases (for

exam ple w hen som e variables are hidden) the p ro p er so lu tio n h a sn ’t been

found so far.

T h e searching o f all possible netw orks can be lim ited by taking into

consideration the prior know ledge ab o u t the problem o f interest (expert

know ledge) o r by im posing additional conditions lim iting the stru ctu re o f

Bayesian netw ork. Usually the lim itation is connected with interactions order,

th a t is it concerns the m axim um num ber o f edges which can be directed

to one node.

IV . T H E P O S S IB IL IT IE S O F A P P L IC A T IO N . C H A N C E S A N D P R O B L E M S

As it was noticed the Bayesian netw ork encodes in a com pressed way

the jo in t p ro b ab ility distrib u tio n o f rand om variables and this kind o f

d istrib u tio n is sufficient fo r the inference. T he answ er to any q u estio n can

be o btained by co m puting the jo in t probability d istrib u tio n on the basis

o f the netw ork and using it for a p p ro p riate calculations.

U n fo rtu n ately , such an app ro ach m eans resignation from one o f the

best advantages which can be obtained by graphical rep resen tatio n o f the

jo in t p ro b ab ility distrib u tio n , lying in its efficiency. O f course, Bayesian

netw orks give o th e r advantages, p articularly legible an d intuitively com p

rehensible grap hical know ledge representation ab o u t direct causalities, but

effectiveness reasons m ak e it im possible to use this d istrib u tio n in practice,

exception for cases w ith few num ber o f variables.

T hus, there is a need for other inference algorithm s in Bayesian networks.

U n fo rtu n ately , in general case such a problem is N P -h ard . T his problem

is becom ing easier for a special type o f netw orks called single-connection

netw orks. In such netw orks a n t tw o nodes can be linked only w ith one

p ath (m axim um ) com posed o f freely directed edges. T here are know n effective

algorithm s o f ap p ro x im ate inference for this kind o f n etw orks based on

M o n te-C arlo m eth o d s e.g. logical o r weighted sam pling. T h erefo re, there

are som e practical lim itations o f use caused by relatively h a rd obtaining

o f efficiency.

T h e field in w hich B ayesian netw orks are dynam ically developed is

m edicine. T h e task o f the netw ork in such cases is usually to find the m ost

p ro b ab le cause o f ailm ent o f a patient. T herefore netw orks have to answer

the question: w hat does the patient suffer from if som e sym ptom s occur

(which c a n n o t be classified clearly-out). H ence we often d irect to Bayesian

netw orks in classification problem s.

(8)

A ltho ugh there is a big interest in Bayesian netw orks in m edicine, their

use in social an d econom ical field is n o t so p o p u lar. T h e actual problem

isn ’t only the m entioned problem o f efficiency but also the problem of

stru ctu re and learning, particularly when the econom ical variables have

dynam ic ch aracter.

L et us consider hypothetical netw ork with binary nodes, represented in

F ig u re 5. T h e aim is to estim ate the prob ab ility o f share (K G H M ) price

fall in fu tu re tim e period.

It is a problem o f classification, if the share should be classified to a falling

g ro u p (or n eutrally) o r to an increasing group. T h e n etw o rk was m ade

w ith use o f G ene Softw are developed by D ecision System s L ab o rato ry in

P ittsb u rg h U niversity.

Let X u

Y s respectively represent nodes o f the net. T h e m eaning and

possible states o f nodes is presented in T ab le 1.

T abic 1. L ist o f variables, sym bols and categories

V ariable Sym bol C ategories

E xchange ra te increase, fall

V olum e o f trad e X 2 increase, fall

S to c k E xchange index X , increase, fall

P o sitions in futures X * increase, fall

S hare price * 5 increase, fall

I f we d en o te category fall by 0 and category increase by 1 th en adequate

m arg inal and co nditional d istributions will be follow ing (for learning sam ple

o f 47 elem ents - weekly d a ta from 10.09.2001 to 09.09.2002):

(9)

Tabic 2. M arginal p ro b a b ility d istrib u tio n P (X , = Y) o f v ariab le X t

II о Y = 1

0.5116 0.4884

T ab ic 3. C o n d itio n a l p ro b a b ility d istrib u tio n P (X 2 = Y \ X t — Z ) o f v ariab le X 2

II О Y = 1

Z = 0 0.7619 0.2381

Z = 1 0.7273 0.2727

T abic 4. C o n d itio n al p ro b a b ility d istrib u tio n P (X

rt

_II >■_. * II N * II O₎ /-Ч o f v ariab le X 3 О II _{Y - 1} N II о Ю II о 0.3750 0.6250 Z = 0 Q = 1 0.4375 0.5625 Z = 1 Q = 0 0.4000 0.6000 Z = 1 Q = 1 0.5000 0.5000

T abic 5. C o n d itio n a l p ro b a b ility d istrib u tio n P (X 4 = У |Х 3 = Z ) o f v ariab le X A

О

II Y = 1

Z = 0 0.5417 0.4583

Z = 1 0.5263 0.4737

T abic 6. C o n d itio n al p ro b a b ility d istrib u tio n Р(.У5 = Y \ X A = Z , X 3 = Q, X 2 = R) o f variable X s II О _{У = 1} Z = 0 Q = 0 R = 0 0.5556 0.4444 Z = 1 e = o я = 0 0.5000 0.5000 Z = 0

_e

=

i

R = 0 0.2000 0.8000 Z = 0 e = o Я = 1 0.7500 0.2500 Z = 1

_{e = i}

Я = 0 0.5000 0.5000 Z = 0

_Q

=

1

Л = 1 0.6000 0.4000 Z = I Q = o Я = 1 0.0001 0.9999 Z = 1

_{Q = 1}

Л = 1 0.0001 0.9999

(10)

T h e rightness o f decision m aking was verified in tim e period o f 23

weeks. It tu rn ed o u t th a t in 48% o f cases the net were classified correctly

(in 26% o f cases it could no t m ak e a decision and in 26% it m ad e a wrong

one). O bviously, it c a n n o t be perceived as a good result. T h e reason for

this result could be connected with the stru ctu re (topology) o f the netw ork

which could be im proper, and with a selection o f variables.

V. S U M M A R Y A N D C O N C L U S IO N S

B ayesian n etw orks can be an effective tool for statistical inference in

co m p u tatio n al expert systems. H ow ever, there are som e b arriers connected

w ith the stru ctu re o f a netw ork, variables d y nam izatio n an d learning or

up d ate processes. T hese problem s are m athem atically com plicated and this

can explain the reason why the neural netw orks arc still m o re p o p u lar than

probabilistic netw orks. Especially the problem o f learning o f the structure

is the m atter o f research. Besides, the software concerning the neural networks

is m o re accesible.

It should be noticed th a t graphical know legdc rep resentation has a great

a d v an tag e over rule-based know legde used in expert and decision sup p o rt

system s. E ach rule in rule-based system s are trea ted independently am ong

o th ers and th a t’s why it m ay be inconsistent and re d u n d an t. I n graph-based

expert system s these problem s d o n o t exist.

R E F E R E N C E S

Bilm es J.A . (2000), D y n a m ic Bayesian m ultinets, [in:] Proceedings o f the 16th conference on U ncertainty in A rtificial Intelligence, M o rg a n K a u fm an n P u b lish ers, S tan fo rd , C a  lifornia.

C hickering D .M ., G eiger D ., H eckerm an D . (1994), Learning Bayesian N etw o rk is N P -hard, M ic ro so ft R esearch M S R -T R -94-17, R edm ond.

D iez F .J., M ira J. (1994), D istributed Reasoning and Learning in Bayesian E xp ert System , D p to . In fo rm atic a y A u to m atica, U N E D , M ad rid .

D o m a ń sk i C z., P ru s k a K ., W agner W. (1998), W nioskow anie sta tystyc zn e p r z y nieklasycznych założeniach, W yd. U L , Ł ódź.

G eiger D ., H eck erm an D . (1994), Learning Gaussian N etw orks, M icro so ft R esearch M S R -TR -94-10, R ed m o n d .

G h a b ra m a n i Z. (1997), Learning D ynam ie Bayesian N etw orks, U niversity o f T o ro n to , T o ro n to . H eckerm an D ., G eiger D . (1995), Learning Bayesian N etw orks, M icrosoft R esearch M S R -TR -95-

02, R ed m o n d .

K rau se P. (1998), Learning Probabilistic Retworks, technical report, Philips R esearch Labs, Redhill. M u law k a J.J. (1996), S y ste m y ekspertow e, W N T , W arszaw a.

(11)

M u rp h y K . (2001), An introduction to graphical models, h ttp ://w w w .cs.b erk eley .ed u /m u rp - h y k /B ay es/b ay es_ lu to rial.p d f

M u rp h y K . (2002), D ynam ic Bayesian N etw orks: Representation, Inference and Learning, P h D T hesis, U C B erkeley, C o m p u te r Science Division.

N iederm ayer D . (1998), A n introduction to Bayesian netw orks and their contem porary applications, h ttp ://w w w .g p rn .sk .c a/~ d a ry le /p ap e rs/b ay e sia n _ n etw o rk s/b a y es.h tm l

N o rm an d S.L., T ritchler D . (1992), P aram eter u p d atin g in a Bayes n etw ork, Journal o f American S ta tistica l Association, 82, 420.

R ussel S., B inder J., K o ller D . (1994), A daptive Probabilistic N etw o rks, technical re p o rt U C B /C S D -94-824, U niversity o f C alifo rn ia, Berkley.

Settim i R ., Sm ith J.Q ., G a rg o u m A .S. (1999), A pp ro xim a te le a r n in g in C om plex D ynam ic Bayesian N etw orks, E ngineering and Physical Sciences R esearch C ouncil G R /K 7 2 2 5 4 . J. P earl, w eb page h ttp ://b ay es.cs.u cla.ed u /jp .h o m e.h tm l

Tomasz Kozdraj

U W A G I O S IE C IA C H B A Y K SO W SK IC H I IC H Z A S T O S O W A N IA C H Streszczenie

Sieci B ayesa są stru k tu ra m i graficznym i będącym i skierow anym i g rafam i acyklicznym i prezentującym i zależności pom iędzy zmiennymi losowymi. Z n ajd u ją one zastosow anie w dziedzinie tzw. o p ro g ram o w an ia inteligentnego, a zwłaszcza w system ach ekspertow ych. A rty k u ł ten porusza pro b lem y sam ych sieci bayesow skich, uczenia o raz ich zasto so w an ia. P o d ję to też p ró b ę ich aplikacji n a p o lu z ag ad n ień ekonom icznych zw iązanych z rynkiem kap itało w y m .