• Nie Znaleziono Wyników

Remarks on Bayesian Networks and Their Applications


Academic year: 2021

Share "Remarks on Bayesian Networks and Their Applications"


Pełen tekst



F O L IA O E C O N O M IC A 194, 2005

T o m a s z K o z d r aj *



B ayesian n e tw o rk s a re d irec te d acyclic g ra p h s th a t re p re se n t d e p en d en cies betw een variables in a p ro b a b ilistic m odel. T h ey are becom ing an increasingly im p o rta n t a rea Гог research a n d a p p licatio n s in the entire field o f A rtificial Intelligence. T h is p a p e r explores the n a tu re o f im p lica tio n s fo r Bayesian n etw o rk s beginning w ith a n overview an d c o m p ariso n o f inferential statistics an d B ayes’ T heorem . I t p re sen ts the possibilities o f ap p lic atio n s o f B ayesian n e tw o rk s in a field o f econom ic p ro b lem s and also focuses on the p ro b lem o f learning.

Key words: B ayesian netw o rk s, p ro b ab ilistic netw o rk s, lea rn in g B ayesian netw orks.


B ayesian n etw orks are becom ing m ore and m o re p o p u la r in the field

o f research an d applications o f artificial intelligence. T hey play a significant

role in decision processes and knowledge representation in expert and decision

su p p o rt systems.

C onsidering expert system or decision su p p o rt system w ith reference to

th eir stru ctu re (Fig. 1) one can notice th a t the role o f co ncluding procedures

and know ledge base, where Bayesian netw orks can be applied, is extrem ely

im p o rtan t.

R etu rn in g to B ayesian m eth o d s particularly to B ayesian n etw o rks it is

com m on know ledge th a t they are classified to non-classical statistical m ethods,

because th e inference is based no t only o n a sam ple b u t also takes advantage

o f in fo rm atio n outside the sam ple. T h e in fo rm atio n outside the sam ple is


called prior in fo rm atio n and presents the m o st con trov ersial p o in t in the

B ayesian theory.

T h e basis o f the whole Bayesian statistics theory is conditional probability

theorem published by T h o m as Bayes in the year 1763.

Figure 1. T h e m ain elem ents o f an ex p ert system S ource: M u la w k a J.J. (1996), S y stem y ekspertow e, W N T , W arszaw a

T h e m ath em atical n o tatio n o f this theorem is the follow ing (for discrete

ra n d o m variable):

P W B ) = 1 M




E e i B . A j P t A j )

J= 1

w here events A lt A 2, A

N are independent events called hypotheses and

P ^ i ) , Р (Л 2), ...,Р (Л л ) are prior probabilities or subjective probabilities. T he

p ro b ab ility o f event A ; conditional on the occurrence o f event В (for 1) is

kn ow n as posterior probability.

It should be noticed th a t m ost o f the probabilities in eq u atio n (1) are

co nditional probabilities. T hey express the tru st fo r som e pro p o sals based

on the assu m p tio n th a t o th er proposals are true.



T he conception o f Bayesian networks is directly connected with conditional

probability theory. It is easy to notice th a t in the real w orld there are

m any situ atio n s in which occurrence o f one event is strictly d epen den t on

the occurrence o f a n o th e r event.

A pplying Bayesian netw orks allows m ore precise m odeling o f u n certain ­

ty and to predict the possibility o f occurrence o f som e situ atio n s thro u g h

using ad d itio n al inform ation. Since the know ledge ab o u t considered p ro b ­

lem has probab ilistic ch a racter and the m eth o d s are based on probabilistic

concepts it is used to call Bayesian netw orks as p ro b a b ility 1 netw orks or

belief netw orks.

F o rm ally one can say th a t Bayesian netw orks are graph ical structures

which represent dependencies between variables. T hey are directed acyclic

graphs w hich encode the stru ctu re o f system, its u ncertain ty and com p ­

rehension. T his sort o f graphs are com posed o f nodes and edges, where

nodes co rresp o n d to all random variables.

T herefo re, one can say ab o u t node

co rrespo nd in g to ra n d o m variable

X t fo r i = 1 ,2 ,..., n. T h e edge (path ) directed from n o d e X , to node X j

can be intuitively interpreted as representation o f direct dependence o f variable

X j from variable X t.

T h e occurrence o f such edge is usually symbolically denoted by expression

X t —* Xj . F o r all nodes we can introduce afterm ath relation denoted as

—* th a t no d e X j is successor o f X t o r n o d e X, is predecessor o f node X j

in Bayesian netw ork, w hat can be described as

if one o f the

follow ing co nditions com es tru e :

- there is a directed p ath from node X t to node X j , th a t is X t —*■ X/ ,

- there is a directed p ath from node X t to som e n o d e X k an d node

X j is successor o f X k, th a t is X i —> X k and X k —* X j .

A ccording to this definition node X j is a successor o f X t, if there exists

a p a th m ad e up o f directed edges from node X t to no d e X j . If X t —*■ Xj ,

it m eans th a t X t is a direct predecessor (parent) o f node X j o r node X j

is a direct successor (child) o f node X ľ

T h e in tro d u ctio n o f afterm ath relation is im p o rta n t because it m akes

possible to define m o re precisely and form ally the way o f sem antics of

B ayesian N etw ork, th a t is the in terp retatio n o f all edges. T h e netw ork is

in terpreted as th e assertion o f conditional independence o f each node from

all nodes which aren ’t its successors and w ith given values o f its predecessors.

T h e m a in idea o f this ap p ro ach is decom position o f the system to sim pler

parts, showing its m odularity (graph theory) and assuring cohesion (probability



T h e usefulness o f Bayesian netw orks w ith correct stru ctu re consists in

the ability to represent in an efficient way th e jo in t p ro b ab ility d istrib ution

for all ra n d o m variables o f the m odel.

I f wc d en o te by sym bol Ux the set (dom ain) o f nodes which are parents

o f node X then the effective rule to com pute the joint probability distribution

requires defining conditional probability distribution P ( X ,|I / X|) for each node

X t, th a t m eans the probability o f variable X t for the sake o f possible

outco m es o f its parents. O bviously, for a node which does n o t have parents

the conditio n al d istrib u tio n is equal to m arginal d istrib u tio n P (X (). T he

basic assu m p tio n in g ra p h -b ased m odels is th e assu m p tio n th a t jo in t

prob ab ility d istrib u tio n P(X) is equal to the p ro d u c t o f m arginal and

conditional distributions for all random variables. Therefore such a distribution

can be d eno ted in the follow ing way (chain rule):

It should be easy to notice th a t know ledge o f the jo in t p ro b ab ility m akes

possible the inference ab o u t values o f any chosen variables when values o f

o th er variables are know n. If thus it is possible to represent th e jo in t

probab ility d istrib u tio n o f variables using Bayesian netw orks, it is also

theoretically possible to use it for probabilistic inference to get the answer

for any q uestion o f interest assum ing th a t the stru ctu re (topology) is correct.

F o r this reason one can distinguish tw o types o f inference (M urph y,

2001), N iederm aycr (1998)). T h e first one going th ro u g h from effect to

cause and called b ottom up inference and the second one from cause to

effect, th a t is to p dow n inference. In som e cases wc use approxim ated

inference (Settim i, Sm ith, G argoum , 1999).


G enerally Bayesian netw orks can be divided into tw o groups i.e. dynam ic

Bayesian n etw orks o r static Bayesian netw orks. D ynam ic B ayesian netw orks

are used in tim e series m odeling, for exam ple in signal recognition processes.

In this ease series and netw ork are usually represented by first-order M arkov

process (G h a b ram a n i, 1997).

If Y l


Y T are ran d o m variables representing tim e series o f first-order

M ark o v process then the jo in t probability d istrib u tio n will be equal to:




Figure 2. A Bayesian N etw o rk rep resen tin g a first-o rd er M a rk o v process

These m odels d o no t directly represent dependencies between observations

over m ore th a n one tim e step, th erefore it is co m m o n to allow higher order

interactio n s betw een variables i.e. r ‘*-order M ark o v m odels.

A n o th e r w ay to extend M ark o v m odels is to posit th a t the observations

are depen d en t on hidden variables which wc can call the states and th a t

the sequence o f states is a M ark o v process. A classical m odel o f this kind

is the K a lm an filter.

Figure 3. A B ayesian netw ork specifying co n d itio n al indep en d en ce relatio n s fo r a K a lm a n filter m odel

U sing th e sh o rt n o ta tio n the jo in t probab ility d istrib u tio n for this case

for sequence from £ = 1 to £ = T is:

Р ( * „ У() = Р (Х 1) - Р ( У 1|Х 1) П Р ( Х (|Х ,_ 1) - Р ( У 1|Х [)


( = 2

T h e state tran sitio n probability P i-Y J-Y j-i) can be decom posed into d eter­

m inistic and stochastic com ponent

X t = f , ( X t- i ) + ot


where / , is the determ inistic tran sitio n function, and o, is zero-m ean rand om

noise vector.


Sim ilarly, the o b servation probability P(Y (|X t), can be decom posed:

Yt = g,(X') + { t


T h ere are o th er ways o f representation com m only used in dynam ic Baye­

sian netw ork s i.e. H id d en M ark o v M odels, F acto rial M ark o v M odels and

S w itching S ta te M odels (B ilm es (2 0 0 0 ),G h a b ra m a n i (1997), M u rp h y


S tatic B ayesian netw orks are usually used in m edical diagnoses o r can

be applied as decision su p p o rt tools in classification pro blem s (for exam ple

in com m u n icatio n insurances). T he structure o f static n etw o rk does not

differ from general scheme (acyclic graph) a p a rt from lack o f dynam ic


I f the n etw ork had for exam ple four ra n d o m variables X , Y, Z and

W (Fig. 4) the jo in t prob ab ility distribu tion w ould be as follows:

P (W, X , Y, Z ) = P(W) • P ( X ) ■

Р(У | W) ■

\ \ Z \ X , Y)


C ertainly, regardless o f the structure, the netw o rk can be subject to

learning process. Such a process can concern b o th p aram eters o f the

netw o rk or the stru ctu re, and can be associated w ith variable selection and

edge specification. T h ere are adequ ate algorithm s fo r p aram eter learning

w hich allow to o b tain the best estim ations (for exam ple gradient m ethods

o r m eth o d s based on m axim um likelihood function). G enerally p aram eter

learning is sim ply u p d atin g o f conditional p ro b ab ility tables for each node

o f the netw ork. H ow ever, m ore com plicated problem is learning the correct

stru ctu re. T h e stru ctu re learning procedures are based on searching between

all possible and acceptable netw orks o f interest to find one o r several

optim al netw orks.


In such cases the solu tio n is based on com plicated m ath em atical al­

gorithm s based o n special m etrics e.g. K 2 m etric and in som e cases (for

exam ple w hen som e variables are hidden) the p ro p er so lu tio n h a sn ’t been

found so far.

T h e searching o f all possible netw orks can be lim ited by taking into

consideration the prior know ledge ab o u t the problem o f interest (expert

know ledge) o r by im posing additional conditions lim iting the stru ctu re o f

Bayesian netw ork. Usually the lim itation is connected with interactions order,

th a t is it concerns the m axim um num ber o f edges which can be directed

to one node.


As it was noticed the Bayesian netw ork encodes in a com pressed way

the jo in t p ro b ab ility distrib u tio n o f rand om variables and this kind o f

d istrib u tio n is sufficient fo r the inference. T he answ er to any q u estio n can

be o btained by co m puting the jo in t probability d istrib u tio n on the basis

o f the netw ork and using it for a p p ro p riate calculations.

U n fo rtu n ately , such an app ro ach m eans resignation from one o f the

best advantages which can be obtained by graphical rep resen tatio n o f the

jo in t p ro b ab ility distrib u tio n , lying in its efficiency. O f course, Bayesian

netw orks give o th e r advantages, p articularly legible an d intuitively com p­

rehensible grap hical know ledge representation ab o u t direct causalities, but

effectiveness reasons m ak e it im possible to use this d istrib u tio n in practice,

exception for cases w ith few num ber o f variables.

T hus, there is a need for other inference algorithm s in Bayesian networks.

U n fo rtu n ately , in general case such a problem is N P -h ard . T his problem

is becom ing easier for a special type o f netw orks called single-connection

netw orks. In such netw orks a n t tw o nodes can be linked only w ith one

p ath (m axim um ) com posed o f freely directed edges. T here are know n effective

algorithm s o f ap p ro x im ate inference for this kind o f n etw orks based on

M o n te-C arlo m eth o d s e.g. logical o r weighted sam pling. T h erefo re, there

are som e practical lim itations o f use caused by relatively h a rd obtaining

o f efficiency.

T h e field in w hich B ayesian netw orks are dynam ically developed is

m edicine. T h e task o f the netw ork in such cases is usually to find the m ost

p ro b ab le cause o f ailm ent o f a patient. T herefore netw orks have to answer

the question: w hat does the patient suffer from if som e sym ptom s occur

(which c a n n o t be classified clearly-out). H ence we often d irect to Bayesian

netw orks in classification problem s.


A ltho ugh there is a big interest in Bayesian netw orks in m edicine, their

use in social an d econom ical field is n o t so p o p u lar. T h e actual problem

isn ’t only the m entioned problem o f efficiency but also the problem of

stru ctu re and learning, particularly when the econom ical variables have

dynam ic ch aracter.

L et us consider hypothetical netw ork with binary nodes, represented in

F ig u re 5. T h e aim is to estim ate the prob ab ility o f share (K G H M ) price

fall in fu tu re tim e period.

It is a problem o f classification, if the share should be classified to a falling

g ro u p (or n eutrally) o r to an increasing group. T h e n etw o rk was m ade

w ith use o f G ene Softw are developed by D ecision System s L ab o rato ry in

P ittsb u rg h U niversity.

Let X u

Y s respectively represent nodes o f the net. T h e m eaning and

possible states o f nodes is presented in T ab le 1.

T abic 1. L ist o f variables, sym bols and categories

V ariable Sym bol C ategories

E xchange ra te increase, fall

V olum e o f trad e X 2 increase, fall

S to c k E xchange index X , increase, fall

P o sitions in futures X * increase, fall

S hare price * 5 increase, fall

I f we d en o te category fall by 0 and category increase by 1 th en adequate

m arg inal and co nditional d istributions will be follow ing (for learning sam ple

o f 47 elem ents - weekly d a ta from 10.09.2001 to 09.09.2002):


Tabic 2. M arginal p ro b a b ility d istrib u tio n P (X , = Y) o f v ariab le X t

II о Y = 1

0.5116 0.4884

T ab ic 3. C o n d itio n a l p ro b a b ility d istrib u tio n P (X 2 = Y \ X t — Z ) o f v ariab le X 2

II О Y = 1

Z = 0 0.7619 0.2381

Z = 1 0.7273 0.2727

T abic 4. C o n d itio n al p ro b a b ility d istrib u tio n P (X


II >■. * II N * II O) /-Ч o f v ariab le X 3 О II Y - 1 N II о Ю II о 0.3750 0.6250 Z = 0 Q = 1 0.4375 0.5625 Z = 1 Q = 0 0.4000 0.6000 Z = 1 Q = 1 0.5000 0.5000

T abic 5. C o n d itio n a l p ro b a b ility d istrib u tio n P (X 4 = У |Х 3 = Z ) o f v ariab le X A


II Y = 1

Z = 0 0.5417 0.4583

Z = 1 0.5263 0.4737

T abic 6. C o n d itio n al p ro b a b ility d istrib u tio n Р(.У5 = Y \ X A = Z , X 3 = Q, X 2 = R) o f variable X s II О У = 1 Z = 0 Q = 0 R = 0 0.5556 0.4444 Z = 1 e = o я = 0 0.5000 0.5000 Z = 0




R = 0 0.2000 0.8000 Z = 0 e = o Я = 1 0.7500 0.2500 Z = 1

e = i

Я = 0 0.5000 0.5000 Z = 0




Л = 1 0.6000 0.4000 Z = I Q = o Я = 1 0.0001 0.9999 Z = 1

Q = 1

Л = 1 0.0001 0.9999


T h e rightness o f decision m aking was verified in tim e period o f 23

weeks. It tu rn ed o u t th a t in 48% o f cases the net were classified correctly

(in 26% o f cases it could no t m ak e a decision and in 26% it m ad e a wrong

one). O bviously, it c a n n o t be perceived as a good result. T h e reason for

this result could be connected with the stru ctu re (topology) o f the netw ork

which could be im proper, and with a selection o f variables.


B ayesian n etw orks can be an effective tool for statistical inference in

co m p u tatio n al expert systems. H ow ever, there are som e b arriers connected

w ith the stru ctu re o f a netw ork, variables d y nam izatio n an d learning or

up d ate processes. T hese problem s are m athem atically com plicated and this

can explain the reason why the neural netw orks arc still m o re p o p u lar than

probabilistic netw orks. Especially the problem o f learning o f the structure

is the m atter o f research. Besides, the software concerning the neural networks

is m o re accesible.

It should be noticed th a t graphical know legdc rep resentation has a great

a d v an tag e over rule-based know legde used in expert and decision sup p o rt

system s. E ach rule in rule-based system s are trea ted independently am ong

o th ers and th a t’s why it m ay be inconsistent and re d u n d an t. I n graph-based

expert system s these problem s d o n o t exist.


Bilm es J.A . (2000), D y n a m ic Bayesian m ultinets, [in:] Proceedings o f the 16th conference on U ncertainty in A rtificial Intelligence, M o rg a n K a u fm an n P u b lish ers, S tan fo rd , C a ­ lifornia.

C hickering D .M ., G eiger D ., H eckerm an D . (1994), Learning Bayesian N etw o rk is N P -hard, M ic ro so ft R esearch M S R -T R -94-17, R edm ond.

D iez F .J., M ira J. (1994), D istributed Reasoning and Learning in Bayesian E xp ert System , D p to . In fo rm atic a y A u to m atica, U N E D , M ad rid .

D o m a ń sk i C z., P ru s k a K ., W agner W. (1998), W nioskow anie sta tystyc zn e p r z y nieklasycznych założeniach, W yd. U L , Ł ódź.

G eiger D ., H eck erm an D . (1994), Learning Gaussian N etw orks, M icro so ft R esearch M S R -TR -94-10, R ed m o n d .

G h a b ra m a n i Z. (1997), Learning D ynam ie Bayesian N etw orks, U niversity o f T o ro n to , T o ro n to . H eckerm an D ., G eiger D . (1995), Learning Bayesian N etw orks, M icrosoft R esearch M S R -TR -95-

02, R ed m o n d .

K rau se P. (1998), Learning Probabilistic Retworks, technical report, Philips R esearch Labs, Redhill. M u law k a J.J. (1996), S y ste m y ekspertow e, W N T , W arszaw a.


M u rp h y K . (2001), An introduction to graphical models, h ttp ://w w w .cs.b erk eley .ed u /m u rp - h y k /B ay es/b ay es_ lu to rial.p d f

M u rp h y K . (2002), D ynam ic Bayesian N etw orks: Representation, Inference and Learning, P h D T hesis, U C B erkeley, C o m p u te r Science Division.

N iederm ayer D . (1998), A n introduction to Bayesian netw orks and their contem porary applications, h ttp ://w w w .g p rn .sk .c a/~ d a ry le /p ap e rs/b ay e sia n _ n etw o rk s/b a y es.h tm l

N o rm an d S.L., T ritchler D . (1992), P aram eter u p d atin g in a Bayes n etw ork, Journal o f American S ta tistica l Association, 82, 420.

R ussel S., B inder J., K o ller D . (1994), A daptive Probabilistic N etw o rks, technical re p o rt U C B /C S D -94-824, U niversity o f C alifo rn ia, Berkley.

Settim i R ., Sm ith J.Q ., G a rg o u m A .S. (1999), A pp ro xim a te le a r n in g in C om plex D ynam ic Bayesian N etw orks, E ngineering and Physical Sciences R esearch C ouncil G R /K 7 2 2 5 4 . J. P earl, w eb page h ttp ://b ay es.cs.u cla.ed u /jp .h o m e.h tm l

Tomasz Kozdraj

U W A G I O S IE C IA C H B A Y K SO W SK IC H I IC H Z A S T O S O W A N IA C H Streszczenie

Sieci B ayesa są stru k tu ra m i graficznym i będącym i skierow anym i g rafam i acyklicznym i prezentującym i zależności pom iędzy zmiennymi losowymi. Z n ajd u ją one zastosow anie w dziedzinie tzw. o p ro g ram o w an ia inteligentnego, a zwłaszcza w system ach ekspertow ych. A rty k u ł ten porusza pro b lem y sam ych sieci bayesow skich, uczenia o raz ich zasto so w an ia. P o d ję to też p ró b ę ich aplikacji n a p o lu z ag ad n ień ekonom icznych zw iązanych z rynkiem kap itało w y m .


Powiązane dokumenty

We introduce ternary wavelets, based on an interpolating 4-point C 2 ternary stationary subdivision scheme, for compressing fractal-like signals.. These wavelets are tightly

Mobile machines (including underground locomo- tives for mine railways) intended to be used in atmo- spheres threatened by methane and/or flammable dust explosion hazard should meet

Short-link chains are used in a number of mechan- ical devices, mainly as drive chains cooperating with socket wheels and support slings in hoists. Chains with long links

Z w ierzętom używ anym do nauczan ia w dośw iadczeniach doraźnych nie pozw ala się potem obudzić, giną nieboleśnie w anestezji.. 4, papier

On the next stage, the next sequence, i.e. β M , is considered and its purpose is to precise the classication result, what is shown in the Figure 4. The procedure can be continued


It is well known that any complete metric space is isomet- ric with a subset of a Banach space, and any hyperconvex space is a non- expansive retract of any space in which it

Zajmiemy si¦ teraz problemem równania postaci (16), które jednak nie jest zupeªne.. Wów- czas mo»emy poszukiwa¢ takiego czynnika, który sprawi, »e po pomno»eniu przez niego