A C T A U N I V E R S I T A T I S L O D Z I E N S I S
F O L IA O E C O N O M IC A 194, 2005
T o m a s z K o z d r aj *
REM ARKS O N BAYESIAN NETW ORKS A N D T H EIR A PPL IC A T IO N S
Abstract
B ayesian n e tw o rk s a re d irec te d acyclic g ra p h s th a t re p re se n t d e p en d en cies betw een variables in a p ro b a b ilistic m odel. T h ey are becom ing an increasingly im p o rta n t a rea Гог research a n d a p p licatio n s in the entire field o f A rtificial Intelligence. T h is p a p e r explores the n a tu re o f im p lica tio n s fo r Bayesian n etw o rk s beginning w ith a n overview an d c o m p ariso n o f inferential statistics an d B ayes’ T heorem . I t p re sen ts the possibilities o f ap p lic atio n s o f B ayesian n e tw o rk s in a field o f econom ic p ro b lem s and also focuses on the p ro b lem o f learning.
Key words: B ayesian netw o rk s, p ro b ab ilistic netw o rk s, lea rn in g B ayesian netw orks.
I. IN T R O D U C T IO N
B ayesian n etw orks are becom ing m ore and m o re p o p u la r in the field
o f research an d applications o f artificial intelligence. T hey play a significant
role in decision processes and knowledge representation in expert and decision
su p p o rt systems.
C onsidering expert system or decision su p p o rt system w ith reference to
th eir stru ctu re (Fig. 1) one can notice th a t the role o f co ncluding procedures
and know ledge base, where Bayesian netw orks can be applied, is extrem ely
im p o rtan t.
R etu rn in g to B ayesian m eth o d s particularly to B ayesian n etw o rks it is
com m on know ledge th a t they are classified to non-classical statistical m ethods,
because th e inference is based no t only o n a sam ple b u t also takes advantage
o f in fo rm atio n outside the sam ple. T h e in fo rm atio n outside the sam ple is
called prior in fo rm atio n and presents the m o st con trov ersial p o in t in the
B ayesian theory.
T h e basis o f the whole Bayesian statistics theory is conditional probability
theorem published by T h o m as Bayes in the year 1763.
Figure 1. T h e m ain elem ents o f an ex p ert system S ource: M u la w k a J.J. (1996), S y stem y ekspertow e, W N T , W arszaw a
T h e m ath em atical n o tatio n o f this theorem is the follow ing (for discrete
ra n d o m variable):
P W B ) = 1 M
W
_
(1)
E e i B . A j P t A j )
J= 1
w here events A lt A 2, A
N are independent events called hypotheses and
P ^ i ) , Р (Л 2), ...,Р (Л л ) are prior probabilities or subjective probabilities. T he
p ro b ab ility o f event A ; conditional on the occurrence o f event В (for 1) is
kn ow n as posterior probability.
It should be noticed th a t m ost o f the probabilities in eq u atio n (1) are
co nditional probabilities. T hey express the tru st fo r som e pro p o sals based
on the assu m p tio n th a t o th er proposals are true.
II. G E N E R A L N E T W O R K S T R U C T U R E A N D IN F E R E N C E
T he conception o f Bayesian networks is directly connected with conditional
probability theory. It is easy to notice th a t in the real w orld there are
m any situ atio n s in which occurrence o f one event is strictly d epen den t on
the occurrence o f a n o th e r event.
A pplying Bayesian netw orks allows m ore precise m odeling o f u n certain
ty and to predict the possibility o f occurrence o f som e situ atio n s thro u g h
using ad d itio n al inform ation. Since the know ledge ab o u t considered p ro b
lem has probab ilistic ch a racter and the m eth o d s are based on probabilistic
concepts it is used to call Bayesian netw orks as p ro b a b ility 1 netw orks or
belief netw orks.
F o rm ally one can say th a t Bayesian netw orks are graph ical structures
which represent dependencies between variables. T hey are directed acyclic
graphs w hich encode the stru ctu re o f system, its u ncertain ty and com p
rehension. T his sort o f graphs are com posed o f nodes and edges, where
nodes co rresp o n d to all random variables.
T herefo re, one can say ab o u t node
co rrespo nd in g to ra n d o m variable
X t fo r i = 1 ,2 ,..., n. T h e edge (path ) directed from n o d e X , to node X j
can be intuitively interpreted as representation o f direct dependence o f variable
X j from variable X t.
T h e occurrence o f such edge is usually symbolically denoted by expression
X t —* Xj . F o r all nodes we can introduce afterm ath relation denoted as
—* th a t no d e X j is successor o f X t o r n o d e X, is predecessor o f node X j
in Bayesian netw ork, w hat can be described as
if one o f the
follow ing co nditions com es tru e :
- there is a directed p ath from node X t to node X j , th a t is X t —*■ X/ ,
- there is a directed p ath from node X t to som e n o d e X k an d node
X j is successor o f X k, th a t is X i —> X k and X k —* X j .
A ccording to this definition node X j is a successor o f X t, if there exists
a p a th m ad e up o f directed edges from node X t to no d e X j . If X t —*■ Xj ,
it m eans th a t X t is a direct predecessor (parent) o f node X j o r node X j
is a direct successor (child) o f node X ľ
T h e in tro d u ctio n o f afterm ath relation is im p o rta n t because it m akes
possible to define m o re precisely and form ally the way o f sem antics of
B ayesian N etw ork, th a t is the in terp retatio n o f all edges. T h e netw ork is
in terpreted as th e assertion o f conditional independence o f each node from
all nodes which aren ’t its successors and w ith given values o f its predecessors.
T h e m a in idea o f this ap p ro ach is decom position o f the system to sim pler
parts, showing its m odularity (graph theory) and assuring cohesion (probability
theory).
T h e usefulness o f Bayesian netw orks w ith correct stru ctu re consists in
the ability to represent in an efficient way th e jo in t p ro b ab ility d istrib ution
for all ra n d o m variables o f the m odel.
I f wc d en o te by sym bol Ux the set (dom ain) o f nodes which are parents
o f node X then the effective rule to com pute the joint probability distribution
requires defining conditional probability distribution P ( X ,|I / X|) for each node
X t, th a t m eans the probability o f variable X t for the sake o f possible
outco m es o f its parents. O bviously, for a node which does n o t have parents
the conditio n al d istrib u tio n is equal to m arginal d istrib u tio n P (X (). T he
basic assu m p tio n in g ra p h -b ased m odels is th e assu m p tio n th a t jo in t
prob ab ility d istrib u tio n P(X) is equal to the p ro d u c t o f m arginal and
conditional distributions for all random variables. Therefore such a distribution
can be d eno ted in the follow ing way (chain rule):
It should be easy to notice th a t know ledge o f the jo in t p ro b ab ility m akes
possible the inference ab o u t values o f any chosen variables when values o f
o th er variables are know n. If thus it is possible to represent th e jo in t
probab ility d istrib u tio n o f variables using Bayesian netw orks, it is also
theoretically possible to use it for probabilistic inference to get the answer
for any q uestion o f interest assum ing th a t the stru ctu re (topology) is correct.
F o r this reason one can distinguish tw o types o f inference (M urph y,
2001), N iederm aycr (1998)). T h e first one going th ro u g h from effect to
cause and called b ottom up inference and the second one from cause to
effect, th a t is to p dow n inference. In som e cases wc use approxim ated
inference (Settim i, Sm ith, G argoum , 1999).
III. T H E T Y P E S O F B A Y ESIA N N E T W O R K S A N D L E A R N IN G P R O C E D U R E S
G enerally Bayesian netw orks can be divided into tw o groups i.e. dynam ic
Bayesian n etw orks o r static Bayesian netw orks. D ynam ic B ayesian netw orks
are used in tim e series m odeling, for exam ple in signal recognition processes.
In this ease series and netw ork are usually represented by first-order M arkov
process (G h a b ram a n i, 1997).
If Y l
t
Y T are ran d o m variables representing tim e series o f first-order
M ark o v process then the jo in t probability d istrib u tio n will be equal to:
N
(2)
Figure 2. A Bayesian N etw o rk rep resen tin g a first-o rd er M a rk o v process
These m odels d o no t directly represent dependencies between observations
over m ore th a n one tim e step, th erefore it is co m m o n to allow higher order
interactio n s betw een variables i.e. r ‘*-order M ark o v m odels.
A n o th e r w ay to extend M ark o v m odels is to posit th a t the observations
are depen d en t on hidden variables which wc can call the states and th a t
the sequence o f states is a M ark o v process. A classical m odel o f this kind
is the K a lm an filter.
Figure 3. A B ayesian netw ork specifying co n d itio n al indep en d en ce relatio n s fo r a K a lm a n filter m odel
U sing th e sh o rt n o ta tio n the jo in t probab ility d istrib u tio n for this case
for sequence from £ = 1 to £ = T is:
Р ( * „ У() = Р (Х 1) - Р ( У 1|Х 1) П Р ( Х (|Х ,_ 1) - Р ( У 1|Х [)
(4)
( = 2T h e state tran sitio n probability P i-Y J-Y j-i) can be decom posed into d eter
m inistic and stochastic com ponent
X t = f , ( X t- i ) + ot
(5)
where / , is the determ inistic tran sitio n function, and o, is zero-m ean rand om
noise vector.
Sim ilarly, the o b servation probability P(Y (|X t), can be decom posed:
Yt = g,(X') + { t
(6)
T h ere are o th er ways o f representation com m only used in dynam ic Baye
sian netw ork s i.e. H id d en M ark o v M odels, F acto rial M ark o v M odels and
S w itching S ta te M odels (B ilm es (2 0 0 0 ),G h a b ra m a n i (1997), M u rp h y
(2002)).
S tatic B ayesian netw orks are usually used in m edical diagnoses o r can
be applied as decision su p p o rt tools in classification pro blem s (for exam ple
in com m u n icatio n insurances). T he structure o f static n etw o rk does not
differ from general scheme (acyclic graph) a p a rt from lack o f dynam ic
variables.
I f the n etw ork had for exam ple four ra n d o m variables X , Y, Z and
W (Fig. 4) the jo in t prob ab ility distribu tion w ould be as follows:
P (W, X , Y, Z ) = P(W) • P ( X ) ■
Р(У | W) ■
\ \ Z \ X , Y)
(7)
C ertainly, regardless o f the structure, the netw o rk can be subject to
learning process. Such a process can concern b o th p aram eters o f the
netw o rk or the stru ctu re, and can be associated w ith variable selection and
edge specification. T h ere are adequ ate algorithm s fo r p aram eter learning
w hich allow to o b tain the best estim ations (for exam ple gradient m ethods
o r m eth o d s based on m axim um likelihood function). G enerally p aram eter
learning is sim ply u p d atin g o f conditional p ro b ab ility tables for each node
o f the netw ork. H ow ever, m ore com plicated problem is learning the correct
stru ctu re. T h e stru ctu re learning procedures are based on searching between
all possible and acceptable netw orks o f interest to find one o r several
optim al netw orks.
In such cases the solu tio n is based on com plicated m ath em atical al
gorithm s based o n special m etrics e.g. K 2 m etric and in som e cases (for
exam ple w hen som e variables are hidden) the p ro p er so lu tio n h a sn ’t been
found so far.
T h e searching o f all possible netw orks can be lim ited by taking into
consideration the prior know ledge ab o u t the problem o f interest (expert
know ledge) o r by im posing additional conditions lim iting the stru ctu re o f
Bayesian netw ork. Usually the lim itation is connected with interactions order,
th a t is it concerns the m axim um num ber o f edges which can be directed
to one node.
IV . T H E P O S S IB IL IT IE S O F A P P L IC A T IO N . C H A N C E S A N D P R O B L E M S
As it was noticed the Bayesian netw ork encodes in a com pressed way
the jo in t p ro b ab ility distrib u tio n o f rand om variables and this kind o f
d istrib u tio n is sufficient fo r the inference. T he answ er to any q u estio n can
be o btained by co m puting the jo in t probability d istrib u tio n on the basis
o f the netw ork and using it for a p p ro p riate calculations.
U n fo rtu n ately , such an app ro ach m eans resignation from one o f the
best advantages which can be obtained by graphical rep resen tatio n o f the
jo in t p ro b ab ility distrib u tio n , lying in its efficiency. O f course, Bayesian
netw orks give o th e r advantages, p articularly legible an d intuitively com p
rehensible grap hical know ledge representation ab o u t direct causalities, but
effectiveness reasons m ak e it im possible to use this d istrib u tio n in practice,
exception for cases w ith few num ber o f variables.
T hus, there is a need for other inference algorithm s in Bayesian networks.
U n fo rtu n ately , in general case such a problem is N P -h ard . T his problem
is becom ing easier for a special type o f netw orks called single-connection
netw orks. In such netw orks a n t tw o nodes can be linked only w ith one
p ath (m axim um ) com posed o f freely directed edges. T here are know n effective
algorithm s o f ap p ro x im ate inference for this kind o f n etw orks based on
M o n te-C arlo m eth o d s e.g. logical o r weighted sam pling. T h erefo re, there
are som e practical lim itations o f use caused by relatively h a rd obtaining
o f efficiency.
T h e field in w hich B ayesian netw orks are dynam ically developed is
m edicine. T h e task o f the netw ork in such cases is usually to find the m ost
p ro b ab le cause o f ailm ent o f a patient. T herefore netw orks have to answer
the question: w hat does the patient suffer from if som e sym ptom s occur
(which c a n n o t be classified clearly-out). H ence we often d irect to Bayesian
netw orks in classification problem s.
A ltho ugh there is a big interest in Bayesian netw orks in m edicine, their
use in social an d econom ical field is n o t so p o p u lar. T h e actual problem
isn ’t only the m entioned problem o f efficiency but also the problem of
stru ctu re and learning, particularly when the econom ical variables have
dynam ic ch aracter.
L et us consider hypothetical netw ork with binary nodes, represented in
F ig u re 5. T h e aim is to estim ate the prob ab ility o f share (K G H M ) price
fall in fu tu re tim e period.
It is a problem o f classification, if the share should be classified to a falling
g ro u p (or n eutrally) o r to an increasing group. T h e n etw o rk was m ade
w ith use o f G ene Softw are developed by D ecision System s L ab o rato ry in
P ittsb u rg h U niversity.
Let X u
Y s respectively represent nodes o f the net. T h e m eaning and
possible states o f nodes is presented in T ab le 1.
T abic 1. L ist o f variables, sym bols and categories
V ariable Sym bol C ategories
E xchange ra te increase, fall
V olum e o f trad e X 2 increase, fall
S to c k E xchange index X , increase, fall
P o sitions in futures X * increase, fall
S hare price * 5 increase, fall
I f we d en o te category fall by 0 and category increase by 1 th en adequate
m arg inal and co nditional d istributions will be follow ing (for learning sam ple
o f 47 elem ents - weekly d a ta from 10.09.2001 to 09.09.2002):
Tabic 2. M arginal p ro b a b ility d istrib u tio n P (X , = Y) o f v ariab le X t
II о Y = 1
0.5116 0.4884
T ab ic 3. C o n d itio n a l p ro b a b ility d istrib u tio n P (X 2 = Y \ X t — Z ) o f v ariab le X 2
II О Y = 1
Z = 0 0.7619 0.2381
Z = 1 0.7273 0.2727
T abic 4. C o n d itio n al p ro b a b ility d istrib u tio n P (X
rt
II >■. * II N * II O) /-Ч o f v ariab le X 3 О II Y - 1 N II о Ю II о 0.3750 0.6250 Z = 0 Q = 1 0.4375 0.5625 Z = 1 Q = 0 0.4000 0.6000 Z = 1 Q = 1 0.5000 0.5000T abic 5. C o n d itio n a l p ro b a b ility d istrib u tio n P (X 4 = У |Х 3 = Z ) o f v ariab le X A
О
II Y = 1
Z = 0 0.5417 0.4583
Z = 1 0.5263 0.4737
T abic 6. C o n d itio n al p ro b a b ility d istrib u tio n Р(.У5 = Y \ X A = Z , X 3 = Q, X 2 = R) o f variable X s II О У = 1 Z = 0 Q = 0 R = 0 0.5556 0.4444 Z = 1 e = o я = 0 0.5000 0.5000 Z = 0
e
=i
R = 0 0.2000 0.8000 Z = 0 e = o Я = 1 0.7500 0.2500 Z = 1e = i
Я = 0 0.5000 0.5000 Z = 0Q
=1
Л = 1 0.6000 0.4000 Z = I Q = o Я = 1 0.0001 0.9999 Z = 1Q = 1
Л = 1 0.0001 0.9999T h e rightness o f decision m aking was verified in tim e period o f 23
weeks. It tu rn ed o u t th a t in 48% o f cases the net were classified correctly
(in 26% o f cases it could no t m ak e a decision and in 26% it m ad e a wrong
one). O bviously, it c a n n o t be perceived as a good result. T h e reason for
this result could be connected with the stru ctu re (topology) o f the netw ork
which could be im proper, and with a selection o f variables.
V. S U M M A R Y A N D C O N C L U S IO N S
B ayesian n etw orks can be an effective tool for statistical inference in
co m p u tatio n al expert systems. H ow ever, there are som e b arriers connected
w ith the stru ctu re o f a netw ork, variables d y nam izatio n an d learning or
up d ate processes. T hese problem s are m athem atically com plicated and this
can explain the reason why the neural netw orks arc still m o re p o p u lar than
probabilistic netw orks. Especially the problem o f learning o f the structure
is the m atter o f research. Besides, the software concerning the neural networks
is m o re accesible.
It should be noticed th a t graphical know legdc rep resentation has a great
a d v an tag e over rule-based know legde used in expert and decision sup p o rt
system s. E ach rule in rule-based system s are trea ted independently am ong
o th ers and th a t’s why it m ay be inconsistent and re d u n d an t. I n graph-based
expert system s these problem s d o n o t exist.
R E F E R E N C E S
Bilm es J.A . (2000), D y n a m ic Bayesian m ultinets, [in:] Proceedings o f the 16th conference on U ncertainty in A rtificial Intelligence, M o rg a n K a u fm an n P u b lish ers, S tan fo rd , C a lifornia.
C hickering D .M ., G eiger D ., H eckerm an D . (1994), Learning Bayesian N etw o rk is N P -hard, M ic ro so ft R esearch M S R -T R -94-17, R edm ond.
D iez F .J., M ira J. (1994), D istributed Reasoning and Learning in Bayesian E xp ert System , D p to . In fo rm atic a y A u to m atica, U N E D , M ad rid .
D o m a ń sk i C z., P ru s k a K ., W agner W. (1998), W nioskow anie sta tystyc zn e p r z y nieklasycznych założeniach, W yd. U L , Ł ódź.
G eiger D ., H eck erm an D . (1994), Learning Gaussian N etw orks, M icro so ft R esearch M S R -TR -94-10, R ed m o n d .
G h a b ra m a n i Z. (1997), Learning D ynam ie Bayesian N etw orks, U niversity o f T o ro n to , T o ro n to . H eckerm an D ., G eiger D . (1995), Learning Bayesian N etw orks, M icrosoft R esearch M S R -TR -95-
02, R ed m o n d .
K rau se P. (1998), Learning Probabilistic Retworks, technical report, Philips R esearch Labs, Redhill. M u law k a J.J. (1996), S y ste m y ekspertow e, W N T , W arszaw a.
M u rp h y K . (2001), An introduction to graphical models, h ttp ://w w w .cs.b erk eley .ed u /m u rp - h y k /B ay es/b ay es_ lu to rial.p d f
M u rp h y K . (2002), D ynam ic Bayesian N etw orks: Representation, Inference and Learning, P h D T hesis, U C B erkeley, C o m p u te r Science Division.
N iederm ayer D . (1998), A n introduction to Bayesian netw orks and their contem porary applications, h ttp ://w w w .g p rn .sk .c a/~ d a ry le /p ap e rs/b ay e sia n _ n etw o rk s/b a y es.h tm l
N o rm an d S.L., T ritchler D . (1992), P aram eter u p d atin g in a Bayes n etw ork, Journal o f American S ta tistica l Association, 82, 420.
R ussel S., B inder J., K o ller D . (1994), A daptive Probabilistic N etw o rks, technical re p o rt U C B /C S D -94-824, U niversity o f C alifo rn ia, Berkley.
Settim i R ., Sm ith J.Q ., G a rg o u m A .S. (1999), A pp ro xim a te le a r n in g in C om plex D ynam ic Bayesian N etw orks, E ngineering and Physical Sciences R esearch C ouncil G R /K 7 2 2 5 4 . J. P earl, w eb page h ttp ://b ay es.cs.u cla.ed u /jp .h o m e.h tm l
Tomasz Kozdraj
U W A G I O S IE C IA C H B A Y K SO W SK IC H I IC H Z A S T O S O W A N IA C H Streszczenie
Sieci B ayesa są stru k tu ra m i graficznym i będącym i skierow anym i g rafam i acyklicznym i prezentującym i zależności pom iędzy zmiennymi losowymi. Z n ajd u ją one zastosow anie w dziedzinie tzw. o p ro g ram o w an ia inteligentnego, a zwłaszcza w system ach ekspertow ych. A rty k u ł ten porusza pro b lem y sam ych sieci bayesow skich, uczenia o raz ich zasto so w an ia. P o d ję to też p ró b ę ich aplikacji n a p o lu z ag ad n ień ekonom icznych zw iązanych z rynkiem kap itało w y m .