• Nie Znaleziono Wyników

Classification into two Populations for Time Dependent Observations

N/A
N/A
Protected

Academic year: 2021

Share "Classification into two Populations for Time Dependent Observations"

Copied!
14
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S FOLIA OECONOMICA 152, 2000

I. STATISTICAL MODELS

M i r o s ł a w K r z y k k o *, W a l d e m a r W o ł y ń s k i ** C L A S S IF IC A T IO N IN T O TYVO P O P U L A T IO N S F O R T IM E D E P E N D E N I O B S E R V A T IO N S

Abstract. Optimal classification rules based on linear functions which maximize the area under the relative operating characteristic curve or which maximize the chosen probabilistic distance between two populations are studied here. We obtain an expression for the optimal linear discriminant function and show that the resulting procedure belongs to the Anderson-Bahadur admissible class. The asymptotic form of the discriminant function is also studied.

1. INTRODUCTION

A n u m b er o f practical problem s in the analysis o f d a ta reduce to classifying a re alizatio n o f a s ta tio n a ry n o rm al sto c h a stic p rocess as belonging to one o r the o th er o f tw o categories. S c h u m w a y (1982) has provided an extensive list o f references and app lication s o f d iscrim in an t analysis fo r tim e series. A pplication s listed th ere include discrim in atin g betw een presum ed ea rth q u ak es and u n derg ro un d n u clear explosions, the detectio n o f a signal im bedded in a noise series, discrim in ating betw een d ifferent classes o f brain wave recordings, and discrim in atin g betw een vario u s speakers o r speech p attern s on the basis o f recorded speech d a ta .

The adm issible procedures for classification p rovided by the N eym an- -P earson theory as well as the B ayes’ rule are based on the likelihood ratio. In the case o f unequal covariance m atrices this likelihood ra tio d ep en ds on a q u a d ra tic function o f observations. U n fo rtu n ately , the d istrib u tio n o f the q u a d ra tic discrim in an t function is very co m plicated. It involves the

* Adam Mickiewicz University, Faculty o f M athem atics and Com puter Science; Technical University of Zielona Góra.

** Adam Mickiewicz University, Faculty o f Mathematics and Computer Science.

(2)

linear co m b in atio n o f n o n-central chi-square ra n d o m variables so th a t com p u tin g e rro r rates resulting from its use seems difficult. H ence, follow ing A n d e r s o n and B a h a d u r (1962) we will consider a linear d iscrim in an t function.

2. THE RELATIVE OPERATING CHARACTERISTIC (ROC) CURVE AND ITS PROPERTIES

L et the T-dim ensional tim e series X = (X (0), A ^ l), J r C T - l ) ) ' be a realization o f a statio n ary stochastic process from the p o p u latio n л , and let Y = 0), y^l), Y ( T - 1))' be an independent realization o f a statio nary stoch astic process from the pop u latio n n 2. Suppose th a t X N j ( p i, Z x) and Y ~ N r (/i2, I 2), where \ix = O i(0), ^ ( l ) , n t f - l ) } ц 2 = (^ 2(0), ц г(\), ..., an d th e co v a ria n c e m atrice s = (c t^ Is - 1|)> and S 2 = (ff2(ls ~ *D), s, t = 0, I ... T - 1 are positive definite.

T h e param etres are assum ed to be know n. T h e p h rase “X is statio n ary w ith m ean /i and covariance m atrix I ” m eans th a t “( X — (i) is statio n ary w ith zero m ean and covariance m atrix E ” .

F o r each a e R T, a ŕ 0, and each c e R let R(a'X, с) denote the discrim inant role th a t assigns the tim e series X to the p o p u latio n n l if a 'X ^ с an d to the p o p u latio n n 2 if a 'X > c.

where Ф is the d istrib u tio n function o f a N (0, 1) ra n d o n variable.

E ach discrim inant rule is characterized in term s o f the two p ro bab ilities of m isclassification or in term s o f the two conditional p rob abilities o f the correct classification.

T h e p ro b a b ility o f m isclassifying an observation when it com es from the first p o p u la tio n is

We have

a ' X ~ N ( a ' f ii, aT je), a 'Y ~ N (a 'n 2, a X 2a). Let

(3)

P ( n 2 K ) = P (a 'X > c ) = 1 - P ( a 'X < c) = 1 - F x(c)

an d the p ro b ab ility o f m isclassifying an observ atio n w hen it com es from the second p o p u latio n is

P (n i \ n2) — P ( a ' Y ^ c) = F 2(c).

T h e co rresp o n d in g co nditional probabilities o f the co rrect classification are equal to

Р ( л i K ) = P (a 'X š c) = F ^ c ) and

Р ( л 2\п 2) = P (a 'Y > c) = 1 - F 2(c).

1 he p ro b ab ility Р(7г1| я 1) is called the specificity o f the d iscrim in an t rule and th e prob ab ility P (n 2\n 2) is called th e sensitivity o f th e d iscrim in an t rule.

In th e p aram etric represen tatio n , the curve o f th e form x = F j( c ), у = 1 - F 2(c), - oo < с < oo

is called the R elative O p e ratin g C haracteristic (R O C ) curve o f the class rules R ( a 'X , •). A plot o f the R O C curve is given in F ig. 1.

Fig. 1. A plot of the ROC curve

I he area D(a) under the R O C curve is the index w hich evaluates the accuracy o f a class o f d iscrim inant rules R ( a 'X , c). A large area indicates th at the linear com bination a X discriminates well between the tw o populations being com pared.

1 he area D(a) under the R O C curve has a sim ple p ro b ab ilistic in ter­ p re ta tio n .

(4)

T heorem 1. T h e a re a D(a) u n d e r the R O C cu rve is eq u a l to th e p ro b ab ility th a t the random variable a ' Y is stochastically larger th an the ra n d o m variable a'X.

P (fl) - P ( a T > , ' X ) - o ( fl, s , a + ° ^ - - )lra) ( , ) w here

<5 = fi2 ß i'

A p ro o f o f the theorem is in K r z y ś k o (1998) and it will be published elsew here.

3. THE CLASSIFICATION RULE

A com p ariso n o f the area un d er the different R O C curves m ay be used to determ ine which linear com b in atio n a 'X is best.

H ence, we w ant to find the linear d iscrim in ant fun ctio n fo r which area under the co rresp o n d in g R O C curve is m axim ized. T his m ax im al a re a is the R O C criterio n , m easu rin g how well the vector o f characteristics d istin ­ guishes between the two p opulation s.

Theorem 2. T h e vector a which m axim izes the area D(a) given by (1) has th e form

a = ( Ľ l + Ľ 2y 1ô (2)

F o r a o f the form (2) we have

0 ( а ) = Ф ((0'(11 + 1 2Г 10 )1/2) (3)

A p ro o f o f the theorem is in K r z y ś k o [1998] and it will be published elsewhere.

R em ark 1. Since S '(Z l + Z 2V l S > 0 and Ф((0’(1.1 + 'Е2У 10 ) 112) > 1/2, we have

(5)

T h e area D(a) close to 1 indicates th a t the T characteristics distinguishes well between the p o p u latio n s n l and n 2, and D(a) close to 1/2 indicates th a t these tw o p o p u latio n s are n o t well separated .

N ow we would like to pass on to the second m eth o d o f finding the optim al linear classification rule. We would like to find the linear classification proced ure which m axim izes the function

w here is the density function o f the ra n d o m variable a 'X , g 2 is the density functio n o f the ra n d o m variable a 'Y a n d p is the M a t u s i t a (1956) distance, o r th e M o r i s i t a (1959) distance o r the K u llb ack d istan ce ( K u l l b a c k , L e i b i e r , 1951).

T h e M a tu sita distance is defined as follows.

Let P j, P 2 be d istrib u tio n s defined on the p-dim ensional E uclidean space R p and denote by f lt f 2 their densities w ith respect to th e Lebesgue m easu re in R p. T h e M atu sita distance has the follow ing form

I f / i and f 2 are sq u are integrable with respect to the Lebesgue m easure in R p th en the M o risita distance h as the form

It is obvious th a t p 2( f 1, f 2) is closely related to th e usual distance 4. THE SECOND CIASSIFICATION RULE

P i ei, g 2)

P z i fi> f 2) —

W 1. / 2) = Jkp f i(xV 2( x )d x,

M f i ) = A h i = 1, 2.

T h e K u llb ack distance has the form :

Р э ( / „ / 2) = í Äp [ / i ( * ) dx. J 2\X)

(6)

( J L Y1 [ a ' Z 2a a T t a J If g v is the prob ab ility density o f a 'X and g 2 is th e p ro b ab ility density o f a 'Y th en one can easily show th a t

к М . Яг) = ‘ + \ + Z > ] - ‘

Р Л . . « г) - 1

- У ln [ И З Д + (a ' I 2a)] - ln (2^/2),

Р Л ‘ ’ ^ > = l i n i a ł , a + a %

T hese three distances are in v arian t u n d er scalar m u ltip licatio n o f a. Theorem 3. ( K r z y ś k o , W o ł y ń s k i , 1997). T h e vector a w hich m a x i­ m izes these three distances has the form

a = ( Ľ l + 0 Z 2y ' ô (4)

where

» - * $ « - 1 . 2 . 3 (5,

are such th a t the m atrices 2^ + OEj are no n sin g u lar and

t n ( a ) = + ty ’ £1г(а ) = + - = ; — » a 2 j t a a Z 2a _ ( a ' ö ) 2 _ 2 1 [fl'd x + а д 2 ď ( I l + Z 2)a Í2i(o) = v42 - B 2(fl'Z1fl) ^ ( а ' Х ^ ) - 1, í22(a) = Л2 - В 2( а Т 2а) * + ( a T 2a ) “ \ (a'<5)2 / - - \ -1

[a X lT + I , ) ^ 2 " [fl'(E l + Za)fl1" 1’ * 2 = ( (fl'Z i a)2 + ( a T 2 a ) J A 2 —

(7)

_ (a'ó) 2 + a"L2a _ (aö) 2 + a ' l l a ( a T jf l) 2 * - ( a T 2a ) 2 *

It is clear th a t the eq u a tio n (4) is an im plicit e q u a tio n in a. H ence an iterative p rocedure m ust be em ployed to solve for a.

Since and Z 2 arc positive definite m atrices by assu m p tio n , there alw ays exists a n on-singular m atrix P such th a t I j = P ’ P, I 2 = P ’A P , w here Л = d iag(A j, Ap) and A'is(l = 1, p) arc the ch aracteristic ro o ts o f E j i r 1. T hen ď Z lfl = ß'ß, a 'Z 2a = ß ’A ß , a!t5 = ß \ where ß = Pa and <5 = P'r], N ow , ß can be w ritten as ß = Pa = P (Z , + 0Z 2) ” M = P (p'P + OP'AP) ~ l 0 = (1 + 0A)~ У T h u s e q u a tio n (5) reduces to 0 = Ф,(0), i = 1, 2, 3 (6) where A = _ ( A ) 2 _ 2 1 [ß'(l + A )ß]2 ß '(I + A ) ß

^2(б) = [fj

(ß'ß)

2

~

(Ä2ß'ß) ~' ] 0

+

1

-

J W W )

* +

lAi(ß 'W ]

(ß 'v)2 / í - 4 л = [ f ? v + A j p \ 2 ~ ( ß v + A ) ß ) ♦ ß 2 = { ( ß,ß) + ( ß ' m 2 Фз = (0) = A 2(ß 'A ß )0 - B , ( ß 'A ß ) + (ß ' A ß ) ( ß ' ß ) ~ \ A _ ( ß ’r})2 + ß 'A ß _ ( ß ' r j ) 2 + ß'ß (ß'ß)2 ’ 3 (ß ’A ß ) 2

(8)

Theorem 4. ( V i l e n k i n 1979, p. 69). Let th e fu nction 0 = ф(0) he the m ap p in g o f the interval [a, b] into itself and suppose in this interval the inequality \il/'(0)\<q, w here q < 1, holds. T h en for any p o in t ()0 o f the in terval [a, />] the sequence o f points 0o, 0 u . . . , 0 n, w here 0n+i = converges to the ro o t o f the eq u atio n 0 = ф(0).

R oughly speaking this theorem says th a t the process o f successive ap p ro x im a tio n s enables us to find those ro o ts 0 o f the eq u atio n 0 = ф(0) for w hich the inequality \ф'(0)\ < 1 is satisfied.

In o u r case one can easily check on which interval or intervals o f the real line the co n d itio n \ф'(0)\ < 1 is satisfied.

5. ADMISSIBILITY OF PROCEDURES

E ach classification p ro c ed u re is ch aracterized in term s o f the tw o p ro b a b ilities o f m isclassification. T h e p ro b a b ility o f m isclassifying an ob serv atio n w hen it com es from the first p o p u latio n is

/ с — a'fil P ( n 2M = l - F l ( c ) = 1 - Ф ---1 \ ( a T i a ) 2

an d the probab ility o f misclassifying an ob servatio n when it com es from the second p o p u latio n is

/ c - a > 21 Р Ы п 2) = Р 2( с ) = Ф - i

\ № a ) 2

It is desired to m ake these probabilities small. O ne classification procedure is b etter th an an o th er if each pro bability o f m isclassification o f the fo rm er is n o t greater th a n the co rresp onding one o f the latter and a t least one is less. A procedure is adm issible if there is no o ther p rocedure which is better.

T h e follow ing theorem is true.

Theorem 5. ( K r z y ś k o , W o ł y ń s k i , 1997). T h e linear classification pro cedure defined by (4) and

с = a'ßl + a,I , l a = а! — 0 (a 'Z 2a)

fo r any 0 such th a t E , + 0 E 2 *s positive definite is adm issible w ithin the class o f linear procedures.

(9)

T his result follows from the A n d e rso n -B ah ad u r’s theorem on adm issible class o f linear procedures ( A n d e r s o n , B a h a d u r , 1962).

R em ark 2. T he linear d iscrim in ant p roced ure for w hich the area un d er the corresp o n d in g R O C curve is m axim ized is adm issible. In o u r case 0 = 1 and the m atrix Z i + Z 2 is positive definite.

R em ark 3. I f = Z 2 = £ , then for M atu sita, M o risita an d K ullb ack distances 0 = 1 and

a ' X - C = 1- ( v i - „ 2) T - _ 1 ( ^ _ ^ 1

H ence all these distances and the R O C curve give the sam e w ell-know n F ish er linear discrim inant function.

R em ark 4. T h e tw o probabilities o f m isclassification resulting from the use o f the linear adm issible classification procedures have the follow ing form P ( n 2 \n x) = 1 - Ф ( 7 а ' а д , = 1 - Ф{0 ^а 'Т .2а). R em ark 5. We have a = ( E ! + 0 Ľ 2) ~ l ô o r ( £ j + 0 Z 2)a = S, o r a '( £ i + 0Ъ2)а = a' ô

If Z 1 + OZ2 > 0 then the d iscrim inant proced ure is adm issible. T h e m atrix Xj + 012 is positive definite if a '(E t + 0 X 2)a > 0 fo r all а Ф 0. H ence a!b Ф 0 for all а Ф 0 o r а'цt Ф а!ц2 o r ф ц 2. T his m eans th a t every ad m issib le lin e a r d isc rim in a n t ru le m ak es som e use o f th e fa c t th a t

^

/^2-6. THE ASYMPTOTIC FORMS OF THE LINEAR DISCRIMINANT FUNCTIONS

W e now consider a spectral ap p ro x im atio n to the linear d iscrim in ant function s under the follow ing assum ptions:

(10)

1. In the population 7^ the stationary process Z (t) has covariance function

: / m w

w ith hj(X)(j = 1, 2) assum ed to be contin uo us, positive [ —я , n] absolutely integrable spectral densities.

W e note th a t for every adm issible linear classification pro ced u re the m a trix Xi -I- 0 £ 2 is positive definite. F o r statio n ary process, this im plies th a t the spectral density

В Д = K (X ) + dh2( A) is strictly positive fo r A e [ — я, я].

2. T h e covariance sequence (a} (t)) satisfies

I И 1+' М 0 1 < ® r= - 00

fo r 7= 1, 2 and for som e ß, 0 < ß < 1. 3. T h e sequence o f m ean differences

s ( 0 = /*2( 0 - / ^ ( 0 satisfies ^ sup | (5(f) | < 00 t and (ii) r -1-1.1 Р т ( * ) = Г 1 I <5( f + | T | ) á ( í ) t=o

has a lim it given by

р(т) = lim p T(t ) = 1 f eUt dM(X), r-00

w here M(A) is a m o n o to n e nondecreasing functio n uniquely defined by the co n d itio n s M ( — я ) = 0 and continuity from the right.

U n d er the assum p tions stated above we have the follow ing theorem . Theorem 6. ( K r z y ś k o , W o ł y ń s k i , 1997). If he(A) = /^(A) + 0h2 (A )> 0 for A e[ — я, я], then

(11)

lim T ~ lPi( 0 i, g2) = ^G (0),

lim T ~ 1'Р г(ви 92) = ^G (0),

lim T ~ l Р з(9 и g2) = 2 Я

(°)-where

I he optim al vector a in the sense o f m axim izing th e M a tu sita distance asym pto tically or M orisita distance asym ptotically has th e form (4) w here 0 is the value for which the function G(0) has a global m axim um .

I he optim al vector a in the sense o f m axim izing the K u llb ack d istance r ľ ľ ľ PLt0tÍCally has the form (4) w here 0 is the value for w hich the fu n ction H \ v ) has a global m axim um .

T h e follow ing theo rem characterizes th e value o f 0 fo r w hich th e functions G(0) and H(0) have a global m axim um .

Theorem 7. ( K r z y ś k o , W o ł y ń s k i , 1997). T h e fu nction s G(0) and ( 0 defined in (7) and (8) have a global m axim um a t 0 = 1 .

b ro m the T heorem 7, the asym ptotically o p tim al vecto r a is given by

T h e vector a w hich m axim izes the area un d er th e R O C curve h as the

L et X ( t ) - n{t) = Z (t) where E ( X ( t) ) = n(t) an d let {Z (t), 0} be a sta tio n a ry no rm al process with £ (Z ( t) ) = 0 which satisfies the assum p tio n 1) Гаке E n ( X ( t) ) = cos(7t/2)i and E n^(X(t)) = 0. T hen S(t) = cos(n/2)t and

a G0 = (Z 1 + E 2) - M .

sam e form .

(12)

lim T - ' I '"<5(1 + |т |Ж О = ~ f_ я Л М ( А ) , Г - oo r = 0

w here М(Я) is a step fun ction having ju m p s at ± (л/2) o f height n/2. H ence the assu m p tio n 3) is satisfied.

L et {e(t)} be a n orm al process w ith £(c(f)) = 0 and

Г0, k = 1 , 2 , . . . ,

C ov(e(í), e(i + /c)) = •!

Let now {Z (í), í > 0} be the second o rd e r autoregressive A R (2) process i.e. Z (i) satisfies

Z ( t ) = ß , Z ( t - 1) + ß 2Z ( t - 2 ) + e(t).

T h e A R (2) process is alw ays invertible. T h e statio n ary co n d itio n o f the A R (2 ) process is given by the follow ing inequalities

ßz + ß i < 1. ß 2 ~ ß i < l - \ < ß 2 < \ . T h e au tocov ariances o f the A R (2) process are

a(k) =

_ 1

i + ß z d - ß t f - ß l '

к = 0, ß i a ( k - l ) + ß 2a ( k - 2 ) , k ž 1.

T h en it is easy to check th a t (er(/c))“=0 satisfies the assu m p tio n 2). Let

tcj : Z (t) = 0.6Z ( t - 1) + 0 .3 Z (t - 2) + e(t), n 2 : Z ( t ) = 0 . 8 Z ( i - l) + 0 . 3 Z ( i - 2 ) + e(t).

T h e solution o f the im plicit eq u atio n (6) for the M o risita d istance and the v alu e o f th e a re a u n d e r th e R O C cu rv e fo r th ese p ro cesses an d T = 10, 11...25, are given in T ab . 1.

(13)

T a b l e 1

The solutions 0 of the equation (6) and the values of the area under the ROC curve for Morisita

distance and AR(2) processes

T 0 D(a) 10 0.786808 0.965627 11 0.778806 0.973469 12 0.817571 0.977832 13 0.808964 0.982712 14 0.839575 0.985532 15 0.844328 0.988685 16 0.857253 0.990494 17 0.852721 0.992534 18 0.871248 0.993716 19 0.867817 0.995051 20 0.882811 0.995826 21 0.879934 0.996705 22 0.892438 0.997217 23 0.890074 0.997799 24 0.900613 0.998139 25 0.898603 0.998525

It is clear from T ab . 1 th a t ap p ro x im atin g the so lu tio n o f (6) by 0 = 1 , becom es increasingly accu rate as T becom es larger. T h e M o risita distance gives the best results. F ro m this T ab le we see also th a t if T is increasing then D (a )—*-1.

REFERENCES

A n d e r s o n T. W., B a h a d u r R. R. (1962), Classification into Two Multivariate Normal Distributions with Different Covariance Matrices, Ann. Math. Statist., 33, 420-431. F i s h e r R. A. (1936), The Use o f Multiple Measurements in Taxonomic Problems, Ann.

Eugen., 7, 179-188.

K r z y ś k o M. (1998), Linear Discriminant Functions which Maximize the Area under the ROC Curve, (unpublished manuscript).

K r z y ś k o M., W o ł y ń s k i W. (1997), Linear Discriminant Functions fo r Stationary Time Series, “Biometrical Journal” , 39, 955-973.

K u l l b a c k S., L e i b l e r A. (1951), On Information and Sufficiency, Ann. Math. Statist., 22, 79-86.

M a t u s i t a K. (1956), Decision Rule, Based on Distance, fo r the Classification Problem, Ann. Inst. Statist. M ath., 8, 67-77.

M o r i s i t a M. (1959), Measuring o f Interspecific Association and Similarity between Communities, Mem. Fac. Sei. Kyushu Univ., Ser. E, 65-80.

(14)

S c h u m w a y R. H. (1982), Discriminant Analysis fo r Time Series, [in:] Handbook o f Statistics, Vol. 2, P. R. Krishnaiah and L. N. Kanal (eds), North-Holland, Amsterdam, 1-46. V a n B e l l e G. , A h m a d 1. (1974), Measuring Affinity o f Distributions, [in:] Reliability and

Biometry. StatLitical Analysis o f Lifelength, F. Proschan and R. J. Serding (eds), SIAM, Philadelphia, 651-668.

Cytaty

Powiązane dokumenty

Suppose, moreover, is a fixed subclass of the class $ of functions regular and univalent in K subject to the usual normalization. Under our assumptions on So, the set Qn has

Jeżeli chodzi o zmiany, to wyraźniejsze przeobrażenia w obrazie narodu dokonały się na płaszczyźnie ideowej (reli- gia, patriotyzm jako umiłowanie ojczyzny, przywiązanie

W  ostatnich latach zauważono obecność niebez- piecznego trendu, który przypuszczalnie może stano- wić zagrożenie dla zdrowia publicznego, a mianowicie powstawanie

Now here it is the picture which demonstrates us the real shape of chain and sagging in a horizontal position with sprockets (Fig. We made the experiment using special test

Classifiers designed to protect a well-defined target class from ill-defined con- ditions, such as new unseen classes, are defined by two decision thresholds, namely a

W przypadku równań ewolucyjnych zbadano wpływ dryftu Darkena w równaniach NPP i opracowano dwie niezależne metody numerycznego rozwiązania zagadnienia elektrodyfuzji w membranach

Przedstawiona charakterystyka człowieka prosto- myślnego, a także ujęcie w sposób zwięzły cech etyki prostomyślności stają się szczególnie ważne ze względu na to,

Pierwszą część tej sekcji zdominowały referaty dotyczące dziejów całej Ukrainy: Olek- sandry Kudłaj z Instytutu Historii Ukrainy Narodowej Akademii Nauk Ukrainy w