• Nie Znaleziono Wyników

The Estimate of Power of Random Tests Based on Lenght of Runs

N/A
N/A
Protected

Academic year: 2021

Share "The Estimate of Power of Random Tests Based on Lenght of Runs"

Copied!
11
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S

FOLIA O E C O N O M IC A 175, 2004

C z e s la w D o m a ń s k i *

T H E E S T IM A T E O F P O W E R O F R A N D O M T E S T S B A S E D O N L E N G H T O F R U N S

Abstract. T ests based on ienght o f runs are used both in statistical inference and statistical quality assurance. T he paper presents power o f three tests based on:

- maximum run lenght on one side o f the medium,

- smaller from the m axim um run length above and beneath the median, - bigger from m aximum run lenght above and beneath the median.

Key words: run tests, number and run length distributions, power o f tests based on run lengh.

1. IN T R O D U C T IO N

T h e analysis o f statistical tests properties from ap p licatio n al p o in t o f view usually involves exam ining th eir pow er o r resistance.

T h e problem o f n o n -p a ram etric tests pow er ex am in atio n is relatively difficult because o f the lack o f general theory for this bran ch o f science.

F o r the last 25 years n o n -p a ram etric tests pow er rarely has been tested analytically but m ostly by m eans o f num erical-sim ulation m eth o d s (M o n te C arlo) both m entioned m ethods were used. Generally, we arbitrarily form ulate a list o f som e a lte rn a tiv e h y p o th eses (it usually co n sists o f th e m o st com m on hypotheses in practice o f statistical research). T h en, we calculate the frequencies o f rejectio n o f th e null hyp o th esis o n th e g ro u n d o f a num ber generated sam ples, which fulfil the assum ptions given by alternative hypotheses. T hese frequencies are the em pirical pow er o f th e given test.

T h e p ap e r presents results referring to three rand om n ess tests based on: - m axim um ru n length on one side o f th e m ed ian ,

- sm aller from the m axim um run length above and b eneath the m edian, - bigger from m axim um ru n length above and ben eath the m edian. T h e aim o f this p a p e r is to fo rm u late som e conclusions a b o u t the pow er o f random ness tests based on ru n length. T h e conclusions will be helpful

(2)

in choosing ru n test in practical applications. F o rm u lated conclusions arc presented fo r the case o f M ark o v statio n ary chain o f tw o states.

2. PR O B L E M FO R M U L A T IO N

Cz. D o m a ń s k i (1986) presented som e conclusions referring to three tests to verify the hypothesis saying th a t the sequence o f consequtive observ atio ns in sam ple is independent. T hey arc based on:

m axim um ru n length on one side o f the m edian (S^),

sm aller from m axim um ru n length above and ben eath o f th e m e­ dian (S D),

bigger from m axim um run length above and beneath o f the m edian (S0). T o m ak e the practical application o f these tests possible th ere were given the tables o f critical values for series tests SA, S D, S G and an attem p t to assess association betw een considered statistics was m ad e.

T he aim o f this p ap e r is to form ulate som e conclusions referring to the pow er o f m ost often used run tests. W e hope these conclusions will be helpful in ch o osing ru n tests in practical ap plications. We are confined to the case o f M ark o v sta tio n a ry chain o f two states trad itio n ally m ark e d by A and В an d the tran sitio n m atrix.

Г Pa a Pa b 1 1 • O О О ___ _ _ 1 \ _ Pb a Pb d_ 9 i l - 9 i _

Let PnB be the d istrib u tio n o f this chain for each

0 6 0 = { { 9 o, 9 i ) : 0 < 9 o < 1 , 0 < í 1 < l }

and let = {A, B }n be the set o f all n-elem ent sequences m ad e o f elem ents A, B. W e will consider the probability space

M „.fl = (П„, 2 ° \ Pn„) for 0 6 0 .

C onclusions fo rm u lated in the final p a rt o f this p ap er are based on the pow er o f tests num erically determ ined for n = 1, 2, ..., 100 an d a few dozen pairs chosen from the set 0 . F o rm u latin g the m o st suitable algo rithm for these calculations w as th e necessary stage o f this research.

U ntil now , m ost often we have been considering the ru n elem ents above and beneath the m ed ian i.e. the case when the p ro b ab ility o f all kind o f elem ents is 0.5, if we follow the B atem an ’s ( B a t e m a n 1948) conclusions

(3)

who showed th a t in this case the pow er o f independence test is the biggest. T h e results p resented in article refer to M a rk o v ch ain w ith a rb itra ry statio n ary p robabilities 0 < р ^ < 1 .

C o m b in ato rial fo rm u las on probabilities connected with ru n d istrib u tio n arc very unsuitable fo r num erical calculations. M o re effective is using recurrent fo rm ulas, particularly when calculations arc m ad e for conscqutive values o f n (sam ple size).

R ecu rren t fo rm u las referring to length run d istrib u tio n in the case when the consequtive o b servations in sam ple are generated by the M a r­ kov statio n ary chain o f tw o states were presented in D o m a n sk i’s p ap e r ( D o m a ń s k i 1986). Let us observe th a t it is fun d am en tal to co m pare the pow er o f tests based on run length with the pow er o f tests including run num ber. T h erefo re, we give recurrent form ulas to calculate th e fu nc­ tion o f to tal n u m b er and scries length p ro b ab ility fu nction . In studies th a t have been m a d e so far, only fo rm u las fo r o n e-d im e n sio n a l d is t­ ributions were used.

3. R EC U R R E N T F O R M U L A FOR T H R E E -D IM E N SIO N A L D IST R IB U T IO N O F R U N S

Let us assign to every sequence

a) = ( x j , x 2, ..., x n) e Q n

the follow ing num bers:

Na(u>) - n u m b er o f A elem ents in sequence w, La(w) - n u m b er o f ru n m ad e o f A elem ents, Ц ш ) - to tal n u m b er o f runs,

- m axim um ru n length com posed o f A elem ents (respec­ tively B),

К А(ш), K B(aj) - n u m b er o f elem ents A (respectively B) located at the end o f sequence ш,

Z A(w), Z B(w) - m axim um ru n length com posed o f elem ents A (respec­ tively B) w ith o u t tak in g into consideration the last run.

Let us assum e th a t sequences w e i l , are the realizatio ns o f M ark o v statio n ary chain o f tra n sitio n m atrix

(4)

3 0 Czeslaw D omański

T h erefo re sta tio n a ry probabilities arc given by the form ula:

p = p ( X j = A ) - — i?AB - for j = 1, 2, n

A J Pa b + Pba

P B — P ( X j = ß ) = fo r

; = l , 2 , . . . , n

Рлл + Pba

T ak in g in to ac co u n t these assum ptions, the pro bability d istrib u tio n on the set can be given by the form ula:

P (w) = ---

I

---p 7 a L a P a Í P Í ä LaP b b ' , a ~ 1 ~ La ( 3 )

Pa b + Pb a

w here to sim plify things, it was assum ed nA = N A(a>), I = Ц ш ) , lA = L A(u)). In fact,

Р(ш ) = Р ( Х 1 = х 1) Р ( Х 2 = х 2|Лг1 = х 1) ... P(X„ = x n\ X „ - i = x K- i ) (4) and at the sam e tim e

Í P a i f x i =

if

Let us observe th a t lA is the num ber o f these A elem ents, which create new ru n i.e. they follow В (p erhaps w ithout the first elem ent). That is why on the right side (4) th ere is lA factor equal to pAB (including also facto r (5) in a shape (2) w hen x , = A). T he num ber o f A elem ents no t creating new run and at the sam e tim e follow ing A, am o u n ts (nA — lA), so there is the sam e n um ber o f fa cto rs p AA on the righ t side (4). Sim ilarly, we show th a t num bers o f facto rs pBB and pBA equal respectively ( nH- l B) and l„ (including also fa cto r (5) in fo rm u la (2), if x l = B). B oth if x , = A , and if x, = ß , on the right side occurs one factor

1 Pa b + Pb a

Let us consider, for fixed n, total three-dim ensional distribution (L, S^, S B) o f num ber o f ru n L, m axim um length o f ru n consisting o f A elem ents and m axim um length o f ru n consisting o f В elem ents.

Let us assign

Л /(м ,/, s, t, u) = c a r d { o je ii „ : / = /(w), s = Z A((o), t = S B(w), u = K A(a>)}

(5)

M (n, /, s, t, u) = 'M (n — 1, /, s, t, и - 1) fo r u > 1 M ( n — 1, / — 1, s, t, 0) fo r u = l (7) f- 1 Г £ M (n, l, v, s, i) + Z M ( n> l> s. w) f01" u = 1 и - 0 м>= 1

T h e first tw o fo rm u las are obvious. W e get them by ad d in g n-th elem ent A to (n — 1) elem ent sequence. In case и = 0, tran sfo rm in g elem ent A to R and inversely, we get M(n, I, s, t, u) = /, v,s, w), where the sum m ation

V , W

is over these p airs (v, w) for which max{v, w} = t. Initial co n d itio n s fo r fo rm u la (7) are as follows:

M ( 1, l,s, t, u) 1 for / = и = 1, s = t = 0

0 for the o th er (8)

Cz. D o m a ń s k i (1986) proved:

Theorem 1. T h e to ta l d istrib u tio n o f variables (L, S A, S B) specified on probability space M n B takes the follow ing form:

P (L = I, SA = s, SB = t) = Q0(n, I, s, t) + Qyin, I, t,s) where fo r n = 0.1 (9) r - 1 i> = 0 + £ Q ( n - w , l - \ , s , w)<?„(l w — 1 for / = s = t = 0 for the o th er

with initial co n d itio n s for /j = 0.1.

F ro m T h e o re m 1 follow th e c u rre n t fo rm u las fo r o n e-d im e n sio n a l d istribution probabilities L A, L B, L c .

(6)

Theorem 2. V ariable d istrib u tio n L A specified on space M„,„ is given by

the fo rm u la

P (S , = s) = Qá(n, ä) + ß i ( n , .v)

Q ii n , s) = " X Q Í ( n - v, s)q0( l - í , ) ' " 1 (10)

tf= 1

Qí(n, s) = X Qo(n-s, v)íi(l

- q 0) l ~ l

+ £ 6 o ( n - w.s)<?iO “ ЗоГ '

»=0 W= 1

with initial conditions

Q á (0 ,0 ) = Q3(0, 0) =

Чо

T ra n sfo rm in g A in to В and 0 into 1, we get form ula for d istrib u tio n S B.

Theorem 3. D istrib u tio n o f variable S G specified on M„t0 is given by

the recu rren t fo rm u la

P (S G = s) = QG0(n, s) + Q í(n , s) (11) where for h = 0.1 Q h (n,s) = X Q ? _ * (n -s ,v )q * (l - < ? i- * ) ,_ I + v = 0 + t Q i - k ( n ~ w , s ) q k( l - i i - * ) " _1 W= 1

with initial conditions

ßg(o,o) = ß?(o, o) = — [ —

Я o + 9i

Let us observe th a t d istrib u tio n SD can be assigned as follows P (S D < s) = P(S^ < s) + P (S B < s) - P (S G < 5).

Theorem 4. D istrib u tio n o f variable L specified on M„,fl is given by

(7)

P ( L = l ) = QŠ(n,l) + e í( n ,l)

(12)

where for h = 0.1

Qk(n, 1) = Q $ { n - 1, 1 )(1 - q x _*) + 6 í - * ( n - 1, s - l ) q h

with initial conditions

ß ii(0 ,0 ) = ß i( 0 , 0 ) = — J—

Ч0 + Ч1

4. E ST IM A T IO N OF T H E POW ER OF R U N T E S T S

T a k in g in to a c c o u n t re c u rre n t fo rm u la s th e p o w e r o f te st fo r n = 1, 2, 1000 an d som e pairs 0 =* (q0, q ^ e G w as assigned.

Because o f b etter in te rp re ta tio n o f param eters

P = Pa = „ . V a n d P = 1 - 9o - 9t

Ho » Чх

adequately expressing M a rk o v chain statio n ary p ro b ab ility an d its a u to c o r­ relation coefficient, these pairs (q0, q t ), were chosen for w hich hypothesis revision H o : p = 0 is as follows:

- if S A < s A — 1, hypothesis H 0 is accepted,

- if S A ^ sA, hypothesis H 0 is rejected (for the altern ativ e H '{ : p > 0),

- if S A = s f - 1, hypothesis H

0

is accepted w ith p ro b ab ility r^.

R andom ized tests based on statistics SB, SD, S c w are analogously assigned. T h e critical value for test based on run num ber is defined by th e form ula:

S„ = m a x { a : F L(s) < a},

where:

Fl(s) = P (Sl^ s)-

(8)

F o r statistics S , we to o k left-sided, and for o th er statistics right-sided critical regions.

Let us take for fixed n, p, significance level a and p = 0, F A(s) = P (S A ^ s ) for s = 0 , 1 , . . .

T h e critical value o f th e test based on statistics will be s i = m in { s : Fa(s) ^ 1 - a}.

T h is value c o rresp o n d s to random ized prob ability

, F M - ( l - a )

r‘ F A( S ' ) - F A(sa- \ )

F o r fixed n, p, and significance level a and sim ple altern ativ e h ypothesis o f type H t : p = p i given results allow (at least ap pro xim ately ) determ in e the pow er o f independence tests (which verify hypothesis H 0 : p = 0) based on statistics S A, S„, S D, S G, L, and , at the sam e tim e, to cho ose the strongest o f them .

N one o f considered test is stronger th an o th er tests, tak in g into accou nt general altern ativ e hypothesis H t : p > 0 . It is obvious th a t we m u st be careful with th a t generalizations based on num erical results b u t even on this stage we can fo rm u late the follow ing conclusions which are useful in choosing run test in applicatio n s (see T ab . 1, 2).

1. T h e test based on statistics SA is stronger th an the test based on statistics S B for p > 0.5 except cases o f stron g asym m etry (p > 0.6) an d very strong a u to c o rre la tio n (p > 0.7).

2. T h e test S c proved to be stronger th a n the tests S A an d S B except cases o f big asym m etry (p > 0.6) and very stro n g a u to c o rre la tio n (p > 0.7). T his test is also stro n g er th a n tests S D, but only for p close to 0.5 and

not very stro n g au to c o rre la tio n in relatively small sam ple size.

3. W ith the increase o f p differences betw een th e pow er o f tests S A and SG and S B an d S D decrease very fast (except cases o f very sm all n (n < 15). T hese differences are significant only in case o f stro n g a u to c o rre la tio n (p > 0.5).

T h e test S L proved to be stronger th an tests S D an d in every considered case. It is also stronger th an tests S D an d S B except cases o f strong asym m etry ( p > 0.7) and n o t very stro n g a u to c o rre la tio n ( p < 0 . 5 ) for n o t very n u m ero u s sam ples (n < 80).

(9)

Pow er o f run tests for p = 0.5, p = 0.3, 0.5, 0.7, 0.9, a = 0.05 (in %o)

n

p = 0 . 3 p = 0 . 5 p = 0. 7 p = 0 . 9

tests tests tests tests

S .4 " S B S G S DS B S G S D * л = s B S D s L S G % s L 5 118 143 59 143 190 253 56 253 289 418 43 418 420 625 18 652 10 145 179 120 215 243 344 175 430 360 579 202 712 467 862 124 956 15 165 209 159 291 292 419 261 598 436 701 335 888 524 954 234 997 20 184 233 197 362 336 480 350 723 505 780 468 959 583 980 350 1 000 25 197 249 217 427 371 519 397 812 560 827 543 986 634 991 436 1 000 30 212 267 235 487 406 562 443 875 613 869 614 995 682 996 517 1 000 40 232 292 289 599 455 618 550 950 686 914 741 1 000 756 999 654 1 000 50 250 313 304 683 500 665 588 980 746 944 796 1 000 814 1 000 741 1 000 60 268 337 333 753 542 711 646 992 795 966 853 1 000 859 1 000 812 1 000 80 290 360 385 855 595 756 730 999 853 981 917 1 000 916 1 000 899 1 000 100 313 387 409 918 644 802 773 1000 898 991 942 1 000 950 1 000 944 1 000 S o u r c e : ow n calculations.

(10)

Pow er o f run tests for p = 0.7, p = 0.6, 0.7, 0.8, 0.9, a = 0.05 (in %o) p = 0.6 p = 0.7 p = 0 . 8 p = 0 .9 Sx sB S c Sd sL

s.

Sb Sg SD sL Sb

sc

Sd sL SB

sc

Sd

sL

5 231 270 307 45 307 143 241 175 51 175 95 194 105 67 105 67 134 70 81 70 10 351 356 507 203 669 353 338 406 211 502 213 296 221 237 221 98 193 99 176 99 15 414 436 603 343 870 386 415 443 366 688 373 359 379 331 471 143 246 143 241 143 20 482 507 679 458 947 432 482 497 448 825 390 423 397 411 550 207 299 208 298 208 25 535 574 732 553 978 485 545 551 523 920 416 483 422 478 728 302 349 302 348 302 30 584 615 775 606 992 532 602 594 590 961 447 537 453 535 767 391 395 391 395 439 40 659 692 835 718 999 605 678 663 685 990 517 631 522 631 897 404 479 404 479 495 50 716 756 873 785 1 000 664 734 717 737 998 576 689 581 689 960 428 552 428 552 657 60 763 804 903 826 1 000 711 782 759 784 1 000 623 733 626 733 982 458 612 458 612 707 80 828 861 938 894 1 000 781 856 820 856 1 000 696 806 696 806 997 526 682 526 682 861 100 875 984 961 929 1 000 830 906 861 906 1 000 751 861 753 861 1 000 588 741 588 741 920 S o u r c e : own calculations.

(11)

REFER EN C ES

B a t e m a n G. (1948), O n the Pow er Function o f the L ongest R un as a T est f o r R andom ness in a Sequence o f A lternatives, “ Biom etrica” , 35, 97-112.

D o m a ń s k i Cz. (1986), Teoretyczne p odstaw y testów nieparam etrycznych i ich zastosow anie tv naukach ekonom iczno-społecznych, Uniwersytet Łódki, Łódź.

O l m s t e a d P. S. (1958), R uns D eterm ined in a Sam ple b y an A rb itra ry C ut, “ Bell System Technical Journal” , 37, 55-82.

C z e s la w D o m a ń s k i

O C EN A M O C Y T E S T Ó W L O SO W O Ś C I O PA R TY C H N A D Ł U G O ŚC I SER II

Testy oparte na długości serii stosow ane są zarówno we w nioskow aniu statystycznym , jak również w statystycznej, kontroli jakości.

W pracy przedstaw iono m oc trzech testów opartych na: - maksymalnej długości serii z jednej strony mediany,

- mniejszej z m aksym alnych długości serii powyżej i poniżej m ediany, - większej z m aksym alnych długości serii powyżej i poniżej mediany.

Cytaty

Powiązane dokumenty

Tym czasem, jak pokazał ten krótki zbiór informacji o sytuacji panującej w Aglomeracji Górnośląskiej, w niektórych obszarach kraju konieczne jest zebranie wyników

W raz z nielicznym personelem tego arch iw u m za­ bezpieczał rozproszone po dw orach, kościołach i urzęd ach poniem ieckie ak ta arch iw aln e i podjął się

6 shows the maximum relative L 2 error as a function of the number of model evaluations by testing the model at the random points after each iteration.. On the same figure, we

tradycją w garnizonie były organizowane przez podoficerów zabawy karnawałowe w sali magistrackiej, na które zapraszano przedstawicieli władz wojewódzkich, miejskich oraz

For this reason, building contractors can be registered as honourable companies, observing the codes of the Corporation for the Assessment of Integrity of the Construction

Hence a simple analytical equation is derived for monthly interception that makes use of the daily characteristics of the interception process and which can be readily used in

w dzie dzi nie eko no mii. Ka pi tał ludz ki nie jest war to ścią sta łą.. Je śli ktoś so bie nie ra dzi na da nym sta no wi sku, prze su wa ny jest na in ne, gdzie jest w sta

Statistical analysis of IATs and flows at the 24 h scale showed that coefficient of variation (CV) and skew- ness values were much higher for flows than for IATs, while medcouple