On the distribution of the longest success-run in Bernoulli trials

(1)

B. Ko p o c in s k i

Wroclaw

On the distribution of the longest success-run in Bernoulli trials

(Praca wplynqia do Redakcji 14.11.1989)

1. Introduction. The success-run in a sequence of Bernoulli trials was considered in a large number of papers. W. Feller [4] concentrates his at- tention on the number of runs what find the principal application in the theory of test of randomness and tests of homogeneity. Many papers deal with random variable Zn introduced by Erdos and Renyi [2] (see also Erdos and Revesz [3]), defined as the length of longest head-run during n coin tossings. They give the asymptotic estimations of that random variable if n tends to infinity. The asymptotic estimation of the distribution of Zn was given by Antonia Foldes [5]. Note that in the theory of extremes in random sequences it is proved that in a sequence of random variables geometrically distributed the maximum linearly standarized do not have a limiting distribution (see [7], p. 26). In [1] the multivariate extension of the problem of the largest cube filled up by successes which may be found in a random lattice in a cube of range n. The problem has very much of practical implications.

Our purpose in this paper is to show the recurrent formulas useful in the calculation of the distribution of the random variable Zn and in the method of calculation of its expected value and also in the test of the accuracy of the limiting estimations.

2. Distribution o f Zn. Let us suppose that the probability p of the success is fixed and 0 < p < 1. Let us denote q = 1 — p,

p(n,k) = P(Zn = k), P(n,k) = P(Zn < k ),

(2)

R(n, k) = P{Zn > k), /»(») = EZn, where n = 0 ,1 ,..., k = 0 ,1 ,...

In the sequel we may find a sum over an empty set of indices and then it is assumed equal to zero. Note that p(0,0) = 1,

p(n, k) = 0 for k < 0 or k > ??,

P(n,k) = 0 for k < 0 and P(n,k) = l for k > n, R(n,k) = 1 for k < 0 and R(n,k) = 0 for k > n.

It is evident that k

P(n,k) = ]T p ( «,*),

i= 0

R(n,k) = 1 — P(n,k),

p(n, k) = P(n,k) - P(n,k - 1) = R(n,k - 1) - R(n,k).

By [x] we denote the integer part of x. The following theorem gives recur- rence formulas for the distribution of the longest success-run in Bernoulli trials.

Th e o r e m 1. For fixed k = 1 ,2 ,... we have

(1) (

2

⁾

(3)

(U (5) ( 6)

p(fc, fc) = pk ,

k- 1

p(n, k) = ^ 2 pj qp(n - j - 1, k) + pkqP(n - k - 1, k) , j=o

n = k + 1, k 2 ,...

p(n,k) = pkqP{n — k — l,k)

ii—k—2

+ P Uik - 1 )pkq2P(n - j - k - 2, fc) i=o

-f pkqP(n — k — l,k — 1), R(k + l,k) = pk+\

k

R(n, k) = - i - 1, k) + pfc+1, n = fc + 1, k + 2 ,...

j=o

#(?i -f 1,&) = R(n,k) + ])k+1q(l - R(n - k - 1,/:)),

7i = T 1, /r T 2 ,...

Let iVfc, Jb = 1 ,2 ,... denote the number of the trial in which the success- run of length k firstly is realized. Let us denote pnk = P{Nk = n),n =

(3)

1 ,2 ,..., and assume that poo — 1- Then

(7) P(n, k - l ) = P ( Z „ < k ) = P(Nk > n) n

= 1 - ^ P i k , n = 1 ,2 ,..., k = 1 ,2 ,...

Th e o r e m 2. For fixed k = 1 ,2 ,... we have (8) Vkk = Pnfc = 0 for n < k ,

k- 1

P r o o f o f T h e o r e m 1. Clearly (1) and (4) hold. We prove the formulas (2), (3) and (5) by the total probability formula. For proof of (2) we calculate the probability of the event {Zn = using as condition the length of the success-run which initiated the considered trials. For the proof of (3) we use as condition the begin of the first success-run of length k in the considered trials. For the proof of (5) we calculate the probability of the event {Z n > k} under the same condition as in the proof of (2).

Now we pass to the proof of (6). Note that the probability of the event {Z n > k} may be interpreted as the reliability of a coherent system, which may be named a structure “fc + 1 in a sequence among n” . This system is in the working state if at least k + 1 elements in sequence from 1,2,... ,n are in two working state. The reliability of the structure can be calculated using minimal paths which are

Here it is convenient to introduce the notation for the considered trials.

Let X i , X o , ... denote the sequence independent 0-1 random variables and let P(Xi = 1) = p, P(Xi = 0) = q, i = 1,2,... Let us define the random events

Ai = { Xj = 1 for j = i , i + 1,..., i + k}, i = 1,2,...

Then n—k

t=l

The formula (6) may be easily shown inductively. We have p( 4,1 - «*+!

POi)

^{= P‘}

n — k

(4)

= R ( n , k) + p k+1 - P [ (J (A t D An_ fc+1))

i—k n—k

= R ( n , k) + p k+1 - pfc+1p ( (J B;) , 1 = 1 where

Bi = { Xj = 1 for j = i, i + 1,..., min(i + k , n - k)}, i = 1,2,..., n - k . Because

n—k n—2k—l

U Bi = ( IJ A i ' j u B n_ k

i= 1 i=l

and since the random events U^i2*""1 anc^ B n - k = { X n- k — 1} are independent, thus

n—2k—l

P ^ Ai'j = R{n — k — 1, A,') -J- p — R(n — k — 1, fc)p, i=l

gives the result of (6).

Remark.

(a) From (6) in particular we have R(2n,n — 1) = pn( 1 + nq) which for p = 1/2 was given in [3], Lemma 1.

(b) The equivalence of (2) and (5) may be checked immediately p(n, k) = R(n, k — 1) — R(n, k)

k — l k

= piqR(n — j — 1, k — 1) — ^ jp qR{n — j — 1, k) +

j=o j=o

A: — 1

= Y , P j q(R(n - j - 1,& - 1) - R(n - j - 1 ,&)) i=o

— pkqR(n — k — l,k) + pkq

k — l

= ^ 2 PJ(lP ( n - J - &) + pfc</(l - - fc - 1, &)) •

i—o

(c) The equivalence of (5) and (6) may be checked immediately

k

R ( n + l,k) = ^ 2 p JqR(n - j,k) + p k+1 j=o/c —1

= qR(n, fc) + p ^ 2 p JqR(n - j - l , k ) + pk+1 j=o

(5)

= qR(n, k) + p{R(n, k) - R(n - k - 1 ,k)pkq - pk+l) + pfc+1

= R(n, k) + pk+1q(l — R(n — k — 1, k)).

(d) The immediate proof of the equivalence of (3) and others requires more complicated calculations. We prove it as a remark using the generating function of the distribution function.

(e) Theorem 1 is useful in calculations of the moments. We have

OO 71 — 1

/K(n) = £ > (2 T „ > £) = £ > ( « , £ ) •

fc=0 A;=0

Some reduction of calculations gives the following recurrence formula being a consequence of (6)

Ml) = P>

[^ 1

fi(n + 1) = p(n) + p — q ^ R(n — i, i — 1 )pl . i=i

P r o o f o f T h e o r e m 2. The formula (9) follows from the total probability formula where the condition is the length of the success-run which indicated the considered trials. The formula (8) clearly hold.

3. Generating functions. Let us introduce the generating functions OO

R*(s, k) = Y ; snR(n,k),

n = k + l oo

Pkis) = ^ 2 sUPnk, k = o, 1,... ,

n—k

oo n—1

(10) R'*(s, u ) = Y l L s" ukR(n>k).

n=l k—0 t oo

p * ( s ) = 71=1

Th e o r em 3. For k = 0,1,..., \u\ < 1, \s\ < 1 we have (11)

(12)

R*(s,k)

P'(s,k)

pk+1sk+1(l — ps) (1 — s)(l - s + pk+lqsk+2) F - - R * ( s , k ) = - ! - P 1 — 5

>5(1 7 T - s )

1 — 5 -f pk+1qsk+1 ’ R» ( s u) = M i - p « ) W r P g f l V _____ 1

( ' a - « ) 2 1 — (ps

j=0 (psy+iu ’

(13)

(6)

(14) ,»*(*) = £**(«, 1), pksk(l — ])s) (15) _Pk(s) = 1 — s + 2>k(isk+1

Pr oof . Multiplying (4) by sk+1 and (6) by sn+1 and adding side by side we obtain

R ‘(s,k) = pk+1sk+1 + sR*(s,k) + pk+1qsk+2j ^ - p k+1qsk+2R*(s,k) implying that

(16) R * ( s , k ) { l - s + pk+1qsk+2) = P + S X} 1 » k = 0,1,..., which implies (11) and (12). Because from (10) we have

oo oo oo oo n —1

= £ Y , uksnR (n ,k )= fc) = £**(«,«),

A:=0 fc= 0n = J t+ l n = l fc=0

hence multiplying (16) by and adding side by side we obtain 7T*(s,u)(l - s) + pqs-R"(s,psu) = ^ ^ p auy The solution of this equation is (13). We have

oo n —1

p-(s) = Y s n Y , R(n> ^ = R" ( s’ J)>

n = l k= 0

which proves (14).

Multiplying (8) by sk and (9) by sn and adding side by side we obtain the equation

k- 1

pl(s) = pksk -!- si+1p'qp*k{s) ,

2 = 0

the solution of which is (15). From (7) we get

00 1 - vksk

P*(s,k - 1) = Y *n p (n, *) = ( ! - P I M M 1 - s) = i - , + pkqai*i n=0

which is equivalent to (12).

Remar k.

(a) Theorem 3 gives the practical possibility of calculation the moments.

(7)

We have

u* w = R " t s i) - K ( M l - ( 1 _ s)2 2 _ ; U - J l - Wp< 1 ~ps'> y ' ( ~ p ^ 2y ____ L _

ps( 1 — ps) 1 — 5

3 =0 (p5)j + 1

k=0 s -f- (ps)k+1qs ’ As

oo / \k oo oo co / \

E ' E E i f ' i 1' " E I

fc=0 J 1 k—0 n = 0 i= 0 V 7

oo oo oo n

**= E*m E E E “K(V+v**

_m=0 fc=0 n=0 i=0 fc+n+( Ar+l)i=m

[^{m — k}fc^{+ 2}¹

**= E»*EE**

_m=0 fc=0 i=0 '

"/ + X)V v+v = E

7 m=0 hence

and

where

n-(s) = M ' - p t) j r , *' bm ,1 —5 7—1'

P/j(« + 1)

m — k

“ n m lT +2- J / . / , . 1 V \

»" = E i" = E E E (

_{m “ n} m — n h—n -»— n '

**V >( v v •**

/

m = 0

— P(tn — \ >

m —0 m —0 k—0 i—0

But

n —A; —( A;-f 2)i m =0

E

m + i i

n

— A? — (A; + l)i + 1

i + 1 (see Feller [4], p.64, formula 12.8), thus

lirrl

**«» = E E (B-fc-i+t1)i+1V (-^19),*B = 0-1>-**

fc=o *=o v 1+ 1 7 (b) Using the generating function we prove the equivalence of (3) and

(8)

(6). The formula (3) in terms of P(n,k) has the form

P(n, k) - P(n, k - 1) = pkq(P(n - k - 1, k - 1) + P(n - k - 1, k))

n — k—2

+ k ~ l )vkqP(n - ' j - k - i , k - i ) .

3=0

Whence and from (1) passing to transforms yields

P*(s, k) - P*(s, k - 1) = pksk + pkqsf+1(P\s, k) + P*(s, k - 1)) + pkq2sk+2P*(s,k)P*(s, k - 1).

The solution of it is (10), which is equivalent to (6).

4. Asymptotic distribution of Zn. The asymptotic distribution of the longest head-run is given by A.Foldes [5]. For the Bernoulli trials the following estimation was proved in [7].

Th e o r e m 4. If n —> oo then

P(Zn - [log1/p n] < k) = exp(-pfc-°(n)) + o(l), where a(n) = log1/p n - [log1/p n].

Table 1 contains selected probability distributions P(Zn > k) for coin tossing and their estimation. It allows to verify the accuracy of estimation for the subsequence n = 2m, m = 1,2,... for which the convergence can be proved as m —* oo. Table 2 contains the expected values of Zn for coin tossing for some n.

5. Estimation of expected value. Theorem 4 suggests a formula for the estimation of the expected value p,(n) for large n.

Now we draw this fact from Theorem 3.

Th e o r e m 5. If n —► oo, then

p(n) = log i/pn + 0(1).

Pr oof . Let us introduce the function

MO = MM)>1 ^ 0 >

and it Laplace transform

oo

/j,®(s')

— J

p,(t)e~st dt, Re s > 0 . o

(9)

It the easy to verify that 1 - e~s

1 — e s 1 — pe' (pe s)s\k _ _______l - pe y - _______

s 1 — e~s fc=l 1 — e~s + (pe~s)kqe~s ’ For real s > 0 and p — e~x, A > 0 let

1 — e~s 1 — pe~s e-(s+\)x (h( x ^ —

' 5 1 - e~s 1 - e~s + e -(a+x)xqe~a Because <f>(x) is decreasing for x > 0

A+l fc

J* <f>(x) dx < cf)(k) < J* <f)(x)dx,

, .T > 0.

At—1 hence

where

and

oo

<£®0) = f (f>(x)dx= 5 + 1 1 — e l — pe a --- log —---1 . 1 — pe 5 1 — e-s qe~s 1 — e~

0 < = 4f ( s ) - 0®>(s) < 0(0) - 0(1) = ~ e_se~bq Let us introduce the function

1 - pu 1 1 - pu- t i ( u) = - r - ^ ---log~i---1 - u qu 1 — u It is easy to verify that

00 ,k 1 — u qu k— 1 k k

1 1

*:=i

1 00 1

= /s=i ( ( 1 + p + ‘ ' + pfc_1 ^ + p + ' ’ ' + pfc_2))

00 I — 1 — nfc_1 1 —

= E “fc- 1u - p + S iL + - + i i ^_fc=i + ^

00 1 1 n2 nk 1 — nfc+1

= E „ . - 1(1 + U . . . a - ( p + ^ + . . . + ^ ) + p '

fc=l k (k + 1)(1 - n)

(10)

This implies that pl(u) is the generating function of the sequence

Cfc — 1 + - + •••.+

p2 pk 1

k-i~{p+Y + -"+r^i

⁺ ^{1 —} 1 , 2 , . . . k ( l - p )

Hence (f>f(s) is the Laplace transform of the convolution of C ( x ) = and F(x) = e~Xx, x > 0

t t

02(/) = J

C(t - x)e~Xx dx = J

f (1

-

e -A(*-x>) dC(ar)

1 1 ^

= - / e~Xx <lC(x) = logj/pf + 0(1), t -* oo .

A 0

Since

4 ® W = 1 _ e ~ \ 1 s = ^— ^ ( 1 + qe-“ + q2e~2’ + ...)

s 1 — e~sq s

we have

Z\(/) = 1 + <?+... + <7^ —> —, t —> oo , Hence 02(0 — /t(/) = 0(1),/ —► oo, and it proves of Theorem 5.P

Table 1. The distribution function P{Zn > k) for coin tossing

k n ⁴ ⁸ ¹⁶ ³² ⁶⁴ ¹²⁸ approximation

for 128

0 0.937 0.996 0.999

1 .500 .785 .960 .998

2 .187 .418 .702 .922 .995 .999 .999

3 .062 .187 .395 .664 .897 .990 .982

4 .078 .196 .389 .648 .883 .865

5 .031 .093 .204 .389 .640 .632

6 .012 .043 .102 .211 .390 .393

7 .004 .019 .050 .109 .215 .221

8 .009 .024 .054 .112 .118

9 .004 .011 .028 .057 .061

10 .002 .005 .013 .029 .031

11 .001 .002 .006 .014 .016

12 .001 .003 .007 .008

13 .002 .004 .004

14 .001 .002 .002

15 .001 .001

E Z n 1.687 2.512 3.425 4.379 5.356 6.345 6.3328

(11)

Table 2. Expected values of Zn for coin tossing

n p(n) n p(n)

1 0.5000 15 3.3380

2 1.0000 20 3.7292

3 1.3750 25 4.0364

4 1.6875 30 4.2895

5 1.9375 35 4.5049

6 2.1563 40 4.6923

7 2.3437 45 4.8581

8 2.5117 50 5.0068

9 2.6621 75 5.5817

10 2.7988 100 5.9918

References

R. W. R. Darling, M.S. W aterm an, Extreme value distribution for the largest cube in a random lattice, SIAM J. Appl. Math. Vol. 46, 1 (1986), 118-132.

P. Erdos and A. Renyi, On a new law of large numbers, Journ. Analyse Math., 22 (1970), 103-111.

P. Erdos and P. Revesz, On the length of the longest head-run, Topics in Informa- tion Theory, 16, Colloquia Math. Soc. J. Bolyai, Kesthely (Hungary) 1975, 219-228.

W. Feller, An Introduction to Probability Theory and Its Application, Vol I, New York 1968.

A. Foldes, The limit distribution of the length of the longest head-run, Periodica Mathematica Hungarica, Vol. 10,4 (1979), 301-310.

I. Kopociriska and B. Kopociriski, On extreme gap in the renewal processes, Appl. Math. 21, 2 (in preparation).

M. R. Lead better, G. Lingren and H. Rootzen, Extremes and Related Properties of Random Sequences and Processes, New York 1983.