B. Ko p o c in s k i
Wroclaw
On the distribution of the longest success-run in Bernoulli trials
(Praca wplynqia do Redakcji 14.11.1989)
1. Introduction. The success-run in a sequence of Bernoulli trials was considered in a large number of papers. W. Feller [4] concentrates his at- tention on the number of runs what find the principal application in the theory of test of randomness and tests of homogeneity. Many papers deal with random variable Zn introduced by Erdos and Renyi [2] (see also Erdos and Revesz [3]), defined as the length of longest head-run during n coin tossings. They give the asymptotic estimations of that random variable if n tends to infinity. The asymptotic estimation of the distribution of Zn was given by Antonia Foldes [5]. Note that in the theory of extremes in random sequences it is proved that in a sequence of random variables geometrically distributed the maximum linearly standarized do not have a limiting distri- bution (see [7], p. 26). In [1] the multivariate extension of the problem of the largest cube filled up by successes which may be found in a random lattice in a cube of range n. The problem has very much of practical implications.
Our purpose in this paper is to show the recurrent formulas useful in the calculation of the distribution of the random variable Zn and in the method of calculation of its expected value and also in the test of the accuracy of the limiting estimations.
2. Distribution o f Zn. Let us suppose that the probability p of the success is fixed and 0 < p < 1. Let us denote q = 1 — p,
p(n,k) = P(Zn = k), P(n,k) = P(Zn < k ),
R(n, k) = P{Zn > k), /»(») = EZn, where n = 0 ,1 ,..., k = 0 ,1 ,...
In the sequel we may find a sum over an empty set of indices and then it is assumed equal to zero. Note that p(0,0) = 1,
p(n, k) = 0 for k < 0 or k > ??,
P(n,k) = 0 for k < 0 and P(n,k) = l for k > n, R(n,k) = 1 for k < 0 and R(n,k) = 0 for k > n.
It is evident that k
P(n,k) = ]T p ( «,*),
i= 0
R(n,k) = 1 — P(n,k),
p(n, k) = P(n,k) - P(n,k - 1) = R(n,k - 1) - R(n,k).
By [x] we denote the integer part of x. The following theorem gives recur- rence formulas for the distribution of the longest success-run in Bernoulli trials.
Th e o r e m 1. For fixed k = 1 ,2 ,... we have
(1) (
2
)(3)
(U (5) ( 6)
p(fc, fc) = pk ,
k- 1
p(n, k) = ^ 2 pj qp(n - j - 1, k) + pkqP(n - k - 1, k) , j=o
n = k + 1, k 2 ,...
p(n,k) = pkqP{n — k — l,k)
ii—k—2
+ P Uik - 1 )pkq2P(n - j - k - 2, fc) i=o
-f pkqP(n — k — l,k — 1), R(k + l,k) = pk+\
k
R(n, k) = - i - 1, k) + pfc+1, n = fc + 1, k + 2 ,...
j=o
#(?i -f 1,&) = R(n,k) + ])k+1q(l - R(n - k - 1,/:)),
7i = T 1, /r T 2 ,...
Let iVfc, Jb = 1 ,2 ,... denote the number of the trial in which the success- run of length k firstly is realized. Let us denote pnk = P{Nk = n),n =
1 ,2 ,..., and assume that poo — 1- Then
(7) P(n, k - l ) = P ( Z „ < k ) = P(Nk > n) n
= 1 - ^ P i k , n = 1 ,2 ,..., k = 1 ,2 ,...
Th e o r e m 2. For fixed k = 1 ,2 ,... we have (8) Vkk = Pnfc = 0 for n < k ,
k- 1
P r o o f o f T h e o r e m 1. Clearly (1) and (4) hold. We prove the for- mulas (2), (3) and (5) by the total probability formula. For proof of (2) we calculate the probability of the event {Zn = using as condition the length of the success-run which initiated the considered trials. For the proof of (3) we use as condition the begin of the first success-run of length k in the considered trials. For the proof of (5) we calculate the probability of the event {Z n > k} under the same condition as in the proof of (2).
Now we pass to the proof of (6). Note that the probability of the event {Z n > k} may be interpreted as the reliability of a coherent system, which may be named a structure “fc + 1 in a sequence among n” . This system is in the working state if at least k + 1 elements in sequence from 1,2,... ,n are in two working state. The reliability of the structure can be calculated using minimal paths which are
Here it is convenient to introduce the notation for the considered trials.
Let X i , X o , ... denote the sequence independent 0-1 random variables and let P(Xi = 1) = p, P(Xi = 0) = q, i = 1,2,... Let us define the random events
Ai = { Xj = 1 for j = i , i + 1,..., i + k}, i = 1,2,...
Then n—k
t=l
The formula (6) may be easily shown inductively. We have p( 4,1 - «*+!
POi)
= P‘n — k
= R ( n , k) + p k+1 - P [ (J (A t D An_ fc+1))
i—k n—k
= R ( n , k) + p k+1 - pfc+1p ( (J B;) , 1 = 1 where
Bi = { Xj = 1 for j = i, i + 1,..., min(i + k , n - k)}, i = 1,2,..., n - k . Because
n—k n—2k—l
U Bi = ( IJ A i ' j u B n_ k
i= 1 i=l
and since the random events U^i2*""1 anc^ B n - k = { X n- k — 1} are independent, thus
n—2k—l
P ^ Ai'j = R{n — k — 1, A,') -J- p — R(n — k — 1, fc)p, i=l
gives the result of (6).
Remark.
(a) From (6) in particular we have R(2n,n — 1) = pn( 1 + nq) which for p = 1/2 was given in [3], Lemma 1.
(b) The equivalence of (2) and (5) may be checked immediately p(n, k) = R(n, k — 1) — R(n, k)
k — l k
= piqR(n — j — 1, k — 1) — ^ jp qR{n — j — 1, k) +
j=o j=o
A: — 1
= Y , P j q(R(n - j - 1,& - 1) - R(n - j - 1 ,&)) i=o
— pkqR(n — k — l,k) + pkq
k — l
= ^ 2 PJ(lP ( n - J - &) + pfc</(l - - fc - 1, &)) •
i—o
(c) The equivalence of (5) and (6) may be checked immediately
k
R ( n + l,k) = ^ 2 p JqR(n - j,k) + p k+1 j=o/c —1
= qR(n, fc) + p ^ 2 p JqR(n - j - l , k ) + pk+1 j=o
= qR(n, k) + p{R(n, k) - R(n - k - 1 ,k)pkq - pk+l) + pfc+1
= R(n, k) + pk+1q(l — R(n — k — 1, k)).
(d) The immediate proof of the equivalence of (3) and others requires more complicated calculations. We prove it as a remark using the generating function of the distribution function.
(e) Theorem 1 is useful in calculations of the moments. We have
OO 71 — 1
/K(n) = £ > (2 T „ > £) = £ > ( « , £ ) •
fc=0 A;=0
Some reduction of calculations gives the following recurrence formula being a consequence of (6)
Ml) = P>
[^ 1
fi(n + 1) = p(n) + p — q ^ R(n — i, i — 1 )pl . i=i
P r o o f o f T h e o r e m 2. The formula (9) follows from the total pro- bability formula where the condition is the length of the success-run which indicated the considered trials. The formula (8) clearly hold.
3. Generating functions. Let us introduce the generating functions OO
R*(s, k) = Y ; snR(n,k),
n = k + l oo
Pkis) = ^ 2 sUPnk, k = o, 1,... ,
n—k
oo n—1
(10) R'*(s, u ) = Y l L s" ukR(n>k).
n=l k—0 t oo
p * ( s ) = 71=1
Th e o r em 3. For k = 0,1,..., \u\ < 1, \s\ < 1 we have (11)
(12)
R*(s,k)
P'(s,k)
pk+1sk+1(l — ps) (1 — s)(l - s + pk+lqsk+2) F - - R * ( s , k ) = - ! - P 1 — 5
>5(1 7 T - s )
1 — 5 -f pk+1qsk+1 ’ R» ( s u) = M i - p « ) W r P g f l V _____ 1
( ' a - « ) 2 1 — (ps
j=0 (psy+iu ’
(13)
(14) ,»*(*) = £**(«, 1), pksk(l — ])s) (15) Pk(s) = 1 — s + 2>k(isk+1
Pr oof . Multiplying (4) by sk+1 and (6) by sn+1 and adding side by side we obtain
R ‘(s,k) = pk+1sk+1 + sR*(s,k) + pk+1qsk+2j ^ - p k+1qsk+2R*(s,k) implying that
(16) R * ( s , k ) { l - s + pk+1qsk+2) = P + S X} 1 » k = 0,1,..., which implies (11) and (12). Because from (10) we have
oo oo oo oo n —1
= £ Y , uksnR (n ,k )= fc) = £**(«,«),
A:=0 fc= 0n = J t+ l n = l fc=0
hence multiplying (16) by and adding side by side we obtain 7T*(s,u)(l - s) + pqs-R"(s,psu) = ^ ^ p auy The solution of this equation is (13). We have
oo n —1
p-(s) = Y s n Y , R(n> ^ = R" ( s’ J)>
n = l k= 0
which proves (14).
Multiplying (8) by sk and (9) by sn and adding side by side we obtain the equation
k- 1
pl(s) = pksk -!- si+1p'qp*k{s) ,
2 = 0
the solution of which is (15). From (7) we get
00 1 - vksk
P*(s,k - 1) = Y *n p (n, *) = ( ! - P I M M 1 - s) = i - , + pkqai*i n=0
which is equivalent to (12).
Remar k.
(a) Theorem 3 gives the practical possibility of calculation the moments.
We have
u* w = R " t s i) - K ( M l - ( 1 _ s)2 2 _ ; U - J l - Wp< 1 ~ps'> y ' ( ~ p ^ 2y ____ L _
ps( 1 — ps) 1 — 5
3 =0 (p5)j + 1
k=0 s -f- (ps)k+1qs ’ As
oo / \k oo oo co / \
E ' E E i f ' i 1' " E I
fc=0 J 1 k—0 n = 0 i= 0 V 7
oo oo oo n
= E*m E E E “K(V+v
m=0 fc=0 n=0 i=0 fc+n+( Ar+l)i=m[m — kfc+ 2 1
= E»*EE
m=0 fc=0 i=0 '"/ + X)V v+v = E
7 m=0 henceand
where
n-(s) = M ' - p t) j r , *' bm ,1 —5 7—1'
P/j(« + 1)
m — k
“ n m lT +2- J / . / , . 1 V \
»" = E i" = E E E (
m “ n m — n h—n -»— n 'V *>( v * v •
/m = 0
— P(tn — \ >
m —0 m —0 k—0 i—0
But
n —A; —( A;-f 2)i m =0
E
m + i i
n
— A? — (A; + l)i + 1
i + 1 (see Feller [4], p.64, formula 12.8), thus
lirrl
«» = E E (B-fc-i+t1)i+1V (-^19),*B = 0-1>-
fc=o *=o v 1+ 1 7 (b) Using the generating function we prove the equivalence of (3) and(6). The formula (3) in terms of P(n,k) has the form
P(n, k) - P(n, k - 1) = pkq(P(n - k - 1, k - 1) + P(n - k - 1, k))
n — k—2
+ k ~ l )vkqP(n - ' j - k - i , k - i ) .
3=0
Whence and from (1) passing to transforms yields
P*(s, k) - P*(s, k - 1) = pksk + pkqsf+1(P\s, k) + P*(s, k - 1)) + pkq2sk+2P*(s,k)P*(s, k - 1).
The solution of it is (10), which is equivalent to (6).
4. Asymptotic distribution of Zn. The asymptotic distribution of the longest head-run is given by A.Foldes [5]. For the Bernoulli trials the following estimation was proved in [7].
Th e o r e m 4. If n —> oo then
P(Zn - [log1/p n] < k) = exp(-pfc-°(n)) + o(l), where a(n) = log1/p n - [log1/p n].
Table 1 contains selected probability distributions P(Zn > k) for coin tossing and their estimation. It allows to verify the accuracy of estimation for the subsequence n = 2m, m = 1,2,... for which the convergence can be proved as m —* oo. Table 2 contains the expected values of Zn for coin tossing for some n.
5. Estimation of expected value. Theorem 4 suggests a formula for the estimation of the expected value p,(n) for large n.
Now we draw this fact from Theorem 3.
Th e o r e m 5. If n —► oo, then
p(n) = log i/pn + 0(1).
Pr oof . Let us introduce the function
MO = MM)>1 ^ 0 >
and it Laplace transform
oo
/j,®(s')
— J
p,(t)e~st dt, Re s > 0 . oIt the easy to verify that 1 - e~s
1 — e s 1 — pe' (pe s)s\k _ _______l - pe y - _______
s 1 — e~s fc=l 1 — e~s + (pe~s)kqe~s ’ For real s > 0 and p — e~x, A > 0 let
1 — e~s 1 — pe~s e-(s+\)x (h( x ^ —
' 5 1 - e~s 1 - e~s + e -(a+x)xqe~a Because <f>(x) is decreasing for x > 0
A+l fc
J* <f>(x) dx < cf)(k) < J* <f)(x)dx,
, .T > 0.
At—1 hence
where
and
oo
<£®0) = f (f>(x)dx= 5 + 1 1 — e l — pe a --- log —---1 . 1 — pe 5 1 — e-s qe~s 1 — e~
0 < = 4f ( s ) - 0®>(s) < 0(0) - 0(1) = ~ e_se~bq Let us introduce the function
1 - pu 1 1 - pu- t i ( u) = - r - ^ ---log~i---1 - u qu 1 — u It is easy to verify that
00 ,k 1 — u qu k— 1 k k
1 1
*:=i
1 00 1
= /s=i ( ( 1 + p + ‘ ' + pfc_1 ^ + p + ' ’ ' + pfc_2))
00 I — 1 — nfc_1 1 —
= E “fc- 1u - p + S iL + - + i i ^fc=i + ^
00 1 1 n2 nk 1 — nfc+1
= E „ . - 1(1 + U . . . a - ( p + ^ + . . . + ^ ) + p '
fc=l k (k + 1)(1 - n)
This implies that pl(u) is the generating function of the sequence
Cfc — 1 + - + •••.+
p2 pk 1
k-i~{p+Y + -"+r^i
+ 1 — 1 , 2 , . . . k ( l - p )Hence (f>f(s) is the Laplace transform of the convolution of C ( x ) = and F(x) = e~Xx, x > 0
t t
02(/) = J C(t - x)e~Xx dx = J f (1
- e -A(*-x>) dC(ar)
1 1 ^
= - / e~Xx <lC(x) = logj/pf + 0(1), t -* oo .
A 0
Since
4 ® W = 1 _ e ~ \ 1 s = ^— ^ ( 1 + qe-“ + q2e~2’ + ...)
s 1 — e~sq s
we have
Z\(/) = 1 + <?+... + <7^ —> —, t —> oo , Hence 02(0 — /t(/) = 0(1),/ —► oo, and it proves of Theorem 5.P
Table 1. The distribution function P{Zn > k) for coin tossing
k n 4 8 16 32 64 128 approximation
for 128
0 0.937 0.996 0.999
1 .500 .785 .960 .998
2 .187 .418 .702 .922 .995 .999 .999
3 .062 .187 .395 .664 .897 .990 .982
4 .078 .196 .389 .648 .883 .865
5 .031 .093 .204 .389 .640 .632
6 .012 .043 .102 .211 .390 .393
7 .004 .019 .050 .109 .215 .221
8 .009 .024 .054 .112 .118
9 .004 .011 .028 .057 .061
10 .002 .005 .013 .029 .031
11 .001 .002 .006 .014 .016
12 .001 .003 .007 .008
13 .002 .004 .004
14 .001 .002 .002
15 .001 .001
E Z n 1.687 2.512 3.425 4.379 5.356 6.345 6.3328
Table 2. Expected values of Zn for coin tossing
n p(n) n p(n)
1 0.5000 15 3.3380
2 1.0000 20 3.7292
3 1.3750 25 4.0364
4 1.6875 30 4.2895
5 1.9375 35 4.5049
6 2.1563 40 4.6923
7 2.3437 45 4.8581
8 2.5117 50 5.0068
9 2.6621 75 5.5817
10 2.7988 100 5.9918
References
R. W. R. Darling, M.S. W aterm an, Extreme value distribution for the largest cube in a random lattice, SIAM J. Appl. Math. Vol. 46, 1 (1986), 118-132.
P. Erdos and A. Renyi, On a new law of large numbers, Journ. Analyse Math., 22 (1970), 103-111.
P. Erdos and P. Revesz, On the length of the longest head-run, Topics in Informa- tion Theory, 16, Colloquia Math. Soc. J. Bolyai, Kesthely (Hungary) 1975, 219-228.
W. Feller, An Introduction to Probability Theory and Its Application, Vol I, New York 1968.
A. Foldes, The limit distribution of the length of the longest head-run, Periodica Mathematica Hungarica, Vol. 10,4 (1979), 301-310.
I. Kopociriska and B. Kopociriski, On extreme gap in the renewal processes, Appl. Math. 21, 2 (in preparation).
M. R. Lead better, G. Lingren and H. Rootzen, Extremes and Related Properties of Random Sequences and Processes, New York 1983.