• Nie Znaleziono Wyników

Variance Component Estimation in the Unbalanced N-way Nested Classification

N/A
N/A
Protected

Academic year: 2021

Share "Variance Component Estimation in the Unbalanced N-way Nested Classification"

Copied!
8
0
0

Pełen tekst

(1)

ANNALES

UNI VE RSITATIS MARIAE C URIE-S К Ł OD ОWSКA LUBLIN-POLONIA

VOL. XXVII, 8 SECTIO A 1973

Katedra Zastosowań Matematyki, Akademia Rolnicza, Lublin

HENRYK MIKOS

Variance Component Estimation in the Unbalanced N-way Nested Classification

Estymacja komponentów wariancyjnych w niezrównoważonej N-krotnoj klasyfikacji hierarchicznej

Оценка компонент дисперсии по несбалансированным данныт Х-факторной иерархической классификации

Introduction. Estimates of the variance components for the unbalanced nestedclassification obtained with the useof analyticalmethodsare given in the papers of Gates and Shiue [3], Gower [5], Oktaha [11], Ahrens [1] and Gaylor and Hartwell [4]. Matrix methods for obtaining the va­ riance components estimates are shown in the papers of Searle [13] and Mahamunulu [9]. This paper gives the estimates of variance components for the unbalanced N-way nested classification obtained with the use of the properties of linear spaces.

Model and analysis of variance. The linear model for an observation

^P2-hV+l ’S tak6n

(!) ^bb-bv+i “ /* + “b + “ii<J+• • • + “bb-bv +ebb-b\r+i

where /z is the general mean, is the effect due to the tx-th first stage class Ajp aU2 is the effectdue to the i2-th second stage class A? i t within Aq, ...’ a^i2 ,v is the effect due to iy-th A-th stage class AA2 within 4bb - bv-i and ebb-bv+i is the residual error of the observation yil<2...<JV+1- We assume that the number of the first stage classes A^ is a1 so that U =1,2,...,a1. Within each A^-dass there are A2-classes so that

= 1,2, a^. Furthemore, within each Af^1 { f class (p—2, 3,...,A) there are l A^-classes so that i =1,2,

The number of observations in the A-th stage class AN,...<v is wbb-bv where «,-i<2 iv> 0. All terms of the model (except /i) are assumedto be independent and normally distributed random variables with zero means

&nd variances <rx, o%, ..., a2N and <^, respectively. These are the variance components which are to be estimated.

5 — Annales

(2)

Let y be the column vector with the elements —- the column vector with elements and e — the column vector the elements of which are the residual errors ^p2...ijV+1- Let furthermore Xp(p = 1, 2,N) denote the « xmatrix, where

n

*1 »2

(2)

»AT

<1 »2 'p-1

in which the element of the g-th row and (in i2,...» ip)-th column is either zero or one; one- if the g-th observation is in the .!?<.< -th class of the classification Ap, in other cases — zero.

Now we can express the linear model (1) as follows:

(3) y = J„M+

£XpaP+e

P=1

where Jń = [1,1,1]. It iseasytosee, that

(4) fitJJcÆflJcfitlJc

whereB [Z] denotes the range space of the matrix X, i.e. the space span­ nedby the column vectors of X. It follows from this that each observation which belongs to the class -4?^..., of the classification Ap belongs also to the class of the classification Xp_1.

It is easy to see that the random vectors ap(p = 1,2,..., N) are distributed as N[O, o%,IaP] and the random vector e is distributed as N[0. Thus we have the following covariance matrix of the vector

y/c.f.[12]/

N

(5) 2; =

p=i

In the customary analysis of variance there are the following sums of squares /c.f. [10]/

*1 *2 ’AT+1

(6) and

8S‘ = 22-2 (ÿbb..,-,-^1..^.)2 (< = 2,3, ..., X)

•l *2 ,A’’+1

88g = 2 2‘" 2 —

‘1 ł2 bv+l

(3)

Variance component estimation... 67 where

•*< = - S S’” S = 1,2,..., A)

’<+1 *1+2 *V+1

*1 *2 *.V+1

and

(9) <i2-*< = SS’”SH^’

*«+l*<+2 *N

It can be shown that the sums of squares (6) can be written in the follo­

wing form

88, = y’(Pm-P[Zt_^y (t = 1, 2,..., N), SSe = y'(I—P[XN])y

where, if t =1, Xo should be replaced by J„ and the term P[A,] denotes the orthogonalprojection operatorto therange spaceof the matrix A,/c.f.

[14]/. It is worth noticing thatfrom the relationship (4) follows immedia­ tely that

(11) (I-P[IJ(P[I,]-P№-,]) = O for t = 1, 2, N and

(12) (P[I(]-P№.1])(P[Ir]-P[Ir.J) =

0

for r * t

and r,t =1,..., N.

Estimation of variance components. To obtain the unbiased estima­

tors of the variance components a2,, cr^, ...,a2N, a,, we must have the expect­

ed values of the sums of squares (10). According to the formula 2.1.24

in [1] we get — =-

« = ^trni-PtJyD^TXl+trRP-Pra)^],

p=l

(13)

«] = 2’tr[(P[T<]-P[x<_1])TpT;4]+tr[(P[T<]- -PfX,.,])^] (< =1, 2,..., A)

where tr[A] denotes the trace of the matrix X.

Let

kr8 = tr[P[Ar]AX], 0* =0,1,2,..., A).

(4)

Then, in regard to the range space of the matrix Xp and the range space ofthe matrix PfAT,]—P[Xz_j] are orthogonal if p < t, we have

W.l = (»-«?)<£, (14)

=(aî-«!-1)o2e+ (t = 1,2,...,N)

p=t

Now we will find the coefficients krs. If r — $ = p(p = 1,2, N) we get

(15) kpp=tr [P [jyxpz;i=tr[xpz;]=tr [z;xp]=2 • • • 21nh-<„=n • Similarly

(16) kOp = triP[j„]^z;,] = -1- tr = i tr[(j;xp)'

j' nx p]

•v 71

In an analogous way we can obtain

(17) "pr 1 ’2 *r

z-l

•l«2" J2 (P< r-,p, r = 1,2, ...,N).

Hendorson’s first method (c.f. [8]) for estimating the components varianceisto equate each of the sums of squares88,, SS2, 8SV, 88e to its expected value. Denoting the resulting estimates as a2,, a2, a2N, and o2 the equations for obtaining them are

A'

(18) 88, = £ (klp-k,_,'P) % + (<*-a'-1) % p=--t

88e = (n-aN)%.

5 ? «

lt2,,,tp

The equations (18) can be written in the following matrix form

(19) S = Xff2

where S = 8S2, SSN, 8Se]', <r = [o'2, ..., ofv, a2]' and A' is the triangular matrix of k’s. Since all diagonall elements of the matrix JC are not equal zero, the matrix K is nonsingular. Hence the following unique solution of the equation (19) exists

at2 =Z-*S.

The sampling covariance matrix of the vector a2 can be found for the unbalanced data by the method of Ahrens (c.f. [2]).

(5)

Variance component estimation... 69 Balanced data. When all the '«•»!,-2...<jV are equal, say wt, and when all the aJ'ij...,- t are equal, say ap (p =1,2,..., N), i.e. when the data are balanced, we can explicitly obtain the estimatesof variance components as well as the sampling variances of their estimates. In this case for p< r(p, r = 0,1,2, ...,W)

(20)

= a®a1 a2. .aJJar+1ar+2...ajV+1

where to simplify the notation it is taken 1 = a°, m = aN+1.

Now the equations (18) can be expressed as JV

M8, = a2 + £a

P+IqP+2

.aN+1? p

(t

=1,2,...,W),

(21) i>=t

ms

, =

a:

where M8e =(l/fe)S8e, MSt = (llft)SSt are the mean squares due to the error and the t-th classification, respectively. The notationsfe, ft are used for the degrees of freedom due to the error and the t-th classification, respectively. It can be shown that

(22)

fe = a1ai...Olf(m—l),

ft = a#a1a2...a,~1(a‘—1) (t = 1,2, ...,W).

We can readily see that

MS, = at+1al+2...aN+1a2 +]H8t+1 and hence

(23)

a‘ ~ a t+'at+2 ...aN+

(MSt-M8l+1) (t =1,2,..., N) where, if t = N, MSN+1 should be replaced by M8e.

For balanceddata the covariance matrix of the vector y can beexpres­ sed as

(24) 2; = « + ^ap+1ap+2...aN+l^P[Xp] forPtZJ = (ap+1ap+2...aN+l)~lXpX'p (p =1,2,...,W).

Now we can prove the following theorem:

Theorem 1. The projection operators I — P[XjV], P[2(] —P[W(_,]

(t = 1,2, ..., N)and the covariancematrix Xu satisfy the followingconditions

(I-P[X N -])XU

= ^(I-P[TV]) (PRJ-Pt^.,]^ = 9>,(P№]--P№-i])

<Pe =<£ + JV+1a*’+2...ax+1<

(6)

Proof. Since for p<t the range space B[Xp] and the range space PjPEX,] — P[X4_j][ are orthogonal

(25) (PtXJ-PtX^PfX,] =

0

(p<t)

and since forp tP[P[X(]—P[X(_j]] c P[Xp] we have (c.f § 76 Theo­

rem 2 in [7]),

(26) (PEXJ-PCX^DPEXp] = P[X<]-P[X,_1] (/>>*).

Thus for t = 1, 2,...,N

N

(P[Xt-\-P[xt_^u = 2'«,’+1«,’+î--«JV+1^(^№]--P№_1])P[Xp] + p-1

N

+(P[X,]-P[Xt_M = 2*>+'a”*...a»+l<t(P[Xtt-P[Xt-'U 4-

P=1

+ (P[X(] -p [X,.,]) d =Vt (P [Xj-p №_,]).

The first condition follows immediately from (4) and (24).

Thestraightforwardconclusion from Theorem 1 is the following theorem:

Theorem 2. The quadratic forms — SSlt — SS2,SSyf, — 8Se

<Pl V>2 <Pn * <Pe are independently distributed as xa (chi-square) with degrees of freedom fi,ft, • • •j fni fei respectively.

Proof. From Theorem 1 we have that the matices llq>t(P[Xt] —

—PEX,_i])27y(t = 1,2,...,X) andl/<pt(I—P[XN]Xy are idempotent. The expectation of the vector y is the vector Jn which is orthogonal to each of the rang spaces P|2—P[XjV]], PjPEX,]— P[X,_i]| (t = 1,2, ...,N). The application of the theorem 4.9 in [6] completes the first partof the proof.

The independence of the quadratic forms follows immediately from (11), (12), Theorem 1 and theorem 4.21 in [6].

The sampling variance of any quadratic form y' Ay of normally-distri­ buted random variables represented by the vector y is 2tr(Py X)a where Xyis the covariance matrixof y. Thewell knownformula, Theorem 1 and Theorem 2 will beappliedtoobtainthe sampling variancesof the estimates c* and <r,2 (t =1,2,...,N). Firstwe will getthesamplingvariances of the mean squares M8a and AL8t (t =1,2,...,N).

var(Jf^) =2/e-ïtr[?>e(Z-P[XJV])]ï

Je

var(Jf^) =2/<-2trk(P[X<]-P[X<_1])la =^.

(7)

Variance component estimation... 71 Hence

On the basis of Theorem 2 wecan say that the test function available to verify the hypothesis

„ , . JUS,

H«: a; = 0 is Ft = (t = 1, 2, N)

where if t — N, MSN+1 should be replaced by MSe. If H, is true, the test function Ft is distributed as F(ft,fl+1).

Acknowledgement. The author is indebted to Professor Dr Victor Oktabaforsuggesting the subject of this paper, and for his advice during its preparation.

REFERENCES [1] Ahrens, H., Variansanalyse; Berlin 1967.

[2] Ahrens, H., Standardfehler geschätzter Varianzkomponenten eines unbalanzie- rten Versuchsplanesin r-stufiger hierarchischer Klassifikation. Monatsb. Deutsch.

Akad. Wiss. Berlin 7(2), 1965.

[3] Gates, C.E. and Shiue, C., The AnalysisofVariance of the S-stage Hierarchical Glassification. Biometrics, 18 (1962), 529-536.

[4] Gaylor, D. W. and Hartwell, T. D., Expected Mean Squares for Nested Clas­ sifications. Biometrics. 25 (1969), 427-430.

[5] Gower, J. C., Variance Component Estimation for Unbalanced Hierarchical Classifications. Biometrics, 18 (1962), 537-542.

[6] Graybill, F.A., An Introduction to Linear Statistical Models. New York 1961.

[7] Haimos, P.R., Finite-Dimensional Vector Spaces. New York 1958.

18] Henderson, C.R., Estimation of Variance and Covariance Components. Biome­

trics. 9 (1953) 226-252.

[9] Mahamunulu, D., M., Sampling Variances ofthe Estimates of Variance Com­ ponents in the Unbalanced 3-way Nested Classification. Ann. Math. Statist., 34 (1963), 521-527.

[10] Mikos, H., Orthonality in the N-way Nested Classification. Ann. Univ. Mariae Curie-Skłodowska, Sect. A, 27 (1973), 55-63.

[11] Oktaha, W., N ieortogonalne modele losowe klasyfikacji hierarchicznej. Roczniki Nauk Rolniczych 82-B-3, 417-435.

[12] Oktaha, W., Teoria układów eksperymentalnych. I Modele stałe, PAN Wydz.

V, Warszawa 1970.

[13] Searle, S.R., VarianceComponents in the Unbalanced 2-way Nested Classification.

Ann. Math. Statist., 32 (1961), 1161-1166.

[14] Seher, G.A.F., The Linear Hypothesis, London 1966.

(8)

STRESZCZENIE

W pracy otrzymano nieobciążone estymatory komponentów warian- cyjnych dla modelu losowego niezrównoważonej N-krotnej klasyfikacji hierarchicznej. Dla omówionego oddzielnie modelu z danymi zrównoważo­ nymi otrzymano ponadto wariancje z próby uzyskanych estymatorów oraz testy istotności dla weryfikacji hipotez dotyczących parametrów modelu. Wszystkie wyniki uzyskano w oparciu o własności przestrzeni liniowych.

РЕЗЮМЕ

В этой работе получены несмещенные оценки компонент дисперсии по несбалансированным данным У-факторной иерархической клас­

сификации. Для отдельно обсуждаемой сбалансированной модели получены кроме того выборочные дисперсии оценок и критерии зна­ чимости для проверки гипотез об эффектах исследуемых факторов.

Все результаты получены с использованием свойств линейных прост­

ранств.

Cytaty

Powiązane dokumenty

Mikos: Variance Component Estimation in the Unbalanced N-way Nested Classification.. Estymacja komponentów wariacyjnych w niezrównoważonej N-krotnej

is made with the following estimators: (1) the minimum norm quadratic unbiased estimator (MINQUE); (2) the best invari- ant quadratic unbiased estimator (BIQUE), also known as

LS-VCE is attractive since it allows one to apply the existing body of knowledge of least-squares theory to the problem of (co)variance component esti- mation. With this method, one

Generally, a more precise estimate (i.e., a smaller variance) will be obtained if the number of bins taken into account increases, provided that the counts in those bins are

The Detailed Survey consists of 57 questions and is divided into five parts: biobank and its structure, col- lected material (including information about the sam- ples and

1.1 , especially if one realizes that in the CM the degrees are independent, and the edges are not, whereas for instance in the GRG (and in the other two examples) precisely

Since often many background data points are available, the noise variance can be estimated from the maximum of the image grey value histogram, which is more robust against the

Here, we bench- mark a set of commonly used GFP variants to analyze gene expres- sion in the low-GC-rich Gram-positive model organisms Bacillus subtilis, Streptococcus pneumoniae,