• Nie Znaleziono Wyników

GEE Estimators in Mixture Model with Varying Concentrations

N/A
N/A
Protected

Academic year: 2021

Share "GEE Estimators in Mixture Model with Varying Concentrations"

Copied!
8
0
0

Pełen tekst

(1)

FOLIA OECONOMICA 3(314)2015 http://dx.doi.org/10.18778/0208-6018.314.03

Oleksii Doronin

*

, Rostislav Maiboroda

**

GEE ESTIMATORS IN MIXTURE MODEL WITH

VARYING CONCENTRATIONS

Abstract. We discuss a semiparametric mixture model where some components are parameterized with common Euclidean parameter and others are fully unknown. We introduce GEE (generalized estimating equations) approach and adaptive GEE-based approach for parameter estimation. Derived estimators are consistent and asymptotically normal, and they are optimized in terms of their dispersion matrices. Proposed techniques are tested on simulated samples.

Keywords: mixture model, semiparametric estimation, GEE.

1. INTRODUCTION

The cumulative distribution function (CDF) of one observation in a mixture model is expressed by a linear combination of some CDFs with

probabilities , (i.e. M F F ,...,1 M p p ,...,1 1 1 

M m m p

M m m m F p x F 1 ( ) (  m p m p x) ). Note that is called the CDF of the -th mixture component, and − the component concentration. In mixture model with varying concentrations depends on the observation index: , m F m m j p m pj1,N. Thus,

  M j m m jF x p x F j 1 ) ( ) (  , j1,N.

We consider the case when some parametric model is known for the first

K

components: Fm(x)Fm(x;t), m1 K, . Parameter is assumed to be Euclidean: The true value of

t

we designate as

t

. d   

t

and assume that it

is unknown. The CDFs of the last M mixture components are assumed toK

 

* Ph.D. student, Department of Probability Theory, Statistics and Actuarial Mathematics, Mechanics and Mathematics Faculty, Taras Shevchenko National University of Kyiv.

** Ph.D., Department of Probability Theory, Statistics and Actuarial Mathematics, Mechanics and Mathematics Faculty, Taras Shevchenko National University of Kyiv.

(2)

be fully unknown. We also assume that concentrations are known. Our goal is to estimate m j p .

 To do this, we derive consistent and asymptotically normal estimators, and optimize them in terms of their dispersion matrices.

2. NONPARAMETRIC ESTIMATE FOR DISTRIBUTION FUNCTION

CDF of the m-th component may be estimated through the weighted empirical distribution function:

. 1 : ) ( ˆ } { 1 x N j m j m a I j N x F

Weights are taken as the solution of the minimization problem of maximal variance of unbiased estimates of for all possible CDFs (i.e.

where m j a ) (x Fm Fm m m e p a  1 p: (pmj )j 1,N,m N M,     1 M,  : 1 T M M, p p N     M ,..., 1 i m}) m i

e :({  ). See Maiboroda et al. (2008) for details.

Note that weights can be negative. Thus, we can improve by introducing improved empirical distribution function (see Maiboroda et al. (2005)):

m j a Fˆ xm( ) )) ( max , 1 min( : ) ( ˆ x F y F x y m   . 3. GEE ESTIMATE

Consider some set of measurable functions

Theoretical moment may be estimated by the weighted empirical moment as . ) ; ( ),..., ; ( 1 t gK t d g   

gk(x;t)Fk(dx) .) ; ( 1 : ) ( ˆ 1

  N j j k k j k k a g t N t g

Define the joint weighted empirical moment of

g

ˆ t

kk

(

)

as

.) ( ˆ : ) ( ˆ 1

  K k k k t g t g

(3)

Definition. GEE estimator for ˆ  is the measurable function from sample N  1,...,   N

such that Next we assume that as

. . 0 ) ˆ ( ˆ   g P[t:gˆ(ˆ)0]1

Example. Moment estimators can be represented as GEE estimators. Let be the set of estimating functions. Denote theoretical moment of

as , K h h ,...,1 ( Hk ) (x hk

 ( ) ( ; ) : ) h x F dxt

t k k k1 K, . Define estimating functions as

), ( ) (x H t hkk  : ) ; ( tx gk k1 K, . ) ˆ ( 1

K k k k h

GEE estimator can be represented as where

ˆ

: ˆ  H1

 1

H is the inversed function to

Analogous improved moment estimate with can be introduced. .) ( k t H : ) ( 1

  K k t H ) ( ˆ ) (x Fk dx  : k k  ˆ h h

k

Consistency for moment estimators is shown in theorem 3.1 from Doronin (2014a).

4. ASYMPTOTICS OF GEE ESTIMATOR

Assume that CDFs are absolutely continuous with respect to sigma-finite measure

M

F F ,...,1

on the space of observations. Denote densities of each

component's distributions as , ) ( ) ; ( : ) ( fk x d x dF x k    k1 K, , , ) ( ) ( : ) ( x d x dF x f k k . , 1 M K k  

Introduce the matrix of estimating functions

d K K x g x g x G              ) ; ( ) ; ( : ) ( 1    .

Expectation of

G

(x

)

from the m-th component designate as , ) ( ) ( :

G x F dx Gm m m1,...,M.

Introduce the following notations.

K K K l k N j s j r j l j k j N K l k l k s r s r a a p p N              

, 1 , 1 , 1 , , , , 1 lim : ) ( :   , r,s1,M.

(4)

K K K l k N j m j l j k j N K l k l k m m a a p N              

, 1 , 1 , 1 , , ) : lim 1 ( :   , m1,M. K K m M m mf x x R( ):

( )  1 . d d M s r rs s T r T G G dx x G x R x G Z

  1 , , ) ( ) ( ) ( ) ( :   . d d k t K k k dx F t t x g V     

 

( ; ) ( ) : 1  .

Theorem 4.1. (Theorem 3.4 from Doronin (2014a)) Let be GEE estimator in introduced definitions, and be some open neighborhood of the true parameter value

ˆ

U .

 Assume the following.: (i) converges in probability to ˆ  as N.

(ii) Derivatives exist and are integrable (i.e. ) for , where denotes expectation under condition that the true parameter value is

T k k x t g x t t g'( ; ) ( ; )/ U tEt ,   ||] ) ; ( [||g' t Et km

t and m are the formal random values with

distrubutions Fm.

(iii) Functions g (t) E [gk( m;t)]

m

k    are continuous on U.

(iv) E[suptU ||gk(m;t)||].

(v) Limit matrix  exist and is nonsingular. (vi) Matrices r ,s and m exist.

(vii) Matrix V is nonsingular.

(viii) GEE is unbiased, i.e. [ ( ; )] 0

1 

K

k Et gkk t for tU.

Then N(ˆ) converges in distribution to Gaussian distribution with zero mean and covariance matrix 1 T.

ZV

V 

5. LOWER BOUND OF DISPERSION MATRIX FOR GEE ESTIMATOR

Assume that the matrix

Z

and nonsingular matrix V exist. Without loss of generality we can assume that two conditions for GEE estimator are fulfilled: ˆ

(i1)

gk(x;)Fk(dx) 0, k1,K (unbiasedness);

(5)

Consider the minimization problem of dispersion matrix

Z

in Loewner ordering (i.e. A if B A is non-negatively defined) over all B gk(x;)

.

d

c ,  satisfying conditions (i1), (i2). Thus, we have to minimize for all

The solution of this problem is the set of estimating functions ) which give us the lower bound of dispersion matrix

Zc cT ; x ( gk  

Z (see theorem 4.1 from Doronin (2014a)).

6. ADAPTIVE ESTIMATE

Unfortunately, it is impossible to use in practice the optimal estimating functions which give the lower bound of dispersion matrix. The first reason is that they depend on unknown densities

), ; (xgk  ), (x

fk k1 K, . The second one

is the difficulty to solve the GEE in the general case. Therefore, we consider the adaptive approach.

Each function can be approximated as where

is some matrix of coefficients to be found, and is the vector of some predefined basis functions (e.g. B-splines). Under conditions (i1), (i2) equation we can approximate as

) ; ( tx gkt) 0 ( ) ; ( tx u Bk k L k x t u ( ; ) k L d k B   k

K k k k g

 

    K k K k k k k k k t B u t g 1ˆ ( ) 1 ˆ ( ) ( ) 0   .

The solution of this approximated equation is ˆ ( .)

1

   K k k k ku B t   Thus,

one can start with some consistent estimate and define adaptive estimate as ~ .) ~ ( ˆ ~ : ˆ 1

   K k k k ku B   

Consistency and asymptotic normality of introduced adaptive estimate is shown in lemma 3.3 from Doronin (2014b).

7. NUMERICAL RESULTS

We chose a three-component mixture model to simulate. All components are taken Gaussian, with parameter values (m,) as (3.2), for each component, respectively. The first two components are assumed to be

), (0.2 2 . 3

(6)

parameterized with (different means, common standard deviation). Distribution of the third component is assumed to be fully unknown. Concentrations were also generated as the pseudo-random values, derived by formula where is taken from uniform distribution on . Series of samples with sizes 50, 100, 250, 500, 750, 1000, 2000, 5000 were simulated, 2000 samples in each series. Vectors of basis functions

for adaptive estimate were chosen as the set of uniform cubic B-splines with knots at points T m m, , ) ( 1 2   ) 3 2 j j s s   , /( 1 j m j m j s s psmj

]

1

,

0

[

) ; ( tx uki

m where m and  are the mean and standard deviation of the k-th component, respectively, i5,...,5. Matrices were chosen to minimize dispersion matrix. Results are shown in Figure 1.

k

B

CONCLUSIONS

The mixture model with varying concentrations is considered. Several estimators for this model are introduced (moment, GEE, adaptive). The proposed estimators are consistent and asymptotically normal under some conditions. Performance of moment and adaptive estimators are compared on simulated samples. Dispersion of introduced estimators converges to its theoretical asymptotic value for samples with 1000 and more observations.

REFERENCES

Doronin O. (2014a), Lower bound of dispersion matrix for semiparametric estimation in mixture model. "Theory of Probability and Mathematical Statistics", no. 90, p. 64−76.

Doronin O. (2014b), Adaptive estimation in semiparametric model of mixture with varying concentrations. "Theory of Probability and Mathematical Statistics", no. 91, p. 27−38. Doronin O. (2012), Robust Estimates for Mixtures with Gaussian Component. "Bulletin of Taras

Shevchenko National University of Kyiv. Series: Physics & Mathematics" (in Ukrainian), vol. 1, p. 18–23.

Maiboroda R., Sugakova O. (2008), Estimation and classification by observations from mixtures. Kyiv University Publishers, Kyiv (in Ukrainian).

Maiboroda R., Kubaichuk O. (2005), Improved estimators for moments constructed from observations of a mixture. "Theory of Probability and Mathematical Statistics", no. 70, p. 83−92.

Maiboroda R., Sugakova O., Doronin A. (2013), Generalized estimating equations for mixtures with varying concentrations. "The Canadian Journal of Statistics", no. 41, vol. 2, p. 217−236.

(7)

                                       1 0 0 2 0 0 5 0 0 1 00 0 2 00 0 5 00 0 1 0 0 1 00 0 1 0 4 1 0 5 ) ˆ(m1 MSE                                        100 200 500 1000 2000 5000 1 0 0 50 2 0 0 300 15 0 70 ) ˆ (1 m RobVar                                        1 0 0 2 0 0 5 0 0 1 00 0 2 00 0 5 00 0 1 0 0 1 00 0 1 0 4 1 0 5 ) ˆ(m2 MSE                                        5000 100 200 500 1000 2000 ) ˆ(m2 RobVar

 

1 0 0 70 50 300 20 0 1 5 0

(8)

                                       1 0 0 2 0 0 5 0 0 1 00 0 2 00 0 5 00 0 10 1 0 0 1 00 0 1 0 4 1 0 5 )ˆ ( MSE                                        100 200 500 1000 2000 5000 N )ˆ (  RobVar He re MSE is th e m ea n sq ua red e rro r of th e para m eter e st im at e m ul tip lied b y nu m be r of obse rv ations . RobVar i s th e ro bu st esti m ate of MSE t hro ug h the i nte rqu ar tile r an ge o f pa ra m ete r es tim ate . S ym bol ■ in dic ate s the m om ent e st im at es ( lo w er line f or im prov ed an d up pe r lin e for un im prov ed), an d ▲ − a da pt ive e sti m ates. W hi te s ym bo ls in di cate th eo re ti cal disp ersi on . Sy m bol ○ ind ic at es th e l ow er bou nd . Figure 1 . D is pers io n of es tim ate s So urce : pl ots are g enera ted by W olf ra m M at he m at ica usin g ou r own s cr ip t.

 

70 50 30 20 15 300 10 0 2 0 0 1 5 0

Cytaty

Powiązane dokumenty

In this paper we complete the characterization of (K m,n ; 1)- vertex stable graphs with minimum

I crush a cracker Hard as rock With my bare hand The crust disintegrates Into smallest crumbles I spend a lot of energy Breaking apart chemical bonds Between matter

For each of the given cost functions nd (a) the cost, average cost and marginal cost of producing 1000 units; (b) the production level that will minimize the average cost; and (c)

In this paper, we prove existence and controllability results for first and second order semilinear neutral functional differential inclusions with finite or infinite delay in

Students of the English-medium studies at the Faculty of Mathematics and Information Science are not obliged to study English in their mandatory 4th semester foreign language

1) All students must take 180 credit hours of a foreign language before successfully completing their program of study. 2) Foreign language courses are incorporated in the

The antipodal graph of a graph G, denoted by A(G), is the graph on the same vertices as of G, two vertices being adjacent if the distance between them is equal to the diameter of

This article presents new fixed point and homotopy results for single valued mappings satisfying an implicit relation on space with two metrics.. In Section 2, we give an