• Nie Znaleziono Wyników

Choice of the Smoothing Parameter in Kernel Density Estimation

N/A
N/A
Protected

Academic year: 2021

Share "Choice of the Smoothing Parameter in Kernel Density Estimation"

Copied!
7
0
0

Pełen tekst

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S FO LIA O EC O N O M IC A 196, 2006

A lek sa n d ra B a s z c z y ń s k a *

CH O ICE OF T H E SM O O T H IN G PARAM ETER IN KERNEL DENSITY ESTIM ATIO N

Abstract. K ernel density estimation is one o f the main methods available for univariate density estim ation. The problem s o f choosing the kernel function and choosing the smoothing param etr are o f crucial im portance in density estimation. Various m ethods, used in practice, for choosing sm oothing param etr are discussed. Some o f them are simple, some complicated in calculations, but it m ust be emphasized th at the appropriate choice o f m ethod for choosing param eter depends on the purpose for which the density estim ate is to be used.

M onte C arlo study is presented, where three “practical rules" and tw o form s o f cross- validation (maximum likelihood CV and least-squares CV) are used in density estim ation. The values o f sm oothing param eters are com pared with the “ optim al” one, which is obtained by minimizing mean squared error. In all mentioned studies the accuracy o f the estimation, measured by mean squared error, is considered.

Key words: density estim ation, kernel function, sm oothing param eter, practical rules, cross-validation.

Kernel density estim ation is one o f the m ethods m ost widely used for nonparametric density estimation. It is defined by:

where x ls . . . , хя is a random sample, K(u) is a kernel function satisfying J K(u)du = 1, h is sm oothing parameter called bandwidth or the window

I. IN TRO D U CTIO N

(

1

)

+ 00 00 width.

(2)

In the process o f constructing the estimator we have to choose two parameters o f the method: kernel function K( u) and sm oothing parameter h (bandwidth). The kernel functions used in practice are symmetric around 0 and integrate to 1, and since the kernel is a density, estim ator (1) is also density function. Some o f the best known kernel functions are presented in Dom ański et al. (1998). The properties o f some kernel functions are explored by Baszczyńska (2005). The smoothing parameter h ( h > 0 ) regulates the degree o f sm oothness in estimation o f the density function. In other words, for small values o f h we get less noisy density estimate, but for big values o f h we get very smooth density estimate.

T he appropriate choice o f smoothing parameter depends on the purpose for which the density estimate is to be used.

Subjective choice o f smoothing parameter is a natural m ethod, where several kernel estimations arc made with different parameters h and such an estimation is chosen, which is m ost in accordance with prior ideas about density. This method is used in situation, where the main purpose o f density estimation is only to explore the data. O f course, it may be connected with mistakes and errors, when unexperienced user takes advantages with such method o f sm oothing parameter.

A utom atic m ethods for choosing the bandwidth are used in estimation o f density function, which is the base for presenting conclusions, and as a starting point in subjective adjustment. They are also used in situations, where density estimation is based on a large number o f data sets or when density estim ation is only a part o f larger procedure.

2. BANDW IDTH C H O O SIN G

A ssum e that:

• given a sample X lt . . . , X n, f ( x ) is the estimator o f continuous density function

• kernel function is o f second order: + OO f K(u)du = 1, - OO + oo f uK(u)du = 0, (2) - 00 + OO J u2K(u)du = k 2 ф0.

(3)

“Practical rules”, known also as “reference to a standard distribution” , is a simple autom atic chooice o f parameter h in density estimation. Parametr h is “optim al” when minimizing the approximate mean integrated square error is made. It depends on unknown density. So, for simplicity, it is assumed that the unknown distribution is normal with parameters p and a.

T he practical rules are the following:

Practical rule I h = 1.06 crn 5,

where a is estimated from a sample

Practical rule II h = 0.79 R n * ,

where R is interquartile range

Practical rule III h = 0 .9 An 5, where A = min

Cross-validation is a well-known method to choose the sm oothing para­ metr in kernel density function. There are two forms o f cross-validation: maximum likelihood cross-validation and least-squares cross-validation.

Least-squares cross-validation method o f choosing the sm oothing para­ metr consists in minimizing the following:

CVNK( h) = - Ą i (3)

where К ■ К denotes the convolution o f the kernel with itself.

For Gaussian and Epanechnikov kernel convolution o f the kernel with itself are respectively:

1 / — u 2V| к а д = - ^ е х р ( 3 К ■ K ( u ) = . - ( 3 2 - 4 0 и 2 + 2 0 | и| 3 - | u | 5) dla \ u \ ^ 2 1160 0 otherwise

(4)

M aximum likelihood cross-validation consists in choosing the smoothing parametr, which maximizes the following:

- l o g [ ( n - 1)Л]. (4)

3. M O N TE CARLO STUDY

M onte Carlo study was conducted to compare the values o f smoothing parametrs h in kernel density estimation. Analysis o f properties o f estimator was done in three basic variants, depending on distribution, from which the data were chosen. These variants are the following:

• variant I: normal distribution N ( 0, 0.2),

• variant II: mixture o f normal distributions:

/ ( x ) = 0.25/ х( х ) -I- 0.75f 2{x), where / t ( x ) is density function N ( 0, 0.2), / 2( x )

is density function N ( 3, 0.5). Variant II presents two-m odal distributions. • variant III: mixture o f normal distributions:

/ ( x ) = 0 . 5 / j ( x ) + 0 . 2 5 / 2(x) + 0 . 2 5 / 3(x) , where / t ( x ) is density function

N ( 0 , 0 .2 ) , / 2( x ) is density function N (3 , 0 .5 ) , / 3( x ) is density function N (7 , 0 .5 ) . Variant III presents three-modal distributions.

In experiment we used some measures: • mean squred error

• M R = m a x \ f ( x j - f ( x ,) |, i

• P is a number o f cases, where the value o f estimator is greater than value o f density function in this point (over smoothing),

• N is a number o f cases, where the value o f estimator is less than value o f density function in this point (under smoothing).

Experiment A consists in using in density estimation values o f smoothing parameters, which are calculated from „practical rules” . In that way es­ timators (with Gaussian and Epanechnikov kernels and with smoothing parameters hu hn, hm) are compared with true density functions descibed by variant I, II and III. Optimal smoothing parameter is determined by minimizing BSK in estimation. The results o f this part o f the study are presented in Tables 1, 2 and 3.

(5)

Table 1. Values o f sm oothing param eters from „practical rules” for variant 1 Kernel function Smoothing param eter B SK M R P N Optim al sm oothing param eter G aussian h, =0.085814 0.030399 0.397500 44 84 0.08 hu = 0.085716 0.030393 0.397796 44 84 Л,,, = 0.072368 0.034431 0.456595 71 57 Epanechnikov k, =0.085814 0.030365 0.307472 40 88 0.08 h,, = 0.085716 0.030322 0.307761 39 89 Л,,, - 0.072368 0.029081 0.386202 55 73

Table 2. Values o f sm oothing param eters from „practical rules” for variant II

Kernel function Smoothing param eter B SK M R P N O ptim al sm oothing param eter G aussian Л, = 0.818362 0.046498 0.376785 25 103 0.18 A,, = 0.560079 0.028571 0.329700 30 98 hm = 0.472863 0.022256 0.302722 32 96 Epanechnikov Л, = 0.818362 0.053581 0.396041 25 103 0.16 hn = 0.560079 0.032578 0.352379 30 98 Л,,, = 0.472863 0.025426 0.327119 32 96

Table 3. Values of sm oothing param eters from „practical rules” for variant III

Kernel function Sm oothing param eter BSK M R P N Optimal smoothing param eter G aussian Л, = 1.267892 0.198267 0.833802 18 110 0.11 = 1.164849 0.191606 0.822093 19 109 h,„ = 0.983456 0.176536 0.794683 19 109 Epanechnikov hj = 1.267892 0.212725 0.858054 17 111 0.09 Л,, = 1.164849 0.207093 0.848407 16 112 /*,„ = 0.983456 0.193565 0.825899 19 109

(6)

In variant I all values o f sm oothing parameters from „practical rules” are similar to optimal value. In variants II and III differences are big. It means that „practical rules” can be used only in situations when the distribution o f population is normal. In other cases, „practical rules” do not provide appropriate estimation o f density function. This uncomplicated method o f choosing a smoothing parameter, used m ostly in applications, turned out to be useless when there is another distribution than the normal one.

In experiment В estimation o f density function is made using parameters calculated by m ethods based on cross-validation (maximum likelihood cross- validation and least-squares cross-validation). Values o f sm oothing parame­ ters from cross-validation methods and optimal value o f parametr h in density estim ation (for seven different kernel functions) are presented in Table 4.

Table 4. Values o f smoothing param eters determined by cross-validation methods for variant I, 11 and III

Kernel function V ariant I V ariant II V ariant III

G aussian

min BSK h = 0.0840 min BSK /i = 0.1340 min BSK /1 = 0.1180 maxCWVfV h = 0.1003 m a xC VN W h = 0.1949 maxCVNW Л = 0.1385 minCHVK h = 0.0917 min CVNK /1 = 0.1935 m in C W K h = 0.1125

Epanechnikov

min BSK h = 0.1770 min BSK h = 0.3100 minflSK h = 0.2530

m a xC V N W h = 0.2071 m&xCVNlV /1 = 0.4110 ma xC V N W /i = 0.3016 min CVNK h = 0.2107 m inC VN K h = 0.4030 min CVNK /1 = 0.2171

Triangle min BSK >1 = 0.1916 min BSK h = 0.3245 min BSK h = 0.2757 m ax C W lV h = 0.2167 ma xC VN W h = 0.4379 m ax C W itt' h = 0.3464

Uniform min BSK /г = 0.1355 min BSK h = 0.2410 min BSK h = 0.2008

m a.\C VN W /1 = 0.1701 m axC V N W h = 0.3429 m a x C W lF h = 0.2897

Q uartic min BSK h = 0.2075 min BSK h = 0.3429 min BSK h = 0.3018 m ax C W lV h = 0.2425 m axC W W ' h = 0.4835 m axC W IV h = 0.3693

Tri weight minBSK h = 0.2356 minflSK h = 0.3873 min BSK h = 0.3452

m axC V N W h = 0.2914 m axC W IV h = 0.5534 m axC W ÍV h = 0.4124

Cosinus min BSK /1 = 0.1783 minflSK h = 0.3137 min BSK h = 0.2586 m ax C W IV /1 = 0.2113 m a\C V N W /1 = 0.4168 m a xC VN W /1 = 0.3105

(7)

In all examined cases (seven kernel functions and different distributions o f population denoted as variants I, II, III) value o f sm oothing parameter derived from cross-validation methods is bigger than optimal value (last column in Tables 1, 2 and 3). It means that criterium o f cross-validation causes less sm oothness o f kernel density estimator. Independently o f dist­ ribution o f population, from which sample is derived, Gaussian kernel is connected with smaller parametr h - it can indicate som e sm oothing proper­ ties o f this kernel.

REFERENCES

Baszczyńska A. (2005), Som e Remarks on the Choice o f the Kernel Function in Density

Estimation, A cta Universitatis Lodziensis, Folia Oeconomica.

D om ański Cz., Pruska K ., W agner W. (1998), Wnioskowanie statystyczne przy nieklasycznych

założeniach, W ydawnictwo Uniwersytetu Łódzkiego, Łódź.

H ardle W. (1991), Smoothing Techniques. With Implementation in S, Springer-Verlang, New Y ork.

Priestley M ., C hao M . (1972), “ N onparam etric Function Fitting” , Journal o f the Royal

Statistical Society B, 34, 385-392.

R osenblatt M . (1956), “ Rem arks on Some N onparam etric Estim ation of a D ensity Function” ,

Annals o f M athematical Statistics, 27, 832-837.

Silverman B. W. (1996), Density Estimation fo r Statistics and Data Analysis, C hapm an and Hall, L ondon.

W and M ., Jones M . (1995), Kernel Smoothing, Chapm an and Hall, London.

Aleksandra Baszczyńska

WYBÓR PARAM ETRU W YGŁADZANIA W ESTY M A C JI JĄ D R O W E J FU N K CJI G ĘSTO ŚCI

(Streszczenie)

Jądrow a estymacja jest jedną z podstawowych m etod nieparam etrycznej estymacji funkcji gęstości. Zagadnienie wyboru funkcji ją d ra oraz wyboru właściwej w artości param etru wy­ gładzania traktow ane są ja k o zasadnicze w estymacji funkcji gęstości. W pracy rozważane są różne m etody wyboru param etru wygładzania w estymacji jądrow ej, od metod najprostszych do nieco bardziej złożonych. Należy podkreślić jednak, iż wybór metody wyboru param etru wygładzania zależy od celu dokonywanej estymacji charakterystyki funkcyjnej.

W artykule przedstaw iono również wyniki z przeprowadzonego eksperymentu M onte Carlo, gdzie rozw ażano trzy „praktyczne zasady” wyboru param etru wygładzania oraz dwie metody

cross-validation (największej wiarygodności i najmniejszych kw adratów ). W artości tak otrzym a­

nych param etrów wygładzania są porównywane z param etrem otrzymanym poprzez m ini­ malizację błędu średniokw adratow ego, traktow anym jak o param etr „optym alny” .

Cytaty

Powiązane dokumenty

Below, in the next three subsections, we give very condensed recipes for calculating the optimal bandwidth using the PLUGIN and two LSCV approaches: the simplified one for

When all the information needed to carry out sampling is readily available (such as: auxiliary variable values, unit sampling costs, adjacency matrix in spatial sampling), Fat-

on the Bergman Kernel and Related Topics October

Celem tego artyku³u jest przedstawienie w jaki sposób spadek liczby urodzeñ, starzenie siê spo³eczeñstw oraz wzrost migracji wp³ywa na zmiany na poziomie globalnym oraz kreuje

Based on the simple model of standard linear optimization problems , some basic concepts of interior point methods and various strategies used in the algorithm are introduced..

The third section will discuss the coupling procedure and typical physical requirements such as the conservation of discrete work and forces, as well as problems that arise,

Viewed from various perspectives FSM is a neurofuzzy system, a probability density estimation network modeling p(X |C i ), localized transfer functions enable a memory-based

This intriguing situation, which occurs also in Theorem 2 of [9], depends on the fact that our proof of Theorem 3 uses ideas similar to those of Shapiro's paper [11] on weighted