• Nie Znaleziono Wyników

Nonparametric estimation of quantile versions of the Lorenz curve.

N/A
N/A
Protected

Academic year: 2021

Share "Nonparametric estimation of quantile versions of the Lorenz curve."

Copied!
9
0
0

Pełen tekst

(1)

Agnieszka Magdalena Siedlaczek (Opole)

Nonparametric estimation of quantile versions of the Lorenz curve.

Abstract Estimators of quantile versions of the Lorenz curve are proposed. The pointwise consistency and asymptotic normality of the estimators is proved. The efficiency of the estimators is also studied in simulations.

2010 Mathematics Subject Classification: Primary: 62G05; Secondary: 62P10.

Key words and phrases: Lorenz curve, quantile version of the Lorenz curve, non- parametric estimation.

1. Introduction The Lorenz curve was first introduced by Lorenz in [7].

It measures the degree of inequality in the distribution of some features in a population. It is applied in many fields such as economics, biology, medical sciences, industry, etc.

Definition 1.1 Let X be a random variable with cumulative distribution function (CDF) F and µ = EX < ∞. Lorenz curve of X can be represented by the function

L(p) = 1 µ

Z p 0

F−1(t)dt, where F−1(t) = inf{x : F (x) ≥ t}, and p ∈ [0, 1].

Prendergast and Staudte in [9] redefined the basic concept of the Lorenz curve in terms of quantiles instead of moments. They introduced three quantile versions of the Lorenz curve. These versions are defined for the class of all CDFs F with F (0) = 0, not only for these with the finite expected value.

Denote F−1(p) = xp.

Definition 1.2 Let F be the class of all CDFs F with F (0) = 0. For F ∈ F the quantile versions L1, L2, L3 of the Lorenz curve are defined by functions

L1(p) = pxp/2 x0.5

, L2(p) = p xp/2

x1−p/2,

(2)

L3(p) = 2p xp/2

xp/2+ x1−p/2 = 2 1/p + 1/L2(p) for p ∈ [0, 1), and Li(1) = 1, i = 1, 2, 3.

The article is organized as follows. In Section2the estimators of the quan- tile versions of the Lorenz curve are proposed. The pointwise consistency and asymptotic normality of the estimators is proved also in Section 2. In Sec- tion3 the estimators proposed are examined through a computer simulation study. Some real data are examined in Section4. Finally, in Section5, a brief discussion of the plug-in estimators of quantile versions of the Lorenz curves is provided.

2. Estimation of quantile versions of the Lorenz curve The plug-in estimators of each of the quantile versions of the Lorenz curve one can obtain by the substitution of the unknown quantiles by quantile estimators. In this section we recall the known quantile estimators based on the distribution function estimators and then we apply them to the estimation of the quantile versions of the Lorenz curve.

Let X1, . . . , Xnbe i.i.d. non-negative random variables with a CDF F and a quantile function Q = F−1. A traditional nonparametric estimator of the distribution function is the empirical distribution function (EDF) given by formula

nE(t) := 1 n

n

X

i=1

I(−∞,t](Xi),

while IA(x) = 1 if x ∈ A and 0 otherwise. Accordingly, a nonparametric estimator of xp is the empirical quantile

ˆ

xEp,n= X([np]+1):n, (1)

where [x] denotes the greatest integer not greater than x.

In a widely cited article [3], the authors analysed nine different sample quantile definitions, which are commonly used in statistical packages. Most of them are based on the quantile function estimators ˆQnconstructed by linear interpola- tion between the so called plotting positions, i.e., the points pk, k = 1, . . . , n, for which ˆQn(pk) = Xk:n, where X1:n, . . . , Xn:n denote the order statistics of the sample Xn. From the nine mentioned estimators, the following three are recommended in literature:

• ˆxHp,n proposed by Hazen (1914) in [2], based on the plotting positions pHk = k − 1/2

n , (2)

• ˆxHFp,n proposed by Johnson and Kotz (1970) in [4] and recommended by Hyndman and Fan in [3], based on the plotting positions

pHFk = k − 1/3

n + 1/3, (3)

(3)

• ˆxM Pp,n proposed by Weibull (1939) in [11] and Gumbel(1939) in [1] and recommended by Makkonen and Pajari (2014) in [8], based on the plot- ting positions

pM Pk = k

n + 1. (4)

Zieliński in [12] has proposed a continuous estimator of a distribution func- tion based on a kernel estimator with random bandwidth. The estimator is constant on some intervals of the real line R, hence the quantile function estimator based on this function is not continuous. Therefore Jokiel-Rokita and Pulit in [5] proposed a continuous and easily invertible estimator ˆFnJ P of the distribution function. The quantile function estimator ˆQJ Pn , based on the estimator ˆFnJ P is of the form

J Pn (p) = npX(k+1):n− X(k−1):n

2 − (k − 1)X(k+1):n− X(k−1):n 2

+X(k−1):n+ Xk:n

2 ,

if k−1n < p ≤ nk for k = 1, . . . , n, and ˆ

xJ Pp,n= ˆQJ Pn (p). (5) The ˆxJ Pp,n estimator does not satisfy a symmetry property, which is desirable in quantile estimation [3]. For that reason Jokiel-Rokita and Siedlaczek in [6]

proposed a modification ˆFnM of the ˆFnJ P estimator which leads to the following estimator

Mn (p) =

































X0:n+ D1

nM(X1:n)p, p ∈ [0, ˆFnM(X1:n)], Xk:n+ Dk+1

2[nk− ˆFnM(Xk:n)][p − ˆFnM(Xk:n)], p ∈ [ ˆFnM(Xk:n),kn], Xk:n+ X(k+1):n

2 + Dk+1(p −nk)

2[ ˆFnM(X(k+1):n) −kn], p ∈ [nk, ˆFnM(X(k+1):n)], Xn:n+ Dn+1

1 − ˆFnM(Xn:n)[p − ˆFnM(Xn:n)], p ∈ [ ˆFnM(Xn:n), 1], for k = 1, . . . , n − 1, of the quantile function, where Di = Xi:n− X(i−1):n, i = 1, . . . , n + 1, X0:n= 0, X(n+1):n = Xn:n+Xn:n−X2(n−1):n = 32Xn:n12X(n−1):n, and

ˆ

xMp,n= ˆQMn (p). (6)

We will denote ˆLEi (p), ˆLHi (p), ˆLHFi (p), ˆLM Pi (p), ˆLJ Pi (p) and ˆLMi (p), i = 1, 2, 3, the plug-in estimators obtained by the substitution of the unknown

(4)

quantiles by the quantile estimators given by equations (1), (2), (3), (4), (5) and (6), respectively.

Theorem 2.1 (consistency) The proposed estimators ˆLji(p) are consistent for Li(p) for each p ∈ (0, 1), where i=1,2,3 and j = E, H, HF, M P, J P, M . Proof Consistency of ˆxEp is for example proved in [10] (Theorem 2.3.1).

Consistency of other quantile estimators can be concluded from Theorem 3 at [6]. By Slutsky’s theorem we have consistency of Lji(p), for each p ∈ (0, 1).

Theorem 2.2 (asymptotic normality) The proposed estimators ˆLji(p) are asymptotically normal for Li(p) for each p ∈ (0, 1), where i=1,2,3 and j = E, H, HF, M P, J P, M .

Proof Asymptotic normality of ˆxEp is for example proved in [10] (Theorem 2.3.3.A). Asymptotic normality of other quantile estimators can be concluded from Theorem 4 at [6]. By Slutsky’s theorem we have asymptotic normality

of Lji(p), for each p ∈ (0, 1). 

3. Simulation study Simulations are made for samples of size n = 30 from the generalized Pareto distribution with following CDF

Fξ,σ(t) =













 1 −

 1 +ξtσ

−1/ξ

I[0,∞)(t) for ξ > 0,

 1 −

1 +ξtσ−1/ξ

I[0,−σ/ξ](t) for ξ < 0,

1 − exp −σt I[0,∞)(t) for ξ = 0,

where σ > 0. We will take into account the scale parameter σ = 1 and the shape parameter ξ ∈ {−1/4, 1/4, 2/3}. The quantile function is

Qξ,σ(p) =

σ

ξ (1 − p)−ξ − 1

for ξ 6= 0,

−σ ln(1 − p) for ξ = 0.

When ξ ≥ 1 the mean is not defined, neither is the Lorenz curve. When ξ < 1 the Lorenz curve is

Lξ(p) =





1−(1−p)1−ξ−(1−ξ)p

ξ for ξ 6= 0, ξ < 1, ln(1 − p)(1 − p) + p for ξ = 0.

(5)

The quantile versions of the Lorenz curve are given by

LGP1 (p) =



 p(1−

p 2)−ξ−1

(12)−ξ−1 dla ξ 6= 0, p− ln(1−

p 2)

ln 2 dla ξ = 0,

LGP2 (p) =





 p(1−

p 2)−ξ−1

(p2)−ξ−1 dla ξ 6= 0, pln(1−

p 2)

lnp2 dla ξ = 0,

LGP3 (p) =





2p (1−

p 2)−ξ−1

(1−p2)−ξ−1+(p2)−ξ−1 dla ξ 6= 0, 2p ln(1−

p 2)

ln(p2p24 ) dla ξ = 0.

As a measure of error of the estimators ˆLi, i = 1, 2, 3, we take into account the integrated mean squared error IMSE, which is defined as

IM SE = E

Z 1 0

(Li(p) − ˆLi(p))2dp

 .

Table 1 presents the estimated IMSEs of the proposed estimators, multi- plied by 100 for a clarity of the results, based on 1000 replications.

Table 1: Simulation results for the Pareto distribution

qE qH qHF qM P qJ P qM

ξ = −1/4

L1 0.4046 0.3074 0.3057 0.3046 0.2834 0.2939 L2 0.365 0.3072 0.3035 0.2986 0.2795 0.2829 L3 0.4599 0.4256 0.4095 0.38 0.4084 0.4108 ξ = 1/4

L1 0.4189 0.3082 0.306 0.3038 0.2815 0.2924 L2 0.3622 0.2925 0.2884 0.2826 0.2595 0.2632 L3 0.4582 0.4154 0.399 0.3693 0.3937 0.397 ξ = 2/3

L1 0.51 0.3688 0.3665 0.3644 0.3371 0.3505 L2 0.4056 0.3223 0.3188 0.3144 0.2843 0.2879 L3 0.4704 0.4181 0.4031 0.3761 0.3919 0.3958 In the considered cases of the generalized Pareto distribution, while esti- mating ˆL1(p) and ˆL2(p), the ˆLJ P estimator has the least estimated IMSEs.

In the case of ˆL3(p), the ˆLM P estimator has the least estimated IMSEs. The worst estimator, in the sense of IMSE, is the ˆLE.

Let us consider also the Weibull We(ξ, σ) distribution with the following CDF

Fξ,σ(x) = 1 − exp



−x σ

ξ ,

(6)

where x ≥ 0, σ > 0, ξ > 0. In the simulations we take into account the scale parameter σ = 1 and the shape parameter ξ ∈ {1/2, 3/2, 3}. The quantile function of the Weibull We(ξ, σ) distribution is

Qξ,σ(p) = σ(ln(1 − p))1ξ. The corresponding Lorenz curve is

Lξ(p) = γ(1 + 1ξ, − ln(1 − p)) Γ(1 +1ξ) ,

where Γ(a) is the gamma function, and γ(a, x) is the incomplete gamma function.

Quantile versions of the Lorenz curve of the Weibull We(ξ, σ) distribution are given by

L1(p) = pξ

sln(1 −p2) ln(12) , L2(p) = pξ

sln(1 −p2) ln(p2) ,

L3(p) = 2p

ξ

q

ln(1 −p2)

ξ

q

ln(1 −p2) + ξ q

ln(p2) .

Table2presents the estimated IMSEs for the Weibull distribution, multiplied by 100 for a clarity of the results, based on 1000 replications.

Table 2: Simulation results for the Weibull distribution

qE qH qHF qM P qJ P qM

ξ = 1/2

L1 0.7515 0.5357 0.5298 0.5209 0.4784 0.4994 L2 0.5516 0.4156 0.4093 0.399 0.3546 0.3603 L3 0.494 0.4043 0.3996 0.3931 0.3607 0.3658 ξ = 3/2

L1 0.291 0.2215 0.2213 0.2226 0.2063 0.2132 L2 0.2603 0.222 0.2207 0.2204 0.2034 0.2059 L3 0.1728 0.153 0.1534 0.1564 0.1429 0.1446 ξ = 3

L1 0.1117 0.0855 0.086 0.0882 0.0802 0.0827 L2 0.1218 0.1069 0.1069 0.1083 0.0994 0.1005 L3 0.0594 0.0531 0.0536 0.0558 0.0499 0.0504 The ˆLJ P estimator has the least estimated IMSE for all considered cases of the Weibull distribution. The worst estimator, in the sense of IMSE, is again the ˆLE.

4. Real data analysis We applied our results to a medical problem of investigating the effectiveness of treatment. We used data from Jokiel-

(7)

Rokita and Pulit [5]. The data concerns the effectiveness of the combined treatment of interferon alpha and metronomic cyclophosphamide in patients with metastatic kidney cancer. There were 31 patients, while we consider hemoglobin level for those, who had clinical response observed at 24-th week of treatment (17 patients), i.e. 7.2, 8.6, 9.1, 9.5, 10.9, 10.9, 11.1, 11.5, 11.7, 11.9, 11.9, 12.7, 12.9, 13.9, 14.1, 14.5, 14.7. The estimated quantile versions of the Lorenz curve for these data are illustrated in Figure1. We present only curves for the best estimator ˆLJ P, in the context of IMSE, and in the most cases considered, and the worst empirical estimator ˆLE, for comparison.

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

p LE

LJP

(a) L1

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

p LE

LJP

(b) L2

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

p LE

LJP

(c) L3

Figure 1: Estimators of the quantile versions of the Lorenz curve of the real data

5. Conclusion and further remarks New estimators of the quantile versions of the Lorenz curve have been proposed. The consistency and asymp- totic normality of the estimators were proved. In the simulation, the accuracy, measured by the integrated mean squared error, of the plug-in estimators of the quantile version of the Lorenz curve was investigated when the small sam- ples were generated from the Weibull and generalized Pareto distributions.

The ˆLJ P and ˆLM P estimators have the lowest estimated IMSEs. The real data analysis showed the great difference between the empirical estimator and estimator ˆLJ P.

The considered estimators of the quantile versions of the Lorenz curve can be applied to the estimation of measures of inequalities in distributions such as the quantile versions of the Gini index.

References

[1] E. Gumbel. La probabilité des hypothèses. C. R. Acad. Sci., Paris, 209:

645–647, 1939. ISSN 0001-4036. Zbl 0022.37203. Cited on p.151.

[2] A. Hazen. Storage to be provided in impounding reservoirs for municipal water supply. Transactions of the American Society of Civil Engineers, 77:1539–1669, 1914. URL https://catalog.hathitrust.org/Record/

000506421. Cited on p.150.

(8)

[3] R. J. Hyndman and Y. Fan. Sample quantiles in statistical pack- ages. The American Statistician, 50(4):361–365, 1996. ISSN 00031305.

doi: 10.2307/2684934. URL http://www.jstor.org/stable/2684934.

Cited on pp. 150 and 151.

[4] N. L. Johnson and S. Kotz. Distributions in statistics. Continuous uni- variate distributions. 2. Houghton Mifflin Co., Boston, Mass., 1970. MR 0270476. Cited on p.150.

[5] A. Jokiel-Rokita and M. Pulit. Nonparametric estimation of the ROC curve based on smoothed empirical distribution functions. Stat. Comput., 23(6):703–712, 2013. ISSN 0960-3174; 1573-1375/e. doi: 10.1007/s11222- 012-9340-x. Zbl 1322.62122. Cited on pp.151 and 155.

[6] A. Jokiel-Rokita and A. M. Siedlaczek. Quantile estimation via distri- bution fitting. Technical report, Politechnika Wrocławska, 2017. Cited on pp. 151 and 152.

[7] M. O. Lorenz. Methods of measuring the concentration of wealth. Pub- lications of the American Statistical Association, 9(70):209–219, 1905.

ISSN 15225437. doi: 10.2307/2276207. URL http://www.jstor.org/

stable/2276207. Cited on p.149.

[8] L. Makkonen and M. Pajari. Defining sample quantiles by the true rank probability. J. Probab. Stat., pages Art. ID 326579, 6, 2014. ISSN 1687- 952X. doi: 10.1155/2014/326579. URL https://doi.org/10.1155/

2014/326579. MR 3293006. Cited on p. 151.

[9] L. A. Prendergast and R. G. Staudte. Quantile versions of the Lorenz curve. Electron. J. Stat., 10(2):1896–1926, 2016. ISSN 1935-7524. doi:

10.1214/16-EJS1154. URL https://doi.org/10.1214/16-EJS1154.

Cited on p. 149.

[10] R. J. Serfling. Approximation theorems of mathematical statistics. John Wiley & Sons, Inc., New York, 1980. ISBN 0-471-02403-1. Wiley Series in Probability and Mathematical Statistics. Cited on p. 152.

[11] W. Weibull. The phenomenon of rupture in solids., volume 153 of Ingeniörsvetenskapsakademiens handlingar. Generalstabens litografiska anstalts förlag, Stokholm, 1939. URLhttp://www.barringer1.com/wa_

files/The-Phenomenon-Of-Rupture-In-Solids-Weibull-1939.pdf.

Cited on p. 151.

[12] R. Zieliński. Kernel estimators and the Dvoretzky-Kiefer-Wolfowitz in- equality. Appl. Math., 34(4):401–404, 2007. ISSN 1233-7234; 1730- 6280/e. doi: 10.4064/am34-4-3. Zbl 1130.62031. Cited on p.151.

(9)

Nieparametryczna estymacja kwantylowych wersji krzywych Lorenza.

Agnieszka Magdalena Siedlaczek

Streszczenie Zaproponowano nieparametryczne estymatory kwantylowych wersji krzywej Lorenza. Udowodniono punktową zgodność i asymptotyczną normalność zaproponowanych estymatorów. Porównano średnie scałkowane błędy kwadratowe wybranych nieparametrycznych estymatorów kwantylowych wersji krzywej Lorenza przy użyciu symulacji komputerowych.

Klasyfikacja tematyczna AMS (2010): Primary: 62G05; Secondary: 62P10.

Słowa kluczowe: krzywa Lorenza; kwantylowe wersje krzywej Lorenza; estymacja nieparametryczna.

Agnieszka Magdalena Siedlaczek is a PhD student in Mathemat- ics at the University of Opole. She received MSc in Mathematics in the specialization of mathematical modeling and data analysis in 2016, a bachelor degree in Mathematics in 2014 in the spe- cialization of financial mathematics and engineer in Computer Science in 2016. During studies she has been receiving a schol- arship for the best students. In 2017 she got the Fidelius prize for young mathematicians for the best paper presented at the XLVI National Conference on the Applications of Mathematics.

Her current scientific interests are estimation of the Lorenz curve, estimation of quantiles and statistics in medicine in general.

Agnieszka Magdalena Siedlaczek University of Opole

Institute of Mathematics and Computer Science ul. Oleska 48, 45-052 Opole

E-mail: asiedlaczek@uni.opole.pl Communicated by: Urszula Foryś

(Received: 22nd of February 2018; revised: 10th of March 2018)

Cytaty

Powiązane dokumenty

Na moment tak krótki, że nie mogę go rozpoznać, zagląda mi w twarz, widzę tylko ciemny cień głowy przechylający się przez moje ramię, i cofa się zaraz, spłoszony, a może

[r]

Kongres teologów polskich w Lublinie. Studia Philosophiae Christianae

referat tego dnia w Auli Instytutu Fizyki wygłosiła Jolanta Panasiuk z Lu­ blina na temat: Afazja semantyczna - diagnoza, terapia.. Pierwszy referat na temat: Wybrane

Praca składa się z części teoretycznej (rozdziały: Wstęp, Założenia modelu rozmytego, Model formalny automatu FDPLLA(k)) oraz części implementacyjno-doświadczalnej

pomiędzy wojskami gdańskimi pod do­ wództwem Jana z Kolonii a wojskami polskimi pod dowództwem hetma­ na Jana

The quality of (water level) forecasts is assessed based on standard deviation of model error. The results are compara- tively presented for different lead times in Fig. It can

Bernhard N e u m a n , Die ältesten Verfahren der Erzeugung technischen Eisens durch direkte Reduktion von Erzen mit Holzkohle in Remfeurn und Stücköfen und die