• Nie Znaleziono Wyników

θ) is a density of a probability distribution function on X and (∀x) lx

N/A
N/A
Protected

Academic year: 2021

Share "θ) is a density of a probability distribution function on X and (∀x) lx"

Copied!
5
0
0

Pełen tekst

(1)

A. B O R A T Y ´N S K A and R. Z I E L I ´N S K I (Warszawa)

BAYES ROBUSTNESS VIA THE KOLMOGOROV METRIC

Abstract. An upper bound for the Kolmogorov distance between the posterior distributions in terms of that between the prior distributions is given. For some likelihood functions the inequality is sharp. Applications to assessing Bayes robustness are presented.

1. Introduction and notations. Given a sample space X and a pa- rameter space Θ, let l : X × Θ → R1be a given function such that (∀θ)l(·, θ) is a density of a probability distribution function on X and (∀x) lx(·) = l(x, ·) is the likelihood function on Θ. Throughout the paper we assume that Θ is an interval (θL, θU) in R1, −∞ ≤ θL < θU ≤ +∞, and that for every x ∈ X the likelihood function lx(·) is of finite variation; also, we define s(x) = supθ∈Θlx(θ).

All integrals are Lebesgue–Stieltjes integrals over (θL, θU) unless stated otherwise. To avoid some technical difficulties we assume that the distri- bution functions F and G appearing below are continuous. Actually, it is enough to assume that points of discontinuity of F and G do not coincide with those of lx(·).

Let lx(·) = l+x(·) − lx(·) be the Jordan decomposition of lx(·) and let lx(·) = lx+(·) + lx(·). We assume that for every x ∈ X ,

u(x) = R

dlx(θ) < ∞ .

Observe that if lx(·) is differentiable and R |∂lx(θ)/∂θ| dθ < ∞, then u(x) = R |∂lx(θ)/∂θ| dθ.

If F and G are any cdf’s then

%(F, G) = sup

θ∈Θ

|F (θ) − G(θ)|

1991 Mathematics Subject Classification: Primary 62F15; Secondary 62C10, 62F35.

Key words and phrases: Bayes robustness, Kolmogorov metric, stability of Bayes procedures.

Supported by KBN Grant 2-1168-91-01 Gr-101.

(2)

denotes the Kolmogorov distance between F and G. For cdf’s F and G of prior distributions on Θ and for a given x ∈ X , let Fx and Gx be cdf’s of the corresponding posterior distributions.

2. The main result. The following theorem gives us an estimate for the Kolmogorov distance between posterior distributions Fxand Gxin terms of the Kolmogorov distance between the appropriate prior cdf’s F and G.

Given a cdf H, let mx(H) =R lx(θ) dH(θ).

Theorem. Let F be a given prior distribution and let x ∈ X be a fixed point in the sample space. For every likelihood function lx(·),

(1) %(Fx, Gx) ≤ %(F, G)

max{mx(F ), mx(G)}(s(x) + u(x)) . There exists a likelihood function for which the inequality is sharp.

P r o o f. Since

Fx(θ) − Gx(θ) = Rθ

θLlx(t) dF (t) mx(F )

Rθ

θLlx(t) dG(t) mx(G) , adding and subtracting Rθ

θLlx(t) dF (t)/mx(G), we obtain Fx(θ) − Gx(θ) = 1

mx(G)

 Rθ

θL

lx(t) d(F (t) − G(t))

−Fx(θ) R

lx(t) d(F (t) − G(t))

 . Integrating by parts gives

Fx(θ) − Gx(θ) = 1 mx(G)



lx(θ)(F (θ) − G(θ)) +R

(Fx(θ) − 1(−∞,θ)(t))(F (t) − G(t)) dlx(t)

 . Hence

Fx(θ) − Gx(θ) ≤ 1

mx(G)%(F, G)(s(x) + u(x)) . Similarly,

Gx(θ) − Fx(θ) ≤ 1

mx(F )%(F, G)(s(x) + u(x)) , which gives (1).

For the second statement, see Example 3 below.

3. How sharp is inequality (1)? The following three examples an- swer that question. In each of them ε is a fixed positive number, lx(·) is

(3)

a fixed likelihood function, and F is a fixed prior distribution. The prior distribution G is chosen in such a way that %(F, G) = ε.

Let RHS and LHS denote the right and left hand sides of (1), respec- tively.

Example 1. Suppose that lx(·) is the likelihood function of the normal distribution N (θ, σ2) and that F is normal N (0, τ2). Define

G(θ) =

0 if θ < F−1(ε), F (θ) − ε if F−1(ε) ≤ θ < 0, F (θ) + ε if 0 ≤ θ < F−1(1 − ε), 1 if θ ≤ F−1(1 − ε).

Then F (θ) − G(θ) ≡ ε on the support of G and

RHS ≤

σ

σ2+ τ2exp



1 2

x2 σ2+ τ2

 ,

LHS ≥ Φ

r 1

σ2 + 1 τ2

τ Φ−1(ε) − x σ2 1 σ2 + 1

τ2

.

Both RHS and LHS , as well as their difference of course, tend to zero as ε → 0. For large σ2 and small τ2 the right hand side is approximately 3ε and the left hand side equals ε, and hence RHS /LHS ≈ 3.

Example 2. Let lx(θ) = nxx(1 − θ)n−x and let F (θ) = θα, α > 0.

Constructing G as in Example 1 we obtain RHS /LHS ≈ 2 for small ε.

Example 3. To see that the inequality is sharp take 1(θ−1/2,θ+1/2)(x), θ ∈ Θ = (0, 1) as a family of densities on the sample space X = R1. Then lx(θ) = 1(x−1/2,x+1/2)(θ) and for x = 1/2 one obtains u(x) ≡ 0 and, for every cdf F on Θ, mx(F ) = 1. Now Fx = F and Gx = G, s(x) = 1 and hence LHS = RHS .

4. Bayes robustness. For a given prior distribution F consider the class of prior distributions Gε = {G : %(G, F ) ≤ ε} (see, for example, the class Γ1 in Berger (1985)) and the class of the corresponding posterior dis- tributions.

As consequences of inequality (1) we can estimate the oscillation of poste- rior distributions under (small) violations of the assumed prior distribution and we can conclude that the posterior distribution is infinitesimally ro- bust (in the sense of, e.g., M¸eczarski and Zieli´nski (1991) and of the papers quoted therein) under misspecification of the prior distribution.

(4)

Corollary 1 (oscillation of the posterior distribution). For any given prior distribution F and any sample point x ∈ X ,

(2) sup

G∈Gε

%(Fx, Gx) ≤ ε

max{mx(F ), minG∈Gεmx(G)}(s(x) + u(x))

where s(x) and u(x) depend on the likelihood function lx(·) only, and mx(F ) depends on the likelihood function and the prior distribution F .

Corollary 2 (infinitesimal robustness). For a fixed x ∈ X , for every prior distribution F and for each ε > 0 there exists δ > 0 such that for every distribution G on Θ,

%(G, F ) < δ ⇒ %(Gx, Fx) < ε .

Corollary 3 (uniform infinitesimal robustness). If there exist positive α, M1, and M2 such that

mx(F ) > α, s(x) < M1, and u(x) < M2

for all x ∈ X , then for each ε > 0 there exists δ > 0 such that for all x ∈ X and for all distributions G on Θ,

%(G, F ) < δ ⇒ %(Gx, Fx) < ε .

Berger and Berliner (1986), Sivaganesan (1988), Sivaganesan and Berger (1989), Gelfand and Dey (1991), to quote but a few, considered the class Γε = {(1−ε)F +εQ : Q ∈ Q} of distributions, with a given prior distribution F and some specified Q, and discussed the oscillations of some functionals on the appropriate class of posterior distributions. Since Γε ⊂ Gε we conclude that if the prior distributions belong to a given ε-contamination class Γεthen the posterior distributions do not differ substantially in the Kolmogorov metric. A similar conclusion holds if the prior distributions do not differ too much in the total variation metric. On the other hand, if the prior distributions do not differ much in the Kolmogorov metric the appropriate posterior distributions do not differ substantially in the L´evy or Prohorov metric (see, e.g., Zolotarev (1986) or Rachev (1991)). Taking all this into account one can say that under rather general conditions the Bayes inference is infinitesimally robust to small misspecifications of the prior distribution.

Acknowledgments. The authors are greatly indebted to Professor Les law Gajek for his fruitful comments which enabled us to improve in- equality (1).

References

J. O. B e r g e r (1985), Statistical Decision Theory and Bayesian Analysis, Springer.

J. O. B e r g e r and L. M. B e r l i n e r (1986), Robust Bayes and empirical Bayes analysis with ε-contaminated priors, Ann. Statist. 14, 461–486.

(5)

A. E. G e l f a n d and D. K. D e y (1991), On Bayesian robustness of contaminated classes of priors, Statist. Decisions 9, 63–80.

M. M ¸e c z a r s k i and R. Z i e l i ´n s k i (1991), Stability of the Bayesian estimator of the Pois- son mean under the inexactly specified gamma priors, Statist. Probab. Lett. 12, 329–333.

S. T. R a c h e v (1991), Probability Metrics and the Stability of Stochastic Models, Wiley, Chichester.

S. S i v a g a n e s a n (1988), Ranges of posterior measures for priors with arbitrary contam- ination, Comm. Statist. Theory Methods 17 (5), 1591–1612.

S. S i v a g a n e s a n and J. O. B e r g e r (1989), Ranges of posterior measures for priors with unimodal contaminations, Ann. Statist. 17, 868–889.

V. M. Z o l o t a r e v (1986), Contemporary Theory of Summation of Independent Random Variables, Nauka, Moscow (in Russian).

AGATA BORATY ´NSKA RYSZARD ZIELI ´NSKI

FACULTY OF MATHEMATICS INSTITUTE OF MATHEMATICS

UNIVERSITY OF WARSAW POLISH ACADEMY OF SCIENCES

BANACHA 2 P.O. BOX 137

02-097 WARSZAWA, POLAND 00-950 WARSZAWA, POLAND

Received on 1.7.1993

Cytaty

Powiązane dokumenty

Oleszkiewicz, Institute of Mathematics, University of Warsaw, Banacha 2, 02-097 Warszawa,

Like Green’s theorem, the divergence theorem can be used to reduce a complicated surface integral to a simpler volume integral, or a complicated volume integral to a simpler

After a shop opens at 09:00 the number of customers arriving in any interval of duration t minutes follows a Poisson distribution with mean..

You are not required to find the coordinates of the

[r]

Experimental economics has provided ambiguous estimates of ε (see Levitt &amp; List 2007, for a broader discussion). Beckman et al. 19) noted that the apparent

This soft- ware allows to determine the expanded uncertainty of the resistance measure- ment using the technical method and takes into account the error of the method

Probability analysis of monthly daily mean of maximum temperature of Mid- dle East was carried out by employing three probability distributions namely lo- gistic, Rayleigh and