• Nie Znaleziono Wyników

1. Introduction. A broad class of nonsmooth optimization problems can be written in the composite form

N/A
N/A
Protected

Academic year: 2021

Share "1. Introduction. A broad class of nonsmooth optimization problems can be written in the composite form"

Copied!
10
0
0

Pełen tekst

(1)

POLONICI MATHEMATICI LXVII.3 (1997)

Smoothing a polyhedral convex function via cumulant transformation and homogenization

by Alberto Seeger (Avignon)

Abstract. Given a polyhedral convex function g : R n → R ∪ {+∞}, it is always possible to construct a family {g t } t> 0 which converges pointwise to g and such that each g t : R n → R is convex and infinitely often differentiable. The construction of such a family {g t } t> 0 involves the concept of cumulant transformation and a standard homogenization procedure.

1. Introduction. A broad class of nonsmooth optimization problems can be written in the composite form

(P ) Minimize {g(M (ξ)) : ξ ∈ Ξ},

where M is a mapping from some normed space Ξ to the Euclidean space R n , and g : R n → R ∪ {+∞} is a polyhedral convex function, i.e.

epi g := {(x, α) ∈ R n × R : g(x) ≤ α}

is a polyhedral convex set [11, p. 172]. As a general rule M is a smooth mapping (say, of class C k , for some k ≥ 1), but the composite function g ◦ M is nonsmooth. This fact leads us to consider an approximate version

(P ) t Minimize {g t (M (ξ)) : ξ ∈ Ξ}

for the original problem (P ). A fundamental question which is addressed in this note is thus:

(1.1)

 

 

How to construct a family {g t } t>0 of

smooth convex functions g t : R n → R such that

t→∞ lim g t (x) = g(x) for all x ∈ R n ?

The above mentioned question has an interest which goes far beyond the

1991 Mathematics Subject Classification: Primary 41A30; Secondary 52B70, 60E10.

Key words and phrases : polyhedral convex function, smooth approximation, Laplace transformation, cumulant transformation, homogenization, recession function.

[259]

(2)

context of the composite optimization problem (P ). This note will concen- trate on this question and will discuss some related issues.

As a first attempt to answer (1.1), one may consider the classical Moreau–

Yosida approximation technique:

g t (x) = inf

u∈R n



g(x − u) + t 2 kuk 2

 .

The disadvantage of such an approach is twofold: first of all, the evaluation of g t (x) is not straightforward since it requires solving a minimization problem.

Secondly, the convex function g t is of class C 1 , but its degree of smoothness is not higher than one (unless one imposes additional assumptions on g; cf.

Lemar´echal and Sagastizabal [7]). The same remark applies to the rolling ball approximation technique [12]:

g t (x) = inf

kuk≤t −1 {g(x − u) − [t −2 − kuk 2 ] 1/2 }.

The approach suggested in this note is completely different: it uses ho- mogenization technique applied to the Laplace transform and to the cumu- lant transform of some discrete measures associated with the function g.

2. Smoothing a polyhedral supporting function. To start with, consider the case in which g : R n → R ∪ {+∞} is the supporting function of a nonempty polyhedral convex set Ω ⊂ R n , i.e.

(2.1) g(x) = Ψ (x) := sup

w∈Ω

hw, xi.

One may think of g as the recession function ([11, p. 66]) (2.2) g(x) = [rec f ](x) := lim

t→∞ f (tx)/t

of some convex lower-semicontinuous function f : R n → R ∪ {+∞} that is finite at 0 ∈ R n . As observed by Ben-Tal and Teboulle [2], the usefulness of the approximation scheme (2.2) lies in the fact that frequently f is a smooth function, in which case the convex function

R n ∋ x 7→ g t (x) := f (tx)/t

is also smooth. Ben-Tal and Teboulle [2] provided the examples (2.3) g(x) = max{x 1 , . . . , x n }, f (x) = log h X n

j=1

e x j i and

(2.4) g(x) =

n

X

j=1

|x j |, f (x) =

n

X

j=1

[1 + x 2 j ] 1/2

(3)

to justify their observation, but they did not give a method for finding a smooth function f in other cases. Examples (2.3) and (2.4) were inspired by particular approximation techniques suggested by Bertsekas [3] and El-Attar et al. [5], respectively.

The aim of this section is to provide the reader with a simple and elegant method for constructing a smooth function f in the case in which g is an arbitrary polyhedral supporting function. Our approximation mechanism relies on the following basic assumption:

Ω ⊂ R n is a polyhedral convex set which admits at least one extreme (2.5)

point.

As is well known, such a set Ω can be represented in the form of a Minkowski sum:

(2.6) Ω = co[extr Ω] + rec Ω.

Here

(2.7) extr Ω = {w 1 , . . . , w k }

denotes the set of extreme points of Ω, and rec Ω refers to the recession cone of Ω ([11, p. 61]). Since Ω is a polyhedral convex set, rec Ω can be represented in terms of a set {a 1 , . . . , a m } ⊂ R n of generating directions:

(2.8) rec Ω = n X m

i=1

λ i a i : λ i ≥ 0 ∀i = 1, . . . , m o .

Without loss of generality one may suppose that the set {a 1 , . . . , a m } is minimal in the sense that none of these directions can be expressed as a nonnegative linear combination of the others. It is not difficult to show that g = Ψ takes the form

(2.9) g(x) =  max{hw 1 , xi, . . . , hw k , xi} if x ∈ K,

+∞ otherwise,

where

K = {v ∈ R n : ha i , vi ≤ 0 ∀i = 1, . . . , m}.

Now we are ready to state:

Theorem 2.1. Let Ω be as in (2.5) and let g be the supporting function of Ω. Then there exists a convex function f : R n → R of class C such that

(2.10) lim

t→∞ f (tx)/t = g(x) for all x ∈ R n . An example of such a function is given by

(2.11) f (x) = log h

\

R n

e hw,xi dµ(w) i +

\

R n

e hw,xi dν(w),

(4)

where µ is any discrete measure concentrated on extr Ω, and ν is any discrete measure concentrated on a minimal set of generating directions for rec Ω.

If the information (2.7)–(2.8) is available, then one has the more explicit example

(2.12) f (x) = log h X k

j=1

e hw j ,xi i +

m

X

i=1

e ha i ,xi .

P r o o f. The function f in (2.11) involves the Laplace transform R n ∋ x 7→ L ν (x) :=

\

R n

e hw,xi dν(w) of the measure ν, and the cumulant transform

R n ∋ x 7→ K µ (x) := log L µ (x)

of the measure µ. Since µ and ν are discrete, both transforms are finite- valued. By invoking some classical results (cf. [1, Theorem 4.1], [6, Theo- rem 7.5.1]), one can show that L ν and K µ are C convex functions. It just remains to prove the convergence property (2.10). That µ is concentrated on extr Ω means simply

µ({w}) > 0 iff w ∈ extr Ω.

Denote by µ j = µ({w j }) the mass of the extreme point w j . Then K µ (x) = log h X k

j=1

µ j e hw j ,xi i and

t→∞ lim K µ (tx)/t = max{hw 1 , xi, . . . , hw k , xi},

regardless of the values of the µ j ’s. Similarly, denote by ν i = ν({a i }) the mass of the generating direction a i . In this case

L ν (x) =

m

X

i=1

ν i e ha i ,xi and

t→∞ lim L ν (tx)/t =  0 if ha i , xi ≤ 0 ∀i = 1, . . . , m, +∞ otherwise,

regardless of the values of the ν i ’s. This completes the proof of (2.10). Finally, observe that (2.12) corresponds to the particular case in which the masses of µ and ν are uniformly distributed.

R e m a r k 2.1. Instead of (2.10), one can write the equality

(2.13) lim

t→∞

f (tx) − f (0)

t = g(x) for all x ∈ R n .

(5)

The advantage of the approximation scheme (2.13) over (2.10) is that the quotient [f (tx) − f (0)]/t converges monotonically upward to g(x) as t goes to ∞. Of course, one can always normalize f so that f (0) = 0. It suffices to subtract the constant ν(R n ) + log µ(R n ) from the expression appearing on the right-hand side of (2.11).

R e m a r k 2.2. The measure µ used in (2.11) can be concentrated on a set which is larger than extr Ω, but it cannot assign a positive mass to a point which is outside the polytope co(extr Ω). Similarly, ν can be concentrated on a set which is larger than a minimal set of generating directions for rec Ω.

However, ν should not assign a positive mass to a direction which is not in rec Ω.

R e m a r k 2.3. The function f given by (2.11) can also be used to ap- proximate

R n ∋ x 7→ inf

w∈Ω hw, xi.

Indeed,

t→−∞ lim f (tx)/t = lim

t→∞ f (−tx)/(−t)

= − lim

t→∞ f (t(−x))/t = − sup

w∈Ω

hw, −xi = inf

w∈Ω hw, xi.

Of course, for t < 0, the function x 7→ f (tx)/t is concave.

Theorem 2.1 can be illustrated with an example.

Example 2.1. Consider the function g : R 8 → R given by

g(x) = max{x 1 , x 2 , x 3 } + |x 4 | + |x 5 | + max{0, x 6 } + max{|x 7 |, |x 8 |}.

The first term corresponds to the supporting function of the set {u ∈ R 3 + : u 1 + u 2 + u 3 = 1}, whose extreme points are the canonical vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1). Thus, max{x 1 , x 2 , x 3 } can be approximated with the help of log(e x 1 + e x 2 + e x 3 ). The absolute value function | · | cor- responds to the supporting function of the interval [−1, 1], whose extreme points are −1 and 1. Thus, |x 4 | and |x 5 | can be approximated by using log(cosh x 4 ) and log(cosh x 5 ), respectively. Similarly, max{0, ·} is the sup- porting function of the interval [0, 1], and therefore it can be approximated by using log(1+e (·) ). The last term of g corresponds to the supporting func- tion of a set whose extreme points are (1, 0), (−1, 0), (0, 1), and (0, −1). Thus, max{|x 7 |, |x 8 |} can be approximated with the help of log(cosh x 7 + cosh x 8 ).

Summarizing,

f (x) = log(e x 1 + e x 2 + e x 3 ) + log(cosh x 4 ) + log(cosh x 5 )

+ log(1 + e x 6 ) + log(cosh x 7 + cosh x 8 ).

(6)

Of course, g can be regarded as the supporting function of some polytope in R 8 . In this case, however, the identification of the extreme points is a more cumbersome task.

3. Smoothing a polyhedral convex function. The approximation technique developed in Section 2 can be extended to the case in which g : R n → R ∪ {+∞} is an arbitrary polyhedral convex function. It suffices to use the representation

(3.1) g(x) = Ψ epi g ∗ (x, −1) = sup

(w,β)∈epi g

{hw, xi − β},

where g : R n → R ∪ {+∞} stands for the Legendre–Fenchel conjugate of g.

Since g is a polyhedral convex function, it follows that the epigraph of g is a polyhedral convex set ([11, Theorem 19.2]).

Theorem 3.1. Let g : R n → R ∪ {+∞} be a polyhedral convex function such that epi g has at least one extreme point. Then there exists a convex function F : R n × R → R which is of class C and such that

(3.2) lim

t→∞ F(tx, −t)/t = g(x) for all x ∈ R n . An example of such a function is

(3.3) F(x, α) = log h

\

R n ×R

e hw,xi+βα dµ(w, β) i +

\

R n ×R

e hw,xi+βα dν(w, β), where µ is any discrete measure concentrated on extr(epi g ), and ν is any discrete measure concentrated on a minimal set of generating directions for rec(epi g ).

P r o o f. Observe that the function F is given simply by F(x, α) = K µ (x, α) + L ν (x, α).

According to Theorem 2.1, F : R n × R → R is a convex function of class C such that

t→∞ lim F(t(x, α))/t = Ψ epi g ∗ (x, α).

This and the representation formula (3.1) yield the convergence result (3.2).

It should be clear that, for each t > 0, the function R n ∋ x 7→ g t (x) = F(tx, −t)/t

is convex and of class C . Also, g t (x) converges toward g(x) as t goes to ∞. Thus, Theorem 3.1 answers completely the question stated in (1.1).

Of course, if one seeks a more explicit formula for the function F, then

more information is needed regarding the structure of g, namely one needs

(7)

to know the extreme points {(w 1 , β 1 ), . . . , (w k , β k )} of epi g and a mini- mal set {(a 1 , γ 1 ), . . . , (a m , γ m )} of generating directions for rec(epi g ). This amounts to representing g in the following “canonical” form:

g(x) =  max{hw 1 , xi − β 1 , . . . , hw k , xi − β k } if x ∈ K,

+∞ otherwise,

with

K = {v ∈ R n : ha i , vi ≤ γ i ∀i = 1, . . . , m}.

If this representation is available, then one can take F simply as F(x, α) = log h X k

j=1

e hw j ,xi+β j α i +

m

X

i=1

e ha i ,xi+γ i α . Example 3.1. If one wishes to approximate

R 2 ∋ x 7→ g(x) =  max{6x 1 − x 2 + 4, x 1 + x 2 − 2} if x 1 ≥ 3,

+∞ otherwise,

then it suffices to take

F(x, α) = log[e 6x 1 −x 2 −4α + e x 1 +x 2 +2α ] + e −x 1 −3α .

4. Application: smoothing a spectrally defined matrix function.

Consider the case of a function Φ : S n → R ∪ {+∞} defined over the space S n of n × n real symmetric matrices. Such a function Φ is said to be spectral (or spectrally defined) if there is a symmetric function g : R n → R ∪ {+∞}

such that

(4.1) Φ(A) = g(λ(A)) for all A ∈ S n ,

where λ(A) = (λ 1 (A), . . . , λ n (A)) is the vector of eigenvalues of A in non- increasing order. The symmetric function g is necessarily unique. In fact, it is given by

g(x) = Φ(diag x) for all x ∈ R n ,

where diag x stands for the diagonal matrix whose entries on the diagonal are the components of x. For a detailed account on spectral functions, see, for instance, [13] (also [8], [9]). Examples of spectral functions include:

Φ(A) = λ 1 (A) = largest eigenvalue of A;

Φ(A) = λ 1 (A) + . . . + λ p (A) = sum of the p largest eigenvalues of A;

Φ(A) = λ 1 (A) − λ n (A) = width of the spectrum of A;

Φ(A) = max{λ 1 (A), −λ n (A)} = spectral radius of A.

In connection with these examples, two comments deserve to be made: first,

none of the above functions is differentiable; and, second, all the above

(8)

functions can be written in the form

Φ(A) = Ψ (λ(A)),

with Ω ⊂ R n being a symmetric convex polytope. The symmetry property means that

w ∈ Ω ⇒ Πw ∈ Ω for any n × n permutation matrix Π.

These facts lead us to establishing the following approximation result.

Theorem 4.1. Let Ω ⊂ R n be a symmetric convex polytope. Then Φ = Ψ ◦ λ is a convex spectral function. Moreover , there exists a convex spectral function F : S n → R of class C such that

(4.2) Φ(A) = lim

t→∞ F (tA)/t for all A ∈ S n . An example of such a function is F = f ◦ λ, where

(4.3) f (x) = log h

\

R n

e hw,xi dµ(w) i

is defined in terms of a discrete measure µ which distributes uniformly its total mass among all the extreme points of Ω.

P r o o f. Φ is a convex spectral function because Ψ is a symmetric convex function (cf. Davis [4]). Since the convex polytope Ω is symmetric, so is the set extr Ω. From this, and the fact that the discrete measure µ distributes uniformly its total mass over extr Ω, one deduces that the convex function f is symmetric. Hence, F = f ◦ λ is a convex spectral function.

Since f is of class C , so is F (even if λ : S n → R n is not differentiable).

Finally, observe that, for all A ∈ S n , one has

t→∞ lim F (tA)/t = lim

t→∞ (f ◦ λ)(tA)/t = lim

t→∞ f (tλ(A))/t = Ψ (λ(A)).

This proves the convergence property (4.2).

Example 4.1. The largest eigenvalue function S n ∋ A 7→ λ 1 (A) cor- responds to the composition of R n ∋ x → g(x) = max{x 1 , . . . , x n } and λ : S n → R n . Thus

λ 1 (A) = lim

t→∞ F (tA)/t for all A ∈ S n , with

F (A) = log h X n

j=1

e λ j (A) i

= log[trace e A ].

Observe that the smallest eigenvalue function is given by λ n (A) = lim

t→−∞ F (tA)/t.

(9)

Example 4.2. Consider the function

S n ∋ A 7→ Φ(A) = sum of the p largest eigenvalues of A.

In this case Φ = g ◦ λ, with

g(x) = sum of the p largest components of x.

According to Overton and Womersley [10], g is the supporting function of the set

Ω = n

u ∈ [0, 1] n :

n

X

j=1

u j = p o , whose extreme points u ∈ Ω are given by

u i = n 1 for exactly p of the indices 1, . . . , n, 0 otherwise.

If one denotes by w 1 , . . . , w k the k = n!/(p!(n − p)!) extreme points of Ω, then one can approximate Φ(A) with the help of

F (A) = log h X k

j=1

e hw j ,λ(A)i i . Example 4.3. The spectral radius function

S n ∋ A 7→ Φ(A) = max{λ 1 (A), −λ n (A)}

corresponds to the case Φ = Ψ ◦ λ with Ω = n

u ∈ R n :

n

X

j=1

|u j | ≤ 1 o .

The extreme points of Ω are the canonical vectors of R n and their opposite vectors. Thus, Ψ (x) = max{|x 1 |, . . . , |x n |} can be approximated with the help of

f (x) = log(cosh x 1 + . . . + cosh x n ).

This leads to the expression F (A) = log  X n

j=1

cosh λ j (A) 

= log[trace(cosh A)].

References

[1] O. B a r n d o r f f - N i e l s e n, Exponential families: exact theory, Various Publ. Ser. 19, Inst. of Math., Univ. of Aarhus, Denmark, 1970.

[2] A. B e n - T a l and M. T e b o u l l e, A smoothing technique for nondifferentiable op-

timization problems, in: Lecture Notes in Math. 1405, S. Dolecki (ed.), Springer,

Berlin, 1989, 1–11.

(10)

[3] D. B e r t s e k a s, Constrained Optimization and Lagrangian Multiplier Methods, Aca- demic Press, New York, 1982.

[4] C. D a v i s, All convex invariant functions of hermitian matrices, Arch. Math.

(Basel) 8 (1957), 276–278.

[5] R. A. E l - A t t a r, M. V i d y a s a g a r, and S. R. K. D u t t a, An algorithm for ℓ 1 - norm minimization with application to nonlinear ℓ 1 -approximation, SIAM J. Nu- mer. Anal. 16 (1979), 70–86.

[6] R. E l l i s, Entropy, Large Deviations and Statistical Mechanics, Springer, Berlin, 1985.

[7] C. L e m a r´ec h a l and C. S a g a s t i z ´ a b a l, Practical aspects of the Moreau–Yosida regularization: theoretical preliminaries, SIAM J. Optim. 7 (1997), 367–385.

[8] A. S. L e w i s, Convex analysis on the Hermitian matrices, ibid. 6 (1996), 164–177.

[9] J. E. M a r t i n e z - L e g a z, On convex and quasiconvex spectral functions, in: Proc.

2nd Catalan Days on Appl. Math., M. Sofonea and J. N. Corvellec (eds.), Presses Univ. de Perpignan, Perpignan, 1995, 199–208.

[10] M. L. O v e r t o n and R. S. W o m e r s l e y, Optimality conditions and duality the- ory for minimizing sums of the largest eigenvalues of symmetric matrices, Math.

Programming 62 (1993), 321–357.

[11] R. T. R o c k a f e l l a r, Convex Analysis, Princeton Univ. Press, Princeton, N.J., 1970.

[12] A. S e e g e r, Smoothing a nondifferentiable convex function: the technique of the rolling ball , Technical Report 165, Dep. of Mathematical Sciences, King Fahd Univ.

of Petroleum and Minerals, Dhahran, Saudi Arabia, October 1994.

[13] —, Convex analysis of spectrally defined matrix functions, SIAM J. Optim. 7 (1997), 679–696.

Department of Mathematics University of Avignon 33, rue Louis Pasteur 84000 Avignon, France

E-mail: alberto.seeger@univ-avignon.fr

Re¸ cu par la R´ edaction le 2.4.1996

evis´ e le 17.10.1996

Cytaty

Powiązane dokumenty

We consider time-delay linear fractional dynamical systems with multiple, constant delays in the state described by a fractional differential equation with a retarded argument of

By means of a Riccati type transformation, necessary and sufficient conditions for the existence of eventually positive solutions and positive nonincreasing solutions are

In this section we shall present some considerations concerning convergence of recurrence sequences, and their applications to solving equations in Banach

Besides these the proof uses Borel–Carath´ eodory theorem and Hadamard’s three circles theorem (the application of these last two theorems is similar to that explained in [4], pp..

It is also remarked there that this fact is a consequence of a lemma of [11] which in turn is proved via Kloosterman sums and Kuznetsov’s trace formulas.. We shall prove Lemma 3

In case µ = 0 of the ordinary Hermite transform the product formula (27) of GHP is not correct in case of even degree n of the polynomials, since they are connected with the

We consider the first initial-boundary value problem for the 2-D Kura- moto–Sivashinsky equation in a unit disk with homogeneous boundary conditions, peri- odicity conditions in

This new machinery has had striking applications to the solution of some old problems on von Neumann algebras ([9, 2, 3, 1]). On the other hand, from the point of view of this