1. Introduction. A broad class of nonsmooth optimization problems can be written in the composite form

(1)

POLONICI MATHEMATICI LXVII.3 (1997)

Smoothing a polyhedral convex function via cumulant transformation and homogenization

by Alberto Seeger (Avignon)

Abstract. Given a polyhedral convex function g : R ⁿ → R ∪ {+∞}, it is always possible to construct a family {g t } t> 0 which converges pointwise to g and such that each g t : R ⁿ → R is convex and infinitely often differentiable. The construction of such a family {g t } t> 0 involves the concept of cumulant transformation and a standard homogenization procedure.

1. Introduction. A broad class of nonsmooth optimization problems can be written in the composite form

(P ) Minimize {g(M (ξ)) : ξ ∈ Ξ},

where M is a mapping from some normed space Ξ to the Euclidean space R ⁿ , and g : R ⁿ → R ∪ {+∞} is a polyhedral convex function, i.e.

epi g := {(x, α) ∈ R ⁿ × R : g(x) ≤ α}

is a polyhedral convex set [11, p. 172]. As a general rule M is a smooth mapping (say, of class C ^k , for some k ≥ 1), but the composite function g ◦ M is nonsmooth. This fact leads us to consider an approximate version

(P ) t Minimize {g t (M (ξ)) : ξ ∈ Ξ}

for the original problem (P ). A fundamental question which is addressed in this note is thus:

(1.1)



 

 

How to construct a family {g t } t>0 of

smooth convex functions g t : R ⁿ → R such that

t→∞ lim g t (x) = g(x) for all x ∈ R ⁿ ?

The above mentioned question has an interest which goes far beyond the

1991 Mathematics Subject Classification: Primary 41A30; Secondary 52B70, 60E10.

Key words and phrases : polyhedral convex function, smooth approximation, Laplace transformation, cumulant transformation, homogenization, recession function.

[259]

(2)

context of the composite optimization problem (P ). This note will concen- trate on this question and will discuss some related issues.

As a first attempt to answer (1.1), one may consider the classical Moreau–

Yosida approximation technique:

g _t (x) = inf

u∈R ⁿ

g(x − u) + t 2 kuk ²

.

The disadvantage of such an approach is twofold: first of all, the evaluation of g t (x) is not straightforward since it requires solving a minimization problem.

Secondly, the convex function g t is of class C ¹ , but its degree of smoothness is not higher than one (unless one imposes additional assumptions on g; cf.

Lemar´echal and Sagastizabal [7]). The same remark applies to the rolling ball approximation technique [12]:

g t (x) = inf

kuk≤t ⁻¹ {g(x − u) − [t ⁻² − kuk ² ] ^1/2 }.

The approach suggested in this note is completely different: it uses ho- mogenization technique applied to the Laplace transform and to the cumu- lant transform of some discrete measures associated with the function g.

2. Smoothing a polyhedral supporting function. To start with, consider the case in which g : R ⁿ → R ∪ {+∞} is the supporting function of a nonempty polyhedral convex set Ω ⊂ R ⁿ , i.e.

(2.1) g(x) = Ψ _Ω ^∗ (x) := sup

w∈Ω

hw, xi.

One may think of g as the recession function ([11, p. 66]) (2.2) g(x) = [rec f ](x) := lim

t→∞ f (tx)/t

of some convex lower-semicontinuous function f : R ⁿ → R ∪ {+∞} that is finite at 0 ∈ R ⁿ . As observed by Ben-Tal and Teboulle [2], the usefulness of the approximation scheme (2.2) lies in the fact that frequently f is a smooth function, in which case the convex function

R ⁿ ∋ x 7→ g _t (x) := f (tx)/t

is also smooth. Ben-Tal and Teboulle [2] provided the examples (2.3) g(x) = max{x 1 , . . . , x n }, f (x) = log h X ⁿ

j=1

e ^x ^j i and

(2.4) g(x) =

n

X

j=1

|x j |, f (x) =

n

X

j=1

[1 + x ² _j ] ^1/2

(3)

to justify their observation, but they did not give a method for finding a smooth function f in other cases. Examples (2.3) and (2.4) were inspired by particular approximation techniques suggested by Bertsekas [3] and El-Attar et al. [5], respectively.

The aim of this section is to provide the reader with a simple and elegant method for constructing a smooth function f in the case in which g is an arbitrary polyhedral supporting function. Our approximation mechanism relies on the following basic assumption:

Ω ⊂ R ⁿ is a polyhedral convex set which admits at least one extreme (2.5)

point.

As is well known, such a set Ω can be represented in the form of a Minkowski sum:

(2.6) Ω = co[extr Ω] + rec Ω.

Here

(2.7) extr Ω = {w ¹ , . . . , w ^k }

denotes the set of extreme points of Ω, and rec Ω refers to the recession cone of Ω ([11, p. 61]). Since Ω is a polyhedral convex set, rec Ω can be represented in terms of a set {a ¹ , . . . , a ^m } ⊂ R ⁿ of generating directions:

(2.8) rec Ω = n X ^m

i=1

λ _i a ⁱ : λ _i ≥ 0 ∀i = 1, . . . , m o .

Without loss of generality one may suppose that the set {a ¹ , . . . , a ^m } is minimal in the sense that none of these directions can be expressed as a nonnegative linear combination of the others. It is not difficult to show that g = Ψ _Ω ^∗ takes the form

(2.9) g(x) = max{hw ¹ , xi, . . . , hw ^k , xi} if x ∈ K,

+∞ otherwise,

where

K = {v ∈ R ⁿ : ha ⁱ , vi ≤ 0 ∀i = 1, . . . , m}.

Now we are ready to state:

Theorem 2.1. Let Ω be as in (2.5) and let g be the supporting function of Ω. Then there exists a convex function f : R ⁿ → R of class C ^∞ such that

(2.10) lim

t→∞ f (tx)/t = g(x) for all x ∈ R ⁿ . An example of such a function is given by

(2.11) f (x) = log h

^\

R ⁿ

e ^hw,xi dµ(w) i +

\

R ⁿ

e ^hw,xi dν(w),

(4)

where µ is any discrete measure concentrated on extr Ω, and ν is any discrete measure concentrated on a minimal set of generating directions for rec Ω.

If the information (2.7)–(2.8) is available, then one has the more explicit example

(2.12) f (x) = log h X ^k

j=1

e ^hw ^j ^,xi i +

m

X

i=1

e ^ha ⁱ ^,xi .

P r o o f. The function f in (2.11) involves the Laplace transform R ⁿ ∋ x 7→ L ν (x) :=

\

R ⁿ

e ^hw,xi dν(w) of the measure ν, and the cumulant transform

R ⁿ ∋ x 7→ K µ (x) := log L µ (x)

of the measure µ. Since µ and ν are discrete, both transforms are finite- valued. By invoking some classical results (cf. [1, Theorem 4.1], [6, Theo- rem 7.5.1]), one can show that L _ν and K _µ are C ^∞ convex functions. It just remains to prove the convergence property (2.10). That µ is concentrated on extr Ω means simply

µ({w}) > 0 iff w ∈ extr Ω.

Denote by µ j = µ({w ^j }) the mass of the extreme point w ^j . Then K µ (x) = log h X ^k

j=1

µ j e ^hw ^j ^,xi i and

t→∞ lim K _µ (tx)/t = max{hw ¹ , xi, . . . , hw ^k , xi},

regardless of the values of the µ j ’s. Similarly, denote by ν i = ν({a ⁱ }) the mass of the generating direction a ⁱ . In this case

L ν (x) =

m

X

i=1

ν i e ^ha ⁱ ^,xi and

t→∞ lim L ν (tx)/t = 0 if ha ⁱ , xi ≤ 0 ∀i = 1, . . . , m, +∞ otherwise,

regardless of the values of the ν i ’s. This completes the proof of (2.10). Finally, observe that (2.12) corresponds to the particular case in which the masses of µ and ν are uniformly distributed.

R e m a r k 2.1. Instead of (2.10), one can write the equality

(2.13) lim

t→∞

f (tx) − f (0)

t = g(x) for all x ∈ R ⁿ .

(5)

The advantage of the approximation scheme (2.13) over (2.10) is that the quotient [f (tx) − f (0)]/t converges monotonically upward to g(x) as t goes to ∞. Of course, one can always normalize f so that f (0) = 0. It suffices to subtract the constant ν(R ⁿ ) + log µ(R ⁿ ) from the expression appearing on the right-hand side of (2.11).

R e m a r k 2.2. The measure µ used in (2.11) can be concentrated on a set which is larger than extr Ω, but it cannot assign a positive mass to a point which is outside the polytope co(extr Ω). Similarly, ν can be concentrated on a set which is larger than a minimal set of generating directions for rec Ω.

However, ν should not assign a positive mass to a direction which is not in rec Ω.

R e m a r k 2.3. The function f given by (2.11) can also be used to ap- proximate

R ⁿ ∋ x 7→ inf

w∈Ω hw, xi.

Indeed,

t→−∞ lim f (tx)/t = lim

t→∞ f (−tx)/(−t)

= − lim

t→∞ f (t(−x))/t = − sup

w∈Ω

hw, −xi = inf

w∈Ω hw, xi.

Of course, for t < 0, the function x 7→ f (tx)/t is concave.

Theorem 2.1 can be illustrated with an example.

Example 2.1. Consider the function g : R ⁸ → R given by

g(x) = max{x 1 , x 2 , x 3 } + |x 4 | + |x 5 | + max{0, x 6 } + max{|x 7 |, |x 8 |}.

The first term corresponds to the supporting function of the set {u ∈ R ³ + : u 1 + u 2 + u 3 = 1}, whose extreme points are the canonical vectors (1, 0, 0), (0, 1, 0), and (0, 0, 1). Thus, max{x 1 , x 2 , x 3 } can be approximated with the help of log(e ^x ¹ + e ^x ² + e ^x ³ ). The absolute value function | · | cor- responds to the supporting function of the interval [−1, 1], whose extreme points are −1 and 1. Thus, |x 4 | and |x 5 | can be approximated by using log(cosh x ₄ ) and log(cosh x ₅ ), respectively. Similarly, max{0, ·} is the sup- porting function of the interval [0, 1], and therefore it can be approximated by using log(1+e ^(·) ). The last term of g corresponds to the supporting func- tion of a set whose extreme points are (1, 0), (−1, 0), (0, 1), and (0, −1). Thus, max{|x 7 |, |x 8 |} can be approximated with the help of log(cosh x 7 + cosh x 8 ).

Summarizing,

f (x) = log(e ^x ¹ + e ^x ² + e ^x ³ ) + log(cosh x 4 ) + log(cosh x 5 )

+ log(1 + e ^x ⁶ ) + log(cosh x 7 + cosh x 8 ).

(6)

Of course, g can be regarded as the supporting function of some polytope in R ⁸ . In this case, however, the identification of the extreme points is a more cumbersome task.

3. Smoothing a polyhedral convex function. The approximation technique developed in Section 2 can be extended to the case in which g : R ⁿ → R ∪ {+∞} is an arbitrary polyhedral convex function. It suffices to use the representation

(3.1) g(x) = Ψ _{epi g} ^∗ ∗ (x, −1) = sup

(w,β)∈epi g ^∗

{hw, xi − β},

where g ^∗ : R ⁿ → R ∪ {+∞} stands for the Legendre–Fenchel conjugate of g.

Since g is a polyhedral convex function, it follows that the epigraph of g ^∗ is a polyhedral convex set ([11, Theorem 19.2]).

Theorem 3.1. Let g : R ⁿ → R ∪ {+∞} be a polyhedral convex function such that epi g ^∗ has at least one extreme point. Then there exists a convex function F : R ⁿ × R → R which is of class C ^∞ and such that

(3.2) lim

t→∞ F(tx, −t)/t = g(x) for all x ∈ R ⁿ . An example of such a function is

(3.3) F(x, α) = log h

^\

R ⁿ ×R

e ^hw,xi+βα dµ(w, β) i +

\

R ⁿ ×R

e ^hw,xi+βα dν(w, β), where µ is any discrete measure concentrated on extr(epi g ^∗ ), and ν is any discrete measure concentrated on a minimal set of generating directions for rec(epi g ^∗ ).

P r o o f. Observe that the function F is given simply by F(x, α) = K µ (x, α) + L ν (x, α).

According to Theorem 2.1, F : R ⁿ × R → R is a convex function of class C ^∞ such that

t→∞ lim F(t(x, α))/t = Ψ _{epi g} ^∗ ∗ (x, α).

This and the representation formula (3.1) yield the convergence result (3.2).

It should be clear that, for each t > 0, the function R ⁿ ∋ x 7→ g t (x) = F(tx, −t)/t

is convex and of class C ^∞ . Also, g t (x) converges toward g(x) as t goes to ∞. Thus, Theorem 3.1 answers completely the question stated in (1.1).

Of course, if one seeks a more explicit formula for the function F, then

more information is needed regarding the structure of g, namely one needs

(7)

to know the extreme points {(w ¹ , β ¹ ), . . . , (w ^k , β ^k )} of epi g ^∗ and a mini- mal set {(a ¹ , γ ¹ ), . . . , (a ^m , γ ^m )} of generating directions for rec(epi g ^∗ ). This amounts to representing g in the following “canonical” form:

g(x) = max{hw ¹ , xi − β ¹ , . . . , hw ^k , xi − β ^k } if x ∈ K,

+∞ otherwise,

with

K = {v ∈ R ⁿ : ha ⁱ , vi ≤ γ ⁱ ∀i = 1, . . . , m}.

If this representation is available, then one can take F simply as F(x, α) = log h X ^k

j=1

e ^hw ^j ^,xi+β ^j ^α i +

m

X

i=1

e ^ha ⁱ ^,xi+γ ⁱ ^α . Example 3.1. If one wishes to approximate

R ² ∋ x 7→ g(x) = max{6x 1 − x 2 + 4, x 1 + x 2 − 2} if x 1 ≥ 3,

+∞ otherwise,

then it suffices to take

F(x, α) = log[e ^6x ¹ ^−x ² ^−4α + e ^x ¹ ^+x ² ^+2α ] + e ^−x ¹ ^−3α .

4. Application: smoothing a spectrally defined matrix function.

Consider the case of a function Φ : S _n → R ∪ {+∞} defined over the space S n of n × n real symmetric matrices. Such a function Φ is said to be spectral (or spectrally defined) if there is a symmetric function g : R ⁿ → R ∪ {+∞}

such that

(4.1) Φ(A) = g(λ(A)) for all A ∈ S n ,

where λ(A) = (λ 1 (A), . . . , λ n (A)) ^⊤ is the vector of eigenvalues of A in non- increasing order. The symmetric function g is necessarily unique. In fact, it is given by

g(x) = Φ(diag x) for all x ∈ R ⁿ ,

where diag x stands for the diagonal matrix whose entries on the diagonal are the components of x. For a detailed account on spectral functions, see, for instance, [13] (also [8], [9]). Examples of spectral functions include:

Φ(A) = λ 1 (A) = largest eigenvalue of A;

Φ(A) = λ 1 (A) + . . . + λ p (A) = sum of the p largest eigenvalues of A;

Φ(A) = λ 1 (A) − λ n (A) = width of the spectrum of A;

Φ(A) = max{λ ₁ (A), −λ _n (A)} = spectral radius of A.

In connection with these examples, two comments deserve to be made: first,

none of the above functions is differentiable; and, second, all the above

(8)

functions can be written in the form

Φ(A) = Ψ _Ω ^∗ (λ(A)),

with Ω ⊂ R ⁿ being a symmetric convex polytope. The symmetry property means that

w ∈ Ω ⇒ Πw ∈ Ω for any n × n permutation matrix Π.

These facts lead us to establishing the following approximation result.

Theorem 4.1. Let Ω ⊂ R ⁿ be a symmetric convex polytope. Then Φ = Ψ _Ω ^∗ ◦ λ is a convex spectral function. Moreover , there exists a convex spectral function F : S n → R of class C ^∞ such that

(4.2) Φ(A) = lim

t→∞ F (tA)/t for all A ∈ S n . An example of such a function is F = f ◦ λ, where

(4.3) f (x) = log h

^\

R ⁿ

e ^hw,xi dµ(w) i

is defined in terms of a discrete measure µ which distributes uniformly its total mass among all the extreme points of Ω.

P r o o f. Φ is a convex spectral function because Ψ _Ω ^∗ is a symmetric convex function (cf. Davis [4]). Since the convex polytope Ω is symmetric, so is the set extr Ω. From this, and the fact that the discrete measure µ distributes uniformly its total mass over extr Ω, one deduces that the convex function f is symmetric. Hence, F = f ◦ λ is a convex spectral function.

Since f is of class C ^∞ , so is F (even if λ : S n → R ⁿ is not differentiable).

Finally, observe that, for all A ∈ S n , one has

t→∞ lim F (tA)/t = lim

t→∞ (f ◦ λ)(tA)/t = lim

t→∞ f (tλ(A))/t = Ψ _Ω ^∗ (λ(A)).

This proves the convergence property (4.2).

Example 4.1. The largest eigenvalue function S n ∋ A 7→ λ 1 (A) cor- responds to the composition of R ⁿ ∋ x → g(x) = max{x ₁ , . . . , x _n } and λ : S n → R ⁿ . Thus

λ 1 (A) = lim

t→∞ F (tA)/t for all A ∈ S n , with

F (A) = log h X ⁿ

j=1

e ^λ ^j ^(A) i

= log[trace e ^A ].

Observe that the smallest eigenvalue function is given by λ n (A) = lim

t→−∞ F (tA)/t.

(9)

Example 4.2. Consider the function

S _n ∋ A 7→ Φ(A) = sum of the p largest eigenvalues of A.

In this case Φ = g ◦ λ, with

g(x) = sum of the p largest components of x.

According to Overton and Womersley [10], g is the supporting function of the set

Ω = n

u ∈ [0, 1] ⁿ :

n

X

j=1

u j = p o , whose extreme points u ∈ Ω are given by

u i = n 1 for exactly p of the indices 1, . . . , n, 0 otherwise.

If one denotes by w ¹ , . . . , w ^k the k = n!/(p!(n − p)!) extreme points of Ω, then one can approximate Φ(A) with the help of

F (A) = log h X ^k

j=1

e ^hw ^j ^,λ(A)i i . Example 4.3. The spectral radius function

S n ∋ A 7→ Φ(A) = max{λ 1 (A), −λ n (A)}

corresponds to the case Φ = Ψ _Ω ^∗ ◦ λ with Ω = n

u ∈ R ⁿ :

n

X

j=1

|u _j | ≤ 1 o .

The extreme points of Ω are the canonical vectors of R ⁿ and their opposite vectors. Thus, Ψ _Ω ^∗ (x) = max{|x 1 |, . . . , |x n |} can be approximated with the help of

f (x) = log(cosh x 1 + . . . + cosh x n ).

This leads to the expression F (A) = log X ⁿ

j=1

cosh λ _j (A)

= log[trace(cosh A)].

References

[1] O. B a r n d o r f f - N i e l s e n, Exponential families: exact theory, Various Publ. Ser. 19, Inst. of Math., Univ. of Aarhus, Denmark, 1970.

[2] A. B e n - T a l and M. T e b o u l l e, A smoothing technique for nondifferentiable op-

timization problems, in: Lecture Notes in Math. 1405, S. Dolecki (ed.), Springer,

Berlin, 1989, 1–11.

(10)

[3] D. B e r t s e k a s, Constrained Optimization and Lagrangian Multiplier Methods, Aca- demic Press, New York, 1982.

[4] C. D a v i s, All convex invariant functions of hermitian matrices, Arch. Math.

(Basel) 8 (1957), 276–278.

[5] R. A. E l - A t t a r, M. V i d y a s a g a r, and S. R. K. D u t t a, An algorithm for ℓ 1 - norm minimization with application to nonlinear ℓ ₁ -approximation, SIAM J. Nu- mer. Anal. 16 (1979), 70–86.

[6] R. E l l i s, Entropy, Large Deviations and Statistical Mechanics, Springer, Berlin, 1985.

[7] C. L e m a r´ec h a l and C. S a g a s t i z ´ a b a l, Practical aspects of the Moreau–Yosida regularization: theoretical preliminaries, SIAM J. Optim. 7 (1997), 367–385.

[8] A. S. L e w i s, Convex analysis on the Hermitian matrices, ibid. 6 (1996), 164–177.

[9] J. E. M a r t i n e z - L e g a z, On convex and quasiconvex spectral functions, in: Proc.

2nd Catalan Days on Appl. Math., M. Sofonea and J. N. Corvellec (eds.), Presses Univ. de Perpignan, Perpignan, 1995, 199–208.

[10] M. L. O v e r t o n and R. S. W o m e r s l e y, Optimality conditions and duality the- ory for minimizing sums of the largest eigenvalues of symmetric matrices, Math.

Programming 62 (1993), 321–357.

[11] R. T. R o c k a f e l l a r, Convex Analysis, Princeton Univ. Press, Princeton, N.J., 1970.

[12] A. S e e g e r, Smoothing a nondifferentiable convex function: the technique of the rolling ball , Technical Report 165, Dep. of Mathematical Sciences, King Fahd Univ.

of Petroleum and Minerals, Dhahran, Saudi Arabia, October 1994.

[13] —, Convex analysis of spectrally defined matrix functions, SIAM J. Optim. 7 (1997), 679–696.

Department of Mathematics University of Avignon 33, rue Louis Pasteur 84000 Avignon, France

E-mail: alberto.seeger@univ-avignon.fr

Re¸ cu par la R´ edaction le 2.4.1996

R´ evis´ e le 17.10.1996

1. Introduction. A broad class of nonsmooth optimization problems can be written in the composite form

POLONICI MATHEMATICI LXVII.3 (1997)

Smoothing a polyhedral convex function via cumulant transformation and homogenization

by Alberto Seeger (Avignon)

1. Introduction. A broad class of nonsmooth optimization problems can be written in the composite form

(P ) Minimize {g(M (ξ)) : ξ ∈ Ξ},

where M is a mapping from some normed space Ξ to the Euclidean space R n , and g : R n → R ∪ {+∞} is a polyhedral convex function, i.e.

epi g := {(x, α) ∈ R n × R : g(x) ≤ α}

is a polyhedral convex set [11, p. 172]. As a general rule M is a smooth mapping (say, of class C k , for some k ≥ 1), but the composite function g ◦ M is nonsmooth. This fact leads us to consider an approximate version

(P ) t Minimize {g t (M (ξ)) : ξ ∈ Ξ}

for the original problem (P ). A fundamental question which is addressed in this note is thus:

(1.1)



 

 

How to construct a family {g t } t>0 of

smooth convex functions g t : R n → R such that

t→∞ lim g t (x) = g(x) for all x ∈ R n ?

The above mentioned question has an interest which goes far beyond the

1991 Mathematics Subject Classification: Primary 41A30; Secondary 52B70, 60E10.

Key words and phrases : polyhedral convex function, smooth approximation, Laplace transformation, cumulant transformation, homogenization, recession function.

[259]

context of the composite optimization problem (P ). This note will concen- trate on this question and will discuss some related issues.

As a first attempt to answer (1.1), one may consider the classical Moreau–

Yosida approximation technique:

g t (x) = inf

u∈R n



g(x − u) + t 2 kuk 2

 .

The disadvantage of such an approach is twofold: first of all, the evaluation of g t (x) is not straightforward since it requires solving a minimization problem.

Secondly, the convex function g t is of class C 1 , but its degree of smoothness is not higher than one (unless one imposes additional assumptions on g; cf.

Lemar´echal and Sagastizabal [7]). The same remark applies to the rolling ball approximation technique [12]:

g t (x) = inf

kuk≤t −1 {g(x − u) − [t −2 − kuk 2 ] 1/2 }.

The approach suggested in this note is completely different: it uses ho- mogenization technique applied to the Laplace transform and to the cumu- lant transform of some discrete measures associated with the function g.

2. Smoothing a polyhedral supporting function. To start with, consider the case in which g : R n → R ∪ {+∞} is the supporting function of a nonempty polyhedral convex set Ω ⊂ R n , i.e.

(2.1) g(x) = Ψ Ω ∗ (x) := sup

w∈Ω

hw, xi.

One may think of g as the recession function ([11, p. 66]) (2.2) g(x) = [rec f ](x) := lim

t→∞ f (tx)/t

of some convex lower-semicontinuous function f : R n → R ∪ {+∞} that is finite at 0 ∈ R n . As observed by Ben-Tal and Teboulle [2], the usefulness of the approximation scheme (2.2) lies in the fact that frequently f is a smooth function, in which case the convex function

R n ∋ x 7→ g t (x) := f (tx)/t

is also smooth. Ben-Tal and Teboulle [2] provided the examples (2.3) g(x) = max{x 1 , . . . , x n }, f (x) = log h X n

j=1

e x j i and

(2.4) g(x) =

n

X

j=1

|x j |, f (x) =

n

X

j=1

[1 + x 2 j ] 1/2

to justify their observation, but they did not give a method for finding a smooth function f in other cases. Examples (2.3) and (2.4) were inspired by particular approximation techniques suggested by Bertsekas [3] and El-Attar et al. [5], respectively.

The aim of this section is to provide the reader with a simple and elegant method for constructing a smooth function f in the case in which g is an arbitrary polyhedral supporting function. Our approximation mechanism relies on the following basic assumption:

Ω ⊂ R n is a polyhedral convex set which admits at least one extreme (2.5)

point.

As is well known, such a set Ω can be represented in the form of a Minkowski sum:

(2.6) Ω = co[extr Ω] + rec Ω.

Here

(2.7) extr Ω = {w 1 , . . . , w k }

denotes the set of extreme points of Ω, and rec Ω refers to the recession cone of Ω ([11, p. 61]). Since Ω is a polyhedral convex set, rec Ω can be represented in terms of a set {a 1 , . . . , a m } ⊂ R n of generating directions:

(2.8) rec Ω = n X m

i=1

λ i a i : λ i ≥ 0 ∀i = 1, . . . , m o .

Without loss of generality one may suppose that the set {a 1 , . . . , a m } is minimal in the sense that none of these directions can be expressed as a nonnegative linear combination of the others. It is not difficult to show that g = Ψ Ω ∗ takes the form

(2.9) g(x) =  max{hw 1 , xi, . . . , hw k , xi} if x ∈ K,

+∞ otherwise,

where

K = {v ∈ R n : ha i , vi ≤ 0 ∀i = 1, . . . , m}.

Now we are ready to state:

Theorem 2.1. Let Ω be as in (2.5) and let g be the supporting function of Ω. Then there exists a convex function f : R n → R of class C ∞ such that

(2.10) lim

t→∞ f (tx)/t = g(x) for all x ∈ R n . An example of such a function is given by

(2.11) f (x) = log h

R n

e hw,xi dµ(w) i +

where M is a mapping from some normed space Ξ to the Euclidean space R ⁿ , and g : R ⁿ → R ∪ {+∞} is a polyhedral convex function, i.e.

epi g := {(x, α) ∈ R ⁿ × R : g(x) ≤ α}

is a polyhedral convex set [11, p. 172]. As a general rule M is a smooth mapping (say, of class C ^k , for some k ≥ 1), but the composite function g ◦ M is nonsmooth. This fact leads us to consider an approximate version

smooth convex functions g t : R ⁿ → R such that

t→∞ lim g t (x) = g(x) for all x ∈ R ⁿ ?

g _t (x) = inf

u∈R ⁿ

g(x − u) + t 2 kuk ²

.

Secondly, the convex function g t is of class C ¹ , but its degree of smoothness is not higher than one (unless one imposes additional assumptions on g; cf.

kuk≤t ⁻¹ {g(x − u) − [t ⁻² − kuk ² ] ^1/2 }.

2. Smoothing a polyhedral supporting function. To start with, consider the case in which g : R ⁿ → R ∪ {+∞} is the supporting function of a nonempty polyhedral convex set Ω ⊂ R ⁿ , i.e.

(2.1) g(x) = Ψ _Ω ^∗ (x) := sup

of some convex lower-semicontinuous function f : R ⁿ → R ∪ {+∞} that is finite at 0 ∈ R ⁿ . As observed by Ben-Tal and Teboulle [2], the usefulness of the approximation scheme (2.2) lies in the fact that frequently f is a smooth function, in which case the convex function

R ⁿ ∋ x 7→ g _t (x) := f (tx)/t

is also smooth. Ben-Tal and Teboulle [2] provided the examples (2.3) g(x) = max{x 1 , . . . , x n }, f (x) = log h X ⁿ

e ^x ^j i and

[1 + x ² _j ] ^1/2

Ω ⊂ R ⁿ is a polyhedral convex set which admits at least one extreme (2.5)

(2.7) extr Ω = {w ¹ , . . . , w ^k }

denotes the set of extreme points of Ω, and rec Ω refers to the recession cone of Ω ([11, p. 61]). Since Ω is a polyhedral convex set, rec Ω can be represented in terms of a set {a ¹ , . . . , a ^m } ⊂ R ⁿ of generating directions:

(2.8) rec Ω = n X ^m

λ _i a ⁱ : λ _i ≥ 0 ∀i = 1, . . . , m o .

Without loss of generality one may suppose that the set {a ¹ , . . . , a ^m } is minimal in the sense that none of these directions can be expressed as a nonnegative linear combination of the others. It is not difficult to show that g = Ψ _Ω ^∗ takes the form

(2.9) g(x) = max{hw ¹ , xi, . . . , hw ^k , xi} if x ∈ K,

K = {v ∈ R ⁿ : ha ⁱ , vi ≤ 0 ∀i = 1, . . . , m}.

Theorem 2.1. Let Ω be as in (2.5) and let g be the supporting function of Ω. Then there exists a convex function f : R ⁿ → R of class C ^∞ such that

t→∞ f (tx)/t = g(x) for all x ∈ R ⁿ . An example of such a function is given by

R ⁿ

e ^hw,xi dµ(w) i +

R ⁿ

e ^hw,xi dν(w),

(2.12) f (x) = log h X ^k

e ^hw ^j ^,xi i +

e ^ha ⁱ ^,xi .

P r o o f. The function f in (2.11) involves the Laplace transform R ⁿ ∋ x 7→ L ν (x) :=

R ⁿ

e ^hw,xi dν(w) of the measure ν, and the cumulant transform

R ⁿ ∋ x 7→ K µ (x) := log L µ (x)

Denote by µ j = µ({w ^j }) the mass of the extreme point w ^j . Then K µ (x) = log h X ^k

µ j e ^hw ^j ^,xi i and

t→∞ lim K _µ (tx)/t = max{hw ¹ , xi, . . . , hw ^k , xi},

regardless of the values of the µ j ’s. Similarly, denote by ν i = ν({a ⁱ }) the mass of the generating direction a ⁱ . In this case

ν i e ^ha ⁱ ^,xi and

t→∞ lim L ν (tx)/t = 0 if ha ⁱ , xi ≤ 0 ∀i = 1, . . . , m, +∞ otherwise,

t = g(x) for all x ∈ R ⁿ .

R ⁿ ∋ x 7→ inf

Example 2.1. Consider the function g : R ⁸ → R given by

f (x) = log(e ^x ¹ + e ^x ² + e ^x ³ ) + log(cosh x 4 ) + log(cosh x 5 )

+ log(1 + e ^x ⁶ ) + log(cosh x 7 + cosh x 8 ).

Of course, g can be regarded as the supporting function of some polytope in R ⁸ . In this case, however, the identification of the extreme points is a more cumbersome task.

3. Smoothing a polyhedral convex function. The approximation technique developed in Section 2 can be extended to the case in which g : R ⁿ → R ∪ {+∞} is an arbitrary polyhedral convex function. It suffices to use the representation

(3.1) g(x) = Ψ _{epi g} ^∗ ∗ (x, −1) = sup

(w,β)∈epi g ^∗

where g ^∗ : R ⁿ → R ∪ {+∞} stands for the Legendre–Fenchel conjugate of g.

Since g is a polyhedral convex function, it follows that the epigraph of g ^∗ is a polyhedral convex set ([11, Theorem 19.2]).

Theorem 3.1. Let g : R ⁿ → R ∪ {+∞} be a polyhedral convex function such that epi g ^∗ has at least one extreme point. Then there exists a convex function F : R ⁿ × R → R which is of class C ^∞ and such that

t→∞ F(tx, −t)/t = g(x) for all x ∈ R ⁿ . An example of such a function is

R ⁿ ×R

e ^hw,xi+βα dµ(w, β) i +