Approximation and Complexity (notes for students of the University of Warsaw)

(1)

Approximation and Complexity

(notes for students of the University of Warsaw)

Leszek Plaskota

Instytut Matematyki Stosowanej i Mechaniki Uniwersytet Warszawski

October 17, 2021

(2)

Part I

Classical approximation

2

(4)

Chapter 1 Preliminaries

A general formulation of the (classical) approximation problem is as follows. Let X be a linear normed space over K (where K = R or K = C) with a norm k · k. Since we are primarily interested in function spaces, the elements of X will be denoted by f, g, . . . . Let V be a linear subspace of X with finite dimension, i.e.,

dim(V ) = n < +∞.

For f ∈ X we define

dist(f, V ) := inf

v∈V kf − vk and

P_V(f ) := {v ∈ V : kx − vk = dist(f, V )}.

P_V(f ) is the set of optimal approximations for f with respect to V.

The approximation problem expresses a general wish to represent ‘complicated’ objects (the ones in X) with ‘simpler’ objects (the ones in V.)

We first make an observation that finite dimensionality of V ensures non-emptiness of P_V(x).

Indeed, since we clearly have that dist(f, V ) ≤ kf k, the set P_V(f ) is contained in the ball B = {v ∈ V : kvk ≤ 2kf k}; otherwise, if v ∈ PV(f ) and kvk > 2kf k then, by the triangle inequality,

kf − vk ≥ kvk − kf k > kf k ≥ dist(f, V ).

Since any finite and closed ball in a finite dimensional space is compact and the function w 7→ kf − wk is continuous, it attains its minimal value in B.

The assumption dim(V ) < +∞ is crucial for P_V(f ) 6= ∅. To see this, we give an example.

Example 1.1 Let X = `₁ be the space of all infinite and absolutely summable real sequences x = {x_n}_n≥1 with norm kxk =^P^∞_i=1|x_i|. Let V be the subspace consisting of all sequences v for which only finitely many coefficients are nonzero. Then for any x /∈ X \ V (for instance, for x_i = i⁻²) we have dist(x, V ) = 0, but for any v ∈ V is kx − vk > 0.

We list the following more or less obvious, but useful properties of the map f 7→ dist(f, V ). (A simple proof is left for the reader.)

Lemma 1.1

(i) dist(f + g, V ) ≤ dist(f, V ) + dist(g, V ), f, g ∈ X, 3

(5)

CHAPTER 1. PRELIMINARIES 4 (ii) dist(f + v, V ) = dist(f, V ), f ∈ X, v ∈ V,

(iii) dist(αf, V ) = |α| dist(f, V ), f ∈ X, α ∈ K, (iv) |dist(f, V ) − dist(g, V )| ≤ kf − gk, f, g ∈ X.

We now give sufficient conditions for uniqueness of the opimal approximations.

Definition 1.1 A normed space X is called uniformly convex iff for any > 0 there is δ > 0 such that for all f, g ∈ X the following holds:

if kf k = 1 = kgk and k(f + g)/2k > 1 − δ then kf − gk < .

Definition 1.2 A normed space X is called strictly convex iff for any f, g ∈ X the following holds:

if kf k = 1 = kgk and k(f + g)/2k = 1 then f = g.

Then we have the following result.

Theorem 1.1

(i) If X is uniformly convex then X is also strictly convex.

(ii) If dim(X) < +∞ and X is strictly convex then X is also uniformly convex.

Proof. (i) Let kf k = 1 = kgk and k(f + g)/2k = 1. Then the uniform convexity implies that for any > 0 is kf − gk < , which in turn means that kf − gk = 0 and f = g.

(ii) Suppose X is strictly convex and dim(X) < +∞. For given > 0, define the set A :=ⁿ(f, g) ∈ X × X : kf k = 1 = kgk and kf − gk ≥ ^o.

This set is closed and bounded in X × X, hence it is compact. Furthermore, by strict convexity of X, the function

h(f, g) = 1 − k(f + g)/2k

is positive and continuous on A. Then, uniform convexity holds for δ = inf_f,g∈Ah(f, g) > 0. Indeed, if kf k = 1 = kgk and kf − gk ≥ then we have k(f + g)/2k ≤ 1 − δ.

Theorem 1.2 If X is strictly convex then the optimal approximation is uniquely determined for any f ∈ X.

Proof. Suppose v⁰, v⁰⁰ ∈ V are different and both optimal for f ∈ X. Let d = dist(f, V ). Then for v = (v⁰ + v⁰⁰)/2 we have

kf − vk = d

1 2

f − v⁰

d +f − v⁰⁰ d

!

< 1 2

kf − v⁰k + kf − v⁰⁰k

= dist(f, V ),

where the inequality ‘<’ follows from strict convexity of X. This contradicts the optimality of v⁰ and v⁰⁰.

Examples of uniformly convex, and consequently also strictly convex, spaces are provided by unitary spaces. Recall that X is a unitary space iff its norm is generated by an inner product h·, ·i : H × H → K (where K ∈ {R, C}), i.e.,

kf k =^qhf, f i.

(6)

CHAPTER 1. PRELIMINARIES 5 Theorem 1.3 Any unitrary space X is uniformly convex.

Proof. For a given > 0 we let ₁ = min(2, ) and

δ = 1 −^q1 − ²₁/4 > 0.

Suppose that f, g ∈ X with kf k = 1 = kgk and k(f + g)/2k > 1 − δ. Due to the rectangle equality kf + gk²+ kf − gk² = 2kf k²+ kgk², we have

kf − gk² < 41 − (1 − δ)²= ²₁ < ², as claimed.

In a unitary space X, the optimal approximation v^∗ of f ∈ X with respect to V is just the orthogonal projection of f onto V ; that is,

hf − v^∗, vi = 0 for all v ∈ V.

Having a basis of V, the optimal element can be expressed as follows. Let n = dim(V ) and V = span(v1, v2, . . . , vn).

The orthogonality condition is equivalent to hv^∗, v_ii = hf, v_ii, 1 ≤ i ≤ n. Writing v^∗ =^Pⁿ_j=1a_jv_j, we then have that the unknown a_j’s satisfy the following n × n system of linear equations:

n

X

j=1

a_jhv_j, v_ii = hf, v_ii, 1 ≤ i ≤ n.

In particular, if the basis is orthonormal, i.e., hv_j, v_ii = 0 for j 6= i and kv_ik = 1 for all i then v^∗ =

n

X

j=1

hf, v_jiv_j and

dist(f, V )² = kf − v^∗k² = kf k²−

n

X

j=1

|hf, v_ji|² =

+∞

X

j=n+1

|hf, v_ji|².

The most popular unitary space is L²(a, b), where −∞ ≤ a < b ≤ +∞. It consists of (Lebesgue) measurable and square integrable (real or complex) functions defined on the interval (a, b), where the inner product is defined as

hf, gi =

Z b a

f (t)g(t) dt, and the corresponding norm is

kf k_L² = hf, f i^1/2 =

Z b a

|f (t)|²dt

!1/2

. It can be easily checked that the trigonometric polynomials

√1

2π, 1

√πcos t, 1

√π sin t, 1

√πcos 2t, 1

√π sin 2t, . . .

(7)

CHAPTER 1. PRELIMINARIES 6 form an orthonormal system in L²(0, 2π).¹

A generalization of L²(a, b) are L^p(a, b) spaces, where 1 ≤ p < +∞. They consist of all (Lebesgue) measurable functions on (a, b), such that |f |^p is integrable. The norm is defined as

kf k_L^p =

Z b a

|f (t)|^pdt

!1/p

. It is known that the spaces L^p(a, b) are separable Banach spaces.

We also define the space L^∞(a, b) of all mesurable functions for which kf k_L^∞ = ess sup

a≤t≤b

|f (t)|

(which defines the norm) is finite.

Recall an important H¨older’s inequality: for any measurable functions f and g and any 1 ≤ p, q ≤ +∞ such that

1 p +1

q = 1 (where q = +∞ if p = 1, and vice versa) we have

kf gk_L¹ ≤ kf k_L^pkgk_L^q. (1.1)

For 1 < p, q < +∞, H¨older’s inequality can be written as

Z b a

|f (t)g(t)| dt ≤

Z b a

|f (t)|^pdt

!1/p Z b a

|g(t)|^qdt

!1/q

.

Moreover, we have equality above if and only if the functions |f |^p and |g|^q are linearly dependent, meaning in particular that if f is not the zero function then there is c such that (note p/q = p − 1)

|g(t)| = c |f (t)|^p−1 for all t a.e. (1.2) The spaces L^p(a, b) are uniformly convex for 1 < p < +∞, but the proof of this fact is far from trivial. We show only strict convexity which, by Theorem 1.2, is sufficient for uniqueness of the best approximation.

Theorem 1.4 The space L^p(a, b) is strictly convex for 1 < p < +∞.

Proof. The triangle inequality (which of course is one of the necessary conditions for k · k_L^p to be a norm) says that

kf + gk_L^p ≤ kf k_L^p + kgk_L^p. (1.3)

Recall the proof. We have kf + gk^p_Lp =

Z b a

|(f + g)(t)| |(f + g)(t)|^p−1dt

≤

Z b a

|f (t)| |((f + g)(t)|^p−1dt +

Z b a

|g(t)| |(f + g)(t)|^p−1dt

≤ kf k_L^pk(f + g)^p−1k_L^q + kgk_L^pk(f + g)^p−1k_L^q,

1Actually, the subspace spanned by all trigonometric polynomials is dense in L²(0, 2π).

(8)

CHAPTER 1. PRELIMINARIES 7 where the second inequality follows from H¨older’s inequality applied to |f | and |f + g|^p−1, and to |g|

and |f + g|^p−1. Dividing both sides by k(f + g)^p−1k_L^q and using (p − 1)q = p we obtain (1.3).

From the proof above it follows (cf. (1.2)) that we have equality in (1.3) only if |g| = c₁|f + g|^p−1 and |f | = c₂|f + g|^p−1 a.e. For kf k_L^p = kgk_L^p = k(f + g)/2k_L_p = 1, this can happen only when g = f a.e.

The spaces L¹(a, b) and L^∞(a, b) are not strictly convex, and consequently not uniformly convex.

To see this, we provide simple examples showing that in these spaces the approximation problem does not have a unique solution.

Example 1.2 Consider the approximation of f ≡ 1 with respect to V = {t 7→ at : a ∈ R}. Then, for X = L^∞(0, 1) we have dist(f, V ) = 1 and any t 7→ at with a ∈ [0, 1] is optimal, while for X = L¹(−1, 1) we have dist(f, V ) = 2 and any t 7→ at with a ∈ [−1, 1] is optimal.

Finding optimal approximations with respect to the L^p norm with p 6= 2 is in general a difficult problem. However, it is possible to give the following characterization of the optimal elements.

Theorem 1.5 Let V be a finite dimensional subspace of L^p(a, b) where 1 < p < +∞. An element v^∗ ∈ V is optimal for f ∈ L^p(a, b) with respect to V if and only if for any v ∈ V it holds

Z b a

v(t)f (t) − v^∗(t)^p−1sgnf (t) − v^∗(t)dt = 0. (1.4) For L¹(a, b), the correponding if and only if condition reads

Z b a

v(t) sgnf (t) − v^∗(t)dt = 0 (1.5) (with the convention that sgn 0 = 0).

Proof. We first show necessity of the condition (1.4). We can assume that f /∈ V. We clam that there is a linear functional ` such that `(f − v^∗) = kf − v^∗k_L^p, k`k = 1, and `(v) = 0 for all v ∈ V.

Indeed, we first define the functional `₁ on the space spanned by V and the function f − v^∗ as

`₁α(f − v^∗) + v= αkf − v^∗k, α ∈ K, v ∈ V.

Since the optimal element for f −v^∗ in V equals zero, v is the optimal element for any g = α(f −v^∗)+v and dist(g, V ) = αkf − v^∗k. Hence

|`(g)| = |α| kf − v^∗kL^p ≤ kα(f − v^∗) + vk_L^p = kgk_L^p,

which means that `₁ has norm one. Next, by Hahn-Banach theorem, cf. Chapter 16, this functional can be extended to a funtional ` defined on L^p(a, b) preserving the norm.

It is known that the functional ` (as any other bounded functional on L^p(a, b) with 1 ≤ p < +∞) has the representation

`(f ) =

Z b a

f (t)h(t) dt,

(9)

CHAPTER 1. PRELIMINARIES 8 for some h ∈ L^q(a, b) such that khk_L^q = k`k = 1. By H¨older’s inequality,

`(f − v^∗) =

Z b a

(f − v^∗)(t)h(t) dt ≤

Z b a

|(f − v^∗)(t)| |h(t)| dt

≤ kf − v^∗k_L^pkhk_L^q = kf − v^∗k_L^p = `(f − v^∗).

This means that we have equalities above, and for some c > 0 is

h(t) = c |f (t) − v^∗(t)|^p−1 ∀t a.e. (1.6)

It also follows that (f − v^∗)(t)h(t) = |(f − v^∗)(t)| |h(t)| a.e. on (a, b), which together with (1.6) means that

sgn h(t) = sgn (f (t) − v^∗(t)) for all t such that f (t) 6= v^∗(t). Hence

h(t) = |h(t)| sgn h(t) = c |(f − v^∗)(t)|^p−1sgn(f − v^∗)(t),

which together with the fact that `(v) = ^R_a^bv(t)h(t) dt = 0 for v ∈ V completes the proof of the necessity of (1.4).

We now prove the sufficiency. Let v^∗ ∈ V satisfy (1.4). Then for any v ∈ V we have kf − v^∗k^p_Lp =

Z b a

(f − v^∗)(t)(f − v^∗)(t)^p−1sgn (f − v^∗)(t) dt

=

Z b a

(f − v + v − v^∗)(t)(f − v^∗)(t)^p−1sgn (f − v^∗)(t) dt

=

Z b a

(f − v)(t)(f − v^∗)(t)^p−1sgn (f − v^∗)(t) dt

≤ kf − vk_L^pkf − v^∗k^p−1_Lp , which implies kf − v^∗kL^p ≤ kf − vkL^p.

The proof for p = 1 follows the same lines as for p > 1 and therefore is omitted.

We add that for p = 2 the condition (1.4) menas that f − v^∗ is orthogonal to the subspace V.

(10)

Chapter 2 Uniform approximation

In this chapter, we deal with the approximation in C(D), where D ⊂ R^d is a compact set. (Recall that a set D ⊂ R^d is compact if and only if D is bounded and closed.) Specifically, C(D) is the space of all continuous and real-valued functions

f : D → R with the (uniform) norm

kf k := max

t∈D |f (t)|.

It is well known that C(D) is a Banach space, cf. Chapter 14.

Our aim is to give characterization of optimal approximations. We first note that C(D) is not a strictly convex space and therefore optimal approximations are not always unique. (One can produce similar examples to those for L^∞.)

Lemma 2.1 A function v ∈ V is optimal for f ∈ C(D) if and only if 0 is optimal for f − v.

Proof. If 0 is not optimal for f − v then there is v₁ ∈ V such that kf − v − v₁k < kf − vk,

but this means that v −v₁ is a better approximation for f than v. The proof of the reversed implication is similar.

Now we characterize functions for which 0 is optimal; that is, functions f such that

dist(f, V ) = kf k. (2.1)

Define the critical set as

Crit(f ) = {x ∈ D : |f (x)| = kf k}.

Theorem 2.1 The equality (2.1) holds if and only if there is no element v ∈ V such that f (x)v(x) > 0 for all x ∈ Crit(f ).

Proof. If there is v ∈ V such that

kf − vk < kf k, (2.2)

then f (x) and v(x) are of equal signs for any x ∈ Crit(f ). Indeed, if v(x) ≤ 0 < f (x) (the other cases are similar) then f (x) − v(x) ≥ f (x) = kf k which contradics (2.2).

9

(11)

CHAPTER 2. UNIFORM APPROXIMATION 10 Suppose now that there is v ∈ V that takes the same signs as f in Crit(f ). We can assume without loss of generality that kvk < kf k. Let

A = {x ∈ D : f (x)v(x) ≤ 0}.

If A is an empty set then we clearly have kf − vk < kf k. Otherwise A is compact and has empty intersection with Crit(f ), so that

m := max

x∈A |f (x)| < kf k.

Define

v₁ :=

1 − m kf k

v.

Then |f (x) − v₁(x)| < kf k for all x ∈ D and consequently kf − v₁k < kf k. Indeed, this is clear for x /∈ A, and for x ∈ A we have

|f (x) − v₁(x)| ≤ |f (x)| + |v₁(x)| < m +

1 − m kf k

kf k = kf k.

Corollary 2.1 An element v ∈ V is optimal for f ∈ C(D) if and only if there is no element w ∈ V such that

w(x)f (x) − v(x)> 0 for all x ∈ Crit(f − v).

To proceed further on, we need two facts from convex analysis. Recall that a convex hull of a set S ⊂ Rⁿ is the set of all convex linear combinations of points in S, i.e.,

conv(S) =

( _k X

i=1

α_is_i : k ∈ N, si ∈ S, α_i > 0,

k

X

i=1

α_i = 1

)

.

Lemma 2.2 Let S ⊂ Rⁿ be a compact set. The vector ~0 does not belong to the convex hull of S if and only if there is ~z ∈ Rⁿ such that the inner product h~z, ~ui₂ > 0 for all ~u ∈ S.

Proof. Suppose first that ~0 /∈ conv(S). Let ~z be the element of conv(S) with the minimal norm, i.e., k~zk2 = min{k ~wk2 : ~w ∈ conv(S)}.

(Such an element exists by compactness of S, and it is unique by convexity of conv(S) and strict convexity of the space Rⁿ.) For any ~u ∈ S and 0 ≤ α ≤ 1 we have α~u + (1 − α)~z ∈ conv(S) and

0 ≤ kα~u + (1 − α)~zk²₂− k~zk²₂ = ααk~u − ~zk²₂+ 2h~u − ~z, ~zi₂. This may hold for any α only if h~u − ~z, ~zi ≥ 0, or equivalently

h~u, ~zi₂ ≥ k~zk²₂ > 0.

Thus ~z is the wanted element.

Suppose now that ~0 ∈ conv(S). Then ~0 =^P^m_i=0λ_i~s_i for some ~s_i ∈ S and λ_i > 0 with ^P^m_i=0λ_i = 1.

Hence for any ~z ∈ Rⁿ we have ^P^m_i=0λ_ih~s_i, ~zi₂ = 0, which means that for at least one i is h~s_i, ~zi₂ ≤ 0, as claimed.

(12)

CHAPTER 2. UNIFORM APPROXIMATION 11 Lemma 2.3 Every point of a convex hull of a set S ⊂ Rⁿ is a convex combination of at most n + 1 points of S.

Proof. Let ~x ∈ conv(S). Then ~x =^P^m_i=0α_i~s_i for some ~s_i ∈ S and α_i > 0 with^P^m_i=0α_i = 1, where we assume that m is smallest possible. Suppose that m ≥ n + 1. Define ~y_i = ~s_i− ~x for 0 ≤ i ≤ m. Then

P_m

i=0αi~yi = ~0. Since m > n, we also have that the elements ~yi for 1 ≤ k ≤ m are linearly dependent;

hence ^P^m_i=1β_i~y_i = ~0 for some β_is, where at least one β_i is negative. Then for all real λ we have

m

X

i=0

(λα_i+ β_i)~y_i = ~0,

where we additionally set β₀ = 0. Now, we set λ := max

0≤j≤m

−βj

α_j > 0.

Then λα₀ + β₀ > 0, for at least one i is λα_i + β_i = 0, and for all the remaining indexes i these coefficients are nonnegative. Using again the substitution ~y_i = ~s_i− ~x we have

m

X

i=0

λα_i+ β_i

!

~x =

m

X

i=0

(λα_i+ β_i)~s_i,

and dividing both sides by ^P^m_i=0(λα_i + β_i) we finally obtain that ~x can be represented as a convex combination of less than m + 1 points from S.

We also introduce some notation. Let dim(V ) = n and (v₁, v₂, . . . , v_n) be a fixed basis of V . Then

~v(x) := (v₁(x), v₂(x), . . . , v_n(x)).

For x ∈ D, we denote by ˆx a linear functional on C(D) given by ˆ

x(g) = g(x), g ∈ C(D).

Theorem 2.2 The following conditions are equivalent:

(i) kf k = dist(f, V ).

(ii) No element of V has the same signs as f in the set Crit(f ).

(iii) 0 belongs to the convex hull of the set {f (x)~v(x) : x ∈ Crit(f )}.

(iv) There exists a functional of the form L =^P^k_i=1λ_ixˆ_i with k ≤ n + 1, such that x_i ∈ Crit(f ) and λ_if (x_i) > 0 for all 1 ≤ i ≤ k, and V ⊂ ker(L).

Proof. Equivalence of (i) and (ii) is proven in Theorem 2.1. To show (ii) ⇒ (iii), observe that (ii) implies that there are no numbers c_i such that

f (x)

n

X

i=1

c_iv_i(x) > 0 ∀x ∈ Crit(f ).

This condition can be written as h~c, f (x)~v(x)i2 > 0 (~c = (c1, . . . , cn)) and means, by Lemma 2.2, that

~0 is in the convex hull of the set {f(x)~v(x) : x ∈ Crit(f)}.

(13)

CHAPTER 2. UNIFORM APPROXIMATION 12 We show (iii) ⇒ (iv). From (iii) and Lemma 2.3 it follows that ~0 ∈ Rⁿ can be written as a convex combination of k ≤ n + 1 points from {f (x)~v(x) : x ∈ Crit(f )}, i.e.,

~0 =

k

X

i=1

αif (xi)~v(xi).

Then L(g) =^P^k_i=1λ_ig(x_i) where λ_i = α_if (x_i). Indeed, we clearly have λ_jf (x_j) > 0 and 0 =

k

X

i=1

λ_iv_j(x_i) =

k

X

i=1

λ_ixˆ_i

!

(v_j) = L(v_j), 1 ≤ j ≤ n, which means that L vanishes on V.

And finally, to show (iv) ⇒ (i), we check that (iv) implies that for any v ∈ V is kf k

k

X

i=1

|λ_i| =

k

X

i=1

λ_if (x_i) =

k

X

i=1

λ_if (x_i) − v(x_i)≤ kf − vk

k

X

i=1

|λ_i|, i.e., kf k ≤ kf − vk. The proof is complete.

Now we deal with uniqueness of optimal approximations.

Definition 2.1 An n-dimensional linear space V ⊂ C(D) is a Haar space iff any nonzero function f ∈ V vanishes in at most n − 1 points of D. Any basis of a Haar space is called a Haar system.

Let us note that an equivalent condition for V to be a Haar space is that the interpolation problem:

for different n points x_i ∈ D and numbers y_i, find v ∈ V such that

v(x_i) = y_i, 1 ≤ i ≤ n, (2.3)

has a unique solution. Indeed, writing v =^Pⁿ_j=1ajvj where {vj}ⁿ_j=1 is a basis of V, we have that (2.3) is equivalent to the linear system of n × n equations

n

X

j=1

ajvj(x_i) = y_i, 1 ≤ i ≤ n.

It has a unique solution if and only if the zero function is the unique solution of the homogenous system. The latter is ensured by the Haar condition.

We now give some examples.

A probably most natural Haar space is provided by algebraic polynomials of degree at most n, P_n+1 := span(1, x, x², . . . , xⁿ) ⊂ C([a, b])

for any −∞ < a < b < +∞. Obviously dim P_n+1 = n + 1. Another important Haar space form trigonometric polynomials,

V_2n+1 := span1, cos t, sin t, . . . , cos nt, sin nt⊂ C([a, b]) (2.4) for any a, b with 0 < b − a < 2π. Indeed, using e^iφ = cos φ + i sin φ (i =√

−1), any nontrivial function t 7→ h(t) :=^Pⁿ_k=0a_kcos kt + b_ksin kt ∈ V_2n+1 can be written as

h(t) = ^X

|k|≤n

c_ke^ikt= z⁻ⁿ

2n

X

k=0

c_k−nz^k, (2.5)

(14)

CHAPTER 2. UNIFORM APPROXIMATION 13 where z = e^it and c₀ = a₀, c±k = (a_k∓ ibk)/2, 1 ≤ k ≤ n. If c_±k = 0 for all k then also a_k = b_k = 0.

This yields dim V_2n+1 = 2n + 1. Furthermore, h vanishes at no more than 2n points in [a, b], since the algebraic polynomial on the right-hand side of (2.5) has at most 2n different zeros and the function x 7→ e^it is one-to-one in any real interval of length smaller than 2π.

Remark 2.1 Some important subspaces of V_2n+1 are also Haar spaces. These include Vbn+1 := span(1, cos t, cos 2t, . . . , cos nt) ⊂ C([0, π]), and Ve_n:= span(sin t, sin 2t, . . . , sin nt) ⊂ C([, π − ]) for any ∈ (0, π/2).

Indeed, any nontrivial function t 7→ h(t) :=^Pⁿ_k=0akcos kt ∈V^bn+1can be written as h(t) =^Pⁿ_k=0akTk(x) where x = cos t, t ∈ [0, π] and T_k is the kth Chebyshev polynomial (of the first kind). Since the change of variable is one-to-one, h has at most n zeros in [0, π].

To see that V^en is a Haar space on [, π − ] it suffices to notice that if an odd function t 7→ h(t) :=

Pn

k=1b_ksin kt has n zeros ≤ t₁ < . . . < t_n ≤ π − then it has 2n + 1 zeros in [−π + , π − ]; namely 0 and ±t_j, 1 ≤ j ≤ n. This is impossible since, as shown above, V_2n+1 is a Haar space on the interval [−π + , π − ].

Theorem 2.3 Let V ⊂ C(D) be a Haar space of dimension n, and f ∈ C(D) be such that kf k = dist(f, V ). Then there exists γ > 0 (that depends on f ) such that for any v ∈ V

kf − vk ≥ kf k + γ kvk.

Proof. We first make an observation that if V is an n-dimensional Haar space then k in the definition of the functional L of Theorem 2.2(iv) equals n + 1; otherwise we could choose v ∈ V such that v(x_i) = λ_i, 1 ≤ i ≤ k, and then 0 = L(v) =^P^k_i=1λ_iv(x_i) =^P^k_i=1λ²_i > 0.

We write L as

L(g) =

n

X

i=0

θ_iσ_ig(x_i), σ_i := sgnf (x_i),

so that all θ_is are positive. Let w ∈ V with kwk = 1. Since L(w) = 0 and, by the Haar condition, w does not vanish at all the points x_i, we have max_0≤i≤nσ_iw(x_i) > 0. Define

γ := inf

w max

i σiw(xi) > 0.

Now, if v ∈ V then for some i is σ_iv(x_i)/kvk ≥ γ and

kf − vk ≥ σ_if (x_i) + v(x_i)≥ kf k + γ kvk, as claimed.

Corollary 2.2 If V ⊂ C(D) is a Haar space then any f ∈ C(D) has a unique optimal approximation with respect to V.

Proof. If v⁰ and v⁰⁰ are both optimal for f then by Theorem 2.3 we have dist(f, V ) = kf − v⁰⁰k = k(f − v⁰) + (v⁰− v⁰⁰)k

≥ kf − v⁰k + γ kv⁰− v⁰⁰k = dist(f, V ) + γ kv⁰− v⁰⁰k which forces kv⁰ − v⁰⁰k = 0 and v⁰ = v⁰⁰.

Another consequence of Theorem 2.3 is continuity of the best approximation.

(15)

CHAPTER 2. UNIFORM APPROXIMATION 14 Theorem 2.4 Let V ⊂ C(D) be a Haar space. Let A : C(D) → V be the mapping that associates with any element of C(D) its optimal approximation with respect to V . Then for any f ∈ C(D) there is κ such that for any other g ∈ C(D) we have

kA(f ) − A(g)k ≤ κ kf − gk.

Proof. By Theorem 2.3, there is γ > 0 such that for any w ∈ V

k(f − A(f )) − wk ≥ kf − A(f )k + γ kwk

which is equivalent to kf − vk ≥ kf − A(f )k + γ kA(f ) − vk for all v ∈ V. Hence, taking v = A(g), we obtain

γ kA(f ) − A(g)k ≤ kf − A(g)k − kf − A(f )k

≤ kf − gk + kg − A(g)k − kf − A(f )k

≤ kf − gk + kg − A(f )k − kf − A(f )k

≤ kf − gk + kg − f k + kf − A(f )k − kf − A(f )k

= 2 kf − gk, and the theorem holds with κ = 2/γ.

Finally, we arrive at the (Chebyshev) alternation theorem for the domain D = [a, b].

Theorem 2.5 Let V be an n-dimensional Haar subspace of C([a, b]). An element v is optimal for f ∈ C([a, b]) with respect to V if and only if there exist σ ∈ {−1, +1} and points a ≤ x0 < x1 < · · · <

x_n≤ b such that

f (x_i) − v(x_i) = σ(−1)ⁱkf − vk, 0 ≤ i ≤ n.

Proof. Let v be optimal for f. Then, by Theorem 2.2(iv), there are points x₀ < x₁ < · · · < x_n and numbers λ_i such that |f (x_i) − v(x_i)| = kf − vk and λ_i(f (x_i) − v(x_i)) > 0, and^Pⁿ_i=0λ_iw(x_i) = 0 for all w ∈ V. We show that x_i are the alternation points. To that end, it suffices that λ_j−1λ_j < 0. Indeed, for 1 ≤ j ≤ n, we choose w_j ∈ V such that it interpolates the data w_j(x_j) = 1 and w_j(x_i) = 0, i 6= j − 1, j. Then

0 =

n

X

i=0

λ_iw_j(x_i) = λ_j−1w_j(x_j−1) + λ_j.

This implies w_j(x_j−1) > 0, since otherwise w_j would have an nth zero in the interval (x_j−1, x_j), and consequently λ_j−1λ_j < 0.

Suppose now that the n + 1 alternation points x_i exist. If there were a w ∈ V such that kf − wk <

kf − vk, then we would have that the function (f − v) − (f − w) = w − v ∈ V assumes alternately positive and negative values at successive x_i. Hence the function w − v would have at least n different zeros. Since V is a Haar space, w − v = 0 and w = v.

Remark 2.2 If V = P_n+1 is the space of algebraic polynomials of degree at most n, then existence of the alternation points can be shown straightforwardly.

Indeed, let f ∈ C([a, b]) and v be optimal for v in P_n+1. Define points {xi} as follows. The first point x₁ is the smallest one in [a, b] such that |(f − v)(x₁)| = kf − vk. We can assume without loss of

(16)

CHAPTER 2. UNIFORM APPROXIMATION 15 generality that (f − v)(x₁) = −kf − vk. Then x₂ is the smallest point in [x₁, b] such that (f − v)(x2) = kf − vk, and generally, x_i is the smallest point in [x_i−1, b] such that (f − v)(x_i) = (−1)ⁱkf − vk.

If we can choose in such a way at least n+2 points then {x_i}ⁿ⁺²_i=1 are the alternation points. Suppose that we can choose only k ≤ n + 1 points. Then we define z_i for 2 ≤ i ≤ k as the largest zero of f − v in the interval [x_i−1, x_i]. We obviously have z_i < x_i. Observe that the polynomial

x 7→ w(x) = (−1)^k(x − z₂)(x − z₃) · · · (x − z_k)

is in P_n+1 and has the property that w(x)(f − v)(x) > 0 for all x such that |(f − v)(x)| = kf − vk.

Now it suffices to use Corollary 2.1.

Remark 2.3 Consider the space V_2n+1 of trigonometric polynomials defined in (2.4). This is a subspace of C([0, 2π]), but not a Haar space for all n ≥ 1, since the function x 7→ sin nx has 2n + 1 different zeros πk/n, 0 ≤ k ≤ 2n. As a consequence, there are functions f ∈ C([0, 2π]) for which the optimal approximation with respect to V_2n+1 is not unique. (A simple example is n = 1, where t 7→ v(t) = −a sin t is optimal for t 7→ f (t) = t/π − 1, for all 0 ≤ a ≤ 1.) However, if we assume, in addition to f ∈ C([0, 2π]) that f (0) = f (2π) then all the results of this chapter that follow from V being a Haar space remain valid.

To see this, observe that for such f the points x_i defining the functional L in Theorem 2.2(iv) can be chosen such that

−π ≤ x₁ < x₂ < · · · < x_k < π.

(Indeed, the condition f (−π) = f (π) implies that the point π can be identified with −π.) Moreover, since V_2n+1 is already a Haar subspace of C([−π, x_k]), proceeding as in the proof of Theorem 2.3 we have k = 2n + 2, and this theorem follows. Consequently, we also have uniqueness of the optimal approximation and {x_i}²ⁿ⁺²_i=1 are the alternation points.

Remark 2.4 If V ⊂ C(D) is not a Haar space then one can construct a function f that possesses more than one optimal element with respect to V. The construction goes as follows.

We first choose a basis (v₁, . . . , v_n) of V and points x₁, . . . , x_n∈ D such that the matrix {v_i(x_j)}ⁿ_i,j=1 is singular. Let ~a = (a₁, . . . , a_n) and ~b = (b₁, . . . , b_n) be nonzero vectors that are, correspondingly, orthogonal to the columns and rows of this matrix, i.e.,

n

X

i=1

a_iv_i(x_j) = 0, 1 ≤ j ≤ n, and

n

X

j=1

b_jv_i(x_j) = 0, 1 ≤ i ≤ n.

(Then obviously ^Pⁿ_j=1b_jv(x_j) = 0 for all v ∈ V.) Let p =^Pⁿ_i=1a_iv_i. We can assume that kpk < 1. Let g ∈ C(D) be such that kgk = 1 and g(xj) = sgn bj for 1 ≤ j ≤ n, and

f (x) = g(x)(1 − |p(x)|).

Then f (x_j) = g(x_j) = sgn b_j. We also have that kf − vk ≥ 1 for all v ∈ V, since otherwise sgn v(x_j) = sgn f (x_j) = sgn b_j, which contradicts ^P_j=1bjv(xj) = 0. We show that λ p is optimal for f for all 0 ≤ λ ≤ 1. Indeed, for any x ∈ D we have

|f (x) − λ p(x)| ≤ |f (x)| + λ |p(x)| = |g(x)|(1 − |p(x)|) + λ |p(x)|

≤ 1 − |p(x)| + λ |p(x)| ≤ 1.

(17)

Chapter 3 The Weierstrass theorem

This chapter is devoted to the well-known Weierstrass theorem which establishes density of algebraic polynomials in the space C([a, b]). Among several proofs of this fact we choose the one that uses properties of positive operators.

For f, g ∈ C([a, b]) we write f ≥ g (or f ≤ g) iff f (x) ≥ g(x) (or f (x) ≤ g(x)) for all x ∈ [a, b]. By

|f | we mean the function x 7→ |f (x)|, x ∈ [a, b].

Definition 3.1 A linear operator L : C([a, b]) → C([a, b]) is positive iff for all f ∈ C([a, b]) the condition f ≥ 0 implies that Lf ≥ 0.

Sometimes the term monotone operator instead of positve operator is used, since Definition 3.1 is obviously equivalent to the following: for any f, g ∈ C([a, b]), if f ≤ g then Lf ≤ Lg.

For positive operators we have in particular that |Lf | ≤ L(|f |).

Theorem 3.1 Let the functions h_i be defined as h_i(x) = xⁱ. Let {L_n}_n≥1 be a sequence of positive linear operators, L_n: C([a, b]) → C([a, b]). If

n→∞lim kh_i− L_nh_ik = 0 for i = 0, 1, 2, then

n→∞lim kf − Lnf k = 0 for all f ∈ C([a, b]).

Proof. Let f ∈ C([a, b]). Let ε > 0. Since continuity of f implies its uniform continuity, there is δ > 0 such that |f (x) − f (y)| < ε if |x − y| < δ. On the other hand, if |x − y| ≥ δ then

|f (x) − f (y)| ≤ 2kf k ≤ 2kf k(x − y)²/δ².

Hence, for c := 2kf k/δ² we have |f (x) − f (y)| ≤ ε + c(x − y)², which can be written in terms of h_i as

|f − f (y)h₀| ≤ εh₀+ c(h₂ − 2yh₁+ y²h₀), (3.1) where we treat both sides of (3.1) as functions of x. Applying the positive operator L_n we get

|L_nf − f (y)L_nh₀| ≤ εL_nh₀+ c(L_nh₂− 2yL_nh₁+ y²L_nh₀).

16

(18)

CHAPTER 3. THE WEIERSTRASS THEOREM 17 Then, denoting e⁽ⁱ⁾_n := L_nhi− hi and taking x = y we obtain

|(L_nf )(y) − f (y)(L_nh₀)(y)|

≤ ε(L_nh₀)(y) + c(L_nh₂)(y) − 2y(L_nh₁)(y) + y²(L_nh₀)(y)

= ε1 + e⁽⁰⁾_n (y)+ c

y²+ e⁽²⁾_n (y) − 2yy + e⁽¹⁾_n (y)+ y²1 + e⁽⁰⁾_n (y)

= ε + εe⁽⁰⁾_n (y) + ce⁽²⁾_n (y) − 2cye⁽¹⁾_n (y) + cy²e⁽⁰⁾_n (y)

≤ ε + εke⁽⁰⁾_n k + cke⁽²⁾_n k + 2ckh₁kke⁽¹⁾_n k + ckh₂kke⁽⁰⁾_n k,

which is smaller than 2ε for n sufficiently large, n ≥ m with m independent of y. The proof completes the observation that

kL_nf − f k ≤ kL_nf − f L_nh₀k + kf L_nh₀− f h₀k ≤ 2ε + kf kke⁽⁰⁾_n k, which is smaller than 3ε for sufficiently large n.

Theorem 3.1 yields, in particular, the Weierstrass theorem.

Theorem 3.2 For any function f ∈ C([a, b]) and any ε > 0, there exists an algebraic polynomial p such that kf − pk < ε.

Proof. Without loss of generality we can restrict ourselves to the interval [a, b] = [0, 1]. For a given f ∈ C([0, 1]) we define the Bernstein polynomials as

(B_nf )(x) :=

n

X

k=0

f k n

! n k

!

x^k(1 − x)^n−k.

It is clear that for all n, B_nf is a polynomial of degree n, and the operator f 7→ B_nf is linear and positive. Hence it is enough to show that we have convergence of B_nf to f for the polynomials h_i, i = 0, 1, 2. For i = 0,

(B_nh₀)(x) =

n

X

k=0

n k

!

x^k(1 − x)^n−k = 1, i.e., B_nh₀ = h₀. For i = 1,

(B_nh₁)(x) =

n

X

k=0

k n

n k

!

x^k(1 − x)^n−k =

n

X

k=0

n − 1 k − 1

!

x^k(1 − x)^n−k

= x

n−1

X

k=0

n − 1 k

!

x^k(1 − x)^n−1−k = x, i.e., B_nh₁ = h₁. And finally, for i = 2,

(B_nh₂)(x) =

n

X

k=0

k n

!2

n k

!

x^k(1 − x)^n−k =

n

X

k=1

k n

n − 1 k − 1

!

x^k(1 − x)^n−k

=

n

X

k=1

n − 1 n

k − 1 n − 1 + 1

n

! n − 1 k − 1

!

x^k(1 − x)^n−k

= n − 1 n x²

n

X

k=2

n − 2 k − 2

!

x^k−2(1 − x)^n−k+ x n

= n − 1

n x²+x n,

(19)

CHAPTER 3. THE WEIERSTRASS THEOREM 18 which yields |(B_nh2)(x) − h₂(x)| = x(1 − x)/n ≤ 1/(4n) and convergence (in norm) of B_nh2 to h₂.

Corollary 3.1 Let f ∈ C([a, b]). For any n ≥ 1 there exist points a ≤ x⁽ⁿ⁾₀ < x⁽ⁿ⁾₁ < · · · < x⁽ⁿ⁾_n ≤ b, n ≥ 1,

such that the algebraic polynomial p_n∈ P_n+1 interpolating f at the points x⁽ⁿ⁾_i , 0 ≤ i ≤ n, is optimal.

Moreover, lim_n→∞kf − p_nk = 0.

Proof. This is a direct consequence of the Weierstrass Theorem 3.2 and the Chebyshev alternation Theorem 2.5. Indeed, let v^∗_n be the optimal polynomial for f with respect to Pn+1. By the the alternation theorem, f − v_n^∗ nullifies between any two alternation points. Since there are n + 2 alternation points, v^∗_n interpolates f at n + 1 points. Moreover, optimality of v^∗_n and the Weierstrass theorem yield limn→∞kf − v^∗_nk = 0. Hence the theorem holds with pn= v_n^∗.

The ‘problem’ with Corollary 3.1 is that the points x⁽ⁱ⁾_n depend on the particular function f . A natural question is whether it is possible to choose points independently of f such that we have convergence of the polynomial approximation to f for any f ∈ C([a, b]). Unfortunately, the answer is negative and is a consequence of more general considerations in the following Chapters 4 and 5 of this book.

Theorem 3.1 has its counterpart in C_2π(R). This space consists of all functions f : R → R that are continuous and 2π-periodic, i.e,

f (x) = f (x + 2π) for all x ∈ R, and where the norm is

kf k := max

x∈R |f (x)|.

Theorem 3.3 Let h₀(x) = 1, h₁(x) = cos x, h₂(x) = sin x. Let {L_n}_n≥1 be a sequence of positive linear operators, L_n: C_2π(R) → C2π(R). If

n→∞lim L_n(h_i) = h_i for i = 0, 1, 2, then lim_n→∞L_n(f ) = f for all f ∈ C_2π(R).

Proof. We proceed similarly to the proof of Theorem 3.1. Let f ∈ C_2π(R) and M := kf k. Choose arbitrary > 0. By uniform continuity of f there is δ > 0 such that |x−y| ≤ δ implies |f (x)−f (y)| ≤ .

We claim that for any α − δ < x ≤ 2π + α − δ is

|f (x) − f (α)| ≤ + 2M

sin²(δ/2)ψ(x), ψ(x) = sin² x − α 2

!

, (3.2)

Indeed, if |x − α| < δ then |f (x) − f (α)| ≤ ; otherwise ₂^δ ≤ ^x−α₂ ≤ π −₂^δ which implies sin²(^x−α₂ ) ≥ sin²(^δ₂) and

|f (x) − f (α)| ≤ 2M ≤ 2M

sin²(δ/2)ψ(x).

Approximation and Complexity (notes for students of the University of Warsaw)