Set-Valued Var. Anal DOI 10.1007/s11228-017-0415-x
Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke1 · Caifang Wang2 · Xianfu Wang1 · Jia Xu1
Received: 4 July 2016 / Accepted: 3 May 2017 © Springer Science+Business Media Dordrecht 2017
Abstract Subgradient projectors play an important role in optimization and for solving convex feasibility problems. For every locally Lipschitz function, we can define a subgradient projector via generalized subgradients even if the function is not convex. The paper consists of three parts. In the first part, we study basic properties of subgradient projectors and give characterizations when a subgradient projector is a cutter, a local cutter, or a quasi-nonexpansive mapping. We present global and local convergence analyses of subgradent projectors. Many examples are provided to illustrate the theory. In the second part, we investigate the relationship between the subgradient projector of a prox-regular function and the subgradient projector of its Moreau envelope. We also characterize when a mapping is the subgradient projector of a convex function. In the third part, we focus on linearity properties of subgradient projectors. We show that, under appropriate conditions, a linear operator is a subgradient projector of a convex function if and only if it is a convex combination of the identity operator and a projection operator onto a subspace. In general, neither a convex combination nor a composition of subgradient projectors of convex functions is a subgradient projector of a convex function.
Xianfu Wang
[email protected] Heinz H. Bauschke
[email protected] Caifang Wang
[email protected] Jia Xu
[email protected] 1
Mathematics, University of British Columbia, Kelowna, BC V1V 1V7, Canada
2
Department of Mathematics, Shanghai Maritime University, Shanghai, China
H.H. Bauschke et al.
Keywords Approximately convex function · Averaged mapping · Cutter · Essentially strictly differentiable function · Fixed point · Limiting subgradient · Local cutter · Local quasi-firmly nonexpansive mapping · Local quasi-nonexpansive mapping · Local Lipschitz function · Linear cutter · Linear firmly nonexpansive mapping · Linear subgradient projection operator · Moreau envelope · Projection · Prox-bounded · Proximal mapping · Prox-regular function · Quasi-firmly nonexpansive mapping · Quasi-nonexpansive mapping · (C,ε)-firmly nonexpansive mapping · Subdifferentiable function · Subgradient projection operator Mathematics Subject Classification (2010) Primary 49J52 · Secondary 49J53 · 47H04 · 47H05 · 47H09
1 Introduction Studies of optimization problems and convex feasibility problems have led in recent years to the development of a theory of subgradient projectors, which is a projection to a certain half-space. Rather than finding projections on level sets of original functions, the iterative algorithms find projections on half spaces which include the 0-level set of the function. Polyak developed subgradient projector iteration for convex functions [45–47], and they are further developed by Censor, Combettes, Fukushima, Kiwiel, Yamada and others, and applied to many kinds of optimization problems [7, 19, 21, 22, 24, 25, 29, 34, 35, 43, 55]. In [10], we give a systematic study for subgradient projectors of convex functions. Convexity is often a too strong assumption for the needs of applications. In a recent work [42], Pang studied finitely convergent algorithms for nonconvex inequality problems involving approximately convex functions. The subgradient projector by Pang used the Clarke subdifferential instead of the Mordukhovich limiting subdifferential. To this day, however, there is a lack of systematic theory on the subgradient projector when a function is possibly nonconvex. The goal of this paper is to carry out the basic theory of subgradient projectors for possibly nonconvex functions on a finite dimensional space, which is thus aimed ultimately at applications to diverse problems of nonconvex optimization. Non-differentiable and nonconvex functions arise in many optimization problems. As far as nonconvex functions are concerned, the cutter theory or T -class developed by Cegielski [20], Bauschke, Borwein and Combettes [5], and Bauschke, Wang, Wang and Xu [11] furnish the new approach to subgradient projectors, without appealing to the existence theory on subgradient projectors for convex functions. Our study shows that subgradient projectors for nonconvex functions have many attractive analytical properties. Among all results presented here, we discover that while cutters and quasi-nonexpansive mappings on Rn are global, cutters and quasinonexpansive mappings on a neighborhood are more useful for functions which are locally convex around the desired point, say a critical point or a feasible point. This paper not only includes some results from [54], but also many refinements and new advances. Since definitions and proofs are much simpler in the finite dimensional space, and many technical complications do not even appear, we shall work in the finite dimensional space only. For the convenience of readers, our main results are presented in three parts. In the first part, we study extensions and theory of subgradient projectors. In the second part, we consider subgradient projectors of Moreau enevelopes and conditions under which a mapping is the subgradient projector of a convex function. The third part is devoted to linear subgradient projectors.
Subgradient Projectors: Extensions, Theory, and Characterizations
The remainder of this paper is organized as follows. Part I consists of Sections 2–6. Section 2 provides an extension of subgradient projectors from convex functions to possibly nonconvex functions; Section 3 is devoted to calculus of subgradient projectors; Section 4 deals with whether one can recover a function from its subgradient projector, and fixed point closed property of a subgradient projector; conditions on functions under which their subgradient projectors are cutters or local cutters are presented in Section 5. Section 6 is devoted to convergence analysis of subgradient projectors by using theory from cutters, local cutters, quasinonexpansive mapping, and local quasinonexpansive mappings. Under appropriate assumptions, we show that subgradient projectors are (C, ε)-firmly nonexpansive, a very useful concept introduced by Hesse and Luke for studying local linear convergence of a variety of algorithms. Part II consists of Sections 7 and 8. For prox-bounded and prox-regular functions, their Moreau envelopes are differentiable. Section 7 studies the subgradient projectors of Moreau envelopes of prox-bounded and prox-regular functions, and their connections to subgradient projectors of original functions. We show that if f is proper, lsc, prox-bounded, and proxregular on Rn , then f is a difference of convex functions (Corollary 7.9); and that if f is C 2 , min f = 0, and ∇f (x) = 0 for every x ∈ Rn \ argmin f , then the subgradient projector Gf of f is a cutter if and only if the subgradient projector Geλ f of the envelope eλ f is a cutter for every λ > 0 (Propositions 7.22 and 7.24). Section 8 characterizes when a mapping is actually a subgradient projector of a convex function. Part III consists of Sections 9–11. It is interesting to ask when a subgradient projector is linear, and what special properties a linear subgradient projector possesses. To the best of our knowledge, this question has not been explored in the literature. Section 9 studies linear subgradient projectors and their distinguished features. In particular, we give a nonlinear cutter which is nonexpansive but not firmly nonexpansive, and the example is much simpler than the one given by Cegielski [20]. In Section 10, using results from Section 9, we show that in general neither a convex combination nor a composition of subgradient projectors of convex functions is a subgradient projector of a convex function. Finally, in Section 11, we completely characterize linear subgradient projectors on R2 , and give explicit formulae for the corresponding functions. The notation that we employ is for the most part standard; however, a partial list is provided for the reader’s convenience. Throughout this paper, Rn is the n-dimensional dimensional Euclidean space with inner product ·, · and induced norm · , i.e., (∀x ∈ √ n n n Rn ) x := x, x. The identity operator on R isn Id. For a mapping T : R → R , T x = x ; its kernel is ker T := its fixed point set is denoted by Fix T := x ∈ R x ∈ Rn T x = 0 ; its range is ran T := y ∈ Rn y = T x for some x ∈ Rn . n by levαf := For na function f : R → (−∞, +∞], its α-level set is denoted x ∈ R f (x) ≤ α ; its effective domain is dom f := x ∈ Rn f (x) < +∞ . For range and fixed point set of F a set-valued mapping F : Rn ⇒ Rm , the domain, n F (x) = ∅ , ran F := ∪ are given by dom F := x ∈ R x∈Rn F (x), and Fix F := x ∈ Rn x ∈ F (x) respectively. We use B(x, δ) for the closed ball centered at x ∈ Rn with radius δ > 0. R+ denotes the set of non-negative real numbers, and N denotes the set of non-negative integers {0, 1, 2, . . .}. For a set C ⊆ Rn , its distance function is dC : Rn → [0, +∞) : x → inf x − y y ∈ C , and the projection operator onto C is
PC : Rn ⇒ C : x → p ∈ C x − p = dC (x) .
H.H. Bauschke et al.
The indicator function of C is ιC : Rn → (−∞, +∞] is defined by ιC (x) := 0 if x ∈ C and ιC (x) := +∞ if x ∈ C. We write int C for the interior, and bdry(C) := C \ int C for the boundary of C, respectively. For a subspace L ⊆ Rn , its orthogonal complement is defined to be L⊥ := y ∈ Rn y, x = 0, x ∈ L . When x, y ∈ Rn , the line segment between x, y is given by [x, y] := {(1 − λ)x + λy|0 ≤ λ ≤ 1} . Part I Extensions to possibly nonconvex functions and basic theory
2 An Extension of Subgradient Projector via Limiting Subgradients To introduce subgradient projectors for possibly non-convex functions, we need the following generalized subgradients [31, 39, 40, 50]. ¯ Definition 2.1 Consider a function f : Rn → (−∞, +∞] and a point x¯ ∈ Rn with f (x) finite. For a vector v ∈ Rn , one says that (i)
ˆ (x), v is a regular (or Fr´echet) subgradient of f at x, ¯ written v ∈ ∂f ¯ if f (x) ≥ f (x) ¯ + v, x − x ¯ + o(x − x); ¯
(ii)
v is a limiting (or Mordukhovich) subgradient of f at x, ¯ written v ∈ ∂f (x), ¯ if there ˆ (xν ) with vν → v. ¯ f (xν ) → f (x) ¯ and vν ∈ ∂f are sequences xν → x,
A locally Lipschitz function is subdifferentially regular at x¯ with f (x) ¯ finite if ∂f (x) ¯ = ˆ (x), ∂f ¯ see [50, Corollary 8.11], [40]. It is well-known that when f is locally Lipschitz, ∂f is nonempty-valued everywhere; when f is lower semicontinuous (lsc), the set of points at which ∂f is nonempty-valued is at least dense in the domain of f , [50, Corollary 8.10]. Furthermore, ∂f is the usual Fenchel subdifferential when f is convex. All of these results can be found in [17, 18, 39, 50]. A function f : Rn → R is called subdifferentiable if ∂f (x) = ∅ for every x ∈ Rn . While every locally Lipschitz functions on Rn is a subdifferentiable function, a subdifferentiable function might not be locally Lipschitz, e.g., 1 √ if x ≤ 0, f : R → R : x → 1 − x if x > 0, see also [50, page 359]. The key concept we shall study is the subgradient projection operator. Definition 2.2 Let f : Rn → R be lsc and subdifferentiable, and let s : Rn → Rn be a selection of ∂f . The subgradient projector of f is defined by f (x) x − s(x) 2 s(x) if f (x) > 0 and 0 ∈ ∂f (x), n n (1) Gf,s : R → R : x → x otherwise. When it is not necessary to emphasize the selection s, we will write Gf . It is also convenient to introduce the set-valued mapping associated with f by (2) Gf : Rn ⇒ Rn : x → Gf,s (x) s is a selection of ∂f with Gf,s being given in (1).
Subgradient Projectors: Extensions, Theory, and Characterizations
Although subgradient projectors have been well studied for convex functions [10, 20, 22, 24, 41, 43, 45, 47, 55], the extension to possibly nonconvex functions is new. When f is convex and infRn f ≤ 0, Gf,s reduces to f (x) x − s(x) 2 s(x) if f (x) > 0, n n Gf,s : R → R : x → x otherwise, where s : Rn → Rn is a selection of ∂f with s(x) ∈ ∂f (x). When f is continuously differentiable on Rn \ lev0 f , Gf reduces to f (x) ∇f (x) if f (x) > 0 and ∇f (x) = 0, x − ∇f n n (x)2 Gf : R → R : x → x otherwise. The geometric interpretation and motivation of the subgradient projector come from the following: Proposition 2.3 Let f : Rn → R be lsc and subdifferentiable, and let s be a selection of ∂f . (i)
Whenever f (x) > 0, 0 ∈ ∂f (x), we have Gf,s (x) = PH− (s(x),β(x)) (x) where the half space
H− (s(x), β(x)) := z ∈ Rn s(x), z ≤ β(x) ,
(ii)
(iii)
β(x) := s(x), x − f (x). (3)
The fixed point set of Gf,s is Fix Gf,s = x ∈ Rn 0 ∈ ∂f (x) ∪ lev0 f = Fix Gf . If f is locally Lipschitz, then Fix Gf,s is closed. If f is convex and infRn f ≤ 0, then Fix Gf,s = lev0 f.
Proof (i). According to [6] or [20, page 133], for the half space H− (a, β) := z ∈ Rn a, z ≤ β , where a ∈ Rn , a = 0 and β ∈ R, its metric projection is given by a if a, x > β, x − a,x−β a2 PH− (a,β) (x) = x if a, x ≤ β.
(4)
Apply (4) with a := s(x), β := β(x) = s(x), x − f (x). (ii). This follows from the definition of Gf . When f is locally Lipschitz, ∂f is uppersemicontinuous [50, Proposition 8.7], so x ∈ Rn 0 ∈ ∂f (x) is closed. Being a union of two closed sets, Fix G is closed. (iii). When f is convex, 0 ∈ ∂f (x) gives f (x) = minX f , so f (x) ≤ 0. Then x ∈ Rn 0 ∈ ∂f (x) ⊆ lev0 f. Thus (iii) follows from (ii).
H.H. Bauschke et al.
Remark 2.4 (i) Proposition 2.3(i) uses the Euclidean distance. Following [5, 8, 9, 36], one may define Bregman subgradient projectors for lsc and subdifferentiable functions. This will be explored in future work. (ii) Proposition 2.3(ii) shows that for subdifferential functions, the fixed point of Gf gives x ∈ Rn such that 0 ∈ ∂f (x) or f (x) ≤ 0. Proposition 2.3(iii) shows that for convex functions, the fixed point of Gf gives x ∈ Rn such that f (x) ≤ 0. We give two simple examples to illustrate the difference of subgradient projectors between convex and nonconvex functions. Example 2.5 Consider f : Rn → R : x → (i) (ii)
√ k
x = x1/k where k > 0. Then
When k ≤ 1, f is convex, Gf = (1 − k) Id is firmly nonexpansive. When k > 1, f is not convex, Gf = (1 − k) Id is not monotone and need not be nonexpansive, e.g, k = 3.
Proof For x = 0, we have x 1 x1/k−1 , k x and the result follows from the definition of Gf . ∇f (x) =
Let B denote the closed unit ball of Rn . According to [50, Exercise 8.14], for a nonempty C ⊆ Rn the normal cone and regular normal cone mapping are respectively defined NC := ˆ C . Recall ∂ιC and Nˆ C := ∂ι Fact 2.6 ([50, Example 8.53]) For f := dC in the case of a closed set C = ∅ in Rn , one has at any point x¯ ∈ C that ¯ ∩ B, ∂f (x) ¯ = NC (x)
ˆ (x) ∂f ¯ = Nˆ C (x) ¯ ∩ B.
On the other hand, for any x¯ ∈ C, one has x¯ − PC (x) ¯ ∂f (x) ¯ = , dC (x) ¯
ˆ (x) ∂f ¯ =
x− ¯ x˜ dC (x) ¯
∅
¯ = {x}, ˜ if PC (x) otherwise.
Example 2.7 (subgradient projectors of distance functions) Let C = ∅ be a closed set in Rn . Then GdC = PC .
Proof Let s be a selection of ∂dC . By Fact 2.6, (∀x ∈ C) s(x) =
x − p(x) dC (x)
where p(x) ∈ PC (x). We show GdC ,s = p. When x ∈ C, we have PC (x) = {x} and p(x) = x. Because dC (x) = 0 for x ∈ C, the definition of GdC ,s gives GdC ,s (x) = x. Thus GdC ,s (x) = x = p(x) for x ∈ C.
Subgradient Projectors: Extensions, Theory, and Characterizations
When x ∈ C, dC (x) > 0 and 0 ∈ ∂dC (x) because every x ∗ ∈ ∂dC (x) has x ∗ = 1 by Fact 2.6. Then for x ∈ C, GdC ,s (x) = x − dC (x)
x − p(x) = p(x). dC (x)
Altogether, GdC ,s (x) = p(x) for every x ∈ Rn . When C is nonempty, closed and convex, the projection mapping PC is single-valued, and dC is continuously differentiable on Rn \ C. Example 2.7 implies: Fact 2.8 ([9, 25]) Let C ⊆ Rn be nonempty, closed and convex. Then GdC = PC . What happens if we take the subgradient projector of a distance function to a set where the distance is taken with respect to another norm? The following example illustrates that using the Euclidean norm for dC in Example 2.7 is essential. Example 2.9 Define f : R2 → R : (x1 , x2 ) → |x1 | + |x2 |, the distance function to C := {(0, 0)} in 1 -norm. When x1 > 0, x2 > 0, x1 = x2 , we have Gf (x1 , x2 ) = ((x1 − x2 )/2, (x2 − x1 )/2) = (0, 0) = PC (x1 , x2 ). Even using · ∞ the dual norm of · 1 for s(x1 , x2 ), we have Gf (x1 , x2 ) = (−x2 , −x1 ) = (0, 0) = PC (x1 , x2 ). Remark 2.10 Example 2.7 might lead the reader to believe that Gf is a monotone operator. This holds for any twice differentiable convex function f : R → R; see [10, Proposition 8.2]. However, this fails for f : R2 → R : (x1 , x2 ) → |x1 |p + |x2 |p when 1 < p < 2; see [10, Proposition 10.1(iii)]. The following example shows that the assumption that f being subdifferentiable is important in Definition 2.2. Example 2.11 The function defined by
f : R → R : x →
1/|x| if x = 0, 0 if x = 0,
has Gf = 2 Id on R so that Fix Gf = Fix 2 Id = {0}. However, the function defined by 1/|x| if x = 0, g : R → (−∞, +∞] : x → +∞ if x = 0, has Gg = 2 Id on R \ {0}. Because Gg is not defined at x = 0, we have Fix Gg = ∅ but Fix 2 Id = {0}.
3 Calculus for Subgradient Projectors In this section we obtain calculus results for subgradient projectors defined in Section 2 related to representations of subgradient projectors for max functions, compositions of functions with a linear operator, and positive powers of nonnegative functions. Subdifferential calculus is the main tool for proving these results.
H.H. Bauschke et al.
A mapping : Rn → Rk is called strictly differentiable at x¯ if the Fr´echet derivative exists and (x) − (y) − (x)(x ¯ − y) = 0. lim x,y→x¯ x − y y=x
(x) ¯
The following facts on subdifferentials are crucial to study the calculus of subgradient projectors. Fact 3.1 ([40, Theorem 6.5], [39, Theorem 1.110(ii)]) Assume that F : Rn → Rk is locally ¯ Then for f (x) = Lipschitz at x¯ ∈ Rn , and g : Rk → R is strictly differentiable at F (x). g(F (x)), one has
∂f (x) ¯ = ∂ g (y), ¯ F (x) ¯ with y¯ = F (x). ¯ For a matrix A : Rn → Rn , let A denote its transpose. Fact 3.2 ([40, Theorem 6.7(i)], [39, Proposition 1.112(i)], or [50, Exercise 10.7]) Let ¯ being invertible, and suppose that F : Rn → Rn be strictly differentiable at x¯ with F (x) ¯ and f being f (x) = g(F (x)) with g : Rn → (−∞, +∞] being lsc around y¯ = F (x) finite at x. ¯ Then ∂f (x) ¯ = F (x) ¯ ∂g(y) ¯ with y¯ = F (x). ¯ n Fact 3.3 ([40, Theorem 7.5(ii)], [39, Theorem 3.46(ii)]) Let f1 , f2 : R → R be locally Lipschitz at x¯ and J (x) ¯ := j fj (x) ¯ = max{f1 , f2 }(x) ¯ . Then
∂ max{f1 , f2 }(x) ¯ ⊆ conv ¯ j ∈ J (x) ¯ , ∂fj (x)
where the equality holds and max{f1 , f2 } is subdifferentially regular at x¯ if the function fj is subdifferentially regular at x¯ for j ∈ J (x). ¯ Proposition 3.4 Let f : Rn → R be lsc and subdifferentiable. (i) (ii)
(iii)
If k > 0 then Gkf = Gf . Let α ∈ R and s be a selection of ∂f . Define (x)−α x − fs(x) 2 s(x) if f (x) > α and 0 ∈ ∂f (x), n n Gf,α : R → R : x → x otherwise. Then Gf,α = Gf −α,s . Let α > 0 and s be a selection of ∂f . Then αs(x) Gf,s (x) + s(x) 2 Gf −α,s (x) = x
if f (x) > α and 0 ∈ ∂f (x), otherwise.
Proof (i). By Fact 3.1, ∂(kf ) = k∂f . Note that kf (x) > 0 if and only if f (x) > 0, and 0 ∈ ∂(kf )(x) if and only if 0 ∈ ∂f (x). When kf (x) > 0 and 0 ∈ ∂(kf )(x), for s(x) ∈ ∂f (x), we have ks(x) ∈ ∂(kf )(x) so that kf (x) f (x) Gkf,ks (x) = x − ks(x) = x − s(x) = Gf,s (x). ks(x)2 s(x)2 When kf (x) ≤ 0 or 0 ∈ ∂(kf )(x), we have f (x) ≤ 0 or 0 ∈ ∂f (x), so Gkf,ks (x) = x = Gf,s (x). (ii). It suffices to note that ∂(f − α) = ∂f .
Subgradient Projectors: Extensions, Theory, and Characterizations
(iii). When f (x) > α and 0 ∈ ∂f (x), we have f (x) > 0 and 0 ∈ ∂f (x). Then Gf −α,s (x) = x−
f (x) − α f (x) α α s(x) = x− s(x)+ s(x) = Gf,s (x)+ s(x). 2 2 2 s(x) s(x) s(x) s(x)2
When f (x) ≤ α or 0 ∈ ∂f (x), Gf −α,s (x) = x by the definition. Proposition 3.5 Assume that f1 , f2 : Rn → R are locally Lipschitz and subdifferentially regular. For the maximum function g := max{f1 , f2 }, one has ⎧ Gf (x) if g(x) > max{f2 (x), 0}, 0 ∈ ∂f1 (x), ⎪ ⎪ ⎨ 1 Gf2 (x) if g(x) > max{f1 (x), 0}, 0 ∈ ∂f2 (x), Gg (x) = V (x) if g(x) = f1 (x) = f2 (x) > 0, 0 ∈ conv (∂f1 (x) ∪ ∂f2 (x)) , ⎪ ⎪ ⎩ x if g(x) ≤ 0, or0 ∈ conv (∂f1 (x) ∪ ∂f2 (x)) , where
fi (x) s(x) ∈ conv (∂f1 (x) ∪ ∂f2 (x)) . V (x) := x − s(x) s(x)2
Proof When g(x) > 0, we consider three cases: (i). g(x) > f2 (x); (ii). g(x) > f1 (x); (iii). g(x) ≤ min{f2 (x), f1 (x)} which is g(x) = f1 (x) = f2 (x). Also note that ∂g(x) = conv (∂f1 (x) ∪ ∂f2 (x)) when f1 (x) = f2 (x) by Fact 3.3. Proposition 3.6 Assume that f : Rn → R is lsc and subdifferentiable, and that g(x) := f (kx) with 0 = k ∈ R. Then 1 Gg (x) = Gf (kx) k for every x ∈ Rn . Moreover, Fix Gg = k1 Fix Gf . Proof By Fact 3.2, ∂g(x) = k∂f (y) where y = kx, so 0 ∈ ∂g(x) if and only if 0 ∈ ∂f (y) with y = kx. Let s be a selection of ∂f . When g(x) > 0 and 0 ∈ ∂g(x), we have f (kx) > 0 and 0 ∈ ∂f (kx), therefore f (kx) ks(y) = x − Gg,ks(k·) (x) = x − ks(y)2 f (kx) 1 kx − = s(y) = k s(y)2
1 f (kx) s(y) k s(y)2 1 Gf,s (kx) k
(5) (6)
where s(y) ∈ ∂f (y) with y = kx. When g(x) ≤ 0 or 0 ∈ ∂g(x), we have f (y) ≤ 0 or 0 ∈ ∂f (y) with y = kx, thus Gg,ks(k·) (x) = x =
1 1 kx = Gf,s (kx). k k
This establishes the result. Proposition 3.7 Let A : Rn → Rn be unitary and b ∈ Rn , and let f : Rn → R be lsc and subdifferentiable. Define g : Rn → R : x → f (Ax + b). Then Gg (x) = A Gf (Ax + b) − b (7)
H.H. Bauschke et al.
for every x ∈ Rn . Furthermore, Fix Gg = A (Fix Gf − b).
(8)
Proof Let s be a selection of ∂f . By Fact 3.2, ∂g(x) = A ∂f (y) where y = Ax + b. As A is unitary, A s(y) = s(y) for every s(y) ∈ ∂f (y). When g(x) > 0 and 0 ∈ ∂g(x), we have f (Ax + b) > 0 and 0 ∈ ∂f (y) with y = Ax + b, therefore f (Ax + b) f (Ax + b) Ax +b− A s(y) = A s(y)−b (9) Gg,A s(A·+b) (x) = x − A s(y)2 s(y)2 (10) = A (Gf,s (Ax +b)−b). When g(x) ≤ 0 or 0 ∈ ∂g(x), we have f (y) ≤ 0 or 0 ∈ ∂f (y) with y = Ax + b, thus Gg,A s(A·+b) (x) = x = A (Ax + b − b) = A (Gf,s (Ax + b) − b). Hence (7) holds. Finally, (8) follows from (7). Corollary 3.8 Let a ∈ Rn , f : Rn → R be lsc and subdifferentiable, and g(x) := f (x−a). Then Gg (x) = Gf (x − a) + a for every x ∈ Rn . Moreover, Fix Gg = a + Fix Gf . Theorem 3.9 Assume that f : Rn → R+ is locally Lipschitz, and g := f k with k > 0. Then 1 1 Id + Gf . Gg = 1 − k k Proof By Fact 3.1, ∂g(x) = kf (x)k−1 ∂f (x) when f (x) > 0. Let s be a selection of ∂f . When g(x) > 0 and 0 ∈ ∂g(x), we have f (x) > 0 and 0 ∈ ∂f (x), therefore f (x)k 1 f (x) kf (x)k−1 s(x) = s(x) k−1 2 k s(x)2 kf (x) s(x) 1 = (Id −Gf,s )(x). k
(Id −Gg,kf k−1 s )(x) =
(11) (12)
When g(x) = 0 or 0 ∈ ∂g(x), we have f (x) = 0 or 0 ∈ ∂f (x), thus (Id −Gg,kf k−1 s )(x) = 0 = (Id −Gf,s )(x). Therefore, Id −Gg,kf k−1 s =
1 k (Id −Gf,s )
which gives Gg,kf k−1 s =
1−
1 k
Id + k1 Gf,s .
Remark 3.10 While Theorem 3.9 says that the convex combination of Id and Gf,s is a subgradient projector, the set of subgradient projectors is not a convex set; see Theorems 10.1,10.3 in Section 10. Note that if Ui : H → H is a cutter (see [20] or Definition 5.1) with a common fixed point, i ∈ I := {1, 2, . . . , m}, and w : H → m is an appropriate weight function, then the operator U := i∈I wi Ui is a cutter, cf. [20, Corollary 2.1.49]. Corollary 3.11 For f := dC2 in the case of a closed set C = ∅ in Rn , one has
Gf =
Id +PC . 2
Subgradient Projectors: Extensions, Theory, and Characterizations
Proof Combine Theorem 3.9 and Example 2.7. Example 3.12 (penalty function) Assume that g : Rn → R is locally Lipschitz. In optimization, for a direct constraint given by C := x ∈ Rn g(x) ≤ 0 , one can define penalty substitutes. Two popular penalty functions associated with g are: the 2 (g(x)) where linear penalty θ1 ◦ g(x) = t+ (g(x)) and quadratic penalty θ2 ◦ g(x) = t+ t+ : R → R : r → max{0, r}, cf. [50, page 4]. We have (∀x ∈ Rn ) Gθ1 ◦g (x) = Gg (x). Because θ2 ◦ g = (θ1 ◦ g)2 , by Theorem 3.9 we obtain
Gθ2 ◦g =
Id +Gθ1 ◦g . 2
The following is immediate from the definition of subgradient projectors. Proposition 3.13 Let f, g : Rn → R be lsc and subdifferentiable such that f ≡ g on an open set O ⊆ Rn . Then Gf = Gg on O. Remark 3.14 For calculus of subgradient projectors of convex functions, see [10, 43].
4 Basic Properties of Subgradient Projectors In this section under appropriate conditions we show that the subgradient projector can determine a function uniquely up to a positive scalar multiplication, that the subgradient projector enjoys the fixed point closedness property, and that the subgradient projector is continuous if the function is strictly differentiable. We start with some elementary properties of subgradient projectors. Theorem 4.1 Let f : Rn → R be lsc and subdifferentiable, and Gf,s be given by (1). Then the following hold: (i)
We have x − Gf,s (x) =
f (x) s(x)
and
x − Gf,s (x) 1 s(x) = f (x) x − Gf,s (x)2
(13) (14)
for every x satisfying f (x) > 0 and 0 ∈ ∂f (x). In particular, when f is locally Lipschitz, one has x − Gf,s (x) s(x) = ∈ ∂(ln f )(x); f (x) x − Gf,s (x)2
(15)
when f is continuously differentiable, one has x − Gf (x) = ∇(ln f (x)). x − Gf (x)2
(16)
H.H. Bauschke et al.
(ii)
Set g := ln f when f > 0. Then whenever f (x) > 0 and 0 ∈ ∂f (x) we have c(x) Gf,s (x) = x − c(x)2 where c(x) =
s(x) f (x)
(17)
∈ ∂g(x). If f is continuously differentiable on Rn \ lev0 f , then Gf (x) = x −
∇g(x) , ∇g(x)2
(18)
whenever f (x) > 0 and ∇f (x) = 0. Proof (i). By the definition of Gf,s , when f (x) > 0 and 0 ∈ ∂f (x), x − Gf,s (x) = f (x) s(x). Therefore, s(x)2 f (x) . x − Gf,s (x) = s(x) It follows that s(x) f (x)2 s(x) x − Gf,s (x) = = x − Gf,s (x)2 , (19) f (x) s(x)2 f (x) x−Gf,s (x) x−Gf,s (x)2 (x) ∂(ln f )(x) = ∂f f (x)
equivalently,
=
s(x) f (x) .
When f is locally Lipschitz, (15) holds because Fact 3.1
gives when f (x) > 0. When f is continuously differentiable, s(x) = ∇f (x), hence (16) follows from ∇(ln f (x)) = ∇f (x)/f (x) when f (x) > 0. (x) s(x) (ii). By Fact 3.1, we have ∂g(x) = ∂f f (x) when f (x) > 0. Then c(x) = f (x) where 2
f (x) 1 s(x) ∈ ∂f (x). (17) follows since c(x) 2 = s(x)2 when f (x) > 0 and 0 ∈ ∂f (x). When f is continuously differentiable on Rn \ lev0 f , the same holds for g, so (18) follows.
4.1 When is a Mapping T a Subgradient Projector? Theorem 4.2 Given a mapping T : Rn → Rn . The following are equivalent: (i) (ii)
T is the subgradient projector of a locally Lipschitz function. There exists a locally Lipschitz function f : Rn → R such that x − T (x) ∈ ∂(ln f (x)) whenever f (x) > 0 and 0 ∈ ∂f (x), x − T (x)2 T x = x whenever f (x) ≤ 0 or 0 ∈ ∂f (x).
(20)
Proof (i)⇒(ii). Suppose that T = Gf,s with f being locally Lipschitz. Apply Theorem 4.1(i) to obtain (20). (ii)⇒(i). Assume that (20) holds. By Fact 3.1, ∂ ln f = ∂f f when f > 0. When f (x) > 0 and 0 ∈ ∂f (x), (20) gives 1 s(x) f (x) = i.e., x − T x = x − T x f (x) s(x) where s(x) ∈ ∂f (x). Then using (20) again we obtain x − T x = x − T x2 fs(x) (x) so that f (x) 2 s(x) f (x) s(x) =x− =x− s(x) T x = x − x − T x2 f (x) s(x) f (x) s(x)2 as required.
Subgradient Projectors: Extensions, Theory, and Characterizations
Can the functions Theorem 4.2(i) and (ii) be different? This is answered in the next subsection.
4.2 Recovering f from its Subgradient Projector Gf Can one determine the function f if Gf is known? To this end, we recall the concept of essentially strictly differentiable functions by Borwein and Moors [15, Section 4]. Definition 4.3 A locally Lipschitz function f : Rn → R is called essentially strictly differentiable on an open set O ⊆ Rn if f is strictly differentiable everywhere on O except possibly on a Lebesgue null set. This class of functions has been extensively studied by Borwein and Moors [15]. This class of functions includes finite-valued convex functions, Clarke regular locally Lipschitz functions, semismooth locally Lipschitz functions, C 1 functions and others, [15, pages 323328]. If a locally Lipschitz function is essentially strictly differentiable, then ∂f is singlevalued almost everywhere. Moreover, the Clarke subdifferetial ∂c f , which can be written as conv ∂f (the convex hull of ∂f ) when f is locally Lipschitz [39, Theorem 3.57], can be recovered by every densely defined selection s ∈ ∂f ; see, e.g., [15]. We refer the reader to [23] and [50] for details on the Clarke subdifferential. Fact 4.4 Let f, g be locally Lipschitz on a polygonally connected and open subset O of Rn . If ∇f = ∇g almost everywhere on O, then h := f − g is a constant on O. Proof We prove this by contradiction. Rademacher’s Theorem says that a locally Lipschitz function is differentiable almost everywhere, see, e.g., [28, page 81]. By the assumption, h is locally Lipschitz, so ∇h = 0 almost everywhere. Suppose that x, y ∈ O and h(x) = h(y). As O is polygonally connected, there exists z ∈ O such that either [x, z] ⊆ O with h(x) = h(z) or [z, y] ⊆ O with h(z) = h(y). Without loss of generality, assume [z, y] ⊆ O and h(z) = h(y). As h is differentiable almost everywhere, by Fubini’s Theorem [48, Theorem 6.2.2, page 110], we can choose z˜ nearby z and y˜ nearby y so that both h is differentiable ˜ Then and ∇h = 0 almost everywhere on [˜z, y] ˜ ⊆ O, and h(˜z) = h(y). 1 1 ∇h(˜z + t (y˜ − z˜ )), y˜ − z˜ dt = 0dt = 0 h(y) ˜ − h(˜z) = 0
0
which contradicts h(˜z) = h(y). ˜ Theorem 4.5 Let T : Rn → Rn be a subgradient projector. Suppose that there exist two essentially strictly differentiable functions f, f1 : Rn → R such that Gf,s = T = Gf1 ,s1 with s being a selection of ∂f and s1 being a selection of ∂f1 . Then on each polygonally connected component of Rn \ Fix T there exists k > 0 such that f = kf1 . Proof Assume that there exist two essentially strictly differentiable and locally Lipschitz functions f, f1 such that T = Gf,s = Gf1 ,s1 . Since T has a full domain, we have dom f = dom f1 = Rn . By Theorem 4.2, we have x − T (x) ∈ ∂(ln f (x)) whenever x ∈ Rn \ Fix T , x − T (x)2 x − T (x) ∈ ∂(ln f1 (x)) whenever x ∈ Rn \ Fix T . x − T (x)2
H.H. Bauschke et al.
As f, f1 are locally Lipschitz, both ln f, ln f1 are locally Lipschitz on Rn \ Fix T . Then ∂ ln f =
1 ∂f, f
∂ ln f1 =
1 ∂f1 f1
by Fact 3.1 or [23, Theorem 2.3.9(ii)]. Because f, f1 are essentially strictly differentiable and locally Lipschitz, ∂f, ∂f1 are single-valued almost everywhere [15], thus x − T (x) = ∂(ln f (x)) x − T (x)2
∂(ln f1 (x)) =
almost everywhere on Rn \ Fix T .
By Fact 4.4, on each polygonally connected component of Rn \ Fix T , there exists c ∈ R such that ln f − ln f1 = c, which implies that f1 = kf for k = ec > 0. Example 4.6 Define ⎧ if x > 0, ⎨ 2x if −1 ≤ x ≤ 0, f1 : R → R : x → 0 ⎩ −3(x + 1) if x < −1, and
⎧ if x > 0, ⎨x if −1 ≤ x ≤ 0, f2 : R → R : x → 0 ⎩ −1(x + 1) if x < −1.
Then (∀x ∈ R)
⎧ ⎨ 0 if x ≥ 0, Gf1 (x) = Gf2 (x) = x if −1 ≤ x ≤ 0, ⎩ −1 if x < −1.
The set R \ [−1, 0] has two connected components (−∞, −1) and (0, +∞). We have f1 = 3f2 on (−∞, −1), and f1 = 2f2 on (0, +∞). The following example shows that Theorem 4.5 fails if one removes the assumption of essentially strictly differentiability. Example 4.7 In [16], Borwein, Moors and Wang showed that generically nonexpansive Lipschitz functions have their limiting subdifferentials identically equal to the unit ball; see also [53]. Let f : Rn → R be a locally Lipschitz function such that ∂f (x) = B for every x ∈ Rn . Since 0 ∈ ∂f (x) for every x ∈ Rn , in view of Definition 2.2 we have Gf = Id. As such, generically nonexpansive Lipschitz functions have a subgradient projector equal to the identity mapping.
4.3 Fixed Point Closed Property and Continuity Definition 4.8 We say that an operator T : D → Rn is fixed point closed at x ∈ D if for every sequence xk → x with xk − T xk → 0 one has x = T x. If this holds for every x ∈ D, we say that T has the fixed point closed property on D. In [20], Cegielski calls the fixed point closed property of T as Id −T being closed at 0.
Subgradient Projectors: Extensions, Theory, and Characterizations
Theorem 4.9 (fixed-point closed property) Let f : Rn → R be locally Lipschitz and Gf,s be given by Definition 2.2. Then Gf,s is fixed-point closed at every x ∈ Rn , i.e., y − Gf,s (y) → 0 and y → x
⇒
x = Gf,s (x).
(21)
Proof Assume that a sequence (yn )n∈N in Rn satisfies yn − Gf,s (yn ) → 0 and yn → x.
(22)
Consider three cases. Case 1. If there exists infinitely many yn ’s, say (ynk )k∈N , such that 0 ∈ ∂f (ynk ). Since ∂f is upper semicontinuous, taking limit when k → ∞ gives 0 ∈ ∂f (x). Hence x = Gf (x). Case 2. If there exists infinitely many yn ’s, say (ynk )k∈N , such that f (ynk ) ≤ 0. Taking limit when k → ∞ and using the continuity of f at x gives f (x) = lim f (ynk ) ≤ 0. k→∞
Hence x = Gf (x). Case 3. There exists N ∈ N such that f (yn ) > 0 and 0 ∈ ∂f (yn ) when n > N . Then by (13), (23) f (yn ) = yn − Gf,s (yn )s(yn ). As f is continuous at x, f is locally Lipschitz around x, so ∂f is locally bounded around x. Therefore, f (x) = lim f (yn ) = lim (yn − Gf,s (yn )s(yn )) = 0 n→∞
n→∞
since yn − Gf,s (yn ) → 0. Hence x = Gf,s (x). Altogether, x ∈ Fix Gf,s . This establishes (21) because (yn )n∈N was an arbitrary sequence satisfying (22). The following result generalizes [10, Theorem 5.6]. Theorem 4.10 Let f : Rn → R be a locally Lipschitz function and essentially strictly differentiable, and let Gf,s be given by Definition 2.2. Suppose that x ∈ Rn \ Fix Gf,s . Then the following statements are equivalent: (i) (ii)
Gf,s is continuous at x. f is strictly differentiable at x.
Consequently, Gf,s is continuous on Rn \ Fix Gf,s if and only if f is continuously differentiable on Rn \ Fix Gf,s . Proof (ii)⇒(i). Assume that f is strictly differentiable at x ∈ Rn \ Fix Gf,s . Under the assumption, s : Rn → Rn is continuous at x and s(x) = 0. The result follows from the definition Gf,s : f (y) s(y). y → y − s(y)2 (i)⇒(ii). Assume that Gf,s is continuous at x ∈ Rn \ Fix Gf,s . By (14), s(y) = f (y)
y − Gf,s (y) y − Gf,s (y)2
H.H. Bauschke et al.
so s is continuous at x. Because s is a selection of ∂f and f is essentially strictly differentiable, we conclude that f is strictly differentiable at x. Note that Fix Gf,s is closed by Proposition 2.3(ii). The remaining result follows from the fact that on an open set on which a function is finite, the function is continuously differentiable if and only if the function is strictly differentiable; cf. [50, Corollary 9.19]. We illustrate Theorem 4.10 by two examples. Example 4.11 Define
f : Rn → R : x →
Then
⎧ ⎨0 Gf,s (x) = 1/2 ⎩ 1−
1 s(x)
|x| if x ≤ 1, 2x − 1 if x > 1.
if x < 1, if x > 1, where s(x) ∈ [1, 2] if x = 1,
is discontinuous at x = 1, because f is not differentiable at x = 1. Proof When x < 0, Gf,s (x) = x − 0; When 0 < x < 1, Gf,s (x) = x − When x = 1, ∂f (1) = [1, 2], so
−x (−1) = 0; When x = 0, f (0) = 0, so Gf,s (0) = (−1)2 x (1) = 0; When x > 1, Gf,s (x) = x − 2x−1 2 = 1/2; 12 22
Gf,s (x) = x −
1 1 (s(x)) = x − s(x) s(x)2
where s(x) ∈ [1, 2]. The next example gives a function that is differentiable but not strictly differentiable at 0, and that its subgradient projector is not continuous at 0. Example 4.12 Define f : R → R : x →
x 2 sin x1 + x + 1 if x = 0, 0 if x = 0.
Then f is differentiable everywhere, but not strictly differentiable at 0. The subgradient projector x 2 sin(1/x)+x+1 x − 2x sin(1/x)−cos(1/x 2 )+1 if f (x) > 0 and f (x) = 0, Gf (x) = x otherwise, is not continuous at 0. Proof At x = 0, f (0) = 1 and f (0) = 1. The function f is not strictly differentiable at 0 because f is not continuous at 0. Since limx→0 Gf (x) does not exist, the subgradient projector is not continuous at 0. How about the continuity of Gf,s on Fix Gf,s ? Since Gf,s = Id on Fix Gf,s , it is always continuous at x ∈ int(Fix Gf,s ). The following result deals with the case of x ∈ bdry(Fix Gf,s ).
Subgradient Projectors: Extensions, Theory, and Characterizations
Theorem 4.13 Let f : Rn → R be locally Lipschtiz, Gf,s be given by Definition 2.2, and x ∈ bdry(Fix Gf,s ). (i) (ii)
Assume that f (x) > 0 and 0 ∈ ∂f (x). Then Gf,s is discontinuous at x. Assume that f (x) ≤ 0. Suppose that one of the following holds: (a)
∃α > 0 such that (∀y : f (y) > 0, 0 ∈ ∂f (y)) αf (y) + s(y), x − y ≤ 0.
(24)
In particular, this is true when f is convex. (b) 0 ∈ ∂f (x).
(25)
(c) lim inf
y→x f (y)>0,0∈∂f (y)
s(y) > 0.
(26)
Then Gf,s is continuous at x. Proof (i). As x ∈ bdry(Fix Gf ), there exists a sequence (yk )k∈N such that yk → x, f (yk ) > 0 and 0 ∈ ∂f (yk ). Because f is locally Lipschitz and s(yk ) ∈ ∂f (x), (s(yk ))k∈N is locally bounded. By taking a subsequence if necessary, we can assume that s(yk ) → l ∈ R+ . Taking limit when k → ∞ yields that yk − Gf,s (yk ) =
f (yk ) f (x) → s(yk ) l
which is +∞ if l = 0 or a positive number if l > 0. Because Gf,s (x) = x, this shows that Gf,s is not continuous at x. (ii). To show that Gf,s is continuous at x, it suffices to show that lim y→x
f (y)>0,0∈∂f (y)
f (y) = 0. s(y)
(27)
Indeed, by Theorem 4.1, when f (y) > 0 and 0 ∈ ∂f (y), we have y − Gf,s (y) =
f (y) . s(y)
Then (27) gives that limy→x Gf,s (y) = limy→x (Gf,s (y) − y) + limy→x y = x. When y ∈ Fix Gf,s , Gf (y) = y, clearly limy→x Gf (y) = x. Hence Gf,s is continuous at x. Now (24) gives s(y), y − x y − x ≤ s(y) f (y) ≤ α α so that y − x f (y) ≤ s(y) α which implies (27). Next, we show (25) implies (26). Note that (25) gives d∂f (x) (0) > 0 since ∂f (x) is closed by [50, Theorem 8.6]. Because f is locally Lipschitz, in view of [50, Proposition 8.7], we have that lim supy→x ∂f (y) ⊆ ∂f (x), hence 0 ∈ ∂f (y) for y sufficiently nearby x. Invoking [50, Corollary 4.7(b)], we obtain lim inf d∂f (y) (0) ≥ d∂f (x) (0), y→x
H.H. Bauschke et al.
from which it follows that lim inf
y→x f (y)>0,0∈∂f (y)
s(y) ≥
lim inf
y→x f (y)>0,0∈∂f (y)
(28)
d∂f (y) (0)
≥ lim inf d∂f (y) (0) ≥ d∂f (x) (0) > 0, y→x
(29)
and this gives (26). Finally, (26) gives (27) because limy→x f (y) = 0 and 0≤
lim inf y→x
f (y)>0,0∈∂f (y)
f (y) ≤ s(y)
lim sup y→x f (y)>0,0∈∂f (y)
limy→x,f (y)>0 f (y) f (y) ≤ = 0. y→x s(y) lim inf s(y) f (y)>0,0∈∂f (y)
Here is an example showing the result of Theorem 4.13(i). Example 4.14 (1). Define f : R → R : x → x 3 + 1. Then 2x − 3x1 2 if x = 0 and x > −1, Gf (x) = 3 x if x = 0 or x ≤ −1, has Fix Gf = (−∞, −1] ∪ {0}, and Gf is not continuous at x = 0. (2). Define x + 1 if x ≤ 0, f : R → R : x → 1 if x ≥ 0. Then
Gf (x) =
−1 if −1 < x < 0, x if x ≤ −1 or x ≥ 0,
has Fix Gf = (−∞, −1] ∪ [0, +∞), and Gf is not continuous at x = 0.
4.4 The Family of Subgradient Projectors Theorem 4.15 Let f : Rn → R be locally Lipschitz. Then the following are equivalent: (i) (ii)
Gf is a single-valued. f is strictly differentiable on Rn \ Fix Gf .
Proof (i)⇒(ii). By (14) in Theorem 4.1, when x ∈ Rn \ Fix Gf , we have s(x) = f (x)
x − Gf,s (x) x − Gf,s (x)2
where s(x) ∈ ∂f (x). By the assumption Gf = T for an everywhere single-valued T : Rn → Rn , so x −Tx . s(x) = f (x) x − T x2 It follows that ∂f (x) is a singleton, so f is strictly differentiable at x by [50, Theorem 9.18]. Therefore, f is strictly differentiable on Rn \ Fix Gf . (ii)⇒(i). Clear.
Subgradient Projectors: Extensions, Theory, and Characterizations
Theorem 4.16 Let C ⊆ Rn be a nonempty closed set. Then the following are equivalent: (i) (ii)
GdC is a single-valued. C is convex.
Proof According to Fact 2.6, we have Fix GdC = C. (i)⇒(ii).
(ii)⇒(i).
By Theorem 4.15, dC is strictly differentiable on Rn \ C. Fact 2.6 shows that PC is single-valued for every x ∈ Rn \ C. Hence, C is convex; cf. [27, Theorem 12.7]. Apply Fact 2.8.
5 When is the Subgradient Projector Gf a Cutter or Local Cutter? In this section we provide conditions for a subgradient projector to be a cutter or local cutter, and an explicit nonconvex function with a cutter subgradient projector. Along the way some calculus on cutter subgradient projectors are also developed.
5.1 Cutters, Quasi-Firmly Nonexpansive Mappings, and Local Cutters Recall the following well-known algorithmic operators. Definition 5.1 ([20, page 53]) Let D be a nonempty subset of Rn and T : D → Rn . We say that T is a cutter if Fix T = ∅ and (∀x ∈ D)(∀u ∈ Fix T ) x − T x, u − T x ≤ 0.
(30)
Definition 5.2 ([20, page 56]) Let D be a nonempty subset of Rn and T : D → Rn . We say that T is quasi-firmly nonexpansive (quasi-fne) if Fix T = ∅ and (∀x ∈ D)(∀u ∈ Fix T ) T x − u2 + x − T x2 ≤ x − u2 . In [20, page 56], quasi-fne mappings are called strongly quasinonexpansive mappings. The following fact says that a cutter is strongly Fejer monotone with respect to the set of its fixed points, and that cutters and quasi-fne mappings are the same, see [20, page 108]. Fact 5.3 ([20, Theorem 2.1.39, Lemma 2.1.36]) (i) A mapping T : D → Rn is a cutter if and only if T is quasi-fne. (ii) Let T : D → Rn be a cutter. Then T is always continuous on Fix T . (iii) Let T : D → Rn be a cutter. Then Fix T is closed and convex. In Definitions 5.1 and 5.2, they require that T satisfies the inequalities for all x ∈ D and u ∈ Fix T . In practice, the sets D and Fix T might be too large to verify those inequalities. We now introduce local cutters and locally quasi-firmly nonexpansive mappings. Definition 5.4 A mapping T : D → Rn is a local cutter at x¯ ∈ Fix T if Fix T = ∅ and there exists δ > 0 such that (∀x ∈ B(x, ¯ δ) ∩ D)(∀u ∈ B(x, ¯ δ) ∩ Fix T ) x − T x, u − T x ≤ 0.
(31)
H.H. Bauschke et al.
Definition 5.5 A mapping T : D → Rn is locally quasi-firmly nonexpansive (locally quasi-fne) at x¯ ∈ Fix T if there exists δ > 0 such that (∀x ∈ B(x, ¯ δ) ∩ D)(∀u ∈ B(x, ¯ δ) ∩ Fix T ) T x − u2 + T x − x2 ≤ x − u2 . A localized version of Fact 5.3(i) comes next. Proposition 5.6 A mapping T : D → Rn is a local cutter at x¯ ∈ Fix T if and only if T is locally quasi-fne at x¯ ∈ Fix T . Proof This follows from x − u2 = T x − u2 + x − T x2 + 2 x − T x, T x − u . Proposition 5.7 Assume that T : R → R and Fix T = ∅. Then T is a cutter on R if and only if (∀x ∈ R)
T x ∈ [x, PFix T x].
(32)
Proof The sufficiency is clear. Conversely, when x ∈ Fix T , (32) clearly holds. Assume x ∈ Fix T and c ∈ Fix T . Because T is from R to R, there exists λ ∈ R such that T x = (1 − λ)x + λc. As T is a cutter, we have (x − T x)(c − T x) = −λ(1 − λ)(x − c)2 ≤ 0, which gives 0 ≤ λ ≤ 1, so that T x ∈ [x, c]. Since c ∈ Fix T was arbitrary, it follows that x ∈ [x, PFix T x]. Remark 5.8 Compare Proposition 5.7 to Corollary 8.4, which characterizes the subgradient projector of a convex function on R. n For a nonempty convex set C ⊆ Rn , the recession of C is rec C := {x ∈ R |x+ cone n − n C ⊆ C}. The negative polar of K ⊆ R is K := y ∈ R y, x ≤ 0, ∀x ∈ K .
Proposition 5.9 Let T : Rn → Rn be a cutter. Then ran(Id −T ) ⊆ (rec(Fix T ))− . Consequently, when Fix T is a linear subspace, ran(Id −T ) ⊆ (Fix T )⊥ . In other words, ran(Id −T ) ⊆ (ker(Id −T ))⊥ . Proof Let x − T x ∈ ran(Id −T ) and v ∈ rec(Fix T ). Then for every k > 0 and u ∈ Fix T , we have u + kv ∈ Fix T . The assumption of T being a cutter implies x − T x, u + kv − T x ≤ 0
⇒
x − T x, u/k + v − T x/k ≤ 0.
When k → ∞ this gives x − T x, v ≤ 0. Since v ∈ rec(Fix T ) was arbitrary, we have x − T x ∈ (rec(Fix T ))− . When Fix T is a linear subspace, Fix T = rec(Fix T ) and (rec(Fix T ))− = (rec(Fix T ))⊥ .
Subgradient Projectors: Extensions, Theory, and Characterizations
5.2 Characterizations of Gf Being a Cutter or Local Cutter Our first result characterizes the class of functions f for which its Gf,s is a cutter. Lemma 5.10 Let f : Rn → R be lsc and subdifferentiable, and let Gf,s be given by Definition 2.2. Suppose that f (x) > 0, 0 ∈ ∂f (x). Then
f (x) x − Gf,s (x), u − Gf,s (x) = (f (x) + s(x), u − x) . s(x)2 Proof Let f (x) > 0, 0 ∈ ∂f (x). The definition of Gf,s gives
x − Gf,s (x), u − Gf,s (x) f (x) f (x)s(x) , u − x + s(x) = s(x)2 s(x)2 f 2 (x) f (x) s(x), u − x + 2 s(x) s(x)2 f (x) = (f (x) + s(x), u − x) . s(x)2
=
Theorem 5.11 (level sets of tangent planes including the target set) Let f : Rn → R be lsc and subdifferentiable, let Gf,s be given by Definition 2.2, and S := u ∈ Rn f (u) ≤ 0 or 0 ∈ ∂f (u) . Then the following hold: (i)
Gf,s is a cutter if and only if whenever x ∈ S and u ∈ S one has f (x) + s(x), u − x ≤ 0.
(ii)
Let x¯ ∈ S and δ > 0. Gf,s is a cutter on B(x, ¯ δ) if and only if for all x ∈ B(x, ¯ δ) \ S and u ∈ S ∩ B(x, ¯ δ) one has f (x) + s(x), u − x ≤ 0.
Proof (i). When f (x) ≤ 0 or 0 ∈ ∂f (x), x = Gf,s (x), (31) holds for T = Gf,s . Assume that f (x) > 0, 0 ∈ ∂f (x) and s(x) ∈ ∂f (x). By Lemma 5.10,
f (x) x − Gf,s (x), u − Gf,s (x) = (f (x) + s(x), u − x) . s(x)2 Since f (x) > 0, we deduce that
x − Gf,s (x), u − Gf,s (x) ≤ 0
⇔
f (x) + s(x), u − x ≤ 0.
Hence, the result follows from Definition 5.1. (ii). Apply the same arguments as above with x ∈ B(x, ¯ δ) and u ∈ S ∩ B(x, ¯ δ). One immediately obtains the following: Fact 5.12 ([20, page 146]) Let f : Rn → R be convex, let Gf,s be given by Definition 2.2, and lev0 f = ∅. Then Gf,s is a cutter. Consequently, Gf,s is continuous at every x ∈ lev0 f .
H.H. Bauschke et al.
Proof As lev0 f = ∅, Fix Gf,s = lev0 f . Assume that f (x) > 0. For u ∈ Gf,s , f (u) ≤ 0. By the convexity of f we have f (x) + s(x), u − x ≤ f (u) ≤ 0. Theorem 5.11 shows that Gf,s is a cutter. The remaining result follows from Fact 5.3(ii). In Fact 5.12, lev0 f = ∅ is required, as the following example shows. Example 5.13 (1). Let f : R → R be defined by (∀x ∈ R) f (x) := exp |x|. Then lev0 f = ∅ and ⎧ ⎨ x − 1 if x > 0, if x = 0, Gf (x) = 0 ⎩ x + 1 if x < 0. In particular, this Gf is discontinuous at x = 0 and not a cutter. Moreover, Gf is not monotone. (2). Consider f : Rn → R : x → exp(x2 /2). We have lev0 f = ∅ and
Gf (x) =
x− 0
x x2
ifx = 0, ifx = 0.
In particular, Gf is not continuous at 0, so not a cutter. Example 5.14 The nonconvex function f : Rn → R : x →
x2 if x ≤ 1, 1 if x > 1,
has Gf being a cutter on a neighborhood of 0, but not a cutter on Rn . It is instructive to consider dC where C ⊆ Rn is closed and nonempty. Proposition 5.15 Let C ⊆ Rn be closed and nonempty, and s be a selection of ∂dC . Then GdC ,s is a cutter if and only if the set C is convex. ¯ whenever x¯ ∈ C, because v = 1 for every v ∈ ∂dC (x). ¯ Proof By Fact 2.6, 0 ∈ ∂dC (x) This implies that Fix GdC ,s = C. Assume that GdC ,s is a cutter. Then Fix GdC ,s = C is convex. Conversely, assume that C is convex. We have dC is convex, consequently, GdC ,s is a cutter. Theorem 5.16 Let k ≥ 1, f : Rn → [0, +∞) be locally Lipschitz, and let Gf,s be given by Definition 2.2 Suppose that Gf,s is a cutter, and Fix Gf,s = ∅. If g = f k , then Gg,kf k−1 s is a cutter. Proof By Theorem 3.9, Gg,kf k−1 s = (1 − 1/k) Id +1/kGf,s . As Id and Gf,s are both cutters, and Fix Gf,s ∩ Fix Id = Fix Gf,s = ∅, being a convex combination of cutters, Gg,kf k−1 s is a cutter by [20, Corollary 2.1.49, page 62]. In Corollary 11.6, we give an example showing even though Gf 2 ,2f s is a cutter, Gf,s might not be a cutter; so the converse of Theorem 5.16 is not true.
Subgradient Projectors: Extensions, Theory, and Characterizations
Theorem 5.17 Let A : Rn → Rn be unitary, b ∈ Rn , let f : Rn → R be lsc and subdifferentiable, and let Gf,s be given by Definition 2.2. Define g : Rn → R : x → f (Ax + b). If Gf,s is a cutter, then Gg,A s(A·+b) is a cutter. Proof Let x ∈ Rn , u ∈ Fix Gg,A s(A·+b) . Proposition 3.7 gives Gg,A s(A·+b) (x) = A Gf,s (Ax + b) − b , Au + b ∈ Fix Gf,s . Since A is unitary and Gf,s is a cutter, we have x −Gg,A s(A·+b) (x)2 = x −A Gf,s (Ax +b)−b 2 = Ax +b−Gf,s (Ax +b)2 (33) ≤ Ax + b − (Au + b)2 − Gf,s (Ax + b) − (Au + b)2 (34) = x − u2 − Gf,s (Ax + b) − Au − b2 = x − u2 − A Gf,s (Ax + b) − b − u2
(35)
= x − u − Gg,A s(A·+b) (x) − u .
(37)
2
2
(36)
Hence Gg,A s(A·+b) is a cutter by Fact 5.3(i). Corollary 5.18 Let B be an n × n symmetric matrix. Define f : Rn → R : x →
Then Gf (x) =
x− x
x Bx Bx 2Bx2
1 x Bx. 2
if x Bx > 0 and Bx = 0, otherwise.
(38)
Moreover, the following are equivalent: (i) (ii)
Gf is a cutter. B is positive semidefinte or negative semidefinite.
Proof (38) follows from Definition 2.2. Because B is symmetric, there exists an orthogonal matrix Q such that Q BQ = D, where D is an n × n diagonal matrix whose diagonal entries are eigenvalues of B. Using x = Qy, Theorem 5.17 shows that Gf is a cutter if and only if Gg is a cutter, where g : Rn → R : y → f (Qy) = 12 y Dy. (i)⇒(ii). (i) implies that Gg is a cutter. This means that 1 y Dy. (39) 2 We will show that all nonzero diagonal entries of D have the same sign. Suppose to the contrary that there exist diagonal entries of D such that λi > 0, λj < 0. Put yk = 0, uk = 0 for k = 1, . . . , n, k = i, j . Then (39) reduces to that whenever λi yi2 + λj yj2 > 0 and (∀y ∈ Rn : y Dy > 0)(∀u ∈ Rn : u Du ≤ 0)
λi u2i + λj u2j ≤ 0, we have
y Du ≤
(40)
1 (41) (λi yi2 + λj yj2 ). 2 Fix (yi , yj ) such that λi yi2 + λj yj2 > 0 and yj < 0. When ui = 0, uj → +∞, (40) is verified but (41) fails to hold. This contradicts that Gg is a cutter. Hence all nonzero λi yi ui + λj yj uj ≤
H.H. Bauschke et al.
diagonal entries of D must have the same sign, which implies that B is positive semidefinite if positive sign, and B is negative semidefinite if negative sign. (ii)⇒(i). When B is positive semidefinite, f is convex, we apply Fact 5.12. When B is negative semidefinite, Gg = Id is a cutter. Theorem 5.19 Let f : Rn → R be lsc and subdifferentiable, and let Gf,s be given by Definition 2.2. Assume that R k = 0 and g(x) = f (kx). If Gf,s is a cutter, then Gg,ks(k·) is a cutter. Proof Proposition 3.6 gives Gg,ks(k·) (x) = k1 Gf,s (kx), and Fix Gg,ks(k·) = k1 Fix Gf,s . Let x ∈ X and u ∈ Fix Gg,ks(k·) . We have
x − Gg,ks(k·) (x), u − Gg,ks(k·) (x) = x − 1/kGf,s (kx), u − 1/kGf,s (kx) (42)
2 = 1/k kx −Gf,s (kx), ku−Gf,s (kx) ≤ 0 (43) since Gf,s is a cutter. Therefore, Gg,ks(k·) is a cutter. One might ask: If each function fi : Rn → R has Gfi being a cutter, must the maximum g := max{f1 , f2 } have Gg being a cutter? The answer is negative as the following example shows. Example 5.20 Let f1 , f2 : Rn → R be defined by f1 (x) := 1 + x and f2 (x) := 1 − x on R. Each Gfi is a cutter by Fact 5.12. The function g(x) := max{f1 (x), f2 (x)} has ⎧ ⎨ −1 if x > 0, Gg (x) = 0 if x = 0, ⎩ 1 if x < 0, which is not continuous at x = 0, so Gg is not a cutter.
5.3 A Nonconvex Function Whose Gf is a Cutter Example 5.21 If f is not convex, Gf,s need not be a cutter. Consider f : R → R : x → 1 − exp(−x 2 ). Then the subgradient projector of f is 1 x − 2x exp(−x 2) − Gf,s (x) = 0
if x = 0, if x = 0, √ and Fix Gf = {0}. However Gf is not a cutter. Indeed, when |x| > 2 we have 1 2x
f (x) + s(x)(0 − x) = 1 − exp(−x 2 ) + (2x exp(−x 2 ))(0 − x) 4
=
1 + x 2 + x2 − (1 + 2x 2 ) exp(x 2 ) − (1 + 2x 2 ) ≥ exp(x 2 ) exp(x 2 )
=
x 2 (x 2 − 2) > 0. 2 exp(x 2 )
By Theorem 5.11, Gf is not a cutter.
Subgradient Projectors: Extensions, Theory, and Characterizations
Example 5.22 Even though f is not convex, Gf,s may still be a cutter. Define ⎧ 0 if x ≤ 0, ⎪ ⎪ ⎨ x if 0 ≤ x ≤ 20/7, f : R → R : x → 8(x − 2.5) if 20/7 ≤ x ≤ 3, ⎪ ⎪ ⎩ 2(x − 1) if x > 3. Then f is not convex since f (x) is not monotone on [20/7, +∞). However, its subgradient projector ⎧ x if x ≤ 0, ⎪ ⎪ ⎪ ⎪ 0 if 0 < x < 20/7, ⎪ ⎪ ⎨ 20 − 20 1 if x = 20/7, where s(x) ∈ [1, 8], 7 7 s(x) Gf,s (x) = 2.5 if 20/7 < x < 3, ⎪ ⎪ ⎪ 4 ⎪ ⎪ if x = 3, where s(x) ∈ {2, 8}, 3 − ⎪ s(x) ⎩ 1 if x > 3, is a cutter. To see this, by Theorem 5.11, it suffices to consider zero level sets of tangent planes. Indeed, let f (u) ≤ 0, i.e., u ≤ 0. When x0 > 3, f (x0 ) + s(x0 )(u − x0 ) = 2(u − 1) ≤ 0; when x0 = 3, f (x0 ) + s(x0 )(u − x0 ) = 4 + s(3)(u − 3) ≤ 4 + 2(u − 3) ≤ 0; where 2 ≤ s(3) ≤ 8; when 20/7 < x0 < 3, f (x0 ) + s(x0 )(u − x0 ) = 8(u − 2.5) ≤ 0; when x0 = 20/7, f (x0 ) + s(x0 )(u − x0 ) = 70/2 + s(20/7)(u − 20/7) ≤ u ≤ 0; where 1 ≤ s(20/7) ≤ 8; when 0 < x0 < 20/7, f (x0 ) + s(x0 )(u − x0 ) = u ≤ 0. See Corollary 11.6(ii) for an example on R2 . Note that even if Gf is continuous, it does not mean that Gf is a cutter, e.g., see Example 2.5(ii). In [20], Cegielski developed a systematic theory for cutters. The theory of cutters can be used to study the class of functions (Theorem 5.11) whose subgradient projectors are cutters. One might also ask: If f : Rn → R has Gf,s being a cutter, does g := f + r have Gg,s being a cutter for every r ∈ R? In general, the answer is negative. When f is convex and lev0 f = ∅, it follows from Fact 5.12 that Gf −r,s is a cutter whenever r > 0. This might fail for r < 0 as the following example shows. Example 5.23 Let f : R → R be defined by f (x) := e|x| − 1. Then ⎧ ⎨ x − 1 + e−x if x > 0, Gf,s (x) = x + 1 − ex if x < 0, ⎩ 0 if x = 0, is a cutter by Fact 5.12. However, for g : R → R : x → e|x| , we have g = f + 1 and but Gg,s is not a cutter by Example 5.13(1).
H.H. Bauschke et al.
For a nonconvex function function, although Gf,s is a cutter, Gf −r,s might not be a cutter even when r > 0. Example 5.24 Let f be given by Example 5.22, and g := f − 20/7. Then ⎧ x if x ≤ 20/7, ⎪ ⎪ ⎨ 20 if 20/7 < x < 3, 7 Gg,s (x) = s ∈ {17/7, 20/7} if x = 3, ⎪ ⎪ ⎩ 17/7 if x > 3. As shown in Example 5.22, Gf,s is a cutter. However, Gg,s is not a cutter by using Proposition 5.7 or by direct calculations using Definition 5.1.
6 Convergence Analysis of Subgadient Projectors In this section, we study the convergence of the sequence generated by the subgradient projector. When the function is convex, the convergence analysis has been fairly well known; see, e.g., [46, Section 5.3], [7, 45], and [20]. For nonconvex functions, we demonstrate that the convergence results on cutters, local cutters, quasi-ne mappings, and local quasi-ne mappings can be effectively used. It turns out that local cutters and local quasi-ne mappings are more appropriate for nonconvex functions. In addition to cutters and local cutters, see Definitions 5.1 and 5.4, quasi-nonexpansive mappings and local quasi-nonexpansive mappings are also useful for the convergence analysis.
6.1 Quasi-Nonexpansive Mappings and Local Quasi-Nonexmapsive Mappings According to [6, page 59], and [20, page 47], we define: Definition 6.1 Let D be a nonempty subset of Rn and T : D → Rn . We say that (i)
T is quasinonexpansive (quasi-ne) if (∀x ∈ D)(∀y ∈ Fix T ) T x − y ≤ x − y.
(ii)
A mapping T : D → D is said to be asymptotically regular at x ∈ D if T k+1 x − T k x → 0 as k → ∞; it is said to be asymptotic regular on D if it is so at every x ∈ D.
Definition 6.1(ii) requires that T satisfy the inequalities for all x ∈ D and y ∈ Fix T . In practice, the sets D and Fix T might be too large to verify those inequalities. We now introduce locally quasinonexpansive mappings. Definition 6.2 A mapping T : D → Rn is locally quasinonexpansive (locally quasi-ne) at x¯ ∈ Fix T if there exists δ > 0 such that (∀x ∈ B(x, ¯ δ) ∩ D) (∀y ∈ B(x, ¯ δ) ∩ Fix T ) T x − y ≤ x − y. The connection between quasi-ne mappings and quasi-fne mappings is given by the following fact.
Subgradient Projectors: Extensions, Theory, and Characterizations
Fact 6.3 ([7, Proposition 2.3(v) ⇔ (vi)], [20, Corollary 2.1.33]) Let D be a nonempty subset of Rn , and T : D → Rn with Fix T = ∅. Then the following are equivalent: (i) (ii)
T is quasi-fne. 2T − Id is quasi-ne.
The following result says that quasi-ne, nonexpansiveness, and local quasi-ne are the same for linear mappings. Although the equivalence of quasi-ne and nonexpansiveness for linear mappings has been given in [6, Exercise 4.4], the equivalence to local quasi-ne is new. Proposition 6.4 Let T : Rn → Rn be a linear operator. Then the following are equivalent: (i) (ii) (iii)
T is quasi-ne. T is nonexpansive. There exists δ > 0 and x¯ ∈ Fix T such that T is quasi-ne on B(x, ¯ δ).
Proof (i)⇒(ii). Since 0 ∈ Fix T , we have T x ≤ x for every x ∈ Rn . Hence T is nonexpansive. (ii)⇒(i). Clear. (ii)⇒(iii). Clear. (iii)⇒(ii). The assumption means that there exists x¯ ∈ Fix T and δ > 0 such that (∀x ∈ B(x, ¯ δ))(∀y ∈ B(x, ¯ δ) ∩ Fix T ) T x − y ≤ x − y.
(44)
Let v ∈ B(0, δ). Using T x¯ = x, ¯ y = x, ¯ and T being linear, from (44) we obtain T v = T (x¯ +v)−T x ¯ = T (x¯ +v)− x ¯ ≤ (x¯ +v)− x ¯ = v. Since v ∈ B(0, δ) was arbitrary and T is linear, we have T v ≤ v for every v ∈ Rn . Hence T is nonexpansive. Remark 6.5 Fact 6.3 and Proposition 6.4 hold in Hilbert spaces. We formulate them only in Rn . The following example illustrates that for nonlinear T , quasinonexpanseness and nonexpanseness are different. Example 6.6 Define
T : R → R : x →
|x| 2
sin x1 if x = 0, 0 if x = 0.
Then T is quasi-ne but not nonexpansive. Proof T is quasi-ne because that Fix T = {0} and |x| 1 |x| (∀x ∈ R) |T (x) − 0| = sin ≤ ≤ |x|. 2 x 2 T is not nonexpansive because for x > 0, we have T (x) =
1 1 1 1 sin − cos 2 x 2x x
and |T (1/(2nπ ))| = nπ > 1. For analogous results on linear cutters, see Proposition 9.1 in Section 9.
H.H. Bauschke et al.
Although we have developed calculus for Gf being cutters in Section 5, most results also hold for quasi-ne mappings. We single out the two most important ones. Theorem 6.7 Let k ≥ 1 and f : Rn → [0, +∞) be locally Lipschitz, and let Gf,s be given by Definition 2.2. Suppose that Gf,s is quasi-ne and Fix Gf,s = ∅. If g = f k , then Gg,kf k−1 s is quasi-ne. Proof Apply Theorem 3.9 and [6, Exercise 4.11]. Theorem 6.8 Let A : Rn → Rn be unitary and b ∈ Rn , let f : Rn → R be lsc and subdifferentiable, and let Gf,s be given by Definition 2.2. Define g : Rn → R : x → f (Ax + b). If Gf,s is quasi-ne, then Gg,A s(A·+b) is quasi-ne. Proof Apply Proposition 3.7 and Definition 6.1. Corollary 6.9 Let k ≥ 1 and f : Rn → [0, +∞) be locally Lipschitz with Gf,s be given by Definition 2.2. Suppose that g = f 2 and Fix Gf,s = ∅. Then Gf,s is quasi-ne if and only if Gg,2f s is quasi-fne. Proof By Theorem 3.9, Gg,2f s =
Gf,s +Id . 2
The result then follows from Fact 6.3.
6.2 Convergences of Cutters, Local Cutters, Quasi-ne Mappings, and Local Quasi-ne Mappings Proposition 6.10 (convergence of iterates of a cutter) Let D ⊆ Rn be a nonempty closed convex set, T : D → D be an operator with a fixed point, and that T has the fixed point closed property on D. If T is a cutter, then for every x ∈ D, the sequence (T k x)k∈N converges to a point z ∈ Fix T . Proof Since T is a cutter, T is quasi-fne by Fact 5.3(i), so quasi-ne. Moreover, T is asymptotically regular by [20, Theorem 3.4.3]. The result now follows from [20, Theorem 3.5.2].
Proposition 6.11 (convergence of iterates of a locally quasi-fne mapping) Let D be nonempty closed convex subset of Rn , let T : D → D and Fix T = ∅. Assume that (i) (ii)
There exists x¯ ∈ Fix T and δ > 0 such that T is locally quasi-fne (see Definition 5.5); T has the fixed-point closed property.
¯ δ). Set Let x0 ∈ D ∩ B(x,
(∀k ∈ N)
xk+1 = T xk .
¯ δ) ∩ Fix T . Then (xk )k∈N converges to a point z ∈ B(x, Proof By assumption (i), (∀x ∈ B(x, ¯ δ) ∩ D) (∀y ∈ B(x, ¯ δ) ∩ Fix T ) T x − y2 + T x − x2 ≤ x − y2 . (45) ¯ δ), (45) gives With x0 ∈ D ∩ B(x, (∀y ∈ B(x, ¯ δ) ∩ Fix T ) T x0 − y2 + T x0 − x0 2 ≤ x0 − y2 ,
Subgradient Projectors: Extensions, Theory, and Characterizations
so x1 − x ¯ ≤ x0 − x ¯ ≤ δ. By induction, we have that ¯ δ). (xk )k∈N is a sequence in B(x,
(46)
Moreover, (45) gives (∀k ∈ N)(∀y ∈ B(x, ¯ δ) ∩ Fix T )
T xk − y2 + T xk − xk 2 ≤ xk − y2 ,
¯ δ) ∩ Fix T , and xk+1 − xk → 0 so (xk )k∈N is Fej´er monotone with respect to C := B(x, as k → ∞. Let x be a cluster point of (xk )k∈N , say xkl → x. Since T xkl − xkl → 0, and T is fixedpoint closed, we have T x − x = 0. Moreover, x − x ¯ ≤ δ because of (46). Thus, x ∈ C. Applying [6, Theorem 5.5], we conclude that xk → z ∈ C. Proposition 6.12 (convergence of iterates of a locally quasi-ne mapping) Let D be nonempty closed convex subset of Rn , let T : D → D and int(Fix T ) = ∅. Assume that (i) There exists x¯ ∈ Fix T and δ > 0 such that T is locally quasi-ne (see Definition 6.2); (ii) int(B(x, ¯ δ) ∩ Fix T ) = ∅; (iii) T has the fixed-point closed property. Let x0 ∈ D ∩ B(x, ¯ δ). Set (∀k ∈ N)
xk+1 = T xk .
Then (xk )k∈N converges to a point z ∈ B(x, ¯ δ) ∩ Fix T . Proof By assumption (i), there exists δ > 0 such that (∀x ∈ B(x, ¯ δ) ∩ D) (∀y ∈ B(x, ¯ δ) ∩ Fix T ) T x − y ≤ x − y;
(47)
With x0 ∈ D ∩ B(x, ¯ δ), (47) gives (∀y ∈ B(x, ¯ δ) ∩ Fix T )
T x0 − y ≤ x0 − y,
so x1 − x ¯ ≤ x0 − x ¯ ≤ δ. By induction, we have that ¯ δ). (xk )k∈N is a sequence in B(x,
(48)
Moreover, (47) gives (∀k ∈ N)(∀y ∈ B(x, ¯ δ) ∩ Fix T ) T xk − y ≤ xk − y, so (xk )k∈N is Fej´er monotone with respect to C := B(x, ¯ δ) ∩ Fix T . As int C = ∅, we have that xk → z ∈ Rn by [6, Proposition 5.10]. This implies that T xk −xk = xk+1 −xk → 0 and xk → z as k → ∞. Since T is fixed-point closed, we have T z − z = 0. Moreover, z − x ¯ ≤ δ because of (48). Hence xk → z ∈ B(x, ¯ δ) ∩ Fix T .
6.3 Applications to Subgradient Projectors Theorem 6.13 Let f : Rn → R be locally Lipschitz, Gf,s be given by Definition 2.2, and S := {x ∈ Rn | 0 ∈ ∂f (x)} ∪ lev0 f = ∅. If the subgradient projector Gf,s is a cutter, then for every x ∈ Rn , the sequence (Gf,s k x)k∈N converges to a point z such that either 0 ∈ ∂f (z) or f (z) ≤ 0.
H.H. Bauschke et al.
Proof Combine Theorem 4.9 and Proposition 6.10. To proceed, it will be convenient to single out: Lemma 6.14 Let f : Rn → R be lsc and subdifferentiable, and Gf,s be given by Definition 2.2. When f (x) > 0, 0 ∈ ∂f (x), and y ∈ Rn , we have Gf,s (x) − y2 = x − y2 +
f (x) (f (x) + 2 y − x, s(x)) . s(x)2
(49)
Proof This follows from 2 f (x) s(x) Gf,s (x) − y = x − y − s(x)2 2
f 2 (x) f (x) = x − y + − 2 x − y, s(x) . s(x)2 s(x)2 2
Theorem 6.15 Let f : Rn → R be locally Lipschitz, Gf,s be given by Definition 2.2, and S := {x ∈ Rn | 0 ∈ ∂f (x)} ∪ lev0 f. Then the following hold: (i)
Gf,s is quasi-ne if and only if (∀x ∈ S) (∀y ∈ S) f (x) + 2 y − x, s(x) ≤ 0.
(ii)
Assume that int S = ∅, and (50) holds. Then for every x ∈ (Gf,s k x)k∈N converges to a point z ∈ S.
Rn ,
(50) the sequence
Proof (i). By Lemma 6.14, when x ∈ S, and y ∈ S, we have f (x) (51) (f (x) + 2 y − x, s(x)) . s(x)2 In view of Definition 6.1, assumption (50) is equivalent to Gf,s being quasi-ne. (ii). By (i), the sequence (Gf,s k x)k∈N is Fej´er monotone with respect to S. Since int S = ∅, by [6, Proposition 5.10], the sequence (Gf,s k x)k∈N converges to a point z ∈ Rn . Write xk = Gf,s k x. Then (∀k ∈ N) xk+1 = Gf,s (xk ). As f is locally Lipschitz at z and xk → z, the sequence (s(xk ))k∈N is bounded. Since Gf,s (x) − y2 = x − y2 +
xk+1 − xk = −
f (xk ) s(xk ) s(xk )2
and limk→∞ xk = z, we have |f (z)| = lim |f (xk )| = lim xk+1 − xk s(xk ) = 0. k→∞
k→∞
Hence z ∈ S. Theorem 6.16 Let f : Rn → R be locally Lipschitz, and Gf,s be given by Definition 2.2, and S := {x ∈ Rn | 0 ∈ ∂f (x)} ∪ lev0 f = ∅.
Subgradient Projectors: Extensions, Theory, and Characterizations
Assume that the subgradient projector Gf,s is locally quasi-fne at x¯ ∈ S, i.e., there exists δ > 0 such that (∀x ∈ B(x, ¯ δ) \ S) (∀y ∈ B(x, ¯ δ) ∩ S) f (x) + y − x, s(x) ≤ 0.
(52)
¯ δ), the sequence (xk )k∈N defined by Then for every x0 ∈ B(x, (∀k ∈ N) xk+1 = Gf,s (xk ) converges to a point z ∈ B(x, ¯ δ) ∩ S. Proof (52) guarantees that Gf,s is locally quasi-fne at x¯ ∈ S. Indeed, when x ∈ S and y ∈ S ∩ B(x, ¯ δ), using Lemma 6.14, (52) and (13), we have f (x) (f (x) + 2 y − x, s(x)) s(x)2 f 2 (x) 2(f (x) + y − x, s(x)) − f (x) = x − y2 + f (x) s(x)2 −f (x) ≤ x − y2 + Gf (x) − x2 f (x)
(54)
≤ x − y2 − Gf,s (x) − x2 .
(56)
Gf,s (x) − y2 = x − y2 +
(53)
(55)
In view of Theorem 4.9, it suffices to apply Proposition 6.11. Theorem 6.17 Let f : Rn → R be locally Lipschitz, Gf,s be given by Definition 2.2, and S := {x ∈ Rn | 0 ∈ ∂f (x)} ∪ lev0 f with int S = ∅. Assume that there exist x¯ ∈ S and δ > 0 such that (∀x ∈ B(x, ¯ δ) \ S) (∀y ∈ B(x, ¯ δ) ∩ S) f (x) + 2 y − x, s(x) ≤ 0.
(57)
¯ δ), the sequence Assume further that int(B(x, ¯ δ) ∩ S) = ∅. Then for every x0 ∈ B(x, (xk )k∈N defined by (∀k ∈ N) xk+1 = Gf,s (xk ) converges to a point z ∈ B(x, ¯ δ) ∩ S. ¯ Indeed, for every x ∈ B(x, ¯ δ) \ S Proof (57) guarantees that Gf,s is locally quasi-ne at x. and y ∈ B(x, ¯ δ) ∩ S, using Lemma 6.14 and (57) we have Gf,s (x) − y2 = x − y2 +
f (x) (f (x) + 2 y − x, s(x)) s(x)2
≤ x − y2 .
(58) (59)
By Theorem 4.9, Gf,s has the fixed point closed property. Therefore, Proposition 6.12 applies. Example 6.18 (1). Define f : R → R : x →
1 − |x| if |x| ≤ 1, 0 otherwise.
Because Fix Gf,s = x ∈ R |x| ≥ 1 is not convex, we have that Gf,s is not a cutter. However, f satisfies the assumptions of both Theorem 6.16 and 6.17 so that the local convergence theory applies.
H.H. Bauschke et al.
(2). Define
⎧ 0 ⎪ ⎪ ⎨ x f : R → R : x → ⎪1 ⎪ ⎩ x−1
if x ≤ 0, if 0 ≤ x ≤ 1, if 1 ≤ x ≤ 2, if x ≥ 2.
As Fix Gf,s = (−∞, 0] ∪ [1, 2], Gf,s is not a cutter. However, both Theorem 6.16 and 6.17 apply.
6.4 Finite Convergence and (C, ε)-Firmly Nonexpansiveness Finite termination algorithms for subgradient projectors of convex functions have been studied in [11, 29, 45]. Recently, in [42] Pang studied finite convergent algorithms of subgradient projectors of locally Lipschitz functions defined in terms of the Clarke subdifferential. Naturally, one asks what his result implies about the subgradient projector defined by us. To this end, let us recall lower-C k functions defined by Rockafellar and Wets [50, Definition 10.29], and approximate convex functions by Nghai, Luc, and Th´era [52], respectively. Definition 6.19 A function f : O → R, where O is an open subset in Rn , is said to be lower C k on O if on some neighborhood V of each x¯ ∈ O there is a representation f (x) := maxt∈T ft (x) in which ft is of class C k on V and the index set T is compact such that ft (x) and all its partial derivatives through order k depend continuously not just on x ∈ V but jointly on (t, x) ∈ T × V . Definition 6.20 A function f : Rn → R is approximately convex at x¯ ∈ Rn if for every ε > 0 there exists δ > 0 such that (∀x, y ∈ B(x, ¯ δ))(∀λ ∈ (0, 1)) f (λx+(1−λ)y) ≤ λf (x)+(1−λ)f (y)+ελ(1−λ)x−y. Fact 6.21 (See [2, Theorem 4.5], [26, Corollary 3]) Let f : Rn → R be locally Lipschitz at x. ¯ Then the following are equivalent: (i) (ii) (iii)
¯ f is lower-C 1 around x. f is approximately convex at x. ¯ for every ε > 0 there exists δ > 0 such that
(∀x, y ∈ B(x, ¯ δ))(x ∗ ∈ ∂c f (x)) f (y) ≥ f (x) + x ∗ , y − x − εx − y.
Theorem 6.22 (finite convergence for accelerated subgradient projectors) Let f : Rn → R be locally Lipschitz, and let x¯ ∈ Rn satisfy (i) f (x) ¯ = 0; (ii) 0 ∈ ∂f (x); ¯ (iii) f is lower-C 1 around x. ¯ Suppose that the strictly decreasing sequence (εk )k∈N converges to 0 at a sublinear rate. Then there exist δ > 0 and ε¯ > 0 such that for every x0 ∈ B(x, ¯ δ) and ε0 < ε¯ , the sequence (xk )k∈N defined by (∀k ∈ N)
xk+1 = xk −
εk + f (xk ) sk , sk 2
where sk ∈ ∂f (xk ),
converges in finitely many iterations, i.e., f (xk ) ≤ 0 for some k ∈ N.
(60)
Subgradient Projectors: Extensions, Theory, and Characterizations
Proof Since f is lower-C 1 around x, ¯ f is Clarke regular around x; ¯ see [50, Theorem 10.31]. Thus, the Clarke subdifferential and the limiting subdifferential of f are the same around x. ¯ Because ∂f is upper semicontinuous, when δ is sufficiently small, (ii) guarantees that 0 ∈ ∂f (x) for every x ∈ B(x, ¯ δ), which implies that (60) is well defined. By Fact 6.21, (ii) is equivalent to f being approximately convex at x. ¯ The result then follows from [42, Theorem 3]. Remark 6.23 See [2, 51, 52] for more characterizations on lower-C 1 functions and approximately convex functions. Let ε ≥ 0 and C ⊆ Rn . In [30], Hesse and Luke studied (C, ε)-firmly nonexpansive mappings; see also [37]. Definition 6.24 Let C, D be nonempty subsets of Rn and T : D → Rn . T is called (C, ε)-firmly nonexpansive if (∀x ∈ D)(∀y ∈ C) T x − T y2 + (x − T x) − (y − T y)2 ≤ (1 + ε)x − y2 . Theorem 6.25 ((C, ε)-firmly nonexpansivness of Gf,s ) Let f : Rn → R be locally Lipschitz, Gf,s be given by Definition 2.2, and S := {x ∈ Rn | 0 ∈ ∂f (x)} ∪ lev0 f. Suppose that x¯ ∈ Rn satisfies (i) f (x) ¯ = 0; (ii) 0 ∈ ∂f (x); ¯ ¯ (iii) f is lower-C 1 around x. Then for every ε > 0 there exists δ > 0 such that on B(x, ¯ δ) the subgradient projector Gf,s 2 is (S ∩ B(x, ¯ δ), ε˜ )-firmly nonexpansive, in which ε˜ = 1 + 8Lε/d∂f (x) ¯ (0) and L being the Lipschitz modulus of f around x. ¯ Proof Let α := d∂f (x) ¯ (0)/2. Then α > 0 by (ii). For every ε > 0, we can find δ > 0 such that •
• •
¯ δ), x ∗ ∈ ∂f (x). f (y) ≥ f (x) + x ∗ , y − x − εx − y, when x, y ∈ B(x,
(61)
This follows from (iii) and Fact 6.21. s(x) ≥ α whenever s(x) ∈ ∂f (x) and x ∈ B(x, ¯ δ). This is because that ∂f (x) ¯ is compact, ∂f is upper semicontinuous, and (ii). |f (x) − f (y)| ≤ Lx − y whenever x, y ∈ B(x, ¯ δ). This is possible because f is locally Lipschitz around x. ¯
Since 0 ∈ ∂f (y) for y ∈ B(x, ¯ δ), we must have f (y) ≤ 0 if y ∈ S ∩ B(x, ¯ δ)). Thus, (61) gives
(∀x ∈ B(x, ¯ δ))(∀y ∈ S ∩ B(x, ¯ δ))(∀x ∗ ∈ ∂f (x)) f (x) + x ∗ , y − x ≤ εx − y. (62)
H.H. Bauschke et al.
Put C := S ∩ B(x, ¯ δ). When f (x) > 0, 0 ∈ ∂f (x), and y ∈ S ∩ B(x, ¯ δ), using Lemma 6.14, (62), and (13), we have f (x) (f (x) + 2 y − x, s(x)) s(x)2 f 2 (x) 2(f (x) + y − x, s(x)) − f (x) x − y2 + f (x) s(x)2 2f (x) 2 2 −f (x) εy − x + Gf,s (x) − x x − y + f (x) s(x)2 2(f (x) − f (y)) x − y2 + εx − y − Gf,s (x) − x2 α2 2Lx − y x − y2 + εx − y − Gf,s (x) − x2 α2 2Lε x − y2 + 2 x − y2 − Gf,s (x) − x2 α (1 + ε˜ )x − y2 − Gf,s (x) − x2 .
Gf,s (x) − y2 = x − y2 + = ≤ ≤ ≤ = ≤
(63) (64) (65) (66) (67) (68) (69)
This completes the proof. Remark 6.26 Observe that both Theorems 6.22 and 6.25 aim for solving nonconvex inequal2 ity problems, e.g., finding a point √ x such that f (x) ≤ 0 with f (x) = −e−x + 1/2 and f satisfying the assumptions at x¯ = ln 2. However, they do not apply to f = dC . This completes Part I. In Part II, we will study subgradient projectors of Moreau envelopes, and their connections to subgradient projectors of original functions. Part II Subgradient projectors of Moreau envelopes and characterizations
7 Subgradient Projectors of Moreau Envelopes When f : Rn → (−∞, +∞] is lsc, ∂f (x) might be empty for some x ∈ Rn . However, eλ f has much better properties when f is prox-bounded, see, e.g., Fact 7.5 below; and this is the motivation for us to study subgradient projectors of Moreau envelopes below. To do this, we need to study the relationship between Geλ f and Gf,s . Recall that for a proper, lsc function f : Rn → (−∞, +∞] and parameter value λ > 0, the Moreau envelope eλ f and proximal mapping Pλ f are defined respectively by 1 n 2 x − w , and eλ f : R → (−∞, +∞] : x → inf f (w) + w 2λ 1 x − w2 . Pλ f : Rn ⇒ Rn : x → argminw f (w) + 2λ When f is proper, lsc, and convex, we refer the reader to [6, Chapter 12] and [49] for the properties eλ f and Pλ f . When f is a proper and lsc function, not necessarily convex, in [44] Poliquin and Rockafellar coined the notions of prox-boundedness and prox-regularity of functions; see also [50, page 610].
Subgradient Projectors: Extensions, Theory, and Characterizations
Definition 7.1 (i) A function f : Rn → (−∞, +∞] is prox-bounded if there exists λ > 0 such that eλ f (x) > −∞ for some x ∈ Rn . The supremum of the set of all such λ is the threshold λf of prox-boundedness for f . (ii) A function f : Rn → (−∞, +∞] is prox-regular at x¯ for v¯ if f is finite and locally lsc at x¯ with v¯ ∈ ∂f (x), ¯ and there exists ε > 0 and ρ ≥ 0 such that
ρ f (x ) ≥ f (x) + v, x − x − x − x2 f or all x − x ¯ ≤ε 2 when v ∈ ∂f (x), v − v ¯ < ε, x − x ¯ < ε, f (x) < f (x) ¯ + ε. When this holds for all v¯ ∈ ∂f (x), ¯ f is said to be prox-regular at x. ¯ We give a simple example to illustrate the concepts of prox-regularity and prox-bounded of functions. Example 7.2 (1). The function f : R → R : x → −|x| is prox-bounded with λf = +∞. However, f is not prox-regular at x = 0. (2). The function f : R → R : x → x 3 is prox-regular on R. However, f is not prox-bounded. (3). The function f : R → R : x → −x 2 /2 is prox-regular on R, and prox-bounded with λf = 1. (4). The function f : R → R : x → x 3 − |x| not prox-regular at x = 0, and not prox-bounded. In the sequel, we shall also need the following key concepts. Definition 7.3 ([50, page 614]) Let O be a nonempty open subset of Rn . We say that f : O → R is C 1+ if f is differentiable with ∇f Lipschitz continuous. Set q : Rn → R : x → x2 /2. Definition 7.4 ([50, page 567]) A proper, lsc function f : Rn → (−∞, +∞] is μ-hypoconvex for some μ > 0 if f + μ−1 q is convex.
7.1 Fine Properties of Prox-Regular Functions Two major facts about the Moreau envelopes of prox-bounded functions and prox-regular functions are: Fact 7.5 ([50, Example 10.32]) Let f : Rn → (−∞, +∞] be proper, lsc, and proxbounded with threshold λf . Then for every λ ∈ (0, λf ), the function −eλ f is lower C 2 , hence semidifferentiable, locally Lipschitz, Clarke regular, and ∂[−eλ f ](x) = λ−1 [conv Pλ f (x) − x], ∅ = ∂[eλ f ](x) ⊆ λ−1 [x − Pλ f (x)]. Fact 7.6 ([13, Proposition 5.3], [50, Proposition 13.37]) Let f : Rn → (−∞, +∞] be lsc, proper, and prox-bounded with threshold λf . Suppose that f is prox-regular at x¯ for v¯ ∈ ∂f (x). ¯ Then for all λ ∈ (0, λf ) there is a neighborhood Uλ of x¯ + λv¯ for which the following equivalent properties hold: (i)
eλ f is C 1+ on Uλ .
H.H. Bauschke et al.
(ii)
Pλ f is nonempty, single-valued, monotone and Lipschitz continuous on Uλ . Further, ∇eλ f = (Id −Pλ f )/λ onUλ .
Proposition 7.7 Let f : Rn → (−∞, +∞] be proper, lsc, and prox-bounded with threshold λf . Then for every λ ∈ (0, λf ), one has dom Pλ f = Rn . Consequently, ran(Id +λ∂f ) = Rn . Proof As 0 < λ < λf , we have dom Pλ f = Rn . To complete the proof, it suffices to apply [50, Example 10.2]: Pλ f ⊆ (Id +λ∂f )−1 . Proposition 7.8 (global prox-regularity implies hypoconvexity) Let f : Rn → (−∞, +∞] be proper, lsc, and prox-bounded with threshold λf . Suppose that f is prox-regular on Rn . Then for every λ ∈ (0, λf ), the following hold: (i) The function f + λ−1 q is convex. (ii) Pλ f = (Id +λ∂f )−1 is single-valued and Lipschitz continuous on Rn . (iii) ∇eλ f = (Id −Pλ f )/λ. Proof When λ ∈ (0, λf ), we have dom Pλ f = Rn . Since f is prox-regular, by Fact 7.6, for v ∈ ∂f (x) there exists an open neighborhood Uλ of x + λv such that Pλ f is singlevalued and locally Lipschitz. Proposition 7.7 implies that Pλ f is single-valued and locally Lipschitz on Rn . As Pλ f is always monotone, cf. [50, Proposition 12.19], Pλ f is maximally monotone by [50, Example 12.7]. Then (i) and (ii) follow from [50, Proposition 12.19]. To obtain (iii), one can apply Fact 7.6(ii). Proposition 7.8(i) immediately implies: Corollary 7.9 Let f : Rn → (−∞, +∞] be proper, lsc, and prox-bounded. Suppose that f is prox-regular on Rn . Then the function f is a difference of two convex functions. Characterizations of prox-regularity on an open subset are given by Fact 7.10 ([50, Theorem 10.33], [50, Proposition 13.33]) Let f : O → R, where O is a nonempty open set in Rn . The following are equivalent: (i) (ii) (iii)
The function f is lower C 2 on O. Relative to some neighborhood of each point of O, there is an expression f = g − ρ q in which g is finite, convex function, and ρ > 0. f is prox-regular and locally Lipschitz on O.
Corollary 7.11 Let f : Rn → (−∞, +∞] and let O be a nonempty open subset of Rn . Suppose that f is prox-regular and locally Lipschitz on O. Then for every compact convex subset S of O, there exists ρ > 0 such that f + ρ q is convex on S. Proof Let x ∈ S. By Fact 7.10, there exists an open ball B(x, δx ) ⊆ O and ρx such that f + ρx q is convex on B(x, δx ). Select from the covering of S by various balls B(x, δx ) a . . . , m. Let ρ := max{ρx1 , . . . , ρxm }. As f + ρ q finite covering, say B(xi , δxi ) with i = 1, is convex on each B(xi , δxi ), and S ⊆ B(xi , δxi ), we obtain that f + ρ q is convex on S.
Subgradient Projectors: Extensions, Theory, and Characterizations
7.2 Relationship Among (∂eλ f )−1 (0), Fix Pλ f and (∂f )−1 (0) Proposition 7.12 Let f : Rn → (−∞, +∞] be proper, lsc, and prox-bounded with threshold λf . Then for every λ ∈ (0, λf ), the following hold: For every α ∈ R, the level set levα f = ∅ if and only if levα (eλ f ) = ∅. Moreover, levα (eλ f ) ⊇ levα f . 0 ∈ ∂eλ f (x) ⇒ x ∈ Pλ f (x) ⇒ 0 ∈ ∂f (x). If, in addition, f is prox-regular at x¯ for v¯ ∈ ∂f (x), ¯ then on a neighborhood Uλ of x¯ + λv¯ one has 0 = ∇eλ f (x) ⇔ x = Pλ f (x). When f is prox-regular on Rn , one has
(i) (ii) (iii)
(∀x ∈ Rn )
0 = ∇eλ f (x)
⇔
x = Pλ f (x)
⇔
0 ∈ ∂f (x).
Proof Since inf f = inf eλ f and argmin f = argmin eλ f by [50, Example 1.46], levα f = ∅ if and only if levα (eλ f ) = ∅ for every α ∈ R. The inclusion follows from eλ f ≤ f . (ii). By Fact 7.5, we have ∂eλ f (x) ⊆ λ−1 [x − Pλ f (x)]. This gives the first implication. By [50, Example 10.2], Pλ f (x) ⊆ (Id +λ∂f )−1 (x) for all x ∈ Rn . The second implication follows. (iii). By Fact 7.6 or [44, Theorem 4.4], the Moreau envelope eλ f is C 1+ on a neighborhood Uλ of x¯ + λv¯ with ∇eλ f = λ−1 [Id −Pλ f ] on Uλ . When f is prox-regular on Rn , one has ∇eλ f = λ−1 [Id −Pλ f ], and Pλ f = (Id +λ∂f )−1 is single-valued on Rn by Proposition 7.8. (i).
Fact 7.13 ([50, Proposition 12.19]) For a proper, lsc function f : Rn → (−∞, +∞], assume that f is μ-hypoconvex for some μ > 0. Then Pμ f = (Id +μ∂f )−1 , and for all λ ∈ (0, μ) the mapping Pλ f = (Id +λ∂f )−1 is Lipschitz continuous with constant μ/[μ − λ]. Under the assumption of f being μ-hypoconvex for some μ > 0, when λ > 0 is sufficiently small eλ f gives rise to a smooth regularization of f . Proposition 7.14 For a proper, lsc function f : Rn → (−∞, +∞], assume that f is μ-hypoconvex for some μ > 0. Then for every λ ∈ (0, μ), the following hold: (i) (ii)
eλ f is C 1+ and ∇eλ f = λ−1 (Id −Pλ f ) on Rn . ∇eλ f (x) = 0 ⇔ 0 ∈ ∂f (x).
Proof As f is μ-hypoconvex, f is prox-regular and prox-bounded. By Fact 7.13 and Fact 7.6, ∇eλ f = λ−1 [Id −Pλ f ] = λ−1 [Id −(Id +λ∂f )−1 ].
Remark 7.15 Proposition 7.14(ii) can also been obtained from [33, Theorem 4.4], in which the authors study the Bregman envelope and proximal mapping of proper, lsc, and proxbounded functions.
H.H. Bauschke et al.
Proposition 7.16 For a proper, lsc function f : Rn → (−∞, +∞], assume that f := max{f1 , . . . , fm } with fi being C 2 and that f is prox-bounded below. Then for every λ > 0 sufficiently small, one has 0 = ∇eλ f (x)
⇔
x = Pλ f (x)
⇔
0 ∈ ∂f (x).
Proof By [50, Proposition 13.33] or [44, Example 2.9], f is prox-regular everywhere on Rn . By Proposition 7.8, Pλ f = (Id +λ∂f )−1 . By Fact 7.6, we have ∇eλ f = λ−1 [Id −Pλ f ]. It remains to apply Proposition 7.12(iii), and Pλ f = (Id +λ∂f )−1 being single-valued. For a sequence {Ck }k∈N of subsets of Rn , its limit and outer limit are denoted respectively by limk→∞ Ck and lim supk→∞ Ck ; see [50, page 109]. Proposition 7.17 Let f : Rn → (−∞, +∞] be proper, lsc, and prox-bounded with threshold λf > 0. Assume that C ⊆ Rn is nonempty and closed. Then for every α ∈ R one has limλ↓0 levα (eλ f + ιC ) = levα (f + ιC ). Proof In view of [50, Theorem 7.4(d)], eλ f + ιC converges epigraphically to f + ιC when λ ↓ 0. By [50, Proposition 7.7], for every α ∈ R there exists αλ ↓ α such that limλ↓0 levαλ (eλ f + ιC ) = levα (f + ιC ). Since levα (f + ιC ) ⊆ levα (eλ f + ιC ) ⊆ levαλ (eλ f + ιC ), we obtain limλ↓0 levα (eλ f + ιC ) = levα (f + ιC ).
7.3 The Subgradient Projector of eλ f The following result extends [10, Proposition 3.1(viii)] and [9, Example 4.9(ii)] from convex functions to possibly nonconvex functions. Theorem 7.18 (subgradient projector of Moreau envelopes of a prox-regular function) Suppose that f : Rn → (−∞, +∞] is proper, lsc, and prox-bounded with threshold λf , and that f is prox-regular. Then for every λ ∈ (0, λf ), the subgradient projector of eλ f is given by eλ f (x) if eλ f (x) > 0 and x = Pλ f (x), x −λ x−P 2 (x −Pλ f (x)) n n λ f (x) Geλ f : R → R : x → x otherwise, and Fix Geλ f = lev0 (eλ f ) ∪ {x ∈ Rn | x = Pλ f (x)}. When x = Pλ f (x), we have 0 ∈ ∂f (x). Moreover, limλ↓0 lev0 (eλ f ) = lev0 f. Proof Apply Propositions 7.12, 7.8, and 7.17 with C = Rn and α = 0. The restriction of Gf to a subset D ⊆ Rn is denoted by Gf |D and is the operator defined by Gf |D : D → Rn , Gf |D (x) = Gf (x) for every x ∈ D. Theorem 7.19 (functions being prox-regular at the critical point) Suppose that f : Rn → (−∞, +∞] is proper, lsc, and prox-bounded with threshold λf , and that f is
Subgradient Projectors: Extensions, Theory, and Characterizations
prox-regular at x¯ for 0 ∈ ∂f (x). ¯ Then for every λ ∈ (0, λf ), there exists a closed neighborhood Uλ of x¯ for which eλ f (x) if eλf (x) > 0 and x = Pλf (x), x − λ x−P 2 (x −Pλf (x)) λ f (x) (∀x ∈ Uλ) Geλ f |Uλ (x) = x otherwise, (70) (71) Fix Geλ f |Uλ = lev0 (eλ f ) ∪ {x ∈ Rn | x = Pλ f (x)} ∩ Uλ , and x = Pλ f (x)
⇒ 0 ∈ ∂f (x).
(72)
Moreover, lim supλ↓0 (lev0 (eλ f ) ∩ Uλ ) ⊆ lev0 f. Proof Apply Fact 7.6 and Proposition 7.12(iii) to obtain (70)–(72). Since (lev0 (eλ f ) ∩ Uλ ) ⊆lev0 (eλ f ), and lim lev0 (eλ f ) = lev0 (e1/k f ) λ↓0
k≥1
by [50, Exercise 4.3(b)], it suffices to use Proposition 7.17 with C = Rn and α = 0. Theorem 7.20 (subgradient projector of Moreau envelopes of a hypoconvex function) Suppose that f : Rn → (−∞, +∞] is proper and lsc, and that f is μ-hypoconvex for some μ > 0. Then for every λ ∈ (0, μ), the subgradient projector of eλ f is given by eλ f (x) if eλ f (x) > 0 and 0 ∈ ∂f (x), x − λ x−P 2 (x − Pλ f (x)) n n λ f (x) Geλ f : R → R : x → x otherwise, Fix Geλ f = lev0 (eλ f ) ∪ {x ∈ Rn | 0 ∈ ∂f (x)}, and {x ∈ Rn | 0 = ∇eλ f (x)} = {x ∈ Rn | 0 ∈ ∂f (x)}. Moreover, limλ↓0 lev0 (eλ f ) = lev0 f. Proof Apply Propositions 7.14 and 7.17 with C = Rn and α = 0. Theorems 7.18 and 7.20 imply that if one can solve xλ ∈ Fix(Geλ f ), then either 0 ∈ ∂f (xλ ) for some λ > 0 or the subsequential limits of (xλ ) will lie in lev0 f when λ ↓ 0. Remark 7.21 Moreau envelopes of nonconvex functions in infinite dimensional spaces have been intensively studied; see, e.g., [3, 13, 14, 32]. Thus, it is possible to have analogues of Theorems 7.18, 7.19, 7.20 in infinite dimensional spaces. However, this is beyond the scope of this paper. Cutters are important for studying convergence of iterative methods; see, e.g., [7, 11, 20]. It is natural to ask whether Geλ f is a cutter in the case that Gf is a cutter. Although we cannot answer this in general, the following special case is true. Proposition 7.22 Let f : Rn → (−∞, +∞] be proper, lsc, and prox-regular. Suppose that min f = 0, f is strictly differentiable at every x ∈ argmin f , and that 0 ∈ ∂f (x) for every x ∈ Rn \ argmin f . Then for every λ > 0 the following hold: (i) (ii)
Fix Geλ f = Fix Gf . If Gf,s is a cutter for every selection s of ∂f , then Geλ f is a cutter.
H.H. Bauschke et al.
Proof As min f > −∞, the function f is prox-bounded with threshold rf = +∞. (i).
Note that min f = min eλ f , argmin f = argmin eλ f . The assumption min f = 0 implies lev0 f = lev0 eλ f = argmin f. Because f is prox-regular on Rn and rf = +∞, for every λ > 0 we have ∇eλ f = λ−1 (Id −Proxλ f ) and Proxλ f = (Id +λ∂f )−1 being single-valued by Proposition 7.8. This gives x ∈ Rn ∇eλ f (x) = 0 = x ∈ Rn 0 ∈ ∂f (x) . Then
Fix Geλ f = lev0 eλ f ∪ x ∈ Rn ∇eλ f (x) = 0 = lev0 f ∪ x ∈ Rn 0 ∈ ∂f (x) = Fix Gf .
(ii).
Assume that eλ f (x) > 0 and 0 = ∇eλ f (x). Since f is prox-regular, ∇eλ f (x) = λ−1 (x − Proxλ f (x)) = 0. By the definition of Proxλ f , we have 0 = λ−1 (x − Proxλ f (x)) ∈ ∂f (Proxλ f (x)),
(73)
which implies that Proxλ f (x) ∈ argmin f . Indeed, if Proxλ f (x) ∈ argmin f , the assumption gives ∂f (Proxλ f (x)) = {0} which contradicts (73). Thus, f (Proxλ f (x)) > 0. Because Proxλ f (x) ∈ argmin f , the assumption also gives 0 ∈ ∂f (Proxλ f (x)). These arguments, (i), Theorem 5.11, and Gf,s being a cutter altogether imply that (74) f (Proxλ f (x)) + λ−1 (x − Proxλ f (x)), u − Proxλ f (x) ≤ 0 if u ∈ Fix Gf,s , eλ f (x) > 0 and ∇eλ f (x) = 0. Now we show that Geλ f is a cutter. Let u ∈ Fix Geλ f , eλ f (x) > 0, and ∇eλ f (x) = 0. In view of (74) and (i), we calculate eλ f (x) + λ−1 (x − Proxλ f (x)), u − x = eλ f (x)+ λ−1 (x −Proxλ f (x)), u−Proxλ f (x) + λ−1 (x −Proxλ f (x)), Proxλ f (x)−x 1 = f (Proxλ f (x)) + x − Proxλ f (x)2 2λ + λ−1 (x − Proxλ f (x)), u − Proxλ f (x) − λ−1 x − Proxλ f (x))2 = f (Proxλ f (x)) + λ−1 (x − Proxλ f (x)), u − Proxλ f (x) − ≤−
1 x − Proxλ f (x))2 2λ
1 x − Proxλ f (x))2 ≤ 0. 2λ
Theorem 5.11(i) concludes the proof. A local version of Theorem 7.23 comes as follows.
Subgradient Projectors: Extensions, Theory, and Characterizations
Proposition 7.23 Let f : Rn → (−∞, +∞] be proper, lsc, and prox-regular at x¯ for v¯ = 0, and let S := x ∈ Rn 0 ∈ ∂f (x) ∪ lev0 f. Suppose that min f = 0, and there exists δ > 0 such that (i)
¯ δ), i.e., For every selection s of ∂f , Gf,s is a cutter on B(x, (∀x ∈ B(x, ¯ δ) \ S)(∀u ∈ S ∩ B(x, ¯ δ)) f (x) + s(x), u − x ≤ 0.
(ii)
f is strictly differentiable at every u ∈ argmin f ∩ B(x, ¯ δ), and that 0 ∈ ∂f (x) for ¯ δ). every x ∈ (Rn \ argmin f ) ∩ B(x,
Then for every λ > 0 there is a neighborhood of x¯ on which Geλ f is a cutter. Proof Because min f > −∞, the function f is prox-bounded with threshold rf = +∞. Since min f = min eλ f and argmin f = argmin eλ f , the assumption min f = 0 implies lev0 f = lev0 eλ f = argmin f. Because f is prox-regular at x¯ for v¯ = 0, and rf = +∞, by Proposition 7.6 for every λ > 0 there exists δ > δ1 > 0 such that on B(x, ¯ δ1 ) the proximal mappings ¯ = x, ¯ Pλ f is Lipschitz continuous, Pλ f (x)
(75)
∇eλ f = λ−1 (Id −Proxλ f ).
(76)
and
By (75) there exists δ1 > δ2 > 0 such that ¯ δ1 ) when x ∈ B(x, ¯ δ2 ). Pλ f (x) ∈ B(x, Claim 1.
For every u ∈ Fix Gf,s ∩ B(x, ¯ δ2 ) we have f (Proxλ f (x)) + λ−1 (x − Proxλ f (x)), u − Proxλ f (x) ≤ 0
(77)
(78)
¯ δ2 ). if eλ f (x) > 0, ∇eλ f (x) = 0 and x ∈ B(x, ¯ δ2 ). In view of (76), ∇eλ f (x) = Indeed, let eλ f (x) > 0 and 0 = ∇eλ f (x) and x ∈ B(x, λ−1 (x − Proxλ f (x)) = 0. By the definition of Proxλ f or [44, Proposition 4.3(b)], we have 0 = λ−1 (x − Proxλ f (x)) ∈ ∂f (Proxλ f (x)).
(79)
This implies that Proxλ f (x) ∈ argmin f . Suppose to the contrary that Proxλ f (x) ∈ argmin f . Then the assumption (ii) and (77) give ∂f (Proxλ f (x)) = {0} which contradicts (166). Thus, f (Proxλ f (x)) > 0.
(80)
Because Proxλ f (x) ∈ argmin f and (77), the assumption (ii) also ensures 0 ∈ ∂f (Proxλ f (x)). Therefore, (78) follows from assumptions (i) and (ii). Claim 2.
Geλ f is a cutter on B(x, ¯ δ2 ).
(81)
H.H. Bauschke et al.
To this end, let u ∈ Fix Geλ f ∩ B(x, ¯ δ2 ), x ∈ B(x, ¯ δ2 ), eλ f (x) > 0, and ∇eλ f (x) = 0. Then u ∈ Fix Gf ∩ B(x, ¯ δ2 ), f (Proxλ f (x)) > 0, 0 ∈ ∂f (Proxλ f (x)) by (80), (81). Using (78) we calculate eλ f (x) + λ−1 (x − Proxλ f (x)), u − x = eλ f (x) + λ−1 (x − Proxλ f (x)), u − Proxλ f (x) + λ−1 (x − Proxλ f (x)), Proxλ f (x) − x 1 = f (Proxλ f (x)) + x − Proxλ f (x)2 + λ−1 (x − Proxλ f (x)), u − Proxλ f (x) 2λ − λ−1 x − Proxλ f (x))2 1 x − Proxλ f (x))2 = f (Proxλ f (x)) + λ−1 (x − Proxλ f (x)), u − Proxλ f (x) − 2λ 1 ≤ − x − Proxλ f (x))2 ≤ 0. 2λ ¯ δ2 ) by Theorem 5.11(ii). Hence, Geλ f is a cutter on B(x, Is it possible that Geλ f is a cutter for every λ > 0 but Gf is not a cutter? This is partially answered by the following result. Proposition 7.24 Let f : Rn → R be C 2 and prox-bounded below. If Geλ f is a cutter for all sufficiently small λ > 0, then Gf is a cutter. Proof Let λ > 0 be sufficiently small. Proposition 7.8 yields that eλ f is C 1+ . Write S := x ∈ Rn 0 ∈ ∇f (x) ∪ lev0 f, Sλ := x ∈ Rn ∇eλ f (x) = 0 ∪ lev0 eλ f. Using that ∇eλ f (x) = 0 ⇔ 0 ∈ ∂f (x) by Proposition 7.16 and that eλf ≤ f , we have S ⊆ Sλ . Since Geλ f is a cutter, by Theorem 5.11 we obtain Sλ ⊆ u ∈ Rn eλ f (x) + ∇eλ f (x), u − x ≤ 0 whenever eλ f (x) > 0 and ∇eλ f (x) = 0. It follows that S ⊆ u ∈ Rn eλ f (x) + ∇eλ f (x), u − x ≤ 0
(82)
whenever eλ f (x) > 0 and ∇eλ f (x) = 0. By [3, Theorem 3.10] or [32, Theorem 5.1], ∇f (x) = lim sup ∇eλm f (xm )
(83)
m→∞
in which xm → x, eλm f (xm ) → f (x), λm ↓ 0. Whenever f (x) > 0, ∇f (x) = 0, (83) implies that for sufficiently large m, it holds that eλm f (xm ) > 0 and ∇eλm f (xm ) = 0. Then by (82),
(∀u ∈ S) eλm f (xm ) + ∇eλm f (xm ), u − xm ≤ 0. Passing to the limit when m → ∞, we have (∀u ∈ S)
f (x) + ∇f (x), u − x ≤ 0.
Hence, Gf is a cutter by using Theorem 5.11 again.
Subgradient Projectors: Extensions, Theory, and Characterizations
7.4 The Subgradient Projector of dC When C is Prox-Regular at a Point In this subsection, instead of functions we shall consider sets which are prox-regular at ¯ when ιC is some points. Recall that a set C ⊆ Rn is prox-regular at x¯ ∈ C for v¯ ∈ NC (x) prox-regular at x¯ for v; ¯ see [50, Exercise 13.31]. Example 7.25 Let C ⊆ Rn be closed and x¯ ∈ C. If C is prox-regular at x¯ for v¯ = 0, then there exists a neighborhood U of x¯ on which (i) PC is single-valued and Lipschitz; (ii) PC = (Id +T )−1 for some localization T of NC around (x, ¯ 0); C (iii) dC is strictly differentiable on U \ C with ∇dC = Id d−P ; C (iv) GdC = PC ; C (v) Gd 2 = Id +P 2 . C
Proof (i), (ii), and (iii) are given in [50, page 618]. To see (iv), let x ∈ U \ C. Since dC (x) > 0 and ∇dC (x) =
x−PC (x) dC (x)
= 0, we have
dC (x) ∇dC (x) ∇dC (x)2 = x − (x − PC (x)) = PC (x).
GdC (x) = x −
When x ∈ U ∩ C, GdC (x) = x = PC (x). (v) follows from (iv) and Theorem 3.9. Remark 7.26 Sets which satisfy the assumption on C in Theorem 7.5 include convex sets, strongly amenable sets, etc; see, e.g., [50, page 442]. See also [1] for recent advances on prox-regular sets and uniformly prox-regular sets. According to Example 7.25(iv), when C is prox-regular at x¯ for v¯ = 0, we have GdC = ¯ What happens if, in addition, GdC is a cutter or quasi PC around a neighborhood of x. nonexpansive on the neighborhood? Proposition 7.27 Let C ⊆ Rn be closed and x¯ ∈ C, and let C be prox-regular at x¯ for v¯ = 0. Suppose that there exists δ > 0 such that one of the following holds: (i)
¯ δ), i.e., PC is a cutter on B(x, (∀x ∈ B(x, ¯ δ))(∀u ∈ C ∩ B(x, ¯ δ))
(ii)
x − PC (x), u − PC (x) ≤ 0.
(84)
PC (x) − u ≤ x − u.
(85)
¯ δ), i.e., PC is quasi-ne on B(x, (∀x ∈ B(x, ¯ δ))(∀u ∈ C ∩ B(x, ¯ δ))
Then C ∩ B(x, ¯ δ) is convex. Proof By Example 7.25, there exists δ > 0 such that PC is single-valued and Lipschitz on the closed ball B(x, ¯ δ). (i). Assume that (84) holds. On the one hand, (84) gives that C ∩ B(x, ¯ δ) ⊆ u ∈ B(x, ¯ δ) x − PC (x), u − PC (x) ≤ 0 . x∈B(x,δ) ¯
H.H. Bauschke et al.
On the other hand, let
u ∈ B(x, ¯ δ) x − PC (x), u − PC (x) ≤ 0 .
y∈ x∈B(x,δ) ¯
Then y ∈ B(x, ¯ δ) and x − PC (x), y − PC (x) ≤ 0 for every x ∈ B(x, ¯ δ). Taking x = y we have y − PC (y), y − PC (y) ≤ 0, which implies y = PC (y), so y ∈ C. Therefore, y ∈ B(x, ¯ δ) ∩ C. Hence u ∈ B(x, ¯ δ) x − PC (x), u − PC (x) ≤ 0 , C ∩ B(x, ¯ δ) = x∈B(x,δ) ¯
and consequently C ∩ B(x, ¯ δ) is a convex set. (ii). Similar arguments as (i) show that C ∩ B(x, ¯ δ) = u ∈ B(x, ¯ δ) PC x − u ≤ x − u . x∈B(x,δ) ¯ n n To finish it suffices to observe the proof, that in the Euclidean space R , for every x, y ∈ R the set u ∈ Rn y − u ≤ x − u is a half space when x = y, and the whole space Rn if x = y.
Proposition 7.28 Let C ⊆ Rn be closed and x¯ ∈ C. If there exists δ > 0 such that C ∩ B(x, ¯ δ) is convex, then there exists δ1 > 0 such that PC is a cutter on B(x, ¯ δ1 ). ¯ δ1 ). Consequently, PC is a quasi-ne on B(x, Proof Observe that for p ∈ PC (x), p − x ¯ ≤ p − x + x − x ¯ ≤ 2x − x. ¯ Thus, we can choose 0 < δ1 < δ sufficiently small, e.g., δ1 < δ/2, such that x − x ¯ < δ1 implies
(∀p ∈ PC (x)) p − x ¯ < δ.
This implies that whenever x ∈ B(x, ¯ δ1 ), we have PC (x) ⊆ C ∩ B(x, ¯ δ). Then (∀x ∈ B(x, ¯ δ1 )) PC (x) = PC∩B(x,δ) ¯ (x).
(86)
Because C ∩ B(x, ¯ δ) is closed and convex, PC∩B(x,δ) is firmly nonexpansive on Rn . From ¯ (86), we have that PC is firmly nonexpansive on B(x, ¯ δ1 ), that is, (∀x, y ∈ B(x, ¯ δ1 )) PC (x) − PC (y)2 + (Id −PC )(x) − (Id −PC )(y)2 ≤ x − y2 . It follows that (∀x ∈ B(x, ¯ δ1 ))(∀y ∈ C ∩ B(x, ¯ δ1 )) PC (x) − y2 + x − PC (x)2 ≤ x − y2 . Hence PC is a cutter on B(x, ¯ δ1 ).
8 Characterization of Subgradient Projectors of Convex Functions Subgradient projectors of convex functions are quasi-fne, so algorithms developed in [20] or [6] can be applied; see also Theorem 6.13. Therefore, in practice, it is useful to have available some results on whether a mapping is a subgradient projector of a convex function. This is the goal of this section. The results in this section provide some checkable conditions for convergence of iterated subgradient projectors in Section 6. The following result is of independent interest.
Subgradient Projectors: Extensions, Theory, and Characterizations
Proposition 8.1 Let C ⊆ Rn be closed and convex. Assume that the function f : Rn \ C → R satisfies (i) (ii) (iii)
f ≥ 0 on Rn \ C; f is convex on every convex subsets of Rn \ C; f (y) = 0. That is, limi→∞ f (yi ) = 0 Whenever x ∈ bdry(Rn \C), one has lim y→x n y∈R \C
whenever (yi )i∈N is a sequence in Rn \ C converging to a boundary point x of Rn \ C. Define
g : Rn → R : x →
if x ∈ C, if x ∈ C.
f (x) 0
Then g is convex on Rn . Proof Let x, y ∈ Rn , 0 ≤ λ ≤ 1. We need to show g(λx + (1 − λ)y) ≤ λg(x) + (1 − λ)g(y).
(87)
We consider three cases. (i). (ii).
If [x, y] ⊆ Rn \ C, g = f is convex on [x, y] by the assumption. If λx + (1 − λ)y ∈ C, then g(λx + (1 − λ)y) = 0 ≤ λg(x) + (1 − λ)g(y)
(iii).
since g(x), g(y) ≥ 0. λx + (1 − λ)y ∈ C and [x, y] ∩ C = ∅. In particular, x, y cannot both be in C. We consider two subcases.
Subcase 1. x ∈ C and y ∈ C. As y ∈ C, there exists z ∈ bdry(C) such that λx + (1 − λ)y ∈ [z, y] ⊆ X \ C
and
f (z) = 0.
Because λx + (1 − λ)y = αz + (1 − α)y
for some 0 ≤ α ≤ 1,
and f is convex on [z, y], we have f (λx + (1 − λ)y) = f (αz + (1 − α)y) ≤ αf (z) + (1 − α)f (y) = (1 − α)f (y). (88) Now z = βx + (1 − β)y
for some 0 ≤ β ≤ 1, and
λx + (1 − λ)y = αz + (1 − α)y = α(βx + (1 − β)y) + (1 − α)y = (αβ)x + (1 − αβ)y give λ = αβ. Therefore, by (88), g(x) = 0 and g(y) = f (y) ≥ 0, g(λx + (1 − λ)y) = f (λx + (1 − λ)y) ≤ (1 − αβ)f (y) = (1 − λ)g(y) + λg(x), which is (87). Subcase 2. x ∈ C and y ∈ C. By the assumption, there exists z ∈ bdry(C) such that λx + (1 − λ)y ∈ [z, y]
or
λx + (1 − λ)y ∈ [x, z],
(89) (90)
H.H. Bauschke et al.
say λx + (1 − λ)y ∈ [z, y]. Then λx + (1 − λ)y = αz + (1 − α)y
for some 0 ≤ α ≤ 1.
As f is convex on [z, y], f (z) = 0, g(λx + (1 − λ)y) = f (αz + (1 − α)y) ≤ αf (z) + (1 − α)f (y) = (1 − α)f (y).
(91)
Now z = βx + (1 − β)y
for some 0 ≤ β ≤ 1, and
λx + (1 − λ)y = αz + (1 − α)y = α(βx + (1 − β)y) + (1 − α)y = (αβ)x + (1 − αβ)y give λ = αβ. Then by (91), using g(x) = f (x) ≥ 0, g(y) = f (y) ≥ 0, we obtain g(λx + (1 − λ)y) ≤ (1 − αβ)f (y) = (1 − λ)f (y)
(92)
≤ (1 − λ)f (y) + λf (x)
(93)
= (1 − λ)g(y) + λg(x),
(94)
which is (87). Combining (i)–(iii), we conclude that g is convex on Rn . Theorem 8.2 Let T : Rn → Rn and C := x ∈ Rn T x = x be closed convex. Then n T is a subgradient projector of a convex function f : R → R with lev0 f = C if and only if there exists g : Rn → [−∞, +∞) such that g : Rn \ C → R is locally Lipschitz, g(x) = −∞ for every x ∈ C, and (i) (ii)
x−T x for every x ∈ Rn \ C, x−T ∈ ∂g(x); x2 the function defined by exp(g(x)) if x ∈ C, f (x) := 0 if x ∈ C,
is convex. In this case, T = Gf . Proof “⇒”: Assume that T is a subgradient projector, say T = Gf1 with f1 : Rn → R being convex and lev0 f1 = C. Then f = max{0, f1 } is convex and Gf = Gf1 . Put g = ln f and C = lev0 f . Since f is locally Lipschitz, g is locally Lipschitz on Rn \ C. Note that ∂g(x) = (∂f (x))/f (x) when f (x) > 0. Apply Theorem 4.1(i) to obtain (i). “⇐”: Assume that (i), (ii) hold. When x ∈ C, (i) and (ii) give x − T x =
1 , c(x)
∂g(x) =
∂f (x) f (x)
where c(x) ∈ ∂g(x). Using (i) again, we have T x = x − x − T x2 c(x) = x −
c(x) = Gf (x) c(x)2
(95)
by Theorem 4.1(ii). Moreover, when x ∈ C, T x = x = Gf (x). Hence T = Gf . For an n × n symmetric matrix A, by A
0 we mean that A is positive semidefinite.
Subgradient Projectors: Extensions, Theory, and Characterizations
Theorem 8.3 Let T : Rn → Rn and C := x ∈ Rn T x = x . Suppose that C is closed and convex, and T is continuously differentiable on Rn \ C. Define T 1 : R n \ C → Rn : x →
x −Tx . x − T x2
Then T is a subgradient projector of a convex function f : Rn → R with lev0 f = C and being differentiable on Rn \ C if and only if (i) (ii)
For every x ∈ Rn \ C, the matrix T1 (x)(T1 (x))∗ + ∇T1 (x) There exists a function g : Rn → [−∞, +∞) such that (∀x ∈ Rn \ C) (∀x ∈ bdry(C))
∇g(x) = T1 (x),
lim g(y) = −∞,
y→x y∈Rn \C
0;
and (∀x ∈ C)
g(x) = −∞.
Proof “⇒”: Assume that T = Gf with f being convex and lev0 f = C. Theorem 4.10 shows that f is continuously differentiable on Rn \ C. By Theorem 4.1(i), we can put g = ln f to obtain (ii). Moreover, as f = exp(g), thanks to (16) in Theorem 4.1, for every x ∈ C we have ∇f (x) = eg(x) ∇g(x) = eg(x) T1 (x), ∇ 2 f (x) = eg(x) T1 (x)(T1 (x))∗ + eg(x) ∇T1 (x) = eg(x) T1 (x)(T1 (x))∗ + ∇T1 (x) . Since f is convex, ∇ 2 f (x)
0, and this is equivalent to T1 (x)(T1 (x))∗ + ∇T1 (x)
0
which is (i). “⇐”: Assume that (i) and (ii) hold. Put f = exp(g). Then lev0 f = C, and for x ∈ Rn \ C, ∇f (x) = eg(x) ∇g(x) = eg(x) T1 (x), ∇ 2 f (x) = eg(x) T1 (x)(T1 (x))∗ + eg(x) ∇T1 (x) = eg(x) T1 (x)(T1 (x))∗ + ∇T1 (x) . (i) and (ii) imply that f is differentiable and convex on convex subsets of Rn \ C, and f ≡ 0 on C. By Proposition 8.1, f is convex on Rn . Moreover, when x = T x we have 2 T1 (x) ∇f (x) f (x) =x− Gf (x) = x − (96) ∇f (x) f (x) T1 (x)2 = x−
x−T x x−T x2 1 x−T x
2 = x − (x − T x) = T x.
(97)
Corollary 8.4 Let T : R → R and C := x ∈ R T x = x . Suppose that C is a closed interval, and T is continuously differentiable on R \ C. Then T is a subgradient projector of a convex function f : R → R with lev0 f = C and being differentiable on R \ C if and only if (i)
T is monotonically increasing on convex subsets of R \ C;
H.H. Bauschke et al.
(ii)
The function
x
1 ds s − Ts a satisfies limx↓sup(C) g(x) = −∞ for some a > sup(C); and limx↑inf(C) g(x) = −∞ for some a < inf(C). g(x) =
Proof Define n : R → R : x → x − T x. Then for every x ∈ C, T1 (x) = Theorem 8.3(i) is equivalent to 1 n2 (x)
−
1 n(x) .
n (x) ≥ 0. n2 (x)
This is the same as n (x) ≤ 1, which transpires to T (x) ≥ 0. Remark 8.5 Let f : Rn → R be continuously differentiable, lev0 f = Fix T , and Gf = T . Can one use T to decide whether f is convex? The proof of Theorem 8.3 implies that ∇f = f · T1 where T1 : Rn \ lev0 f → Rn : x →
x −Tx . x − T x2
If f · T1 is monotone on convex subsets of Rn \ lev0 f ,
(98)
Rn
then f is convex on convex subsets of \ lev0 f . Using Proposition 8.1, we conclude that max{0, f } is convex on Rn . When T is continuously differentiable, (98) is equivalent to (∀x ∈ Rn \ C)
T1 (x)(T1 (x))∗ + ∇T1 (x)
0.
(99)
On R, (99) is equivalent to T is monotonically increasing on convex subsets of R \ C.
(100)
Corollary 8.6 Let T : R → R and C := x ∈ R T x = x . Let C be a closed interval, and T be continuously differentiable on R \ C. Define N : R → R : x → x − T x. Suppose that (i) (ii)
N is nonexpansive; The function
x
1 ds s −Ts satisfies limx↓sup(C) g(x) = −∞ for some a > sup(C); and limx↑inf(C) g(x) = −∞ for some a < inf(C). g(x) =
a
Then T is a subgradient projector of a convex function f : R → R with lev0 f = C and being differentiable on R \ C. In particular, the assumption (i) holds when T is firmly nonexpansive. Proof It suffices to observe that T = Id −N. Since N is nonexpansive, T is monotone. Also note that T is firmly nonexpansive if and only if N is.
Subgradient Projectors: Extensions, Theory, and Characterizations
We illustrate Corollary 8.4 with three examples. They demonstrate that both conditions (i) and (ii) in Corollary 8.4 are needed. More precisely, (i) is for the convexity of f ; (ii) is for lev0 f = C. Example 8.7 Define T : R → R by √ √ √ x − x + xe−2 x T (x) := 0
if x > 0, if x ≤ 0.
Then T is a subgradient projector of the nonconvex function 2√x −1 if x > 0, e f : R → R : x → 0 if x ≤ 0. In this case, T fails to be monotone, but T verifies condition (ii) of Corollary 8.4. √
Proof When x > 0, f (x) = e2
x x −1/2 ,
so that √
f (x) =
e2
x (1 − 1/(2√x))
x
.
Since f (x) < 0 when x < 1/4, f is not convex on R. Now we show that (i).
T fails to be monotone. This is equivalent to verify that for some x we have N (x) > 1 where N (x) = x − T x. Indeed, √ √ √ √ x 1 e2 x − 1 −2 x √ √ +e x− √ = . N (x) = 2 2 x x 2 e e x
L’Hospital’s rule gives lim
x→0+
√ x −1 √ √ e2 x x
e2
= 2,
so limx→0+ N (x) = 2. Therefore, T is not monotone. (ii).
T satisfies condition (ii) of Corollary 8.4. For x > 0, N (x) =
√ x
e2
√
e2
−1
x x −1/2
.
With a > 0, we have x x 2√x −1/2 √ √ 1 e x √ g(x) = dx = ln(e2 x − 1) − ln(e2 a − 1). ds = a N (s) a e2 x − 1 Clearly, limx→0+ g(x) = −∞. Hence (ii) holds. Example 8.8 Define
T : R → R : x → 2
x− 0
1 2x
if x = 0, if x = 0.
Then T = Gf where f : R → R : x → ex . However, lev0 f = ∅ but Fix(T ) = {0}. In this case, in Corollary 8.4 condition (i) holds but condition (ii) fails.
H.H. Bauschke et al.
Proof We have N (x) = x − T (x) =
1 2x
and N (x) = − 2x1 2 . Therefore, T is monotone on (0, +∞) and (−∞, 0). This says that condition (i) of Corollary 8.4 holds. However, when a > 0, for x > 0 we have x x 1 dx = g(x) = 2xdx = x 2 − a 2 . a N (x) a Then limx→0+ g(x) = −a 2 , so condition (ii) of Corollary 8.4 fails. Example 8.9 Define T : R → R by ⎧ √ ⎨x − x x → 0 √ ⎩ x − −x
if x > 0, if x = 0, if x < 0.
Then T = Gf where the nonconvex function f : R → R : x →
√
x e2 √ −2 −x e
if x ≥ 0, if x < 0.
However, lev0 f = ∅ but Fix T = {0}. In this case, both conditions (i) and (ii) in Corollary 8.4 fail. √
Proof The function f (x) = e2 x is nonconvex on [0, +∞), see Example 8.7. Gf = T follows by direct calculations. Condition (i) of Corollary 8.4 fails: T is not monotonically increasing on [0, +∞) since T (x) = 1 − 2√1 x < 0 when x > 0 is sufficiently near 0. √ Condition (ii) of Corollary 8.4 fails. Indeed, N (x) = x −T (x) = x when x ≥ 0. When a > 0, for x > 0 we have x √ √ 1 g(x) = √ ds = 2 x − 2 a, s a √ so that limx→0+ g(x) = −2 a. For further properties of subgradient projectors of convex functions, we refer the reader to [10, 43, 54]. This completes Part II. We will investigate conditions under which a subgradient projector is linear in part III. Part III Linear subgradient projectors
9 Characterizations of Gf,s When Gf,s is Linear We shall see in this section that under appropriate conditions a linear operator is a subgradient projector of a convex function if and only if it is a convex combination of the identity
Subgradient Projectors: Extensions, Theory, and Characterizations
operator and a projection operator on a subspace (Theorems 9.6 and 9.11). For subgradient projectors of convex functions, see [7, 10, 43, 45–47]. For a column vector x ∈ Rn and an n × n matrix U , we shall denote their transposes by x ∗ and U ∗ respectively. We begin with
9.1 Linear Cutters are Precisely Linear Firmly Nonexpansive Mappings Proposition 9.1 Let H be a Hilbert space, and T : H → H be a linear operator. Then the following are equivalent: (i) (ii) (iii)
T is a cutter, i.e., quasi-firmly nonexpansive. T is firmly nonexpansive. There exists δ > 0 and x¯ ∈ Fix T such that T is a cutter on B(x, ¯ δ), i.e., a local cutter.
Proof (i)⇒(ii). Assume that T is a cutter. Then for every x ∈ X and u ∈ Fix T , x − T x, u − T x = T x − x, T x − u ≤ 0. Put u = 0. We have T x − x, T x − 0 ≤ 0
T x2 ≤ x, T x .
⇒
Hence T is firmly nonexpansive, see [6, Corollary 4.3]. (ii)⇒(i). Assume that T is firmly nonexpansive. Let u ∈ Fix T . Then T u = u and T x − x, T x − u = T x − x, T x − T u
(101)
= T x − T u + T u − x, T x − T u
(102)
= T x − T u2 + T u − x, T x − T u
(103)
= T x − T u − x − u, T x − T u ≤ 0.
(104)
2
Hence T is a cutter. (iii)⇒(ii). By the assumption (∀x ∈ B(x, ¯ δ))(∀u ∈ B(x, ¯ δ) ∩ Fix T )
x − T x, u − T x ≤ 0.
As T x¯ = x, ¯ and T is linear, for x = x¯ + v with v ≤ δ, we have 0 ≥ x − T x, x¯ − T x = T x − x, T x − x ¯ = T (x − x) ¯ − (x − x), ¯ T (x − x) ¯ = T v2 − v, T v . Since T is linear, we have T x2 ≤ x, T x for every x ∈ X, so T is firmly nonexpansive, see [6, Corollary 4.3]. Since (i)⇔(ii), and (i) implies (iii), the proof is done. The following example says that Proposition 9.1 fails if T is not linear. Example 9.2 Define the continuous nonlinear mapping ⎧ x/2 if − 2 ≤ x ≤ 2, ⎪ ⎪ ⎨ 3−x if 2 ≤ x ≤ 3, T : R → R : x → −(3 + x) if − 3 ≤ x ≤ −2, ⎪ ⎪ ⎩ 0 otherwise. Then T is a cutter, nonexpansive, but not firmly nonexpansive as T is not monotone; cf. [6, Proposition 4.2(iv)].
H.H. Bauschke et al.
Indeed, Fix T = {0}. This means that T is a cutter if and only if (T (x))2 ≤ xT (x). When −2 ≤ x ≤ 2, we have x2 x (T (x))2 = ≤ x = xT (x); 4 2 When 2 ≤ x ≤ 3, (T (x))2 = (3 − x)2 = (3 − x)(3 − x) ≤ x(3 − x); when −3 ≤ x ≤ −2, (T (x))2 = [−(x + 3)]2 = [−(x + 3)][−(x + 3)] ≤ x[−(3 + x)]; when |x| > 3,
(T (x))2 = 0 = xT (x). Hence T is a cutter. Clearly, T is nonexpansive. As T is not monotone, we conclude that T is not firmly nonexpansive.
Remark 9.3 Observe that Example 9.2 is much simpler than the example on R2 constructed by Cegielski [20, Example 2.2.8, page 68].
9.2 Subgradient Projector of Powers of a Quadratic Function It is natural to investigate subgradient projectors of quadratic functions or their variants first. In the following result, we assume B = 0 because that B = 0 gives Gf = Id with f ≡ 0. Theorem 9.4 Let a > 0 and B = 0 being an n × n symmetric and positive semidefinite matrix. Consider the function f : Rn → R : x → (x ∗ Bx)1/(2a) . Then the following hold:[(iii)] (i) lev0 f = x ∈ Rn Bx = 0 . (ii) We have Gf (x) = (iii)
(iv)
∗
x Bx x − a Bx 2 Bx if Bx = 0, x if Bx = 0.
Gf is linear if and only if B = λPL where λ > 0 and L ⊆ X is a subspace. In this case ker B = L⊥ , 1/a f (x) = λ1/(2a) dL⊥ (x) and Gf = Id −aPL = (1 − a) Id +aPL⊥ . Assume that Gf is linear. Then Gf is a cutter if and only if 0 < a ≤ 1.
Proof (i). Since B is symmetric and positive semidefinite, there exists a matrix A such that B = A∗ A; see, e.g., [38, page 558]. Then Ax = 0 ⇔ Bx = 0. The result follows because f (x) = Ax1/a . (ii). Gf follows from direct calculations. (iii). “⇒”: Assume that Gf is linear. The mapping ∗ x Bx Bx if Bx = 0, −1 x − Gf (x) = Bx2 x → T1 (x) := a 0 if Bx = 0,
Subgradient Projectors: Extensions, Theory, and Characterizations
is linear. Let λ1 , λ2 > 0 be any two eigenvalues of B. We show that λ1 = λ2 . Suppose that λ1 = λ2 . Take unit length eigenvector vi associated with λi . Note that v1 , v2 = 0, Bvi = 0 and B(v1 + v2 ) = λ1 v1 + λ2 v2 = 0. As T1 is linear, we have T1 (v1 + v2 ) = T1 v1 + T1 v2 . Now T1 (v1 + v2 ) = = = T1 v1 + T1 v2 = = =
(v1 + v2 )∗ B(v1 + v2 ) B(v1 + v2 ) B(v1 + v2 )2 (v1 + v2 )∗ (λ1 v1 + λ2 v2 ) (λ1 v1 + λ2 v2 ) λ1 v1 + λ2 v2 2 λ1 + λ2 (λ1 v1 + λ2 v2 ), λ21 + λ22 v1∗ Bv1 v ∗ Bv2 Bv1 + 2 Bv2 2 Bv1 Bv2 2 λ1 v1 2 λ2 v2 2 λ1 v1 + λ2 v2 2 λ1 v1 λ2 v2 2 v 1 + v2 .
(105) (106) (107) (108) (109) (110)
As {v1 , v2 } are linearly independent, the above gives λ1 = λ2 which contradicts λ1 = λ2 . Therefore, all positive eigenvalues of B have to be equal. Hence, we have Id 0 B = λU ∗ U 0 0 where U is an orthogonal matrix, λ > 0, Id is an m × m identity matrix with m = rank B. The matrix Id 0 U U∗ 0 0 is idempotent and symmetric, so it is a matrix associated with an orthogonal projection onto a closed subspace, say PL , [38, page 430, page 433]. Hence B = λPL which implies that Bx = 0 if and only if PL x = 0, i.e., ker B = L⊥ . Then when PL x = 0, T1 (x) =
x ∗ Bx λx ∗ PL x λx ∗ PL x Bx = ∗ λPL x = PL x; λPL x = 2 ∗ 2 x λPL λPL x Bx λ x PL x
when PL x = 0, T1 x = 0 = PL x. Hence T1 = PL . It follows that Gf = Id −aT1 = Id −aPL = (1 − a) Id +a(Id −PL ) = (1 − a) Id +aPL⊥ . We proceed to find the expression for f (x): f (x) = (x ∗ Bx)1/(2a) = (x ∗ λPL x)1/(2a) = λ
1/(2a)
∗
(x PL PL x)
1/(2a)
=λ
1/(2a)
(111) 2 1/(2a)
(112)
(PL x )
= λ1/(2a) (x − PL⊥ x2 )1/(2a) = λ1/(2a) (dL⊥ (x)2 )1/(2a) 1/a . = λ1/(2a) dL⊥ (x) “⇐”: Assume that B = λPL for λ > 0 and some subspace L ⊆ gives 1/a . f (x) = λ1/(2a) dL⊥ (x)
Rn .
(113) (114)
The assumption
H.H. Bauschke et al.
By Proposition 3.4(i), Gf = Gd
L⊥
Fact 2.8,
1/a .
By Theorem 3.9, Gf = (1 − a) Id +aGdL⊥ . By
Gf = (1 − a) Id +aPL⊥ . Hence Gf is linear. (iv). “⇒”: Assume that Gf is linear and a cutter. By Fact 9.1, Gf is firmly nonexpansive, so is Id −Gf . By (ii) Id −Gf = aPL , aPL has to be nonexpansive. Because B = 0, we have L = {0}. Take 0 = x ∈ L. The nonexpansiveness requires aPL x − aPL 0 = ax ≤ x so that a ≤ 1. “⇐”: Assume that 0 < a ≤ 1. Since x → (x ∗ Bx)1/2 is convex, and the function [0, +∞) t → t 1/a is convex and increasing when 0 < a ≤ 1, we have that x → f (x) = ∗ 1/a (x Bx)1/2 is convex. Then Gf is a cutter by Fact 5.12. We illustrate Theorem 9.4(iv) with the following example. Example 9.5 Let a > 1. Consider f : Rn → R : x → (x ∗ x)1/(2a) = x1/a . Then f is not convex, and Gf (x) = (1 − a)x for every x ∈ Rn . Although Gf is linear, it is not a cutter since it is not monotone; see, e.g., Proposition 9.1.
9.3 Symmetric and Linear Subgradient Projectors The following result completely characterizes symmetric and linear subgradient projectors. Theorem 9.6 Assume that T : Rn → Rn is linear and symmetric. Then the following are equivalent: (i) (ii)
T is a subgradient projector of a convex function f : Rn → R with lev0 f = ∅. T = Gf where f : Rn → R is given by 1/λ (115) f (x) = K(x ∗ PL x)1/(2λ) = K dL⊥ (x) where 0 < λ ≤ 1, K > 0, and L ⊆ Rn is a subspace such that L⊥ = Fix T . In this case, Gf = (1 − λ) Id +λPL⊥ .
Proof (i)⇒(ii). Assume that T = Gf for some convex function. Since T is linear and a cutter, T is firmly nonexpansive by Proposition 9.1. Then T1 = Id −T is firmly nonexpansive by [6, Proposition 4.2]. We consider two cases. Case 1. int lev0 f = ∅. We have T1 ≡ 0 on an open set B(x0 , ε) ⊆ lev0 f , i.e., T1 (x0 + b) = 0 for every b < ε. As T1 is linear, T1 (b) = T1 (x0 + b) − T1 (x0 ) = 0 − 0 = 0 when b < ε, so T1 ≡ 0 on Rn . Thus, T = Id on Rn . Then T = Gf with f ≡ 0. This means that (ii) holds with L = {0}, λ = 1 and K > 0.
Subgradient Projectors: Extensions, Theory, and Characterizations
Case 2. int lev0 f = ∅. Since lev0 f is a proper subspace, it is an intersection of a finite collection of hyper-planes [49, Corollary 1.4.1], so Rn \ lev0 f is union of a finite collection of open half spaces. As T1 is continuous, we only need to consider T1 (x) =
f (x) ∇f (x) ∇f (x)2
when f (x) > 0.
Then T1 (x) =
f (x) ∇f (x)
and
∇f (x) T1 (x) = . f (x) T1 (x)2
Since T is symmetric, T1 is symmetric, so there exists an orthogonal matrix Q such that Q∗ T1 Q = D where D is an diagonal matrix and Q∗ denotes the transpose of Q. Put g = ln f and x = Qy. When y ∈ Q∗ (Fix T ), we have (∇g)(Qy) =
T1 Qy . T1 Qy2
Multiplying both sides by Q∗ and using Q∗ being an isometry (i.e., Q∗ z = z for every z ∈ Rn ) give Q∗ (∇g)(Qy) =
Q∗ T1 Qy Dy Dy = = . T1 Qy2 Q∗ T1 Qy2 Dy2
If we put h = g ◦ Q, then ∇h(y) = Q∗ ∇g(Qy) for every y ∈ Rn \ (Q∗ Fix T ), so (∀y ∈ Rn \ (Q∗ Fix T )) ∇h(y) =
Dy . Dy2
Moreover, Rn \(Q∗ Fix T ) is a finite union of open half spaces, because Q∗ Fix T is a proper subspace of Rn . Write ⎛ ⎞ λ1 0 · · · 0 ⎜ 0 λ2 · · · 0 ⎟ ⎜ ⎟ D=⎜ . . . . ⎟. ⎝ .. .. . . .. ⎠ 0 0 · · · λn When λ1 = · · · = λn = 0, this is covered in Case 1. We thus assume that T1 ≡ 0. As T1 is monotone, we can and do assume that λ1 , · · · , λm > 0 and λm+1 = · · · = λn = 0. Then ( ' λm ym λ1 y1 , · · · , m , 0, · · · , 0 . ∇h(y) = m 2 2 2 2 k=1 λk yk k=1 λk yk Since h has continuous second order derivatives on the nonempty open Rn \ (Q∗ Fix T ), it must hold that ∂ 2h ∂ 2h = ∂yi ∂yj ∂yj ∂yi which gives −2λi λ2j yi yj −2λj λ2i yi yj = m m 2 2 2 2 k=1 λk yk k=1 λk yk
(116)
H.H. Bauschke et al.
when 1 ≤ i, j ≤ m, i = j . As int lev0 f = int Fix T = ∅, (116) holds on the nonempty open Rn \ (Q∗ Fix T ), so we have λi = λj . Because 1 ≤ i, j ≤ m were arbitrary, we obtain that λ1 = · · · = λm . Hence λ Idm 0 Idm 0 T1 = Q Q∗ = λQ Q∗ = λPL 0 0 0 0 where L ⊆ Rn is a linear subspace; see [38, page 430]. More precisely, T1 is a positive multiple of an orthogonal projector with Fix T = ker T1 = L⊥ . = T1 , this implies that (λ − λ2 ) Id 0 ∗ 2 − T 1 T1 = T 1 − T 1 = Q Q∗ 0
Now T1 is firmly nonexpansive and T1 + T1∗ 2
(117)
T1∗
is positive semidefinite, so 0 ≤ λ ≤ 1. Because T1 = 0 in this case, we obtain 0 < λ ≤ 1. Therefore, when x ∈ Fix T , ∇ ln f (x) =
T1 x λPL x 1 PL x = = . λ PL x2 T1 x2 λPL x2
Note that PL∗ = PL , PL2 = PL , ∇ ln PL x =
1 PL x
∇PL x =
1 PL x
PL∗
PL x PL x = . PL x PL x2
It follows that
1 ∇ ln PL x = ∇ ln PL x1/λ . λ On each connected and open component of Rn \ Fix T , this is equivalent to ∇ ln f (x) =
ln f (x) = ln PL x1/λ + c for some constant c ∈ R. Taking exp both sides gives f (x) = KPL x1/λ = K(PL x2 )1/(2λ) = K(x ∗ PL x)1/(2λ)
(118)
where K = exp(c) > 0. As PL = Id −PL⊥ , we obtain f (x) = Kx − P⊥ x1/λ = K(dL⊥ (x))1/λ . Moreover, T = Gf = Id −T1 = Id −λPL = Id −λ(Id −PL⊥ ) = (1 − λ) Id +λPL⊥ where L⊥
= Fix T by (117). One can apply the same argument on each connected and open component of Rn \ Fix T , while one might have different constant K’s in (118), but λ will be the same. Indeed, suppose that there exist 0 < λ, λ1 ≤ 1, λ = λ1 such that (1 − λ) Id +λPFix T = (1 − λ1 ) Id +λ1 PFix T . Then PFix T = Id so that Fix T = Rn , which contradicts that int Fix T = ∅. Using the same K > 0 for all connected and open component of Rn \ Fix T , one obtains (115). (ii)⇒(i). Clear. Theorem 9.6 is proved under the assumption that the linear subgradient projector of a convex function is symmetric. We think that the assumption of symmetry is superfluous; cf. Theorem 11.2.
Subgradient Projectors: Extensions, Theory, and Characterizations
Conjecture 9.7 If f : Rn → R is convex and its subgradient projector Gf,s is linear, then Gf,s must be symmetric. Note that when f is not convex, Gf,s can be nonsymmetric; see Corollary 11.6(ii).
9.4 Characterization of Linear Subgradient Projectors In Section 9.3, we assume that the linear operator is symmetric. What happens if the linear operator is not symmetric? For this purpose we need the following result. Proposition 9.8 Let M : Rn → Rn be linear, monotone and (∀x ∈ Rn \ ker M)
∇h(x) =
Mx Mx2
where the function h : Rn \ ker M → R. If dim ran M = 2, then M is symmetric. Proof If dim ran M = 0, then M = 0, so it is symmetric. Let us assume that dim ran M > 0 and dim ran M = 2. Since h has continuous mixed second order derivatives at x whenever Mx = 0, the Hessian matrix ∇ 2 h(x) is symmetric. As ∇ 2 h(x) =
Mx2 M − Mx(∇Mx2 )∗ Mx2 M − 2Mxx ∗ M ∗ M = , Mx4 Mx4
the symmetric property means that Mx2 M − 2Mxx ∗ M ∗ M = (Mx2 M − 2Mxx ∗ M ∗ M)∗ = M ∗ Mx2 − 2M ∗ Mxx ∗ M ∗ whenever Mx = 0. Put y = Mx. The above is simplified to ∗ yy ∗ M − M∗ ∗ yy = M − M . 2 y2 y2
Denote the projection operator on the line spanned by {y}, span(y), by Py := (∀y ∈ ran M)
yy ∗ . We have y2
M − M∗ = Py M − M ∗ Py . 2
(119) Since M is monotone, ran M = ran M ∗ ; see, e.g., [12, Theorem 3.2]. Let ei i = 1, . . . , m be an orthonormal basis of ran M. Then Pran M =
m )
ei ei∗ .
(120)
i=1
Note that M ∗ Pran M = M ∗ Pran M ∗ = M ∗ (Pran M ∗ + P(ran M ∗ )⊥ ) = M ∗ because
M ∗ P(ran M ∗ )⊥
(ran M ∗ )⊥ .
= 0. To see this, let y ∈ For every z ∈ ∗
M y, z = y, Mz = 0
(121)
Rn ,
because Mz ∈ ran M = ran M ∗ . Because z ∈ Rn was arbitrary, we must have M ∗ y = 0. Since M − M∗ = Pei M − M ∗ Pei (122) 2
H.H. Bauschke et al.
by (119), summing up (122) from from i = 1 to i = m, followed by using (120) and (121), we obtain ( ' m ( ' m ) ) m ∗ ∗ Pei M − M Pei = Pran M M − M ∗ Pran M = M − M ∗ , (M − M ) = 2 i=1
that is,
i=1
m 2
− 1 (M − M ∗ ) = 0.
Hence M − M ∗ = 0 because m = 2, and so M is symmetric. The proof of Proposition 9.8 requiring dim ran M = 2 seems bizarre. However, the following examples show that Proposition 9.8 fails when dim ran M = 2. Example 9.9 When dim ran M = 2, although M : R2 → R2 is linear, monotone and (∀x ∈ Rn \ ker M)
∇h(x) =
Mx , Mx2
one cannot guarantee that M is symmetric. To see this, let x = (x, y)∗ ∈ R2 . (1).
Define
M :=
0 −1 1 0
.
Then M is linear, monotone, dim ran M = 2 and −y x Mx ∇ arctan(y/x) = 2 = 2 x +y Mx2 (2).
whenever x = 0. However, M is not symmetric. Define 1/2 −1/2 . M := 1/2 1/2
Then M is linear, firmly nonexpansive and ' ∇
x−y y+x Mx ln(x 2 + y 2 ) = + arctan(y/x) = 2 2 x + y2 Mx2 (
whenever x = 0. However, dim ran M = 2 and M is not symmetric. Conjecture 9.10 Let M : Rn → Rn be linear, monotone and (∀x ∈ Rn \ ker M)
∇h(x) =
Mx Mx2
where the function h : Rn \ ker M → R. If dim ran M = 2 and exp(h) is convex on convex subsets of Rn \ ker M, then M is symmetric. Combining Theorem 9.6 and Proposition 9.8, we obtain the following characterization of linear subgradient projectors.
Subgradient Projectors: Extensions, Theory, and Characterizations
Theorem 9.11 Assume that T : Rn → Rn is linear and dim ran(Id −T ) = 2. Then the following are equivalent: (i) (ii)
T is a subgradient projector of a convex function f : Rn → R with lev0 f = ∅. T = Gf where f : Rn → R is given by 1/λ f (x) := K(x ∗ PL x)1/(2λ) = K dL⊥ (x) where 0 < λ ≤ 1, K > 0, and L ⊆ Rn is a subspace such that L⊥ = Fix T . In this case, Gf = (1 − λ) Id +λPL⊥ .
Proof (i)⇒(ii). Assume that T = Gf for some convex function f : Rn → R. Then T is a cutter by Fact 5.12. As T is linear, in view of Proposition 9.1, T is firmly nonexpansive, so M := Id −T is firmly nonexpansive, in particular, monotone. By Theorem 4.1(i), Mx (123) ∇h(x) = Mx2 where h(x) = ln f (x), f (x) > 0. Since Fix T = lev0 f = ker M, (123) is equivalent to Mx ∇h(x) = Mx2 when Mx = 0. Proposition 9.8 shows that M is symmetric, so is T = Id −M. It suffices to apply Theorem 9.6 to obtain (ii). (ii)⇒(i). Clear.
10 Subgradient Projectors of Convex Functions are Not Closed Under Convex Combinations and Compositions A convex combination of cutters is a cutter, see [20, Corollary 2.1.49] or [6, Proposition 4.34]. Convex combinations of a finite family of cutters with a common fixed point are effectively used in simultaneous cutter methods; see [20, Section 5.8], [6, Corollary 5.18]. A question that naturally arises is whether the set of subgradient projectors of convex functions is convex. Theorem 9.6 allows us to show that the answer is negative. While Theorem 10.1 works only in R2 , Theorem 10.3 works in Rn with n ≥ 2. Theorem 10.1 In R2 , a convex combination of subgradient projectors of convex functions need not be a subgradient projector of a convex function. Proof Let L := {0} × R ⊆ R2 and M := x = (x1 , x2 ) ∈ R2 x1 + x2 = 0 . Both L, M are proper linear subspaces of R2 . Define f, g : R2 → R by 1/λ1 1/λ2 (∀x ∈ R2 ) f (x) := K1 dL⊥ (x) , g(x) := K2 dM ⊥ (x) (124) where 0 < λ1 = λ2 < 1, K1 , K2 > 0. By Theorem 9.6, we have Gf = (1 − λ1 ) Id +λ1 PL⊥ ,
and Gg = (1 − λ2 ) Id +λ2 PM ⊥ .
(125)
Now consider λ3 Gf + (1 − λ3 )Gg where 0 < λ3 < 1. Then λ3 Gf + (1 − λ3 )Gg = λ3 (1 − λ1 ) Id +λ1 PL⊥ + (1 − λ3 ) (1 − λ2 ) Id +λ2 PM ⊥
(126)
= (1 − λ2 + λ2 λ3 − λ1 λ3 ) Id +λ1 λ3 PL⊥ + λ2 (1 − λ3 )PM ⊥ .
(128)
(127)
H.H. Bauschke et al.
We show that λ3 Gf + (1 − λ3 )Gg is not a subgradient projector of a convex function by contradiction. Suppose that λ3 Gf + (1 − λ3 )Gg is a subgradient projector of a convex function. By Theorem 9.6, there are 0 < λ < 1 and S which is a subspace of R2 such that λ3 Gf + (1 − λ3 )Gg = (1 − λ) Id +λPS ⊥ .
(129)
Therefore, we have (1 − λ2 + λ2 λ3 − λ1 λ3 ) Id +λ1 λ3 PL⊥ + λ2 (1 − λ3 )PM ⊥ = (1 − λ) Id +λPS ⊥ .
(130)
Naturally, the set of fixed points of left-hand side is equal to the set of fixed points of right-hand side. Thus we have Fix (1 − λ) Id +λPS ⊥ (131) = Fix (1 − λ2 + λ2 λ3 − λ1 λ3 ) Id +λ1 λ3 PL⊥ + λ2 (1 − λ3 )PM ⊥ . (132) By [6, Proposition 4.34], we have Fix (1 − λ2 + λ2 λ3 − λ1 λ3 ) Id +λ1 λ3 PL⊥ + λ2 (1 − λ3 )PM ⊥ = L⊥ ∩ M ⊥ . Also,
Fix (1 − λ) Id +λPS ⊥ = S ⊥ .
(133) (134)
Hence, using definitions of L, M, and (131)-(134), it follows that {(0, 0)} = L⊥ ∩ M ⊥ = Fix (1 − λ2 + λ2 λ3 − λ1 λ3 ) Id +λ1 λ3 PL⊥ + λ2 (1 − λ3 )PM ⊥ = Fix (1 − λ) Id +λPS ⊥ ⊥
= S . Therefore
(136) (137) (138)
S⊥
PL⊥
(135)
R2 .
= {(0, 0)}, which implies S = In terms of matrices, we have 1 0 1/2 −1/2 0 0 , PM ⊥ = , and PS ⊥ = . = 0 0 −1/2 1/2 0 0
(139)
In particular, PL⊥ , PS ⊥ are diagonal matrices, but PM ⊥ is not. Hence, (130) is not true. Therefore, λ3 Gf + (1 − λ3 )Gg is not a subgradient projector of a convex function. Our next result needs averaged mappings. Definition 10.2 (See [4, 6, Definition 4.23]) Let λ ∈ (0, 1). An operator T : Rn → Rn is λ-averaged if there exists a nonexpansive operator N : Rn → Rn such that T = (1 − λ) Id +λN . Theorem 10.3 Let n ≥ 2, 0 < λ1 < 1, 0 < λ2 < 1, 0 < λ < 1. Suppose that L, M are linear subspaces of Rn satisfying L = M ⊥ , M = L⊥ , and that both L and M are proper linear subspaces of Rn . Define f : Rn → R : x → (dL⊥ (x))1/λ1 , and g : Rn → R : x → (dM ⊥ (x))1/λ2 . If
1−λ λ
=
λ2 λ1 ,
then (1 − λ)Gf + λGg is not a subgradient projector of a convex function.
Proof By Theorem 9.6, we have Gf = (1 − λ1 ) Id +λ1 PL⊥ , and Gg = (1 − λ2 ) Id +λ2 PM ⊥ .
(140)
Subgradient Projectors: Extensions, Theory, and Characterizations
Then (1 − λ)Gf + λGg = (1 − λ) (1 − λ1 ) Id +λ1 PL⊥ + λ (1 − λ2 ) Id +λ2 PM ⊥
(141)
= [(1 − λ)(1 − λ1 ) + λ(1 − λ2 )] Id +λ1 (1 − λ)PL⊥ + λλ2 PM ⊥
(143)
= β Id +(1 − β)(γ PL⊥ + (1 − γ )PM ⊥ ),
(144)
where β := (1 − λ)(1 − λ1 ) + λ(1 − λ2 ) and γ := 0 < β < 1 and γ =
1 2.
λ1 (1−λ) 1−(1−λ)(1−λ1 )−λ(1−λ2 ) .
(142)
We observe that
Indeed,
0 < β = (1 − λ)(1 − λ1 ) + λ(1 − λ2 ) < (1 − λ) + λ = 1.
(145)
Also, γ =
1 2
(146)
λ1 (1 − λ) λλ2 = (147) 1 − (1 − λ)(1 − λ1 ) − λ(1 − λ2 ) 1 − (1 − λ)(1 − λ1 ) − λ(1 − λ2 ) (148) ⇐⇒ λ1 (1 − λ) = λλ2 1−λ λ2 . (149) ⇐⇒ = λ λ1 ⇐⇒
λ2 1 By the assumption: 1−λ λ = λ1 , so γ = 2 . Since Gf , Gg are linear and symmetric, so is (1 − λ)Gf + λGg . We show that (1 − λ)Gf + λGg is not a subgradient projector of a convex function by contradiction. If (1−λ)Gf +λGg is a subgradient projector of a convex function, by Theorem 9.6, we have
(1 − λ)Gf + λGg = (1 − α) Id +αPS ⊥
(150)
where 0 < α < 1 and S is a subspace of X. Note that Gf , Gg are averaged mappings, so is (1 − λ)Gf + λGg . Because Fix Gf = L⊥ , and Fix Gg = M ⊥ ,
(151)
by [6, Proposition 4.34], we obtain Fix((1 − λ)Gf + λGg ) = Fix Gf ∩ Fix Gg = L⊥ ∩ M ⊥ = {0}. Because Fix((1 − α) Id +αPS ⊥ ) = Therefore, in view of (150), we have
S⊥,
using (150) and (152) we obtain
(1 − λ)Gf + λGg = (1 − α) Id .
(152) S⊥
= {0}. (153)
Combining (144) and (153) gives β Id +(1 − β)(γ PL⊥ + (1 − γ )PM ⊥ ) = (1 − α) Id .
(154)
We proceed to analyze α, β. Take x ∈ M \ {0}, which is possible since M = {0}. Then PM ⊥ x = 0, PL⊥ x = PM x = x. (154) gives βx + (1 − β)(γ x) = (1 − α)x,
(155)
β + (1 − β)γ = 1 − α.
(156)
which implies Take x ∈ L \ {0}, which is possible since L = {0}. Then PL⊥ x = 0, PM ⊥ x = PL x = x. Equation (154) gives βx + (1 − β)(1 − γ )x = (1 − α)x, (157)
H.H. Bauschke et al.
which implies β + (1 − β)(1 − γ ) = 1 − α. Subtracting (156) from (158), we have (1 − β)(1 − 2γ ) = 0 which implies β = 1 or γ =
1 2.
(158) (159)
This contradicts the choices of λ, λ1 , λ2 .
If two nearest point projectors onto subspaces commute, then their composition is the projection onto the intersection of the subspaces; see [27, Lemma 9.2]. One referee asks whether there is an analogue when two linear subgradient projectors commute. The answer is negative. To this end, we need an auxiliary result. Lemma 10.4 Let L, M ⊆ Rn be two subspaces, and λi ∈ [0, 1) with i = 1, 2. Then the following are equivalent: (i) λ1 Id +(1−λ1)PL⊥ λ2 Id+(1−λ2)PM ⊥ = λ2 Id+(1−λ2)PM ⊥ λ1 Id+(1−λ1)PL⊥ . (ii) PL⊥ PM ⊥ = PM ⊥ PL⊥ . (iii) PL PM = PM PL . Proof (i)⇔(ii): This follows from λ1 Id +(1 − λ1 )PL⊥ λ2 Id +(1 − λ2 )PM ⊥ = λ1 λ2 Id +λ1 (1 − λ2 )PM ⊥ +(1−λ1 )λ2 PL⊥ +(1−λ1 )(1−λ2 )PL⊥ PM ⊥ , and
λ2 Id +(1 − λ2 )PM ⊥
λ1 Id +(1 − λ1 )PL⊥
(160) (161) (162)
= λ1 λ2 Id +λ2 (1 − λ1 )PL⊥ + (1 − λ2 )λ1 PM ⊥ + (1 − λ1 )(1 − λ2 )PM ⊥ PL⊥ . (163) (ii)⇔(iii): Since PL⊥ = Id −PL , PM ⊥ = Id −PM , (ii) is equivalent to (Id −PL )(Id −PM ) = (Id −PM )(Id −PL ), which is (iii) after simplifications. Theorem 10.5 In R2 , even though two linear subgradient projectors of convex functions commute, its composition need not be a subgradient projector of a convex function. Proof Let 0 < λ1 < λ2 < 1. Because 0 0 1 0 = P{0}×R , = PR×{0} , and 0 1 0 0 by Theorem 9.6, there exist two convex functions f, g : R2 → R such that 1 0 1 0 1 0 Gf = λ1 = + (1 − λ1 ) , and 0 0 0 λ1 0 1 λ2 0 0 0 1 0 = . + (1 − λ2 ) Gg = λ2 0 1 0 1 0 1 These two subgradient projectors are commutative by Lemma 10.4 or a direct calculation: λ2 0 G f Gg = G g Gf = . 0 λ1
Subgradient Projectors: Extensions, Theory, and Characterizations
We claim that T := Gf Gg is not a subgradient projector of a convex function. We prove this by contradiction. Suppose that T is a subgradient projector. Since T is symmetric and linear, by Theorem 9.6, there exists 0 ≤ λ ≤ 1 such that T = λ Id +(1 − λ)P
(164)
where P is a projector onto a subspace of R2 . We consider five cases. Case 1. λ = 0. This gives P = T . Because P is a projector, its eigenvalues are 0 or 1. This is impossible, since 0 < λi < 1 and λ1 = λ2 . Case 2. λ = 1. This gives T = Id. This is impossible, since λ1 = λ2 . Case 1 and Case 2 implies that 0 < λ < 1. This gives ( ' λ2 −λ 0 1−λ . P = 1 −λ 0 λ1−λ Case 3. λ > λ1 . Then λ1 − λ < 0. This is impossible, since the eigenvalues of P have to be nonnegative. Case 4. λ = λ1 . Since λ2 − λ > 0, 1−λ and P has eigenvalues only of 0 or 1, we have λ2 − λ = 1. 1−λ It follows that λ2 = 1, which is impossible. Case 5. 0 < λ < λ1 . Then λ1 − λ λ2 − λ > 0, > 0. 1−λ 1−λ Since P has eigenvalues only of 0 or 1, we must have λ2 − λ λ1 − λ = =1 1−λ 1−λ from which λ1 = λ2 . This is impossible. Altogether, (164) does not hold. Using Theorem 9.6 again, we conclude that T is not a subgradient projector of a convex function.
11 A Complete Analysis of Linear Subgradient Projectors on R2 In this section we turn our attention to linear operators on R2 . One nice feature is that we are able to not only characterize when the linear operator is a subgradient projector but also give explicit formulae for the corresponding functions. Is every linear mapping from R2 to R2 a subgradient projector of an essentially strictly differentiable function (convex or nonconvex) on R2 ? The answer is no by Theorem 11.2 below. Theorem 11.2(iii) also shows that Theorem 9.11 fails if dim ran(Id −T ) = 2 is removed. We start with a simple result about essentially strictly differentiable functions, see Definition 4.3.
H.H. Bauschke et al.
Lemma 11.1 Let O ⊆ Rn be a nonempty open set and f : O ⊆ Rn → R be an essentially strictly differentiable function. If there exists a continuous selection s : O → Rn with s(x) ∈ ∂f (x) for every x ∈ O, then f is strictly differentiable on O. Consequently, f is continuously differentiable on O. Proof By [15, Theorem 2.4, Corollary 4.2], f has a minimal Clarke subdifferential ∂c f , and ∂c f can be recovered by every dense selection of ∂c f . Since s(x) ∈ ∂f (x) ⊆ ∂c f (x), and s is continuous on O, we have ∂c f (x) = ∂f (x) = {s(x)} for every x ∈ O, which implies that f is strictly differentiable at x; see, e.g., [50, page 362, Theorem 9.18] or [39, Theorem 3.54]. Hence f is strictly differentiable on O. We consider the linear operator T : R2 → R2 defined by 1−a −b T := −c 1 − d
(165)
where a 2 + b2 + c2 + d 2 = 0
(i.e., (a, b, c, d) = (0, 0, 0, 0)).
(166)
Note that when a = b = c = d = 0, we have T = Id = Gf with f ≡ 0. Theorem 11.2 Let T be given by (165). Then T is a subgradient projector of an essentially strictly differentiable function on R2 \ Fix T if and only if one of the following holds: (i) (ii) (iii)
a = b = c = 0, d = 0: T = Gf where f (x1 , x2 ) := K|x2 |1/d for some K > 0; b = c = d = 0, a = 0: T = Gf where f (x1 , x2 ) := K|x1 |1/a for some K > 0. a = 0, d = 0, b = c = 0, ad = c2 : T = Gf where f (x1 , x2 ) := K|ax1 + 2 2 cx2 |a/(a +c ) for some K > 0. a = d, b = −c, and a 2 + c2 = 0: T = Gf where ⎧ a 2 2 2(a 2 +c2 ) exp − c ⎪ arctan xx12 if x2 = 0, ⎪ ⎨ K(x1 + x2 ) a 2 +c2 if (x1 , x2 ) = (0, 0), f (x1 , x2 ) := 0 a ⎪ ⎪ ⎩ K|x1 | (a2 +c2 ) exp − |c| π if x1 = 0, x2 = 0, a 2 +c2 2
(167) for some K > 0, and f is lsc. In particular, when c = 0, f is not convex. Proof Observe that (166) implies that Fix T is a proper subspace of R2 . Assume that T is a subgradient projector. By Theorem 4.1 and Lemma 11.1, we can find a differentiable function g : (R2 \ Fix T ) → R such that x −Tx for every x ∈ R2 \ Fix T , = ∇g(x). x − T x2 Because ax1 + bx2 x1 a b = , x −Tx = x2 cx1 + dx2 c d we have ax1 + bx2 ∂g = , ∂x1 (ax1 + bx2 )2 + (cx1 + dx2 )2 cx1 + dx2 ∂g = . ∂x2 (ax1 + bx2 )2 + (cx1 + dx2 )2
Subgradient Projectors: Extensions, Theory, and Characterizations
Since (a 2 b − bc2 + 2acd)x12 + (b3 + bd 2 )x22 + 2(ab2 + ad 2 )x1 x2 ∂2 g(x1 , x2 ) = − , ∂x1 ∂x2 ((ax1 + bx2 )2 + (cx1 + dx2 )2 )2 (a 2 c + c3 )x12 + (cd 2 − b2 c + 2abd)x22 + 2(c2 d + a 2 d)x1 x2 ∂2 g(x1 , x2 ) = − , ∂x2 ∂x1 ((ax1 + bx2 )2 + (cx1 + dx2 )2 )2 on the nonempty open set of R2 \ Fix T , we have ∂2 ∂2 g(x1 , x2 ) = g(x1 , x2 ). ∂x1 ∂x2 ∂x2 ∂x1 This leads to
⎧ 2 2 2 3 ⎪ (1) ⎨ a b − bc + 2acd = a c + c , 3 2 2 b + bd = cd − b2 c + 2abd, (2) ⎪ ⎩ ab2 + ad 2 = c2 d + a 2 d. (3)
Now multiplying (2) by a, followed by subtracting it with (3) multiplied by b, gives (ad − bc)(ab + cd) = 0. It suffices to consider two cases: •
Case ad = bc. (1) implies (b − c)(a 2 + c2 ) = 0. Observe that b − c = 0 is impossible by (2). Then the following two subcases could happen. i.
b = c = 0. Then (3) ⇒ ad(a − d) = 0. This means a = b = c = 0, d = 0,
(168)
or b = c = d = 0, a = 0. ii. •
b = c = 0, which implies a = 0, d = 0, and ad =
(169) c2 .
Case ab + cd = 0. (1) implies (b + c)(a 2 + c2 ) = 0. When b = −c = 0, (3) gives ad(a − d) = 0, which leads to (168), (169), or a = d = 0. It remains to consider the case b = −c = 0. Then (2) and (3) imply a = d. Moreover, we can and do assume a 2 + c2 = 0 since a = c = 0 gives (168) by (2).
In summary, we only have the following three cases. Case 1.
a = b = c = 0, d = 0. Then we get ln |x2 | + C1 , if x2 = 0. d Or b = c = d = 0, a = 0. Then we get g(x1 , x2 ) =
g(x1 , x2 ) =
ln |x1 | + C1 , if x1 = 0. a
Case 2.
a = 0, d = 0, b = c = 0, ad = c2 . Then we get a ln |ax1 + cx2 | + C2 , if ax1 + cx2 = 0. g(x1 , x2 ) = 2 a + c2
Case 3.
a = d, b = −c, and a 2 + c2 = 0. Then we get x1 a c 2 2 + x arctan + C3 , if x2 = 0. ln x − g(x1 , x2 ) = 1 2 x2 2(a 2 + c2 ) a 2 + c2
H.H. Bauschke et al.
Since g = ln f , we obtain f = exp(g) by using Case 1-Case 2. For Case 3, we obtain a x1 c f (x1 , x2 ) = K(x12 + x22 ) 2(a2 +c2 ) exp − 2 if x2 = 0 arctan x2 a + c2 for some K > 0. However, when c = 0, f is not continuous at (x¯1 , 0) with x¯1 = 0 since lim
x1 →x¯1 ,x2 ↓0
arctan
x1 x1 π = lim arctan =± . x1 →x¯1 ,x2 ↑0 x2 x2 2
The function given by (167) is lsc but not continuous at every (x¯1 , 0). Moreover, f is not convex on R2 since a finite-valued convex function on a finite dimensional space is continuous; see, e.g., [6, Corollary 8.31]. 2 2 It is interesting to ask for what selection s ∈ ∂f , we have Gf = T on R . On R \ (x1 , x2 ) x2 = 0 , one clearly chooses s = ∇f . It remains to determine the subgradient of f at (x¯1 , 0). Indeed, when x2 = 0, f (x1 , x2 ) = exp(g(x1 , x2 )), so that ∇f (x1 , x2 ) = f (x1 , x2 )∇g(x1 , x2 ), i.e. a (ax1 −cx2 , ax2 +cx1) x1 c 1 2 2 2(a 2 +c2 ) exp − 2 arctan . ∇f(x1 , x2 ) = K(x1 +x2 ) x2 a 2 + c 2 a + c2 x12 + x22 When (x1 , x2 ) → (x¯1 , 0), cx1 /x2 > 0, we have a f (x1 , x2 ) → K|x¯1 | (a2 +c2 ) exp − and
|c| π a 2 + c2 2
= f (x¯1 , 0),
1 a c |c| π . , a 2 + c2 2 a 2 + c2 x¯1 x¯1 Therefore, by the definition of limiting subdifferentials (see Definition 2.1), a 1 |c| π a c , ∈ ∂f (x¯1 , 0). K|x¯1 | (a2 +c2 ) exp − 2 a + c2 2 a 2 + c2 x¯1 x¯1 a ∇f (x1 , x2 ) → K|x¯1 | (a2 +c2 ) exp −
(170)
Hence, we can choose s(x¯1 , 0) to be the limiting subgradient given by (170). Remark 11.3 Note that ∂f (x¯1 , 0) is not a singleton when x¯1 = 0 and c = 0 in Theorem 11.2(iii). Thus, in Theorem 11.2(iii), we only have T ∈ Gf when c = 0. In order to make f continuous on Rn , we need c = 0, in which case (167) reduces to f (x1 , x2 ) = K(x1 , x2 )1/a , and Gf = (1 − a) Id. Clearly, f is not convex when a > 1. This has been discussed in Example 2.5. Corollary 11.4 Let T be given by (165). Suppose that one of the following holds: (i) (ii)
|b| = |c|. b = c = 0, a = 0, d = 0, and a = d.
Then there exists no f : R2 → R being essentially strictly differentiable such that T = Gf . Corollary 11.5 Let T be given by (165). Suppose that b = c = 0, 0 < a < 1, 0 < d < 1, and a = d. Then T is firmly nonexpansive, and there exists no f : R2 → R being essentially strictly differentiable such that T = Gf .
Subgradient Projectors: Extensions, Theory, and Characterizations
Corollary 11.6 (i) The skew linear mapping 0 −1 T := 1 0 is not firmly nonexpansive, so not a cutter. However, T is nonconvex, discontinuous but lsc function f1 given by ⎧ 2 ⎨ (x + y 2 )1/4 exp (−(1/2) arctan(x/y)) f1 (x, y) := 0 ⎩ 1/2 |x| exp(−π/4) (ii) The linear mapping
T :=
1/2 1/2 −1/2 1/2
a subgradient projector of a if y = 0, if (x, y) = (0, 0), if x = 0, y = 0.
(171)
is firmly nonexpansive and a cutter. However, T is a subgradient projector of a nonconvex, discontinuous but lsc function f2 given by ⎧ 2 ⎨ (x + y 2 )1/2 exp (− arctan(x/y)) if y = 0, f2 (x, y) := 0 (172) if (x, y) = (0, 0), ⎩ |x| exp(−π/2) if x = 0, y = 0. Note that f2 = f12 in Corollary 11.6 and Gf2 = (Id +Gf1 )/2. See Figs. 1 and 2 for plots of f1 and f2 . Remark 11.7 Corollary 11.5 and Corollary 11.6 together show that although the set of cutters and the set of subgradient projectors have a nonempty intersection, they are different because neither one contains the other. By Theorem 4.5, there exists no continuous convex function f such that Gf = T in either case of Corollary 11.6. Corollary 11.6 says that T = Gf being linear and firmly nonexpansive does not imply that f is convex. A key point below is that if T = Gf is linear and f is convex on R2 , then Theorem 11.2 implies that T has to be firmly nonexpansive and symmetric. Fig. 1 Plot of function given by (171)
H.H. Bauschke et al. Fig. 2 Plot of function given by (172)
Corollary 11.8 Let T be given by (165). Then T is a subgradient projector of a convex function if and only if one of the following holds: (i)
(ii) (iii)
a = b = c = 0, d = 0, 0 < d ≤ 1: T = Gf where f (x1 , x2 ) = K|x2 |1/d for some K > 0; or b = c = d = 0, a = 0, 0 < a ≤ 1: T = Gf where f (x1 , x2 ) = K|x1 |1/a for some K > 0. a = 0, d = 0, b = c = 0, ad = c2 , a ≥ a 2 + c2 : T = Gf where f (x1 , x2 ) = 2 2 K|ax1 + cx2 |a/(a +c ) for some K > 0. 1 a = d, b = c = 0, 0 < a ≤ 1: T = Gf where f (x1 , x2 ) = K(x12 + x22 ) 2a for some K > 0.
Acknowledgments The authors thank two anonymous referees for careful reading and constructive suggestions on the paper. HHB was partially supported by a Discovery Grant of the Natural Sciences and Engineering Research Council of Canada (NSERC) and by the Canada Research Chair Program. CW was partially supported by National Natural Science Foundation of China (11401372). XW was partially supported by a Discovery Grant of the Natural Sciences and Engineering Research Council of Canada (NSERC). JX was supported by by NSERC grants of HHB and XW.
References 1. Adly, S., Nacry, F., Thibault, L.: Preservation of prox-regularity of sets with applications to constrained optimization. SIAM J. Optim. 26(1), 448–473 (2016) 2. Aussel, D., Daniilidis, A., Thibault, L.: Subsmooth sets: functional characterizations and related concepts. Trans. Amer. Math. Soc. 357(4), 1275–1301 (2005) 3. Bac´ak, M., Borwein, J.M., Eberhard, A., Mordukhovich, B.S.: Infimal convolutions and Lipschitzian properties of subdifferentials for prox-regular functions in Hilbert spaces. J. Convex Anal. 17(3-4), 737– 763 (2010) 4. Baillon, J.B., Bruck, R.R., Reich, S.: On the asymptotic behavior of nonexpansive mappings and semigroups in Banach spaces. Houston J. Math. 4, 1–9 (1978) 5. Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003) 6. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer (2011) 7. Bauschke, H.H., Combettes, P.L.: A weak-to-strong convergence principle for Fejer-monotone methods in Hilbert spaces. Math. Oper. Res. 26, 248–264 (2001)
Subgradient Projectors: Extensions, Theory, and Characterizations 8. Bauschke, H.H., Combettes, P.L., Noll, D.: Joint minimization with alternating Bregman proximity operators. Pac. J. Optim. 2(3), 401–424 (2006) 9. Bauschke, H.H., Chen, J., Wang, X.: A Bregman projection method for approximating fixed points of quasi-Bregman nonexpansive mappings. Appl. Anal. 94, 75–84 (2015) 10. Bauschke, H.H., Wang, C., Wang, X., Xu, J.: On subgradient projectors. SIAM J. Optim. 25(2), 1064– 1082 (2015) 11. Bauschke, H.H., Wang, C., Wang, X., Xu, J.: On the finite convergence of a projected cutter method. J. Optim. Theory Appl. 165(3), 901–916 (2015) 12. Bauschke, H.H., Wang, X., Yao, L.: Monotone linear relations: maximality and Fitzpatrick functions. J. Convex Anal. 16, 673–686 (2009) 13. Bernard, F., Thibault, L.: Prox-regular functions in Hilbert spaces. J. Math. Anal. Appl. 303(1), 1–14 (2005) 14. Bernard, F., Thibault, L.: Prox-regularity of functions and sets in Banach spaces. Set-Valued Anal. 12(1-2), 25–47 (2004) 15. Borwein, J.M., Moors, W.B.: Essentially smooth Lipschitz functions. J. Funct. Anal. 149(2), 305–351 (1997) 16. Borwein, J.M., Moors, W.B., Wang, X.: Generalized subdifferentials: a Baire categorical approach. Trans. Amer. Math. Soc. 353(10), 3875–3893 (2001) 17. Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples, 2nd edn. Springer, New York (2006) 18. Borwein, J.M., Zhu, Q.J.: Techniques of Variational Analysis. Springer-Verlag, New York (2005) 19. Capricelli, T.D.: Algorithmes de Projections Convexes G´en´eralis´ees et Applications en Imagerie M´edicale, Ph.D. dissertation. University Pierre & Marie Curie (Paris 6), Paris (2008) 20. Cegielski, A.: Iterative Methods for Fixed Point Problems in Hilbert Spaces, Lecture Notes in Mathematics, 2057. Springer, Heidelberg (2012) 21. Censor, Y., Chen, W., Pajoohesh, H.: Finite convergence of a subgradient projections method with expanding controls. Appl. Math. Optim. 64(2), 273–285 (2011) 22. Censor, Y., Lent, A.: Cyclic subgradient projections. Math. Programming 24(1), 233–235 (1982) 23. Clarke, F.H.: Optimization and Nonsmooth Analysis, Second Edition Classics in Applied Mathematics, 5. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (1990) 24. Combettes, P.L.: Convex set theoretic image recovery by extrapolated iterations of parallel subgradient projections. IEEE Trans. Image Process. 6(4), 493–506 (1997) 25. Combettes, P.L., Luo, J.: An adaptive level set method for nondifferentiable constrained image recovery. IEEE Trans. Image Process. 11(11), 1295–1304 (2002) 26. Daniilidis, A., Georgiev, P.: Approximate convexity and submonotonicity. J. Math. Anal. Appl. 291(1), 292–301 (2004) 27. Deutsch, F.: Best Approximation in Inner Product Spaces. Springer (2001) 28. Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions, Studies in Advanced Mathematics. CRC Press, Boca Raton FL (1992) 29. Fukushima, M.: A finitely convergent algorithm for convex inequalities. IEEE Trans. Automat. Control 27(5), 1126–1127 (1982) 30. Hesse, R., Luke, D.R.: Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems. SIAM J. Optim. 23(4), 2397–2419 (2013) 31. Ioffe, A.D.: Approximate subdifferentials and applications, I: the finite-dimensional theory. Trans. Amer. Math. Soc. 281(1), 389–416 (1984) 32. Jourani, A., Thibault, L., Zagrodny, D.: Differential properties of the Moreau envelope. J. Funct. Anal. 266(3), 1185–1237 (2014) 33. Kan, C., Song, W.: The Moreau envelope function and proximal mapping in the sense of the Bregman distance. Nonlinear Anal. 75(3), 1385–1399 (2012) 34. Kiwiel, K.C.: The efficiency of subgradient projection methods for convex optimization. I. General level methods. SIAM J. Control Optim. 34(2), 660–676 (1996) 35. Kiwiel, K.C.: The efficiency of subgradient projection methods for convex optimization. II. Implementations and extensions. SIAM J. Control Optim. 34(2), 677–697 (1996) 36. Kiwiel, K.C.: A Bregman-projected subgradient method for convex constrained nondifferentiable minimization. In: Operations Research Proceedings 1996 (Braunschweig), pp. 26–30. Springer, Berlin (1997) 37. Luke, D.R., Thao, N.H., Tam, M.K.: Quantitative convergence analysis of iterated expansive, set-valued mappings, arXiv:1605.05725 (2016) 38. Meyer, C.: Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2000)
H.H. Bauschke et al. 39. Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation I. Springer-Verlag (2006) 40. Mordukhovich, B.S., Shao, Y.: Nonsmooth sequential analysis in Asplund spaces. Trans. Amer. Math. Soc. 348, 1235–1280 (1996) 41. Ogura, N., Yamada, I.: A deep outer approximating half space of the level set of certain quadratic functions. J. Nonlinear Convex Anal. 6(1), 187–201 (2005) 42. Pang, C.H.: Finitely convergent algorithm for nonconvex inequality problems, arXiv:1405.7280 (2014) 43. Pauwels, B.: Subgradient projection operators, arXiv:1403.7237v1 (2014) 44. Poliquin, R.A., Rockafellar, R.T.: Prox-regular functions in variational analysis. Trans. Amer. Math. Soc. 348(5), 1805–1838 (1996) 45. Polyak, B.T.: Minimization of unsmooth functionals. USSR Comput. Math. Math. Phys. 9, 14–29 ˇ (1969). (The original version appeared in Akademija Nauk SSSR. Zurnal Vyˇcislitel’ no˘ı Matematiki i Matematiˇcesko˘ı Fiziki 9 (1969), 509–521.) 46. Polyak, B.T.: Introduction to Optimization. Optimization Software (1987) 47. Polyak, B.T.: Random algorithms for solving convex inequalities. In: Butnariu, D., Censor, Y., Reich, S. (eds.) Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications, pp. 409– 422. Elsevier (2001) 48. Richardson, L.F.: Measure and Integration: A Concise Introduction to Real Analysis. Wiley (2009) 49. Rockafellar, R.T.: Convex Analysis. Princeton University Press (1970) 50. Rockafellar, R.T., Wets, R.: Variational Analysis. Springer corrected 3rd printing (2009) 51. Spingarn, J.E.: Submonotone subdifferentials of Lipschitz functions. Trans. Amer. Math. Soc. 264(1), 77–89 (1981) 52. Van Ngai, H., Luc, D.T., Th´era, M.: Approximate convex functions. J. Nonlinear Convex Anal. 1(2), 155–176 (2000) 53. Wang, X.: Subdifferentiability of real functions. Real Anal. Exchange 30(1), 137–171 (2004/05) 54. Xu, J.: Subgradient Projectors: Theory, Extensions, and Algorithms. Ph.D. dissertation, University of British Columbia, Kelowna (2016) 55. Yamada, I., Ogura, N.: Adaptive projected subgradient method for asymptotic minimization of sequence of nonnegative convex functions. Numer. Funct. Anal. Optim. 25(7-8), 593–617 (2004)