Multidim Syst Sign Process https://doi.org/10.1007/s11045-018-0554-8
Sparsity and incoherence in orthogonal matching pursuit Yi Shen1 · Ruifang Hu2
Received: 27 August 2017 / Revised: 9 January 2018 / Accepted: 29 January 2018 © Springer Science+Business Media, LLC, part of Springer Nature 2018
Abstract Recovery of sparse signals via approximation methods has been extensively studied in recently years. We consider the nonuniform recovery of orthogonal matching pursuit (OMP) from fewer noisy random measurements. Rows of sensing matrices are assumed to be drawn independently from a probability distribution obeying the isotropy property and the incoherence property. Our models not only include the standard sensing matrices in compressed sensing context, but also cover other new sensing matrices such as random convolutions, subsampled tight or continuous frames. Given m admissible random measurements of a fixed s-sparse signal x ∈ Rn , we show that OMP can recover the support of x exactly after s iterations with overwhelming probability provided that m = O(s(s + log(n − s))). It follows that the approximation order of OMP is x − x j = O(η j ) where 0 < η < 1 and x j denotes the recovered signal at j-th iteration. As a byproduct of the proof, the necessary number of measurements to ensure sparse recovery by l1 -minimization with random partial circulant or Toeplitz matrices is proved to be optimal. Keywords Sparsity · Orthogonal matching pursuit · Isotropy property · Incoherence property · Support recovery
B
Ruifang Hu
[email protected] Yi Shen
[email protected]
1
Department of Mathematics, Zhejiang Sci–Tech University, Hangzhou 310028, China
2
Nanhu College, Jiaxing University, Jiaxing 314001, China
123
Multidim Syst Sign Process
1 Introduction An n dimensional signal x which has at most s nonzero entries at completely unknown locations is called s-sparse. General theory of compressed sensing showed that sparse signals can be recovered from far less information (Candès and Tao 2005; Donoho 2006). The proposed linear model is formulated as y = Ax
(1.1)
where A is a given m × n matrix and y is a given vector. The goal is to recover x from observations y and the sensing matrix A. If the number of measurements m is less than n, then the linear system (1.1) is underdetermined. Instead of using the least square directly, we solve the linear system via Orthogonal Matching Pursuit (OMP). The major advantages of OMP are its ease of implementation and its speed. The center problem considered in this paper is how many measurements are necessary to reconstruct sparse signals via OMP? A further problem is what is the approximation order of OMP if the true index can be selected at each iteration? This paper answers the two questions with the assumption that sensing matrices are generated from a probability distribution. The accurate recovery of the support of a signal is crucial in compressed sensing. If we have obtained the support of the signal, then we can recover the signal by solving a least squares problem. Theoretical analysis of OMP for sparse support recovery has concentrated primarily on two types. One is uniform recovery, i.e., the support of every sparse signal can be recovered under some conditions. Uniform recovery through restricted isometry property or mutual coherence property can be found in Cai and Wang (2011), Shen and Li (2015), Wen et al. (2017), Tropp (2004), Tropp and Gilbert (2007), Mo (2015), Mo and Shen (2012), Xu (2012), Zhang (2011) and many references therein. The mutual coherence based conditions are considered in Kunis and Rauhut (2008) and Tropp and Gilbert (2007). A matrix A is said to satisfy the restricted isometry property (RIP) of order s with the restricted isometry constant δs (Candès and Tao 2006) if δs is the smallest constant such that (1 − δs )x2 Ax2 (1 + δs )x2
(1.2)
holds for all s-sparse signal x. It is an NP-hard problem to verify a given matrix whether satisfies the RIP with some restricted isometry constant or not. Many types of random measurement matrices are proved to satisfy the restricted isometry property with high probability. For example, Gaussian random matrices or Subgaussian matrices have the RIP constant δs with overwhelming probability provided m = O(s log n/s) and partial random Fourier matrices have the RIP constant δs with overwhelming probability provided m = O(s log4 n/s) (Candès and Tao 2006). The sharp conditions on RIP to guarantee the exact recovery of sparse signals by using l1 minimization have been given in Cai and Zhang (2014) and Zhang 1 (2011). If the restricted isometry constant δs+1 < √s+1 , then OMP is guaranteed to exactly recover s-sparse signals x in s iterations. It is a sharp condition (Mo 2015). In Zhang (2011), the author shows that OMP can recovers s-sparse signal, whenever the measurement matrix satisfies the RIP of order cs for some constant c, and that OMP is also stable in l2 under measurement noise. This work is extended to the general Hilbert space in Cohen et al. (2017). An alternative way is considering nonuniform recovery, i.e, sufficient conditions for a given signal. The nonuniform support recovery via OMP with Gaussian random measurement was first studied in Tropp (2004). Then Lin and Li (2013) extended the results in Tropp (2004) to the noisy case. For the random partial Fourier transform, Kunis and Rauhut (2008) have proved that the first iteration of OMP can choose a correct index provided O(s(log n))
123
Multidim Syst Sign Process
measurements. Following this line, Lin and Li (2013) proved that under a condition on the minimum magnitude of nonzero coordinate, OMP can find all the correct nonzero entries of the given s-sparse signal with overwhelming probability provided that m = O(s(s + log(n − s))).
1.1 A general setting In this paper, sensing vectors are assumed to be generated independently at random from a probability distribution. This model includes all standard sensing matrices such as random convolution matrices, random partial discrete Fourier transform, random partial noiselets transform and so on. All we require from the probability distribution is that the measurement matrices obey the isotropy property and the incoherence property. We use the notation ak ∼ F, k = 1, . . . , m, if vectors ak are independently sampled from a population F. For the given sparse vector x, the observation vector y˜ = ( y˜1 , y˜2 , · · · , y˜m ) is created from a series of inner products y˜k = ak , x, k = 1, . . . , m. (1.3) √ We divide both sides of (1.3) by m, and rewrite the statistical model as (1.1) where y = √1m y˜ and ⎛ ⎞ a1 ⎟ 1 ⎜ ⎜ a2 ⎟ (1.4) A= √ ⎜ . ⎟ m ⎝ .. ⎠ am The normalization (1.4) implies that columns of A are approximately unit-normed. The rows of A are said to be normalized independent samples from F. Our work is motived by the analysis of nonuniform recovery via convex minimization program studied in Candès and Romberg (2007) and Candès and Tao (2005). The isotropy property and the incoherence property defined in Candès and Plan (2011) are given below. • Isotropy property A matrix F is said to obey the isotropy property if Eaa∗ = I, a ∼ F. • Incoherence property The coherence parameter μ(F) denotes the smallest number such that a = (a1 , a2 , . . . , an ) ∼ F. max |at |2 μ(F)
1tn
holds either deterministically or stochastically. The smaller μ(F), i.e., the more incoherent the sensing vectors, the fewer samples are needed for accurate recovery. The incoherence and isotropy properties together guarantee that sparse vectors lie away from the nullspace of the sensing matrix. The role of the isotropy condition is to keep the measurement matrix from being rank deficient when sufficiently many measurements are taken. If the matrix F is the Hadamard matrix or the discrete Fourier transform matrix, then μ(F) = 1. Since digital images are usually sparse in the wavelet domain, we must design a measurement system which is incoherent with the wavelet representation. In this case, the
123
Multidim Syst Sign Process
matrix F is decomposed as a product of a sparsity basis and an orthogonal measurement system A. It follows that parameter μ(F) is given by μ(F) = max i, j
|φi , ψ j | , φi ψ j
(1.5)
where φi∗ are rows of A and ψ ∗j are rows of . We can see that the parameter μ(F) defined in (1.5) is equal to the mutual coherence constant which measures the correlation between two bases (Donoho 2006). The cluster coherence modifies the definition of mutual coherence to take into account clustering of the coefficients arising from the geometry of the situation (Donoho and Kutyniok 2013; King et al. 2011, 2012). The inpainting problem considered in King et al. (2011, 2012) can be formulated a minimization problem where partial wavelet transform or partial shearlet transform are used. Then the cluster coherence played a key role in the estimation of asymptotic error bounds also makes the superior behavior of directional representation systems over wavelets precise. Another typical example is that is an orthonormal system of Haar wavelets, and A is the orthogonal noiselet system with the normalization A∗ A = n I . Noiselets introduced in Coifman et al. (2001) are perfectly incoherent with respect to the Haar wavelet orthogonal bases, i.e, μ(A) = 1. Noiselets have two additional advantages which make them ideal for sparse coding and decoding. One is multiscale structure, the other is the binary structure which makes noiselets easy implementation in an actual acquisition system easier. Measurement systems which have low coherence with bases corresponding to given biorthogonal bases are constructed in Pereira et al. (2014). Many other incoherent sensing matrices including random convolutions, subsampled tight or continuous frames can be found in Candès and Plan (2011).
1.2 Random convolutions In compressed sensing, various measurement matrices have been investigated, most of them are random matrices. Among these are Bernoulli and Gaussian matrices as well as partial Fourier matrices. In most cases, the use of an i.i.d. Gaussian random matrix A is either impossible or overly expensive. Compared to Bernoulli or Gaussian matrices, random Toeplitz and circulant matrices have the advantage that they require a reduced number of random numbers to be generated. More importantly, there are fast matrix-vector multiplication routines which can be exploited in recovery algorithms. Using random convolution matrices in compressed sensing were first studied in Bajwa et al. (2007). The work in Haupt et al. (2010) improves the sufficient condition in Bajwa et al. (2007) and proves that some random convolution matrices satisfies the RIP condition with high probability provided m = O(s 2 log n).
(1.6)
Therefore, the work in Bajwa et al. (2007) and Haupt et al. (2010) provides uniformly recovery guarantees for l1 -minimization when the number of measurements satisfies (1.6). Note that the estimated number of measurements m grows with the sparsity squared in (1.6). Instead, Rauhut (2009) studies the non-uniformly recovery of convex optimization and reduces the number of measurements to m = O(s log3 n). (1.7) While there is still a log-factor in (1.7). This motivates us to investigate the structure of random convolution matrices in the later section. Our comparative result in Sect. 5 provides
123
Multidim Syst Sign Process
an optimal requirements of the number of measurements and it closes the gap discussed in Rauhut (2009) completely. The non-uniform recovery with random convolution matrices was also considered in Romberg (2008).
1.3 Our contributions We consider the nonuniform recovery via OMP from fewer noisy random measurements. Rows of sensing matrices are assumed to be normalized independent samples from a probability distribution F obeying the isotropy property and the incoherence property. The main contribution of the work presented in this paper is to provide general models which not only conclude all the standard compressive sensing, but also cover other new sensing matrices such as random convolutions, subsampled tight or continuous frames. The results show that under a condition on the minimum magnitude of the nonzero components of the signal, OMP can recover the support of signal exactly after s iterations with overwhelming probability with the minimal number of measurements. We then analyze the approximation property of OMP in the noiseless case. We also investigate the use of partial random circulant matrices and partial Toeplitz matrices and their connection with recovery by l1 -minimization. The necessary number of measurements is proved to be optimal.
1.4 Organization The rest of the paper has the following structure. In Sect. 2, we introduce some notations and some useful lemmas. In Sect. 3, we focus on nonuniform support recovery of OMP from noisy measurements. In Sect. 4, with the assumption that OMP can find the true support at each iteration, we characterize the approximation rates of OMP. In Sect. 5, we first prove that some random Toeplitz observation matrices and the random circulant matrices satisfy the isotropy property and the incoherence property, then we investigate the use of partial random circulant matrices and partial Toeplitz matrices and their connection with l1 -minimization.
2 Preliminaries Let us state several pieces of notation that are carried throughout the paper. The support of a given vector x = (x1 , · · · , xn )∗ is defined by supp(x) := {i : xi = 0}. As usual, n n 2 x1 = i=1 |xi | denotes the l1 -norm, x = i=1 x i denotes the l2 -norm and x∞ = maxi |xi | denotes the l∞ -norm. The operator norm of a matrix A is denoted by A. Let ei denote the i-th coordinate unit vector in Rn . Let |T | denote the cardinality of the set T . For an m × n matrix A and a subset T ⊂ {1, 2, ..., n}, let A T denote the submatrix of A that contains only the columns indexed by T and let A j denote the j-th column of A. Let T c denote the complement of T in {1, 2, ..., n}. Likewise, xT is the restriction of the given vector x ∈ Rn to indices in the index set T . Let A∗ denote the conjugate transpose of a matrix A. The smallest and largest eigenvalues of A∗ A are denoted by λmin (A∗ A) and λmax (A∗ A), respectively. Throughout, C denotes a positive constant whose value may change from instance to instance. The framework of OMP is showed in Algorithm 1. In Step 4 of OMP, one can compute that x j = (A∗ j A j )−1 A∗ j y and r j = (I − P j ) y,
where P j = A j (A∗ j A j )−1 A∗ j denotes the projection onto the linear spaces spanned by the columns of A j . Therefore, no column is selected twice and the index set of selected columns grows at each iteration. There are many stopping rules for OMP, we use r b as
123
Multidim Syst Sign Process
Algorithm 1 Orthogonal Matching Pursuit (OMP) in Pati et al. (1993) Input: A, y 1: 0 = ∅, r0 = y, j = 1. 2: while not converge do 3: j = j−1 ∪ arg maxi |Ai , r j−1 | 4: x j = arg minw A j w − y 5: r j = y − A j x j 6: j = j + 1 7: end while 8: xˆ j = x j , xˆ C = 0 ˆ Output: x.
j
the stopping rule for l2 bounded noise and it is reasonable in the noiseless case. The following lemmas will be used in Sect. 3. Lemma 2.1 (Cai and Wang 2011, Lemma 5) Let T ⊂ {1, . . . , n}, t ⊂ T and u t = T \t . Denote Pt = At (A∗t At )−1 A∗t . Then the minimum eigenvalue of A∗T A T is less than or equal to the minimum eigenvalue of A∗u t (I − Pt )Au t . The maximum eigenvalue of A∗T A T is greater than or equal to the maximum eigenvalue of A∗u t (I − Pt )Au t . Lemma 2.2 (Lin and Li 2013, Lemma 2.2) Let y = Ax + z and T = supp(x) with |T | = s. Assume that at the first t iterations (t < s), OMP selects t correct indices, that is, t ⊂ T . If for some ρ ∈ (0, 1), A∗T c (I − Pt )A T xT ∞ < ρA∗T (I − Pt )A T xT ∞ and 1
xT \t
2(s − t) 2 A∗ (I − Pt )z∞ , λmin (A∗T A T )(1 − ρ)
then A∗T c rt ∞ < A∗T rt ∞ . Furthermore, if rt does not satisfy the stopping rule, then OMP will select a true index from T at the iteration t + 1. Lemma 2.3 (Tropp 2012, Matrix Bernstein Inequality) Let {X k } ∈ Rd×d be a finite sequence of independent random self-adjoint matrices. Suppose that E(X k ) = 0 and X k B a.s. and put
2 2
σ :=
E(X k ) ,
k
then for all t 0,
−t 2 /2
P
. X k t 2d exp
σ 2 + Bt/3 k
123
Multidim Syst Sign Process
3 Support recovery In this section, we analyze the support recovery of OMP in the noise setting. Some useful inequalities are from Candès and Romberg (2007), Lin and Li (2013), Tropp (2012). Throughout this paper, the parameter δ is assumed to be fixed; it is always less than or equal to one. Recall that the rows of A are normalized independent samples from a matrix F that obeys the isotropy property and the incoherence property. The following result shows that a submatrix A T of A is well conditioned under mild conditions on the number of m and the cardinality of T . Lemma 3.1 (Candès and Romberg 2007, Lemma 2.1) We suppose that the rows of A are normalized independent samples from F. Let T be a fixed index set of cardinality s. Fix β and δ in (0, 1), if m Cμ(F)δ −2 · s log(s/β), then the local isometry condition 1 − δ λmin (A∗T A T ) λmax (A∗T A T ) 1 + δ
(3.1)
holds with probability at least 1 − β. Note that the local isometry condition (3.1) is weaker than the the restricted isometry property defined in (1.2) since it only holds on the given support set T . The proof of Lemmas 3.2 and 3.3 are motived by the work in Lin and Li (2013). Lemma 3.2 We suppose that the rows of A are normalized independent samples from F. Let T be the given index set with |T | = s and let v be a vector with support on T . Then for each j ∈ T c and δ > 0,
δ2 m P | A j , Av | δ 2 exp − . √ 2μ(F) v2 + 13 svδ Proof We write 1 m ωk , m k=1 where ωk := e j , ak ak∗ v for each j ∈ T c . Since j ∈ T c , the isotropy property implies that Eωk = 0. Next, the Cauchy-Schwarz inequality leads to A j , Av = e j , A∗ Av =
|ωk | = |e j , ak ak , v| |e j , ak | · ak,T v. Since |e j , ak | and ak,T for |T | = s, we have
μ(F)
μ(F)s
√ |ωk | μ(F) sv.
For the total variance, we have Eωk2 μ(F)Eak,T , v2 = μ(F)v2 ,
123
Multidim Syst Sign Process
where the equality follows from the isotropy property. Hence σ 2 = |Eωk2 | mμ(F)v2 , and the matrix Bernstein’s inequality with d = 1 gives
m −m 2 δ 2 2 P | A j , Av | δ = P , ωk mδ 2 exp √ mμ(F)v2 + 1 sμ(F)vmδ
3
k=1
that is,
m δ2 P(| A j , Av | δ) 2 exp − · √ 2μ(F) v2 + 13 svδ
.
The following lemma is a generation of uniform off-support incoherence inequality obtained in Lin and Li (2013). The covering number which is a standard technique in compressed sensing context plays an important role in the proof. We denote X T the set of all vectors in Rn that are zero out of T , X T is a s-dimensional linear space if we endow the l2 norm. Lemma 3.3 We suppose that the rows of A are normalized independent samples from F. For any set T with cardinality less than s and any δ > 0, we have A∗T c Av∞ δv for all v ∈ X T
(3.2)
with probability exceeding
m δ2 1 − 2(n − s) · 5 exp − · √ 2μ(F) 4 + 23 sδ
s
.
(3.3)
In particular, if m Cδ −2 μ(F) · s(s + log(n − s) + log β −1 ) for some β ∈ (0, 1), then δ A∗T c Av∞ √ v for all v ∈ X T s
(3.4)
holds with probability exceeding 1 − β. Proof Note that A∗T c Av = A∗T c A T v for all v ∈ X T , A∗T c A T is linear, thus it suffices to prove (3.2) in the case v = 1. We first choose a finite 1/2-covering of the unite sphere in X T , i.e., a set of points Q ⊂ X T ,with q = 1 for all q ∈ Q, such that for all v ∈ X T , v = 1, we have min v − q q∈Q
1 . 2
According to Mendelson et al. (2011, Lemma 2.2), there exits such a Q with |Q| 5s . Using Lemma 3.2, one gets
123
Multidim Syst Sign Process
P
max A∗T c Av∞ v∈Q
δ 2
δ P | A j , Av | 2 v∈Q j∈T c
δ2 m s · 5 (n − s) · 2 exp − . √ 2μ(F) 4 + 23 sδ
It thus follows that with probability exceeding (3.3), we have δ q, for all q ∈ Q. 2 Now define B as the smallest number such that A∗T c Aq∞
A∗T c Av∞ Bv, for all v ∈ X T . Recall that for all v ∈ X T , v = 1, one can choose q ∈ Q, such that min q∈Q v − q and get that A∗T c Av∞ A∗T c A(v − q)∞ + A∗T c Aq∞
1 2
B δ + . 2 2
Therefore, B δ, which leads to the result.
Now we state the sufficient condition for the support recovery of OMP in noisy case. Theorem 3.1 We suppose that the rows of A are normalized independent samples from F. Let x ∈ Rn be an arbitrary fixed s-sparse vector. Given the noisy model y = Ax + z, where the noise vector z is independent of A with z b. Fix β in (0, 1) and δ in (0, 0.5). Choose m Cδ −2 μ(F) · s(s + log(n − s) + log β −1 ). If all nonzero coefficients xi , i = 1, · · · , n, satisfy √ 2 μ(F)b |xi | , 1 − 2δ
(3.5)
then OMP with the stopping rule rt b can recover the support of x in s iterations with probability exceeding 1 − β. Proof We begin with some notation and a few simplifying assumptions. Let T = supp(x). For a vector r ∈ Rm , we denote A∗T c r∞ ρ(r) = . (3.6) A∗T r∞ When r is set to be the residual vector of OMP, then OMP chooses a column from A T provided the condition ρ(r) < 1. By the condition of the theorem, using the Lemmas 3.1 and 3.3, one can conclude that (3.1) and (3.4) occurs with probability at least 1 − β. Under these assumptions, we will show that OMP with the stopping rule rt b can recover the support T in s iterations. We define a ratio ρ(A, T ) =
A∗T c Av∞ . v∈X T ,v =0 A∗T Av∞ max
Note that for all v ∈ X T , by (3.1),
√ √ √ A∗T Av∞ A∗T A T vT / s λmin (A∗T A T )vT / s (1 − δ)v/ s
123
Multidim Syst Sign Process
It thus follows from the above inequality , δ ∈ 0, 21 and (3.4) that ρ(A, T )
δ < 1. 1−δ
(3.7)
Assume that at the first t (t < s) iterations, OMP selects t correct indices, that is , t ⊂ T . Denote u t = T \t . We will prove that ρ(rt ) < 1 and rt > b. Consequently, OMP selects another true index from T at this current iteration. Since z b, we have A∗ (I − Pt )z∞ max A j (I − Pt )z μ(F)z μ(F)b. j∈[n]
It thus follows that √ √ 1 1 1 2 μ(F)(s − t) 2 b 2 μ(F)(s − t) 2 b 2(s − t) 2 A∗ (I − Pt )z∞ xu t . 1 − 2δ (1 − δ)(1 − ρ(A, T )) λmin (A T ∗ A T )(1 − ρ(A, T )) By the definition of ρ(A, T ) and Lemma 2.2, one has ρ(rt ) < 1. The bound rt > b is proved as follows: rt = (I − Pt )A T xT + (I − Pt )z (I − Pt )A T xT − (I − Pt )z (I − Pt )Au t xu t − b λmin (Au t ∗ (I − Pt )Au t )xu t − b λmin (A T ∗ A T )xu t − b √ 1 2 μ(F)(s − t) 2 b ∗ −b λmin (A T A T ) (1 − δ)(1 − ρ(A, T )) √ 2 μ(F)b −b b 1 − ρ(A, T ) where the fourth inequality is implied by Lemma 2.1 and the fifth inequality follows from √ λmin (A T ∗ A T ) 1 √ > 1 and s − t 1. 1−δ 1−δ When all the s correct indexes are selected, OMP stops since rs = (I − PT )z z b.
4 Approximation property In this section, we first analyze the approximation property of OMP in the noiseless case. The convergent rate of OMP is given in the following. Theorem 4.1 For a given s-sparse vector x, the measurements are given by y = Ax. We suppose that columns of the sensing matrix A are normalized to be unit vectors, i.e, Ai = 1, i = 1, . . . , n, and the sensing matrix A satisfies local isometry condition (3.1) with δ < 1. If OMP can find the correct index at each iteration, then the residual r j computed by OMP satisfies r j η j y
123
Multidim Syst Sign Process
where η =
1−
1−δ s
< 1. Moreover, x − x j
ηj x. 1−δ
Proof For any s-sparse vector c such that the support of c is in T , we denote S = max {|Ac, Ai |} . i∈T
A lower bound of S is given by the following, Ac, Ai ci 1−δ S c1 Ac, Ac Ac2 = √ = i∈T√ √ S. Ac √ s sc sc sc sc Summry, we have that
1−δ Ac max {|Ac, Ai |} i∈T s
(4.1)
hold for all s-sparse vectors c. For a given positive integer j, the residue of OMP in the j-th iteration is r j = y − A j x j = y − P j y, where P j is the orthogonal projection. Since P j y is the best approximation to y from the subspace spanned by {Ai }i∈ j , we have r j 2 = y − P j y2 y − P j−1 y − r j−1 , A j A j 2 = r j−1 − r j−1 , A j A j 2 . Let c˜ be the s-sparse vector such that r j−1 = A c˜. We have r j−1 − r j−1 , A j A j 2 = r j−1 2 − |r j−1 , A j |2 = r j−1 2 − |A c˜, A j |2 . Note that j = arg max |r j−1 , Ai |, i∈{1,...,n}
we have
1−δ r j 2 r j−1 2 − |Ac, A j |2 1 − r j−1 2 , s
where the last inequality follows from (4.1). By induction, we conclude that the approximation error of OMP is
with η =
r j η j y 1−
1−δ s
< 1. Using the fact that r j = Ax − A j x j = Ax − A xˆ j ,
the approximation error of the recovered vector at each iteration is given by ηj 1 Ax − A j x j x. x − xˆ j x − xˆ j 1−δ 1−δ
123
Multidim Syst Sign Process
We now consider the general signals and the approximation rate of the residual in OMP. The error of best s-term approximation to y from the measurement matrix A is defined by σs ( y) := min y − Ac. c0 s
(4.2)
Theorem 4.2 Suppose that the measurement matrix satisfies the local isometry condition with δ < 21 and let y be a completely arbitrary signal. Assume that in the first j iterations, OMP select the columns from the support the best s-term approximation of the signal y. At j + 1-th iteration, if the residual of OMP satisfies s(1 − δ) r j > 1 + σs ( y), (4.3) (1 − 2δ)2 then OMP will recover another correct index. Proof Let y∗ be the best s-term approximation of y, i.e., y − y∗ = σs ( y). Without lost general, we assume that y∗ = A I c, where I ⊂ {1, . . . , s}. We define Si (x) := Ai , Ax, i = 1, . . . , n. Denote Sg (x) := max |Si (x)| and Sb (x) := maxc |Si (x)|. i∈I
i∈I
Let Ac j := y − y j , and A I c∗j := y∗ − y j . Consider the j-th iteration, the sufficient condition for choosing an index from {1, . . . , s} is Sg (c j ) > Sb (c j ).
(4.4)
By calculation, we have that Sg (c j ) = max |Ai , y − y j | i∈I
= max |Ai , y − y∗ + Ai , y∗ − y j | i∈I
= max |Ai , y∗ − y j | i∈I
and Sb (c j ) maxc |Ai , y − y∗ | + maxc |Ai , y∗ − y j |. i∈I
i∈I
Therefore, we can see that maxi∈I c |Ai , y∗ − y j | maxi∈I c |Ai , y − y∗ | + <1 ∗ maxi∈I |Ai , y − y j | maxi∈I |Ai , y∗ − y j | implies (4.4). It follows from the assumption that Ai = 1 and (4.1) that s y − y∗ maxi∈I c |Ai , y − y∗ | maxi∈I c Ai y − y∗ . ∗ ∗ maxi∈I |Ai , y − y j | maxi∈I |Ai , y − y j | 1 − δ y∗ − y j
(4.5)
(4.6)
It follows from inequality (3.7) that maxi∈I c |Ai , A I c∗j | maxi∈I c |Ai , y∗ − y j | δ = . maxi∈I |Ai , y∗ − y j | maxi∈I |Ai , A I c∗j | (1 − δ)
123
(4.7)
Multidim Syst Sign Process
It follows from (4.5), (4.6) and (4.7) that √
s(1 − δ) y − y∗ < y j − y∗ 1 − 2δ
is a sufficient condition for (4.4). Since y − y j 2 = y − y∗ 2 + y∗ − y j 2 , we conclude that s(1 − δ) σs ( y) (4.8) y − y j = r j > 1 + (1 − 2δ)2 implies (4.4). Therefore, the OMP selects another correct index at j + 1-th iteration.
5 Random convolution In this section, we study the isotropy property and incoherence property of random Toeplitz matrices and random circulant matrices constructed in Bajwa et al. (2007), Haupt et al. (2010), Krahmer et al. (2014) and Rauhut (2009). These matrices arise naturally from the convolutional structure inherent to linear system identification. Then we apply our results to OMP and l1 -minimization. For a vector b = (b0 , b1 , . . . , bn−1 ) in n dimensional space, the associated circulant matrix S b ∈ Rn×n is defined by ⎛
b0
⎜ bn−1 ⎜ ⎜ S = ⎜ bn−2 ⎜ .. ⎝ . b1 b
b1 b2 · · · b0 b1 · · · bn−1 b0 · · · .. .. . . . . . b2 b3 · · ·
⎞ bn−1 bn−2 ⎟ ⎟ bn−3 ⎟ ⎟. .. ⎟ . ⎠ b0
Similarly, for a vector c = (c−n+1 , c−n+2 , . . . , cn−1 ), the associated Toeplitz matrix T c ∈ Rn×n is defined by ⎛ ⎜ ⎜ ⎜ T =⎜ ⎜ ⎝ c
c0 c−1 c−2 .. .
c1 c0 c−1 .. .
c2 c1 c0 .. .
c−n+1 c−n+2 c−n+3
··· ··· ··· .. . ···
⎞ cn−1 cn−2 ⎟ ⎟ cn−3 ⎟ ⎟. .. ⎟ . ⎠ c0
Now we choose an arbitrary subset ⊂ {1, 2, . . . , n} of cardinality m < n and let the b ∈ Rm×n be the submatrix of S b consisting of the rows indexed partial circulant matrix S by . Similarly, we define the partial Toeplitz matrix Tc ∈ Rm×n to be the submatrix of T c consisting of the rows indexed by . In various physical domains, it is easy to compute b x or T c x. In Rauhut (2009) measurement matrix is a draw of a partial random circulant S matrix generated by a Rademacher vector. In this section, we consider three types of random convolution matrices without the Rademacher vector as follows: 1. b or c is a sequence of independent Bernoulli random variables taking values ±1 with equal probability. √ √ i.i.d. 2. xi ∼ unif[− 3, 3], here xi denote bi or ci .
123
Multidim Syst Sign Process
3. For a fixed parameter q ∈ (0, 1),
⎧ √ ⎪ w.p. q/2 ⎨1/ q i.i.d. xi ∼ 0 w.p. 1 − q ⎪ √ ⎩ −1/ q w.p. q/2,
here xi denote bi or ci . In the rest of this section, b and c are assumed to be the distributions above. Summary, the circulant matrices S b satisfies the incoherence property with the coherence parameter given by μ(S b ) =
max
0in−1
|bi |2 .
the Toeplitz matrix T c satisfies the incoherence property with the coherence parameter given by μ(T c ) =
max
−n+1in−1
|ci |2 .
Let F be S or T . If the vector b or c is a sequence of independent Bernoulli random variables taking values ±1 with equal probability, then μ(F) = 1. If √ √ i.i.d. xi ∼ unif[− 3, 3], then μ(F) = 3. If
⎧ √ ⎪ w.p. q/2 ⎨1/ q i.i.d. xi ∼ 0 w.p. 1 − q ⎪ √ ⎩ −1/ q w.p. q/2,
then μ(F) = 1/q. Theorem 5.1 Both the circulant matrix S b and Toeplitz matrix T c satisfy the isotropy property. Proof For the vector b = (b0 , b1 , . . . , bn−1 ), we have ⎛ b0 b 0 b 0 b 1 b 0 b 2 · · · ⎜ b1 b0 b1 b1 b1 b2 · · · ⎜ ⎜ b∗ b = ⎜ b2 b0 b2 b1 b2 b2 · · · ⎜ .. .. .. .. ⎝ . . . . bn−1 b0 bn−1 b1 bn−1 b2 · · ·
⎞ b0 bn−1 b1 bn−1 ⎟ ⎟ b2 bn−1 ⎟ ⎟. ⎟ .. ⎠ . bn−1 bn−1
Since bi are i.i.d. and obey distributions discussed as above, we have Ebi2 = 1, and Ebi b j = Ebi Eb j = 0, i = j for all In fact,
123
i, j = 0, 1, . . . , n − 1.
Multidim Syst Sign Process
1. b or c is a sequence of independent Bernoulli random variables taking values ±1 with equal probability. 1 1 1 1 + (−1) × = 0, Exi2 = 12 × + (−1)2 × = 1, 2 2 2 2 Exi x j = Exi Ex j = 0, i = j. Exi = 1 ×
√ √ i.i.d. 2. xi ∼ unif[− 3, 3]. We have √3 √3 1 1 2 Exi = √ x √ d x = 0, Exi = √ x 2 √ d x = 1, 2 3 − 3 2 3 − 3 Exi x j = Exi Ex j = 0, i = j. 3. For a fixed parameter q ∈ (0, 1), since ⎧ √ ⎪ w.p. q/2 ⎨1/ q i.i.d. xi ∼ 0 w.p. 1 − q ⎪ √ ⎩ −1/ q w.p. q/2, we have q (−1) q 1 Exi = √ × + 0 × (1 − q) + √ × = 0, q 2 q 2
2 1 (−1) 2 q q 2 2 × + 0 × (1 − q) + √ Exi = √ × = 1, q 2 q 2 and Exi x j = Exi Ex j = 0, i = j. We conclude that Eb∗ b = I. Choose any column of circulant matrix S b , denote a. The components of a are some circulant with the vector b∗ = (b0 , b1 , . . . , bn−1 )∗ , where bi are i.i.d. Therefore, we have the components of a have unit variance and are uncorrelated. It follows that the circulant matrix S b obeys the isotropy property, i.e., Eaa∗ = I, a ∼ S b . Choose any column of T c , denote ξ , the components of ξ are just m part of some circulant with the sequence (c−n+1 , c−n+2 , . . . , cn−1 ). For convenience, we write ξ = (c1 , c2 , . . . , cm ). Since ci are i.i.d, we have Eci2 = 1, and Eci c j = Eci Ec j = 0, i = j for all i, j = 1, . . . , m. Therefore, the components of ξ have unit variance and are uncorrelated. We conclude that the matrix T c obeys the isotropy property, i.e., Eξ ξ ∗ = I, ξ ∼ T c .
123
Multidim Syst Sign Process
The following nonuniformly recovery property of l1 -minimization is implied by Theorem 5.1 and the dual certification in Candès and Tao (2006). Theorem 5.2 Let F be either the circulant matrix S b or Toeplitz matrix T c . We assume that the rows of A are normalized independent samples from F. Let x ∈ Rn be a given s-sparse vector. If the number of the measurements satisfies m Cβ · μ(F) · s log n,
(5.1)
then the solution to the l1 -minimization problem ¯ 1 subject to min x
¯ n x∈R
A x¯ = y
coincides with x with probability at least 1 − 5/n − e−β . Theorem 5.2 provides an optimal requirements of the number of measurements. In Rauhut (2009), the author states: “the power 3 at the log-term [in (1.7)] can very likely be improved to 1.” Therefore, Theorem 5.2 closes the gap discussed in Rauhut (2009) completely. We can also apply Theorem 5.1 to OMP. Here we only present one result for illustration. Theorem 5.3 Let x ∈ Rn be a given s-sparse vector. Let F be either the circulant matrix S b or Toeplitz matrix T c . We assume that the rows of A are normalized independent samples from F and the noise vector z satisfies z b. Fix β in (0, 1) and δ in (0, 0.5). Choose m Cδ −2 μ(F) · s(s + log(n − s) + log β −1 ). If all nonzero coefficients xi satisfy √ 2 μ(F)b , |xi | 1 − 2δ then OMP algorithm with the stopping rule rt b can recover the support of x in s iterations with probability exceeding 1 − β.
6 Conclusion In this paper, we have studied sufficient conditions for the nonuniform recovery of signals from noisy measurements by OMP. Rows of measurement matrices are sampled independently from a matrix obeying the isotropy property and the incoherence property. Those measurement matrices include classical random measurement matrices in compressed sensing, random convolutions, subsampled tight or continuous frames. Given m admissible random measurements of a fixed s-sparse signal x ∈ Rn . We show that if m = O(s(s + log(n − s))), then OMP can recover the support of x exactly after s iterations with overwhelming probability. We have also proposed approximation order of OMP. Acknowledgements This work is partially supported the NSF of China under Grant 11671358, the NSAF of China under Grant U1630116, the key project of NSF of China under Number 11531013. Ruifang Hu is also partially supported by the university visiting scholars program under Grant FX2017049. The authors are grateful to the editor and anonymous reviewers for their constructive suggestions and comments to improve the paper.
123
Multidim Syst Sign Process
References Bajwa, W. U., Haupt, J., Raz, G., Wright, S. J., Nowak, R., & Toeplitz-structured compressed sensing matrices. (2007). IEEE/SP 14th workshop on statistical signal processing. Madison, WI, USA, pp. 294–298. Cai, T., & Wang, L. (2011). Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Transactions on Information Theory, 57(11), 4680–4688. Cai, T., & Zhang, A. (2014). Sparse representation of a polytope and recovery of sparse signals and low-rank matrices. IEEE Transactions on Information Theory, 60, 122–132. Candès, E. J., & Plan, Y. (2011). A probabilistic and RIPless theory of compressed sensing. IEEE Transactions on Information Theory, 57(11), 7235–7254. Candès, E. J., & Romberg, J. (2007). Sparsity and incoherence in compressive sampling. Inverse Problems, 23(3), 969–985. Candès, E. J., & Tao, T. (2005). Decoding by linear programming. IEEE Transactions on Information Theory, 51(12), 4203–4215. Candès, E. J., & Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Transactions on Information Theory, 52(12), 5406–5425. Coifman, R., Geshwind, F., & Meyer, Y. (2001). Noiselets. Applied and Computational Harmonic Analysis, 10, 27–44. Cohen, A., Dahmen, W., & DeVore, R. (2017). Orthogonal matching pursuit under the restricted isometry property. Constructive Approximation, 45(1), 113–127. Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52, 1289–1306. Donoho, D. L., & Kutyniok, G. (2013). Microlocal analysis of the geometric separation problem. Communications on Pure and Applied Mathematics, 66, 1–47. Haupt, J., Bajwa, W. U., Raz, G., & Nowak, R. (2010). Toeplitz compressed sensing matrices with applications to sparse channel estimation. IEEE Transactions on Information Theory, 56(11), 5862–5875. King, E. J., Kutyniok, G., & Zhuang, X. (2011). Analysis of data separation and recovery problems using clustered sparsity. In SPIE proceedings: Wavelets and sparsity XIV, Vol. 8138 King, E. J., Kutyniok, G., & Zhuang, X. (2012). Analysis of inpainting via clustered sparsity and microlocal analysis. Journal of Mathematical Imaging and Vision, 48(2), 205–234. Krahmer, F., Mendelson, S., & Rauhut, H. (2014). Suprema of chaos processes and the restricted isometry property. Communications on Pure and Applied Mathematics, 67(11), 1877–1904. Kunis, S., & Rauhut, H. (2008). Random sampling of sparse trigonometric polynomials II-orthogonal matching pursuit versus basis pursuit. Foundations of Computational Mathematics, 8, 737–763. Lin, J. H., & Li, S. (2013). Nonuniform support recovery from noisy random measurements by orthogonal matching pursuit. Journal of Approximation Theory, 165, 20–40. Mendelson, S., Pajor, A., & Tomczak-Jaegermann, N. (2011). Uniform uncertainty principle for Bernoulli and subgaussian ensembles. Constructive Approximation, 28, 277–289. Mo, Q. (2015). A sharp restricted isometry constant bound of orthogonal matching pursuit. arXiv:1501.01708. Mo, Q., & Shen, Y. (2012). A remark on the restricted isometry property in orthogonal matching pursuit. IEEE Transactions on Information Theory, 58(6), 3654–3656. Pati, Y. C., Rezaiifar, R., & Krishnaprasad, P. S. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Proceedings of 27th annual asilomar conference on signals, systems, and computers (Vol. 1, pp. 40–44). IEEE, Pacific Grove, CA. Pereira, P. M., Lovisolo, L., da Silva, E. A. B., & De Campos, M. L. R. (2014). On the design of maximally incoherent sensing matrices for compressed sensing using orthogonal bases and its extension for biorthogonal bases case. Digital Signal Processing, 27, 12–22. Rauhut, H. (2009). Circulant and Toeplitz matrices in compressed sensing. Mathematics. arXiv:0902.4394. Romberg, J. (2008). Compressive sensing by random convolution. Siam Journal on Imaging Sciences, 2(4), 1098–1128. Shen, Y., & Li, S. (2015). Sparse signals recovery from noisy measurements by orthogonal matching pursuit. Inverse Problems and Imaging, 9(1), 231–238. Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10), 2231–2242. Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4), 389–434. Tropp, J. A., & Gilbert, A. C. (2007). Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12), 4655–4666. Wen, J., Zhou, Z., Wang, J., Tang, X., & Mo, Q. (2017). A sharp condition for exact support recovery with orthogonal matching pursuit. IEEE Transactions on Signal Processing, 65(6), 1370–1382.
123
Multidim Syst Sign Process Xu, Z. (2012). A remark about orthogonal matching pursuit algorithm. Advances in Adaptive Data Analysis, 4(4), 1250026. Zhang, T. (2011). Sparse recovery with orthogonal matching pursuit under RIP. IEEE Transactions on Information Theory, 57(9), 6215–6221. Yi Shen received his B.S. degree and Ph.D. degree in mathematics from Department of Mathematics, Zhejiang University, in 2004 and 2009, respectively. He worked as a PIMS postdoctoral fellow at the University of Alberta and University of Calgary from 2012 to 2014. He is currently a professor in mathematics in Zhejiang Sci-Tech University. His research interests include compressed sensing, wavelet analysis and its applications in image restoration.
Ruifang Hu was born in 1979 in China. She has obtained a B.S. degree in mathematics from the Anhui University in 2002 and a M.A. degree in mathematics from the Zhejiang University in 2005, respectively. She is currently a lecturer in mathematics in Jiaxing University. Her research interests include wavelet analysis, harmonic analysis, compressed sensing.
123