J Sign Process Syst DOI 10.1007/s11265-015-1065-6
Fields of Experts Based Multichannel Compressed Sensing Jingbo Wei1,2 · Yukun Huang3 · Ke Lu4 · Lizhe Wang2,5
Received: 24 July 2015 / Revised: 23 September 2015 / Accepted: 14 October 2015 © Springer Science+Business Media New York 2015
Abstract Image reconstruction from under-sampled data had always been challenging due to its implicit ill-posed nature until, that is, the emergence of compressed sensing. This paper proposes a new compressed sensing-based model and an efficient algorithm for image reconstruction from sparse observations, based upon a generalized multiple sparse restrictions framework to describe both local similarities and nonlocal similarities at the regularization stage. An efficient split Bregman method is employed to decouple the nonlinear optimization problem in order to iteratively approach the optimum solution. To obtain better restoration for multiband images, a principle component analysis is suggested to improve the signal-to-noise ratio of auxiliary parameters, and the noise variance in the wavelet domain is calculated adaptively. A comparison between filters showed that eight 3 × 3 sized filters learned from the model of fields of experts are by far the best filters to earn
Lizhe Wang
[email protected] 1
Institute of Space Science and Technology, Nanchang University, Nanchang, People’s Republic of China
2
Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100096, People’s Republic of China
3
Jiangxi University of Finance and Economics, Nanchang, People’s Republic of China
4
University of Chinese Academy of Sciences, Beijing, People’s Republic of China
5
School of Computer Science, China University of Geosciences, Wuhan 430074, People’s Republic of China
high peak signal-to-noise ratio. The proposed split Bregman optimization compressed sensing algorithm with fields of experts trained filters is compared to alternative methods such as Orthogonal Matching Pursuit, Compressive Sampling Matching Pursuit, Group-based Sparse Representation, and Compressive Sensing based on Bregman Split and principal component analysis. The experimental results of nine images show that the new method has competitive results for single-band images and outweighs all the other methods for color images, and so it is an appropriate model for reconstructing multichannel images. Keywords Compressed sensing · Markov random fields · Total variation · Image reconstruction
1 Introduction The time of big data brings incomplete measuring results from numerous but randomly distributed observation sources, which calls for the restoration of the complete data of interest in order to make reliable decisions. This reconstruction process can be abstracted as an image to be recovered, whereas the inverse process is by nature an ill-posed problem. Compressed sensing, the theory of sparsity-promoted regularization for signal recovery from highly under sampled data, has experienced enormous popularity due to its ability to reconstruct perfect signals from a limited or compressed number of samples far below the Nyquist sampling rate. Though construction of the measurement process is one of the main issues in compressed sensing, most applications require sparsity regularization to reconstruct signals. To this end, signals have to be sparsely represented in certain transform domains, then the reconstruction process
J Sign Process Syst
is formulated as a constrained optimization problem and is solved using l1 -relaxion convex minimization, greedy or iterative approach strategies such as split Bregman, orthogonal matching pursuit, Bayesian estimation, iterative thresholding, or compressive sampling matching pursuit, gradient-based algorithms, and so on. As one of the most fundamental examples of prior knowledge, satisifying representations of images have been investigated for a long time ahead of compressed sensing. Prior knowledge in the spatial domain includes total variation for structure coherence and random fields for textural consistency. Based on compressed sensing, new algorithms like dictionary learning have also been developed to capture sparseness. Total variation (TV) is a low-order finite difference transformation popular due to its simplicity for being integrated into regularization and theoretically preserving edges. Therefore, Shiqian et al. [7] advised total variation in compressed sensing to recover MR images. However, as is generally known, total variation-based reconstruction methods tend to produce over-smoothed image edges and blocky texture details, owing to piece-wise constant solutions. Therefore, Xuan et al. [3] introduced directional total variation into compressed sensing to afford better reconstructive performance. Recently, Peng and Eom [5] noted the recovery superiority of multichannel images to singlechannel images, and proposed an optimization solution in both the spatial domain and the wavelet transform domain for the total variation integrated compressed sensing denoising of multispectral remote sensing images. Overlapped patches extracted from images may have the property of nonlocal similarity, so are explored by compressed sensing to develop sparse representation with dictionary learning. Dictionary learning concerns the idea that a bank of fixed transform basis in compressed sensing is not universally optimal for all images but instead exists at the cost of non-convexity and non-linearity [6]. Therefore, dictionary learning, typically the k-SVD, online dictionary learning or RLS-DLA algorithm, uses an iterative two-step algorithm to identify the optimal dictionary by fixing atoms to get approximation coefficients and then subsequently optimizing atoms to reduce fitting error. Another solution is Bayesian rule-based dictionary learning. Mingyuan Zhou et al. [18] considered a nonparametric Bayesian method (TPFA) by employing a truncated beta-Bernoulli process to infer an appropriate dictionary for restoring images. After the dictionary is trained, an image is sparsely represented with learned atoms, which act as the transform kernels capturing sparseness. Markov random fields (MRFs) were initially motivated by Gibbs random fields to describe the neighborhood statistical correlation for textural images. In the low-order MRF framework, the smoothness energy comes from the negative
log likelihood of priors, and punishes two adjacent pixels with harmonic functions regarding intensity difference. This is quite similar to the total variation, which uses linear potential on gradients. As for high-order MRFs defined over large clique systems, Zhu et al. [19] combined filtering theory and Markov random field modeling through the maximum entropy principle, and chose the least number of filters that best extracted features of textures, which essentially made use of the principle of compressed sensing. Fields of experts [11] is a high-order MRF model that learns filters and expert functions from a patch set, and the filters learned by this model are similar to those obtained using non-parametric individual component analysis (ICA) or standard sparse coding. The integration of compressed sensing and total variation for denoising remote sensing multispectral images in [5] performed favorably in comparison to other approaches, which encouraged us to identify more integration strategies. However, since [5] focused on remote sensing images and made optimization for multispectral images other than panchromatic images, it is still unclear whether the admirable performance comes from the model or from the optimization effort that harnesses the inner relationship between the bands of a multispectral image. Furthermore, as mentioned above, total variation is universally used because of its simplicity, whereas the integration of compressed sensing with more specific filters for sparser representation may produce a better performance. Lastly, total variation only restricts the local similarity among adjacent pixels. Recently, group-based sparse representation that employs local similarity and nonlocal similarity separately to recover images has attracted the interest of researchers. Jian Zhang [16] criticized the fact that traditional patch-based sparse representation modeling has to suffer high computational complexity for large-scale optimization of dictionary learning, whereas relationships between patches are ignored, so exploit the concept of the group as the basic unit of sparse representation composed of nonlocal patches with similar structures. On the basis of group sparsifying, Weisheng Dong et al. [2] proposed a spatially adaptive iterative singular-value thresholding algorithm by taking a low-rank approach toward simultaneous sparse coding and providing a conceptually simple interpretation from a bilateral variance estimation perspective. This enlightens us that local similarities and nonlocal similarities could be coupled for image regularization. In this paper, we propose to integrate other kinds of sparseness priors into compressed sensing to form more powerful constraints for ill-posed image reconstruction wherein local similarity and nonlocal similarity are included for optimization. At the same time, we hope to integrate local similarities and nonlocal similarities while avoiding the block match procedure in group-sparse representation
J Sign Process Syst
that may reduce the variety of image blocks. This idea is implemented with a generalized optimization model to represent sparseness in multiple spatial transform domains simultaneously. Specifically, we solve the optimization problem with the split Bregman technique and offer the iteration formulations. To reconstruct an image of multiple bands, an improvement technique is adopted to denoise with principle component analysis (PCA) as in [5] . Reasonably, since our model is even sparser constrained by filters that have been proved separately in lots of research, it is expected that the integrated constraints could lead to a higher reconstruction performance. The rest of this paper is organized as follows. Section 2 states some previous conclusions on compressed sensing, sparseness representation with total variation and the model of fields of experts, and the split Bregman iteration. The proposed new model and the corresponding optimization formulations are detailed in Section 3. Section 4 introduces the improvement strategy for multiband images. Section 5 describes how to set parameters for the proposed algorithm when wavelet transform is used. Section 6 offers a comparison between considered filters and demonstrates the performance of the proposed model and optimization algorithm on single-band and multiband natural images. Conclusions and future work are presented in Section 7.
2 Background Knowledge
transform Ψ [e.g., discrete wavelet transform (DWT)]. Then, the measurement can be rewritten as y = ΦΨ −1 α
(2)
where Ψ −1 is the inverse transformation of Ψ , and α is the transform coefficients of the original image. Consider a more general case in which an image has N bands, for example a color image of 3 bands or a Hyperion-1 image of 220 bands. By defining T Y = y1 T y2 T ... yN T (3) T X = α1 T α2 T ... αN T
(4)
the sensing process can be rewritten as the following equation if additive noise E is introduced into the measurement process Y = MRX + E
(5)
where the sampling matrix M and the orthogonal transform matrixare block diagonal matrices defined as M = Diag. [Φ1 , Φ2 , · · · , ΦN ]
(6)
R = Diag. Ψ −1 , Ψ −1 , · · · , Ψ −1
(7)
To reconstruct an image with compressed sensing from a noisy measurement in μ TCS (X) = Y − MRX22 + X1 (8) 2
In this section, we will review some classical models for image recovery in the context of sparseness presentation, followed by details of the split Bregman iterative optimization technique. The following notational conventions are used throughout the paper.
where xq is the q norm of an arbitrary vector x with m q 1/q . elements and is defined asxq = ( m i=1 |x(i)| )
2.1 Theory of Compressed Sensing
2.2 Some Other Constraints in the Spatial Domain
Compressed sensing for imaging is generally based on the fact that most images are compressible. Considering a single-band image x ∈ R n , compressed sensing of the image is to compute linear measurements of the image to ensure m n. This is represented by the following equation:
Some a priori restrictions have been proposed ahead of compressive sensing. For example, the total variation (TV) minimization originates from the concept that the total variation in a clean image is usually lower than that in its counterpart noisy image, thus it is helpful to remove noise from images while preserving sharp features. Markov random fields are initially motivated for texture description, whereas their models have been extended to natural images for global image restoration. A total variation is defined as the anisotropic discretization formulation of the TV seminorm uT V = ∇u1 = |(∇h u)i | + |(∇v u)i | (9)
y = Φx
(1)
where Φ is a m ∗ n sized sparse measurement matrix, and y ∈ m is the observed data. Φ maps from n to m in order to represent a dimensionality reduction with regard to m n. The theory of compressed sensing points out the perfect reconstruction of the original image x from the observation y is possible if the original image x is sparse. Assume that the image x is compressible by an orthogonal
i
Here u represents an image; ∇h and ∇v are gradient operators in the horizontal and vertical directions, respectively. For natural image oriented high-order Markov random
J Sign Process Syst
fields such as Fields of Experts (FoE) [12], the neighborhood constraint is defined as p(x; ) =
N K 1 φ(Jk ⊗ xn ; αk ) Z()
(10)
n=1 k=1
where xn gives the location of a pixel n; Jk is the k th filter for convolution; φ(·) is an expert function; αk is the parameter of the k th filter; N is the quantity of pixels; K is the number of total filters; and Z() is the partition function for normalization. The training process of FoE used the principle of maximum likelihood to obtain the optimal filters in which the contrastive divergence learning and hybrid Monte Carlo sampler were exploited for iterative parameter convergence. The filters identified by Roth in [12] are eight for a 3 × 3 and twenty for a 5 × 5 sized window, thus the training process is explained with a sparse approximation in line with the marginal distributions of the training set.
3 A Generalized Model and its Optimization by Split Bregman In this work, we propose to reconstruct the final image by iteratively reconstructing the compressed sensing coefficients via a generalized model representing multiple sparse presentations. The proposed method is based on the different prior sparse constraints discussed above, which motivates us to form a generalized model to integrate them and then solve it. 3.1 A Generalized Model A generalized optimization model is defined as λk 1 λ0 Y − MRX22 + X1 + Jk ⊗ RX1 2 2 2 K
TE (X) =
k=1
=
1 λ0 Y − MRX22 + X1 + 2 2
K λk k=1
2
Hk RX1 (15)
2.3 Split Bregman Iteration Algorithm
Step 1 is a differentiable optimization problem directly solved with tools like Fourier transform, Gauss-Seidel, conjugate gradient, etc. Step 2 can be solved efficiently using the shrinkage function
where λ0 and λk are adjustable regularization parameters, Hk refers to the block circulant matrix corresponding to the k th convolution or transform kernel Jk , whereas the other parameters are the same as in the above definition. Transform matrix R is preferably derived from a bank of multi-scale orthogonal basis. A convolution kernel has to be a small-sized zero-mean filter to meet the elementary characteristic of sparse representation. Since the filter is usually far smaller than the image size, any form of small-sized zero-mean filter tends to produce a local similarity to a smooth image locally. On the other hand, if a filter is adopted in different areas of an image to produce large sparse coefficients, it partly demonstrates the similarities between these areas, as is analogous to response to specifically band-pass filtering in the frequency domain. Following this idea, nonlocal similarity could be observed at the macro level. Typical filters may be in the form of gradient operators, filters learned by high-order Markov random fields, or main atoms of dictionary learning. The gradient operators, also known as the total variation regularization, are criticized in [16] that they only carve local similarity, partly owing to their small size and rigidity to all images. Generally, a filter regarded as capturing global similarities should have the largest number of non-zero coefficients among the filtered images.
shrink(d, 1/λ) = max (|d| − 1/λ, 0) · d/ |d|
3.2 Filters Selection
For an imageu, an 1 − norm constrained problem u∗ = arg min |Φu| + Ku − f 2
(11)
u
is formulated by the split Bregman [4] to form the iteration
uk+1 , d k+1 b
k+1
= arg min u, d
d1 + H (u) +
λ d − Φ(u) − b2 2
= b + Φ(u) − d k k
(12)
The algorithm can be divided into three steps:
2 λ
H (u) + d − Φ(u) − bk u 2 2
2 λ
= arg min d1 + d − Φ(u) − bk d 2 2
Step 1 : uk+1 = arg min Step 2 : d k+1
Step 3 : bk+1 = bk + Φ(uk+1 ) − d k
(13)
(14)
Then the solution is obtained for the nonlinear optimization problem.
The filters in Eq. 15 have an impact on the optimization tradeoff between local similarity and nonlocal similarity,
J Sign Process Syst
which may dominate the possible performance of the proposed model. Intuitively, the more a filter adapts globally, the better it carves wide sparse representation to regularize recovered images. Gradient operators have been proved useful in [5] for noisy remote sensing images, except for the over-smoothness produced by these low-order filters. Besides the gradient operators, we considered another three types of optional filters: atoms from dictionary learning, transform vectors of SVD decomposition, and filters learned by the FoE model. Choosing suitable atoms from dictionary learning is particularly difficult. For example, we extracted 2,000,000 periodically positioned non-overlapping 3 × 3 sized patches from the training set of the Berkeley Segmentation Dataset 300 (BSDS300) and trained with online dictionary learning [8] to find the optimal dictionary and corresponding coefficients using C++ implemented code. Then we counted the usage rates of all 9 atoms as [0.6073, 0.4969, 0.4683, 0.4669, 0.4620, 0.4511, 0.4047, 0.3958, 0.3139].This is quite typical since dictionary learning tends to produce atoms that are widely used due to the sub-optimal greedy strategy, which makes it hard to select the main atoms that are most commonly observed in every patch. PCA produces concentrating components by SVD decomposition and projection onto a group of orthogonal basis. We used the patches extracted above for PCA transform and calculated the impacts of transform vectors as [0.3034, 0.2548, 0.1175, 0.1136, 0.0907, 0.0505, 0.0477, 0.0218, 0.0000]. The first two vectors count for 56 % of related information. To make it better, all vectors except for the last one should be included in the filters bank to recover images with Eq. 15. However, PCA or gradient-based filters may not produce the best reconstruction result. To be more specific, a bank of suitable filters could be selected from an over-complete dictionary other than an orthogonal dictionary. An overcomplete dictionary characterizes images along multiple scales or redundant directions as were considered significant for visual recognition, typically the Gabor functions. 2D Gabor functions are a bank of band-pass filters with multiple directions and spatial frequencies, defined as a Gaussian function modulated by harmonic modulation. A complex Gabor filter is defined as: Gλ,θ,ϕ,γ ,σ (x, y) = e Gλ,θ,ϕ,γ ,σ (x, y) = e Gλ,θ,ϕ,γ ,σ (x, y) = e
−x
2 +γ 2 y 2 2σ 2
−x
2 +γ 2 y 2 2σ 2
−x
2 +γ 2 y 2 2σ 2
x = x cos θ − y sin θ y = x sin θ + y cos θ
· ei(2π
x λ +ϕ)
x · cos 2π + ϕ λ x · sin 2π + ϕ λ
(16) (17) (18) (19)
Here (x, y) is a 2D location, λ is a wave length, θ is a direction limited in [0, π ), ϕ is a phase offset ranging in [−π, π ), σ is the standard deviation of a Gaussian function, and γ is the aspect ratio of a filter. Instead of using the standard Gabor filters, we choose zero-mean real Gabor filters as alternative filters in [4] by removing the mean value for each real Gabor filter. Some parameters in the Gabor functions needed manual setting. σ and λare not independent, with the relationship of σ = a·λ where a lies between 0.3 and 0.6. In our test, σ = 0.56λ, γ = 1 . λ, θ and ϕ are left for manual setting. FoE trained filters have the potential of being suitable filters, too, which aim at identifying common individual components of different images and outlining the marginal distributions of filtered images. In an FoE model, the parameters, = {θ1 , · · · , θN }, θi = {αi , Ji }, which include the expert parameters {αi } and the elements of the filters {Ji }, can be learned from a training patch set by maximizing the likelihood, equivalent to minimizing the Kullback-Leibler divergence between the model and the data distribution to guarantee that the model distribution is mostly in line with the data distribution. The training process is an iteration of the log-likelihood gradient ascent by updating θi = η ·
∂EF oE ∂θi
− p
∂EF oE ∂θi
(20) X
where X denotes the training set and p the actual model distribution. Note that filters J = {Ji } are not directly ˜ defined in the parameter set but represented as J = AT J, where Ais the fixed basis in the form of an inverse whitening transformation. To computationally approximate to the expectation over the model distribution, FoE training draws samples repeatedly from p(x; ) using the hybrid Monte Carlo sampler, which takes a very long time before the Markov chain approximately converges under random walking. As an acceleration strategy, the method of contrastive divergence learning is advised to initialize the sampler at the data points and only run it for a small, fixed number of steps. The filters are expected to be learned from reference images because the proposed reconstruction model is sensitive to the filters employed, under the practical constraint that only a few atoms or filters in a large dictionary are chosen for reconstruction, which implicitly assumes that the major atoms are versatile for different images. If a filter or atom is learned from heavily spoiled images, it usually deviates significantly from the ideal atoms in producing mainly local similarities other than nonlocal similarities and reconstructing images with over-smoothness.
J Sign Process Syst
3.3 Solving the Proposed Model Since the matrix X in and is the coefficient matrix after the orthogonal transform R, R is the expected original image obtained by an inverse orthogonal transform. Therefore, the regularization for preserving local features like sharpness or texture in Eq. 15 can be implemented in the spatial domain. This optimization process to minimize object function in will efficiently benefit from the split Bregman method. As learned from the split Bregman technique, a new energy function is defined as: TE (X) =
1 Y − MRX22 2 λ0 DX − X − BX 22 +DX 1 + 2 K λk 2 Dk 1 + Dk − Hk RX − Bk 2 + 2 k=1
By rearranging the equation, we have K T T T R λk Hk Hk RX M M + λ0 I + k=1
= (MR)T Y +λ0 (DX −BX )+R T
(30) Equation can be efficiently solved with all kinds of gradient optimization methods. Here the Quasi-Newton method is advised. K T T T λk Hk Hk RX G=R M M + λ0 I + k=1
−(MR) Y − λ0 (DX − BX ) T
−R T H =R
⎧1 ⎫ λ 2 2 ⎨ 2 Y − MRX2 + 20 DX − X − BX 2 ⎬ K X = arg min λk 2 ⎭ X ⎩+ 2 Dk − Hk RX − Bk 2 k=1
(22)
T
K
(23)
B X = B X + X − DX
(24)
For k = 1, 2, · · · , K λk Dk − Hk RX − Bk 22 Dk = arg min Dk 1 + (25) Dk 2 Bk = Bk + Hk RX − Dk
(26)
With the shrink function, we have DX = shrink(BX + X, 1/λ0 )
(27)
Dk = shrink(Bk + Hk RX, 1/λk )
(28)
Note that only X is left unsolved. Considering as a simple least squares problem, we can update X with its analytic solution, which satisfies (MR)T (MRX − Y ) + λ0 (DX − X − BX ) −
K k=1
λk (Hk R)T (Dk − Hk RX − Bk ) = 0
(29)
λk Hk T (Dk − Bk )
(31)
k=1
M M + λ0 I + T
K
T
λk Hk Hk R
(32)
k=1
−H · X = G
(33)
Equation can be solved with the standard conjugated gradient algorithm to obtain the new iteration value X = X + X
λ0 DX − X − BX 22 DX = arg min DX 1 + DX 2
λk Hk T (Dk −Bk )
k=1
(21) This function can be iteratively minimized by:
K
(34)
The full reconstruction algorithm is summarized in Algorithm 1.
J Sign Process Syst
4 Improvement for Multiband Images Nowadays, natural images and remote sensing images occur mainly in the form of multiple bands, so the optimization policy for multiband images brings practical meaningfulness. Peng Liu et al. [5] designed an acceleration policy by using PCA to remove the correlation between bands transformed by the gradient operators. In our optimization process, the same technique can be adopted on Bk + Hk RX − Dk to update Bk in Eq. 26. This would be helpful because the measurement process is usually added with noise or the initial iteration of Bk deviates from its real value, then the zero-mean filters that are generally used to obtain heavy tails for sparse presentation [13] will enhance the noise amplitude. PCA transforms filtered images and performs a traditional threshold denoising method with the shrinking function, under the knowledge that the noise of an image is evenly distributed in all components after PCA while most of the information is concentrated in a few principal components. After the inverse PCA transform, the updated Bk would be more accurate for improving convergence and reconstructed images indirectly. The operation is introduced as below. Denote that matrix S contains multiple bands as S = Bk + Hk RX − Dk
(35)
Each band of S is aligned to a column vector and is averaged to get the band mean value, which forms a row vector written as mean (S). By subtracting the mean value band by band, we have S = S − mean(S)
Decorrelated components sk obtained by S = S U T
(36) form the matrix S are (37)
S is shrunk by updating each column component sk with the threshold shrinking function sk = shrink(sk , 1/η)
(38)
After the shrinking operation for all the components of S , the matrix S of multiple bands is updated with S = S U + mean (S)
(39)
5 Parameter Setting for the Proposed Algorithm The orthogonal transform matrix R can typically be a wavelet transform, which has a good tradeoff between transform efficiency and representation effectiveness, and so is used in the experiment with the minimal phase Daubechies scaling filters as the basis to decompose the images into three levels.
Considering the shrink function for DX in Eq. 27, which accounts for a threshold denoising in the wavelet domain, λ0 can be identified as the root variance of noise updated adaptively during each iteration. Supposing α(LH1 ), α(H L1 ), and α(H H1 ) denote the level-1 vertical, horizontal, and diagonal detailed wavelet subbands of a single-band image, respectively, the root variance is calculated by λ0 = median α(LH1 )2 , α(H L1 )2 /0.6745 (40) For an image with multiple bands, the noise variance for the i t h band is computed with λ0 2 (i) ≈ var (αi (H H1 ))−max cov αi (H H1 ), αj (H H1 ) , j = i}
(41)
where i and j are band numbers, var(·) is to obtain the variance of the wavelet subband coefficients, and cov(·) is to obtain the covariance between two subbands. The regularization parameters {λk } have to be set manually. Rationally, if the convolution filters are of the same type and size, their corresponding λk can be of the same value, which lessens the burden of identifying suitable parameters for different images, i.e. λ1 = λ2 = · · · = λK = λJ for all filters to share the same weighting value where denotes the filters set. Our test indicated that the proposed method was quite robust to λJ , so we empirically set λJ = 0.1 for the proposed algorithm in all numerical experiments. The basic regularization parameter λ0 could be adaptively updated in the wavelet transform domain once the first iteration is finished, and so was given a small number (λ0 = 0.3) as the initial setting.
6 Experiments and Validation In the experimental procedure, we try to identify keys to the questions that were proposed but not answered yet: 1) Which class of filters best fit the proposed model? 2) How would it be if compared with other state-of-the-art reconstruction algorithms. To find the answers, we compared the filters’ performance on some remote sensing images to get a general impression, and then used the new algorithm of the best fitted filters bank to reconstruct natural images and compare it with some state-of-the-art algorithms. 6.1 Identifying Suitable Filters for the Proposed Model The adaption of the filters of gradient operators (TV), atoms from online dictionary learning (DL), SVD decomposed eigenvectors (SVD), zero-mean real Gabor filters (Gabor),
J Sign Process Syst Table 1 Single-band reconstruction PSNR comparison on textural images.
Table 3 Single-band reconstruction PSNR comparison on classic images.
Filters
Brick
Stone
Fabric
Method
Lena
Barbara
Peppers
TV DL SVD Gabor FoE
20.9415 16.7268 21.7788 20.6887 22.9810
21.4813 17.9163 21.8694 22.3569 22.8968
27.0223 21.3449 28.9709 29.3883 30.2529
OMP CoSaMP GSR BCS-FoE
27.3044 28.1681 27.5619 27.2903
24.0418 24.8538 29.9887 23.3629
27.6179 28.5216 28.7132 28.0587
6.2 Scheme of Experimental Comparison Between Reconstruction Algorithms
and FoE trained filters (FoE) are compared in this section. Size of all filters are fixed to 3 × 3. The dictionary length is 9 for DL and 8 for SVD. Vectors of DL and SVD are learned from BSDS300, and are mirrored to meet the case of convolution. The parameters for the Gabor filters are set to λ = {2}, θ = {0, 14 π, 24 π, 34 π }, and ϕ = {0, −π/2}. Roth had trained eight 3 × 3 sized filters and twentyfour 5 × 5 sized filters with the FoE model and the corresponding training method. The training data contained 20,000 image regions randomly cropped from images of the BSDS300. Since FoE training is extremely time-consuming, we did not train FoE filters again by just choosing eight 3 × 3 sized filters as the filter set J . The other twentyfour 5 × 5 sized filters are ignored because they took more time while offering only similar performance to the 3 × 3 filters. Two types of images, textural images and remote sensing images, were used to compare the reconstruction differences between the filters. Textural images were picked from the Brodatz Dataset and named as “Brick”, “Stone” and “Fabric”, and processed as in Table 1. The Remote Sensing images processed in Table 2 were snapped from satellite images of Spot-5 (band 3/2/1), LandSat-7 (band 3/2/1) and Ikonos (band 3/2/1) to form colour images. The peak signal-to-noise ratios (PSNRs) in Tables 1 and 2 show that the filters’ performance in the proposed model ranked steadily as FoE > Gabor > SVD > TV >DL. Clearly, FoE trained filters perform best among all the filters that we considered.
Natural images of two types of classic images and specific natural images are selected to verify the performance of the proposed method. The classic images are “Lena”, “Barbara” and “Peppers” in single-band and multiband. Other natural images include “Parthenon”, “Sydney” and “Dinosaur”. We named the proposed algorithm as Bregman based Compressed sensing with FoE constraints (BCS-FoE) for simplicity and compared it with other image reconstruction algorithms derived from the compressed sensing theory, namely Orthogonal Matching Pursuit(OMP) proposed by Joel A. Tropp and Anna C. Gilbert [14], Compressive Sampling Matching Pursuit (CoSaMP) proposed by D. Needell and J.A.Tropp [9], and Group-based Sparse Representation (GSR), a state-of-the-art method proposed by Jian Zhang et al. [16].The performance of the CS-BS-PCA algorithm [5] was also compared when multiband images were tested. While the sparsity was set to 10 % for all algorithms by a noiseless measurement, all regularization parameters in the compared algorithms were kept as advised. The peak signal-noise-ratio (PSNR) of each algorithm with respect to varied images was measured for performance comparison as shown in Tables 3 and 4. As PSNR is not the only index that reflects reconstruction effect, commonly used metrics are also adopted such as correlated coefficient and UIQI [15] and for single-band evaluation whereas spectral angel mapper (SAM), RASE [17], ERGAS [10] and Q4 [1] for
Table 2 Multispectral reconstruction PSNR comparison on classic images.
Table 4 Multiband reconstruction PSNR comparison on classic images.
Filters
Spot-5
LandSat-7
Ikonos
Method
Lena
Barbara
Parthenon
Sydney
Dinosaur
TV DL SVD Gabor FoE
23.4464 21.8879 24.4147 24.9211 25.4299
17.7596 17.5259 18.8783 19.1897 19.9282
19.1369 19.1847 20.6927 20.0755 21.0497
OMP CoSaMP GSR CS-BS-PCA BCS-FoE
25.3599 26.1467 24.5174 26.2137 27.1723
26.6886 27.5168 27.2754 27.1732 28.9999
25.7766 26.6317 23.9005 25.5605 27.0479
23.8445 24.7292 24.0539 26.3677 27.3019
23.3785 24.3252 21.6362 23.1457 24.7083
J Sign Process Syst Table 5 Multiband reconstruction PSNR comparison on the pepper image. Method
PSNR
Correlated Coefficient
UIQI
SAM
RASE
ERGAS
Q4
Processing time(s)
OMP CoSaMP GSR CS-BS-PCA BCS-FoE
27.9549 28.9780 29.7903 27.7371 29.8328
0.9846 0.9874 0.9921 0.9780 0.9903
0.9845 0.9873 0.9891 0.9759 0.9898
0.0468 0.0413 0.0368 0.0681 0.0469
0.0922 0.0820 0.0744 0.1046 0.0857
0.0270 0.0241 0.0222 0.0329 0.0222
0.9832 0.9867 0.9891 0.9751 0.9885
5.92 47.1 1211 102.8 72.3
multi-band evaluation Table 5. To evaluate with correlated coefficient and UIQI, all the bands of an image are equally averaged to form a new band. Note that the ideal result of correlated coefficient, UIQI and Q4 is 1, while it is 0 for SAM, RASE, ERGAS and Q4.
balance between smoothness and detail enhancement. In general, BCS-FoE produces less noise and maintain more details than CS-BS-PCA does, which proves that our model and optimization solution are successful.
6.3 Visual and Metrics Comparison From the PSNR comparison of the single-band images in Table 3, it can be seen that the PSNR of the proposed BCSFoE algorithm is only in a moderate position. GSR and CoSaMP produced higher scores. But generally, the BCSFoE algorithm could produce competitive results even with a bank of fixed simple filters. The PSNR comparison of multiband images in Table 4 is quite different from that in Table 3. BCS-FoE ranks the highest in all tests. CS-BS-PCA could produce good results, although its parameter was initially designed for remote sensing images other than natural images. CoSaMP produced similar results to CS-BS-PCA. GSR produced moderate results for “Barbara” and “Sydney”, but the poorest results for “Lena”, “Parthenon”, and “Dinosaur”. The comparison between algorithms is also performed on the “peppers” image, and the results are demonstrated in Table 5. Generally speaking, other metrics, including correlated coefficient, UIQI, SAM, RASE, ERGAS, and Q4, are in inline with the evaluation conclusion of PSNR. Our method, BCS-FoE, produced the best results in quite short time. It is easily noticed that GSR shows high PSNR value that is quite approaching to our method, but the processing time is too long to be accepted. Figures 1 and 2 demonstrate the reconstruction results of “peppers” and “Sydney”, respectively, which are manifested partly for visual judgment. From the outcomes, we may notice that OMP and CoSamp produced serious blocks, sawlike details or color speckles that deviate from the original images, though they have good evaluation. The reconstructed “Sydney” image of GSR are such over-smoothed that details can hardly be recognized, while the “peppers” image have obviously wrong contours. CS-BS-PCA and BCS-FoE give visually similar results that strike a good
Figure 1 Visual comparison of locally manifested reconstruction images from 10 % sparse observations of the color “Peppers” image. a–b Reconstructions using OMP and CoSaMP. c Original image. d–f Reconstructions using GSR, CS-BS-PCA, and BCS-FoE.
J Sign Process Syst
The noise variance in the wavelet domain is calculated adaptively too. During the validation, eight 3 × 3 sized filters learned by the FoE model were used as the model kernels to reconstruct an image of 0.1 sparsity for natural images of single-band images and multiband images. The algorithm was compared with some state-of-the-art algorithms such as Orthogonal Matching Pursuit, Compressive Sampling Matching Pursuit, Group-based Sparse Representation, and Compressive Sensing based on the split Bregman and PCA. Experimental results show that our method has a competitive result for single-band images and outweighs all the other methods for multiband images, and so is a good model for reconstructing natural color images. Still, there is room for further performance improvement until the optimal filters are found. Even though our experiments use experienced filters to prove the feasibility of the proposed model, it has to be admitted that the evaluation metrics and training strategies are still unable to identify the optimal group of filters. However, now that our new model and the iteration algorithm have been proved effective under specific filters, our philosophical assumption incorporating nonlocal similarities and local similarities together under a unique constraint could be proved feasible. Future work will extend to perfecting the theory of establishing integration and training filters.
Acknowledgments This work is supported by the National Natural Science Foundation of China (No. 41471368 and No. 41571413).
Figure 2 Visual comparison of locally manifested reconstruction images from 10 % sparse observations of the color “Sydney” image. a–b Reconstructions using OMP and CoSaMP. c Original image. d–f Reconstructions using GSR, CS-BS-PCA, and BCS-FoE.
7 Conclusion and Future Work In this paper, we propose a new compressed sensing model and an efficient algorithm for image reconstruction from sparse observations, based upon a generalized multiple sparse restrictions framework and a penalized splitting approach. The model aims to describe both local similarities and nonlocal similarities at the regularization stage for better reconstruction with the convolution of specific filters as an auxiliary restriction besides compressed sensing. Consequently, an efficient split Bregman method is employed to decouple multiple nonlinear-norms to iteratively obtain the optimum solution. To obtain better restoration for multiband images, the proposed algorithm is improved using PCA to update some parameters adaptively as well as accurately.
References 1. Alparone, L., Baronti, S., Garzelli, A., & Nencini, F. (2004). A global quality measurement of pan-sharpened multispectral imagery. IEEE Geoscience and Remote Sensing Letters, 1(4), 313–317. 2. Dong, W., Shi, G., & Li, X. (2013). Nonlocal image restoration with bilateral variance estimation: a low-rank approach. IEEE Transactions on Image Processing, 22(2), 700–711. 3. Fei, X., Wei, Z., & Xiao, L. (2013). Iterative directional total variation refinement for compressive sensing image reconstruction. IEEE Signal Processing Letters, 20(11), 1070–1073. 4. Goldstein, T., & Osher, S. (2009). The split bregman method for l1-regularized problems. SIAM Journal on Imaging Sciences, 2(2), 323–343. 5. Liu, P., & Eom, K.B. (2014). Compressive sensing of noisy multispectral images. IEEE Geoscience and Remote Sensing Letters, 11(11), 1931–1935. 6. Liu, Q., Wang, S., Ying, L., Peng, X., Zhu, Y., & Liang, D. (2013). Adaptive dictionary learning in sparse gradient domain for image recovery. IEEE Transactions on Image Processing, 22(12), 4652– 4663. 7. Ma, S., Yin, W., Zhang, Y., & Chakraborty, A. (2008). An efficient algorithm for compressed mr imaging using total variation and wavelets. In IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008 (pp. 1–8) IEEE.
J Sign Process Syst 8. Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11, 19–60. 9. Needell, D., & Tropp, J.A. (2009). Cosamp: Iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis, 26(3), 301–321. 10. Ranchin, T., & Wald, L. (2000). Fusion of high spatial and spectral resolution images: the arsis concept and its implementation. Photogrammetric Engineering and Remote Sensing, 66(1), 49–61. 11. Roth, S., & Black, M.J. (2005). Fields of experts: A framework for learning image priors. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, (Vol. 2 pp. 860–867) IEEE. 12. Roth, S., & Black, M.J. (2009). Fields of experts. International Journal of Computer Vision, 82(2), 205–229. 13. Schmidt, U., Gao, Q., & Roth, S. (2010). A generative perspective on mrfs in low-level vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010 (pp. 1751–1758) IEEE. 14. Tropp, J.A., & Gilbert, A.C. (2007). Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12), 4655–4666. 15. Wang, Z., & Bovik, A.C. (2002). A universal image quality index. IEEE Signal Processing Letters, 9(3), 81–84. 16. Zhang, J., Zhao, D., & Gao, W. (2014). Group-based sparse representation for image restoration. Available at arXiv:1405.3351. 17. Zhang, Y. (2008). Methods for image fusion quality assessmenta review, comparison and analysis. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 37, 1101–1109. 18. Zhou, M., Chen, H., Paisley, J., Ren, L., Li, L., Xing, Z., Dunson, D., Sapiro, G., & Carin, L. (2012). Nonparametric bayesian dictionary learning for analysis of noisy and incomplete images. IEEE Transactions on Image Processing, 21(1), 130–144. 19. Zhu, S.C., Wu, Y., & Mumford, D. (1998). Filters, random fields and maximum entropy (frame): Towards a unified theory for texture modeling. International Journal of Computer Vision, 27(2), 107–126.
Jingbo Wei received the B.S. degree in Communication Engineering from Beijing Broadcasting Institute, Beijing, China, in 2000, the M.S. degree in Control Theory and Control Engineering from Tongji University, Shanghai, China, in 2008 and the Ph.D. degree in Signal and Information Processing from the Chinese Academy of Sciences (CAS), Beijing, China, in 2013. He is currently an associate processor with the Institute of Space Science and Technology, Nanchang University, Nanchang, China. His research interests include remote sensing image processing and high-performance geocomputation.
Yukun Huang received the B.S. degree in Computer Science and Technology from Nanchang University, Nanchang, China, in 2000, the M.S. and Ph.D. degree in Computer Software and Theory from Tongji University, Shanghai, China, in 2008 and 2015, respectively. She is currently a lecturer with the Deptartment of Computer Science, Jiangxi University of Finance and economics, Nanchang, China.
Ke Lu is a professor at the College of Engineering and Information Technology, University of the Chinese Academy of Sciences, Beijing, China. His research interests include remote sensing image processing, signal processing, and medical image processing. Liu has a PhD in computer science from Northwest University.
Lizhe Wang is a “ChuTian” Chair Professor at School of Computer Science, China Univ. of Geosciences (CUG), and a Professor at Inst. of Remote Sensing & Digital Earth, Chinese Academy of Sciences (CAS). Prof. Wang received B.E. & M.E from Tsinghua Univ. and Doctor of Eng. from Univ. Karlsruhe (Magna Cum Laude), Germany. Prof. Wang is a Fellow of IET, Fellow of British Computer Society. Prof. Wang serves as an Associate Editor of IEEE Tran. Computers and IEEE Tran. on Cloud Computing. His main research interests include high performance computing, e-Science, and spatial data processing.