Int J Comput Vis DOI 10.1007/s11263-014-0755-z
Image Deblurring with Coupled Dictionary Learning Shiming Xiang · Gaofeng Meng · Ying Wang · Chunhong Pan · Changshui Zhang
Received: 3 January 2014 / Accepted: 2 August 2014 © Springer Science+Business Media New York 2014
Abstract Image deblurring is a challenging problem in vision computing. Traditionally, this task is addressed as an inverse problem that is enclosed into the image itself. This paper presents a learning-based framework where the knowledge hidden in huge amounts of available data is explored and exploited for image deblurring. To this end, our algorithm is developed under the conceptual framework of coupled dictionary learning. Specifically, given pairs of blurred image patches and their corresponding clear ones, a learning model is constructed to learn a pair of dictionaries. Among them, one dictionary is responsible for the representation of clear images, while the other is responsible for that of the blurred images. Theoretically, the learning model is analyzed with coupled sparse representations for training samples. As the atoms of these dictionaries are coupled together oneby-one, the reconstruction information can be transmitted between the clear and blurry images. In application phase, the blurry dictionary is employed to reconstruct linearly the blurry image to be restored. Then, the reconstruction coeffi-
cients are kept unchanged along with the clear dictionary to restore the final results. The main advantage of our approach lies in that it works in the case of unknown blur kernels. Comparative experiments indicate the validity of our approach. Keywords Image deblurring · Coupled dictionary learning · Sparse representation · Multiple kernel deblurring
1 Introduction As an open problem, image deblurring has drawn attention all over the world in the past decades. Efforts have been made to develop various models. The road has been paved with tricks of deconvolution, regularization, multi-scale decomposition, and learning. Although the literature is rich, there does not exist a general framework that can work well for diverse images. 1.1 The State-of-the-Art Methods
Communicated by Julien Mairal, Francis Bach and Michael Elad. S. Xiang (B) · G. Meng · Y. Wang · C. Pan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China e-mail:
[email protected];
[email protected] G. Meng e-mail:
[email protected] Y. Wang e-mail:
[email protected] C. Pan e-mail:
[email protected] C. Zhang Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Automation, Tsinghua University, Beijing 100084, China e-mail:
[email protected]
Most early deblurring methods are developed in terms of decovolution (Richardson 1972; Lucy 1974). Later, a large number of extensions have been introduced (Bronstein et al. 2005; Levin et al. 2009). However, their performance is largely determined on the quality of the supplied blur kernel. Even although it is exactly known, the deconvolution-based algorithms may generate ringing artifacts due to image noise. Regularization tricks have been widely introduced to image restoration (Li et al. 2007; Dong et al. 2011). The meaning of regularization has been largely widened with different kinds of prior information. For example, Bar et al. (2007) developed a variational regression method for deblurring image whose different parts are blurred by different blur kernels. Perhaps one of the famous types of prior knowledge
123
Int J Comput Vis
is the Total Variation (TV) (Rudin et al. 1992), and example methods can be found recently in Beck and Teboulle (2009a, b), Goldluecke and Cremers (2010), Chantas et al. (2010), Kamilov et al. (2012), Ma et al. (2013). The main drawback of the TV regularizer, however, is the staircasing effect on the restored results. Another type of prior knowledge is the natural image statistics. Specifically, Fergus et al. selected to restore images via the sparseness prior about heavy-tailed distributions of image gradients (Fergus et al. 2006). Danielyan et al. (2012) developed a block-matching framework for image deblurring. The motivation is to analyze and synthesize the patches for nonlocal patchwise image modeling (Danielyan et al. 2012). In the analysis phase, similar image blocks are collected in groups, from which a 3-D transform is estimated, while in the synthesis phase, the filtered blockwise spectra are employed to restore their corresponding original patches (Danielyan et al. 2012). Dong et al. (2013) developed a framework of non-locally centralized Sparse Representation (SR) for image restoration. The core idea is to suppress the sparse coding noise by exploiting and exploring image non-local self-similarity to obtain good estimates of the sparse coefficients of the original image (Dong et al. 2013). In parallel, some other types of knowledge such as hyper-Laplacian distribution on gradients (Levin et al. 2007), sparsity (Levin et al. 2007; Portilla 2009; Zeyde et al. 2010; Yang et al. 2010; Zhang et al. 2013), color statistics (Joshi et al. 2009), multi-image hints (Yuan et al. 2007), and geometric model about rotational motion of the camera (Whyte et al. 2012) are employed for modeling. In general, prior knowledge is utilized via the regularization tricks or similarly the Bayesian estimations (Fergus et al. 2006; Shan et al. 2008; Joshi et al. 2010), which reduces the degrees of freedom of the problem. Along with the deconvolution and regularization methods, multi-scale decomposition has also been considered for deblurring. The popularly used tools are those constructed on wavelet packets. The motivation here is to utilize the advantages of the multi-scale features and spectral representations. However, these approaches suffer from a few disadvantages. Actually, wavelet decomposition is not shift-invariant and always renders pseudo-Gibbs phenomenon along the edge. To remedy these drawbacks, some regularized tricks are introduced to improve the deblurring quality (Ma et al. 2008; Wen et al. 2012). Beyond the knowledge deduced from rules, observations, or statistics, learning has been currently identified as one of the most flexible ways to discover relations between variables or knowledge hidden in data. This is prompted largely due to the various available learning methods as well as the rich image data. Specifically, Su et al. (2002) developed a learning system for image deblurring, where the relationship between the original images and their blurred versions is
123
learned via radial-basis-function neural network. Takeda et al. (2008) proposed a deblurring framework with local kernel regression. Later, Kenig et al. (2010) proposed a subspace learning approach to reduce the possible space of the point spread functions, where the principal component analysis is employed to achieve this goal. Their approach shows satisfactory deblurring for three dimensional images acquired by a wide-field fluorescence microscope. Recently, another important line of learning for image deblurring is the one constructed on dictionary learning with sparse representation. Indeed, sparse representation provides an effective yet flexible way to describe images (Mairal et al. 2008, 2010, 2011; Elad et al. 2010; Rubinstein et al. 2010; Jenatton et al. 2011; Peleg et al. 2013; Xu et al. 2013). In this process, dictionary actually plays a fundamental role for representation. Beyond discrete cosine transform bases, Haar wavelets, Gabbor wavelets, curvelets, contourlets, and bandlets that are all mathematically designed with highlystructural patterns, over-complete dictionaries learned from data have proven to be more adaptive for sparse representation of images (Elad and Aharon 2006; Aharon et al. 2006; Mairal et al. 2008, 2009, 2012; Hu et al. 2010; Zeyde et al. 2010; Yang et al. 2012; Rubinstein et al. 2013; Dong et al. 2013), demonstrating promising results for solving the traditional difficult problems. Specifically, Lou et al. proposed a deblurring framework based on dictionary replacement (Lou et al. 2011). In their approach, the dictionary atoms corresponding to the clear images are first convolved by the burring kernel to generate the blurred ones, and these blurred atoms are then employed to reconstruct the blurred image (Lou et al. 2011). Zhang et al. (2011) presented a deblurring algorithm by combining the kernel estimation and dictionary learning together, where a dictionary is supplied to exploit the sparsity prior of natural images. In addition, Couzinie-Devy et al. (2011) modeled the clear image patches and blurred ones with a linear mapping function and selected to learn a dictionary to represent the lost sharp details. Recently, Ma et al. (2013) proposed a dictionary learning approach for poisson image deblurring. In their approach, the blurring kernel is given in advance and the dictionary is learned from the degraded image itself for sparsely reconstructing its patches. In summary, the existing approaches render two lines. One is that the dictionary works together with the deconvolution operation or the linear mapping between the blurred patches and the clear ones. Another is that the dictionary is unilaterally learned from the clear images or the image to be restored, without considering the collaboration between data. Accordingly, this unilateral dictionary learning mechanism suffers from one main limitation in that the goodness of dictionary is largely determined on the quality of the estimated (or given) blur kernel or the linear mappings. This technique line has not been totally isolated from the traditional ones, and esti-
Int J Comput Vis
mating a proper blur kernel has proven to be another open problem within this issue. Finally, we point out, in vision computing, some tasks are addressed in way of learning pair of dictionaries, such as super-resolution (Yang et al. 2012; Wang et al. 2012; He et al. 2013; Jia et al. 2013), photo-sketch synthesis (Wang et al. 2012), and cross-space based image transformation (Jia et al. 2013). The constraints on the atoms of dictionaries in the optimization model are all dropped into one type: “||·||2 ≤ 1” (see Yang et al. 2012; Wang et al. 2012; He et al. 2013; Jia et al. 2013). Differently, this paper will construct the equality constraints for deblurring. 1.2 Our Work This paper presents a learning-based framework for image deblurring. Our algorithm is developed on the conceptual framework of coupled dictionary learning. In this framework, we do not estimate the blur kernel, but learn a pair of dictionaries simultaneously from the clear and blurry images. To this end, a CDL model is developed in terms of sparse representation. This model is trained with pairs of clear and blurry image patches, generating a pair of clear and blurry dictionaries. With these dictionaries, the reconstruction coefficients of the blurry patches are estimated first via the blurry dictionary and then kept unchanged to restore the image. Comparative experiments illustrate the effectiveness of the proposed method. The advantages or details of our method can be highlighted as follows: 1. Beyond blur identification and deconvolution used in existing algorithms, our algorithm is tailored around a learning-based framework. Instead, a pair of dictionaries, called clear dictionary and blurry dictionary, are learned from the training data. In application phase, the blurry dictionary is first employed to linearly reconstruct the blurry image to be restored. The reconstruction coefficients are then kept unchanged together with the clear dictionary for deblurring. This operation can be done as the atoms of the two dictionaries are coupled together with one-by-one correspondences. 2. The CDL model is formulated as an optimization problem. A gradient descent method is developed to solve it, which is composed of two phases: estimating the sparse reconstruction coefficients and updating the atoms of the coupled dictionaries. With enough flexibility, the sparse coefficients can be estimated with any existing sparse representation methods. Meanwhile, in each iteration, the atoms of the coupled dictionaries are updated with analytically optimal solution. 3. Our approach can be naturally extended to image deblurring with unknown blur kernels, without any additional
modification of the algorithm framework. It will be favorable for real-world applications in two aspects. One is that the approach need not supply the blur kernel after the dictionaries are learned. The other lies in that it can work in the cases that the images are blurred with different blur kernels. Actually, a new mechanism appears in our framework due to the fact that the blurry image is evaluated in way of data reconstruction. This provides us a way to assess the reconstruction quality among the dictionaries. Thus, the dictionary corresponding to the minimum reconstruction error can be identified, and its clear partner will be picked out for deblurring. 4. Our approach can be naturally extended to multi-phase deblurring. That is, the image can be restored a few times with different pairs of dictionaries. To this end, the first group of the dictionaries can be learned from the clear/blurry patches, and the second group will be learned from the clear patches and the patches restored with the previous dictionaries. The remainder of this paper is organized as follows. Section 2 addresses the problem formulation. Section 3 presents the CDL model. The algorithm is given in Sect. 4, and the experimental results will be reported in Sect. 5. Conclusions will be drawn in Sect. 6.
2 Problem Formulation For image deblurring, it is standard to assume that the observed image G is formulated by subjecting the clear image I to the convolution with a blurring function k as follows: G = k ∗ I + v,
(1)
where v is the pixel-wise additive noise, and “*” is a convolution operator. Our task is to restore I from G. Note that this task is ill-posed as k and I in (1) are both unknown. To solve this task, most traditional methods follow along the line of deconvolution in signal processing. Here we will introduce a learning-based framework to exploit the information hidden in rich image data for image debluring. Without loss of generality and for convenience to explain our idea, we first assume that the blur kernel is known. Thus, images can be convolved with the kernel into blurry ones. Our task now is to learn from the collected clear-blurry pairs of data. To this end, we assume that both the clear and the blurry patches can be linearly reconstructed with their own over-complete dictionary. Thus the learning model can be constructed under sparse representation. We further assume that the sparse coefficients of clear patch and its blurry one with respect to their own dictionary are identical. Technically, this provides a way to deblur new images with the learned dictionaries.
123
Int J Comput Vis
Now we can formally address our task as follows. Let n n and Xb = {xib }i=1 collect n image patches Xc = {xic }i=1 which are randomly selected from the clear and blurry images at the same places. We assume that each image patch contains h × w pixels, where h and w stand for its height and width in pixels. For computation, the pixels in each patch are concatenated into a vector in Rm , with m = h × w. That is, xic and xib will be viewed as two vectors in Rm . Our task is to learn two dictionaries Dc , Db ∈ Rm×d from Xc and Xb , where d is the number of atoms in each dictionary. Two goals need to be achieved in this process. One is that the atoms of the two dictionaries should be aligned oneby-one, and the other is that the magnitudes of the atoms should be identical to each other. Under these two conditions, the reconstruction coefficients can be transferred from blurry images to clear ones.
3 Coupled Dictionary Learning 3.1 The Model Here we will develop the model to learn the dictionaries Dc and Db simultaneously from Xc and Xb . This task could not be replaced by respectively learning from these two sets as their atoms should be well aligned. Let Dc = [d1c , d2c , . . . , ddc ] and Db = [d1b , d2b , . . . , ddb ] collect d atoms {dcj } ⊂ Rm and {dbj } ⊂ Rm . Now we introduce the following learning model: min
Dc ,Db ai ,i=1,...,n
n i=1
n 2 xb − Db ai 2 xic − Dc ai 2 + i 2 i=1
||ai ||0 ≤ s, i = 1, . . . , n, ||dcj ||2 = ||dbj ||2 = 1, j = 1, . . . , d,
s.t.
(2)
where ai ∈ Rd is the coefficient vector, s denotes the maximum number of non-zero entities in ai , ||·||2 is the L 2 -norm, and || · ||0 stands for the L 0 -norm. We see (2) yields a coupled dictionary learning (CDL) model as xic and xib share the same coefficients in ai . Again, it should be mentioned that introducing the equality constraints on the magnitudes of the dictionary atoms is necessary for us to transfer the coefficient vector from blurry image to clear one. For compactness, let Xc = [x1c , x2c , . . . , xnc ] ∈ Rm×n , Xb = [x1b , x2b , . . . , xnb ] ∈ Rm×n , and A = [a1 , a2 , . . . , an ] ∈ Rd×n . Then, we have min
Dc ,Db ,A
s.t.
Xc − Dc A2F + Xb − Db A2F ||ai ||0 ≤ s, i = 1, . . . , n, ||dcj ||2 = ||dbj ||2 = 1, j = 1, . . . , d,
where || · || F denotes the Frobenius norm of matrix.
123
It is obvious that the atoms in Dc and Db are naturally aligned one by one. Let X = [XcT , XbT ]T ∈ R2m×n , and D = [DcT , DbT ]T ∈ R2m×d , then it holds that Xc − Dc A2F + Xb − Db A2F = X − DA2F .
(4)
Thus, the atoms dcj and dbj are now aligned into the jth column of matrix D. As mentioned in the introduction section, the term of “coupled dictionary learning” has recently appeared in vision computing (Yang et al. 2012; Wang et al. 2012; He et al. 2013; Jia et al. 2013). Differently, the constraints on the atoms of dictionaries are all dropped into one type: “|| · ||2 ≤ 1”. Here we use the equality constraints necessarily for deblurring. Finally, we point out that Zeyde et al. (2010) developed a dictionary learning framework for image scale-up. Their framework consists of two phases. In their framework, a dictionary is first learned from low-resolution patches via KSVD (Aharon et al. 2006). Then, another dictionary is sought such that each high-resolution patch is linearly approximated with this dictionary and the sparse reconstruction coefficients of its low-resolution patch by using the lowresolution dictionary. Therefore, the high-resolution dictionary can be obtained via linear regression (Zeyde et al. 2010). For convenience, we abbreviate their approach as “KSVDDictionary Regression” (KSVD-DR). Technically, we can transfer their idea from image scaleup to image deblurring since we are also given two sets of patches.1 Figure 1 gives an example. Figure 1a, b show the clear and blurry dictionaries learned via “KSVD-DR” from 25,000 patches of 11×11 pixels sampled from a bear data set (for more descriptions in Sect. 5.1). Figure 1c, d demonstrate the two dictionaries learned by our approach. By contrast, our approach captures more sharp details (zoom in for details). We also tested the deblurring function of these two groups of dictionaries. Figure 1e illustrates a test image, which is blurred by a Gaussian blur kernel with standard deviation equal to 1.3 and window size equal to 11 × 11 (see Fig. 1f). The deblurred results are given in Fig. 1g, h. As can be been, our approach achieves much better restoration. This also motivates us to develop our framework. 3.2 Solving the Optimization Model Note that in (3) there are two groups of variables A and {Dc , Db } to be solved. Accordingly, we select to solve them alternatively by fixing one for another.
(3) 1
KSVD-DR refines the high-resolution dictionary by using the overlapped patches. Here we can not do this treatment since the patches are sampled from different images.
Int J Comput Vis
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 1 a The clear dictionary obtained by dictionary regression; b the blurry dictionary learned by KSVD; c the clear dictionary learned by our approach; d the blurry dictionary learned by our approach; e a test
min Xb − Db A2F
First, A is solved by fixing Dc and Db . We have min A
Xc − Dc A2F
+ Xb − Db A2F
s.t. ||ai ||0 ≤ s, i = 1, 2, . . . , n.
Db
ai
s.t. ||ai ||0 ≤ s.
(5)
(6)
Problem (6) can be solved with L 1 -norm convex approximation approaches. Here we employ the orthogonal matching pursuit (OMP) (Tropp 2004) and the focal under-determined system solver (FOCUSS) (Gorodnitsky and Rao 1997) to solve it, largely due to the computational efficiency and representation quality in contrast to other methods (Aharon et al. 2006). Next, we consider to solve Dc and Db by fixing A. Formally, it can be divided into two subproblems: min Xc − Dc A2F Dc
s.t. and
||dcj ||2
= 1, j = 1, 2, . . . , d,
(8)
s.t. ||dbj ||2 = 1, j = 1, 2, . . . , d.
Obviously, this task can be decoupled into n independent subproblems: 2 2 min xic − Dc ai 2 + xib − Db ai 2
image; f the blurred image; g the result restored by “KSVD-DR”; h the result restored by our approach (CDL without using multi-phase learning)
(7)
Actually, (7) and (8) can be solved with the Method of Optimal Directions (MOD) (Engan et al. 2000). As mentioned by Aharon et al. (2006), the main drawback in MOD is the second-order computational complexity. In addition, the dictionary is always updated before the reevaluation of the coefficients in A. Such a fully-separated way will largely affect the training speed (Aharon et al. 2006). Thus, we also select to update simultaneously the dictionaries and the coefficients. It is worthy pointing out that such an update of coefficients will not replace the task in (5) where the sparsity of A is identified and then kept unchanged when solving (7) and (8). We follow the line of dictionary updating introduced in Aharon et al. (2006) to solve our model. But, a new need in our task is to align well the atoms. Suppose the j-th atom in Dc and Db will be updated currently. Then, the objective functions in (7) and (8) can be rewritten as 2 −j Xc − Dc A2F = Xc − Dc A− j − dcj a j , (9) F
and
2 −j Xb − Db A2F = Xb − Db A− j − dbj a j , F
(10)
123
Int J Comput Vis −j
where Dc ∈ Rm×(d−1) is a sub-matrix of Dc by deleting −j its j-th column, Db ∈ Rm×(d−1) is a sub-matrix of Db by deleting its j-th column, A− j ∈ R(d−1)×n is a sub-matrix of A by deleting its j-th row, and a j ∈ R1×n is the jth row of matrix A. Now, we mask the zero entities in a j to keep the sparsity of matrix A that is identified via (5). In other words, during the current iteration, the zero entities in a j will always be kept unchanged. Let a j contain p non-zeros entities. We then remove the n − p zero entities in a j , and obtain a new vector j −j j a¯ j ∈ R p . Further let Tc = Xc − Dc A− j and Tb = Xb − −j j Db A− j . Again, the corresponding n − p columns in Tc j and Tb should also be deleted. For clarity, we denote them j j by T¯ c ∈ Rm× p and T¯ b ∈ Rm× p . Then, our task is to solve the following two subproblems (Aharon et al. 2006): 2 ¯ j ca j , s.t. ||dc || = 1, ¯ T min − d c (11) j j 2 c dj
F
(15) holds as (dcj )T dcj = 1 and (dbj )T dbj = 1. Now setting the derivative to be zero, it follows 1 c T¯j j (d j ) Tc + (dbj )T T¯ b . (16) a¯ j = 2 j T¯ b
j Substituting the above rank-one approximations of T¯ c and into (16), we obtain
a¯ j =
1 (σc vc + σb vb ) . 2
(17)
After a¯ j is solved, we can assign its p entities to replace the corresponding p elements in the j-row of A. In this way, the j-th row of A is updated. For clarity, we summarize the steps of our approach in Algorithm 1. In steps 7–13, the d atoms can be updated in a natural or random order way. In addition, the dictionaries can be initialized via k-means clustering approach.
and 2 j min T¯ b − dbj a¯ j , s.t. ||dbj ||2 = 1. dbj
F
(12)
According to matrix theory, (11) and (12) can be solved via performing Singular Value Decomposition (SVD) of matrix. j With rank-one approximation, we have T¯ c ≈ uc σc vcT and j T m T¯ b ≈ ub σb vb , where uc , ub ∈ R are the left-singular vectors, vc , vb ∈ R p are the right-singular vectors, and σc and j j σb are the largest singular values of T¯ c and T¯ b , respectively. Accordingly, we have dcj = uc and dbj = ub . Naturally, we obtain ||dcj ||2 = 1 and ||dbj ||2 = 1. Note that updating dcj and dbj in the above way may cause a sign problem as we need to transfer the coefficients directly between blurry/clear patches for deblurring. Here, we fix uc and select to modify the sign of ub . This treatment can be done as there is only a unique a¯ j in (11) and (12), namely, a¯ j ≈ σc vcT ≈ σb vbT . Then, we can perform the following sign change operation: dbj = −ub , if ||σc vc + σb vb ||2 < ||σc vc − σb vb ||2 .
(13)
If the condition in (13) holds, the sign of vb could be also changed as vb ← −vb for computation. Finally, we need to update a¯ j ∈ R1× p . Given dcj and dbj , it can be solved from the following model: 2 2 j ¯ j b j ¯ a min T¯ c − dcj a¯ j + T − d (14) j . b a¯ j
F
F
Denoting the objective function by G(¯a j ) and taking its derivative with respect to a¯ j , we have ∂G(¯a j ) j j = 4¯a j − 2(dcj )T T¯ c − 2(dbj )T T¯ b . ∂ a¯ j
123
(15)
Finally, we point out that model (3) is non-convex with respect to A, Dc , and Db . Thus, the algorithm can only converge to a local optimum. As analyzed above, (11) and (12) have analytically optimal solutions, and thus each atom is updated alone using the steepest descent directions. However, (6) has no analytical solutions. Therefore, the convergence will largely rely on solving of (6). Here, we employ OMP and FOCUSS to solve it. OMP is a greedy solver, fast but not accurate; while FOCUSS is slower than OMP. Our experiments also indicate that the value of the objective function in (3) vibrates up and down if OMP is used. But, all of the experiments in our implementation show that the value
Int J Comput Vis
of the objective function in (3) strictly decreases if FOCUSS is employed. Figure 8 illustrates this fact.
4 Image Deblurring with Coupled Dictionaries 4.1 Deblurring with Known Blur Kernel Here we assume the blur kernel of the image to be restored is known. In this case, we can collect images similar to it, and obtain blurry images in way of convolution with this kernel. Then, a training set can be obtained by randomly sampling pairs of clear/blurry images. Finally, Algorithm 1 can be employed to train a pair of dictionaries Dc and Db . Let each training patch contain h × w pixels, where h and w denote its height and width in pixels. For pixel q located at the i-th row and j-th column in image I to be restored, we pick out the patch of h × w pixels with pixel q as its center. Denote the concatenated pixels by vector xqb ∈ Rm , with m = h × w. It can be reconstructed as xqb ≈ Db aq ,
(18)
and aq can be solved via the following model: 2 min xqb − Db aq , s.t. ||aq ||0 ≤ s. aq
2
4.2 Deblurring with Multi-phase Dictionary Learning (19)
Note that it is necessary to solve problem (19) for each pixel. Totally, this will take much time for large size image. Alternatively, we can employ the full reconstruction model to replace it: 2 2 min xqb − Db aq + λ aq 2 , s.t. aqT e = 1, (20) aq
2
where λ is a regularization parameter, and e is a vector in Rd with all 1 s. (20) can be easily solved with matrix operation, and has the solution as follows (Xiang et al. 2010): aq =
(λI + XqT Xq )−1 e eT (λI + XqT Xq )−1 e
,
(21)
where Xq = [d1b − xq , d2b − xq , . . . , ddb − xq ] ∈ Rm×d , and I is an d × d identity matrix. Next, we keep aq unchanged, and obtain the restoration of xqb as follows: xqc = Dc aq .
(22)
Finally, we average all the results of the overlapped patches together to fulfill the restoration task. Algorithm 2 lists the above steps in detail. Note that as the window patch move once a pixel and its center is located from ([ w2 ], [ h2 ]) to (C − [ w2 ], R − [ h2 ]), the boundary regions are covered and thus treated naturally.
It is worthy pointing out that Algorithm 1 can be called several times to construct a multi-phase training framework. Actually, given the training data sets {Xc , Xb }, a pair of dictionaries {Dc , Db } can be learned with Algorithm 1. In turn, we can employ them to restore the blurry images by running Algorithm 2. Then new samples can be obtained from the ¯ b , we can restored results. Collecting them into a new set X ¯ get a new training data sets {Xc , Xb }. Thus, a new pair of dic¯ b } can be learned from them via Algorithm ¯ c, D tionaries {D 1. In this way, we can perform the CDL algorithm several times and obtain a few pairs of dictionaries. In application, a blurry image can be deblurred first with ¯ c, D ¯ b }. This treatment dictionaries {Dc , Db } and then with {D can improve the quality of deblurring. 4.3 Deblurring with Unknown Blur Kernel Suppose we are given g groups of training samples, each of which are generated with different blurring functions. Accordingly, we can employ Algorithm 1 to learn g groups of g dictionaries {(Dic , Dib )}i=1 , where Dic is the clear dictionary b and Di is the blurry dictionary. Now each group of the dictionaries will be used to restore each patch of the burry image I. The one with the minimum construction error will be finally selected: arg min i
b b xq − Dib ai,q , 2
(23)
123
Int J Comput Vis
(a)
(b)
(c)
(d)
Fig. 2 Demos of the pairs of the learned dictionaries. a and b, the clear and blurry dictionaries learned from the goose data set; c and d, the clear and blurry dictionaries learned from the tulip data set b is the coeffiwhere xqb is the q-th blurry patch in I, and ai,q cient vector by using the dictionary Dib . After the optimal dictionary is identified, we can pick out its clear dictionary and use the linear reconstruction criterion like (22) to restore the patches. This treatment renders new distinct characteristics in our approach against those traditional ones developed implicitly or explicitly along the line of kernel estimation.
5 Experiments 5.1 Parameter Setting for the CDL Algorithm Except for the number of training samples, Algorithm 1 has two parameters: the number of the atoms in dictionary (d) and the sparsity indicator of the coefficients (s). Here we employ six image data sets to evaluate their performances, including four data sets from the CMU-Cornell iCoseg database (Batra et al. 2010, 2011), one data set from the Corel5K image database,2 and one data set constructed by ourselves. Specifically, the four ones in the CMU-Cornell iCoseg datasets are the “Alaskan Brown Bear” (bear), the “Goose-Riverside” (goose), the “Kite-kitekid” (kite), and the “Monks-LAO” (monk). These sets contain 19, 31, 10, and 17 images, respectively. The data set in Corel5K is the “plantart”, which is comprised of 100 images. The data set constructed by ourselves includes 13 tulip flower images. Half of the images in each data set are employed as training samples, and the rest images are used as test samples. First, seven training sets are constructed to evaluate the performance of the number of training samples. Specifically, each training image (grayscale) is blurred by a Gaussian kernel with parameter (11, 1.3), where the first value stands for the kernel size in pixels and the second value denotes its standard deviation (σ ). Then, seven sets including 10,000, 15,000, 20,000, 25,000, 30,000, 35,000 and 40,000 patches 2
Available at: http://msm.cais.ntu.edu.sg/LSCBIR.
123
of 11 × 11 pixels are randomly sampled from the clear and blurry images at the same positions. Each patch is concatenated into a vector, and thus in Algorithm 1 m = 121. Finally, a pair of dictionaries are learned from each of the above seven sets, by fixing d = 400 and s = 200. Figure 2 shows two pairs of dictionaries learned respectively from the 25,000 pairs of clear and blurry patches, which are randomly sampled from the “goose” and “tulip” data sets. The clear atoms are shown in Fig. 2a, c, while the blurry atoms are illustrated in Fig. 2b, d. We see that the atoms are aligned well one by one. That is, given an atom in the clear dictionary, one can found its partner in the blurry dictionary in the same position, and vise verse. Such a property provides us with a desired mechanism to transfer the reconstruction coefficients between blurry/clear patches. In addition, Fig. 2 also indicates that the atoms in the clear dictionary keep more sharp details. Thus, we can first reconstruct the blurry image with the blurry dictionary and then restore the image by employing the clear dictionary to render sharp details. To evaluate the performance of the learned dictionaries, the test images are blurred with the same Gaussian kernel. Figures 3 and 4 show six images restored by the dictionaries learned from the six training sets with 25,000 samples. They are all significantly restored. The peak signal-to-noise ratio (PSNR) and the Structural SIMilarity (SSIM) index (Wang and Li 2011) are taken to evaluate quantitatively the performance. The results obtained from one test image per data set are reported. These test images are shown in the first column in Figs. 3 and 4. Figure 5a reports the PNSR curves of the deblurred results, while Fig. 5b shows the SSIM curves. We see that the curves keep (roughly) horizontally. Next, we evaluate the performance of the parameter d by taking d = 250, 300, 350, 400, and 450, respectively. In experiments, the number of the training samples will be kept as 25,000, and s is set as 200. Each training set will be learned five times with different d. Then, the dictionaries are used respectively to restore the images. Figure 6a, b show
Int J Comput Vis
Fig. 3 Demo I: Three deblurred results with the dictionaries learned from the bear, goose and kite data sets
the PSNR and SSIM curves of the six images used in Figs. 3 and 4, indicating no significant changes within the curves. Finally, we evaluate the performance of the parameter s. To this end, we take s = 120, 160, 200, 240, and 280, respectively. The number of the training samples is kept as 25,000, and d is set as 400. Totally, five groups of dictionaries will be learned from each training set. Then, the six blurry images used in Figs. 3 and 4 are restored with the corresponding dicFig. 5 Image accuracy. The dictionaries are learned with different numbers of samples. a PSNR curves; b SSIM curves
Fig. 4 Demo II: Three deblurred results with the dictionaries learned from the monk, plantart and tulip data sets
tionaries. Figure 7a, b show their PSNR and SSIM curves. As can be seen from Fig. 7, there are also no significant changes within the curves.
35
1
0.8
25
SSIM
PSNR
30
20 bear goose kite
15 10
1
1.5
2
2.5
monk plantart tulip
3
0.6 bear goose kite
0.4
3.5
0.2
4 4
1
Number of training samples (×10 )
1.5 2 2.5 3 3.5 4 Number of training samples (×104)
(a) Fig. 6 Image accuracy. The dictionaries are learned with different d. a PSNR curves; b SSIM curves
(b) 1
35
0.8
25
SSIM
PSNR
30
20 bear goose kite
15 10
monk plantart tulip
2.5
3
3.5
monk plantart tulip
4
0.6 bear goose kite
0.4
4.5 2
0.2
2.5
3
3.5
monk plantart tulip
4
4.5 2
Number of atoms in each dictionary (×10 )
Number of atoms in each dictionary (×10 )
(a)
(b)
123
Int J Comput Vis Fig. 7 Image accuracy. The dictionaries are learned with different s. a PSNR curves; b SSIM curves
35
1
0.8
25
SSIM
PSNR
30
20 bear goose kite
15 10
1.2
1.6
monk plantart tulip
2
2.4
2.8
Number of non−zero entities in SR (×10 )
Value of objective function
5
bear goose kite
4
monk plantart tulip
3
2
1 1
40
80
120
160
200
Number of iterations
Fig. 8 The values of the objective function in (2) with respect to the number of iterations. The algorithm is convergent at the 138th, 110th and 87th iteration for “bear”, “goose” and “kite”, due to the condition at step 16 in Algorithm 1
5.2 Convergence As pointed in Sect. 3.2, the optimization model in (3) is not convex, and the convergence is relied on the solving of the sparse representation. We tried to employ OMP and FOCUSS to solve it. As pointed by Aharon et al. (2006), OMP is fast but its performance is not as good as that of FOCUSS. We observed that the value of the objective function changes up and down during iterations when OMP is executed, and it decreases when FOCUSS is employed. Figure 8 illustrates the curves of the objective function in the case that FOCUSS is implemented. Here, the 10,000 training samples for each data set is employed, with d = 400 and s = 120. The maximum number of iterations is set to 200 when running Algorithm 1. It can be seen that the curves are decreasing with the increase of iteration number. 5.3 Comparisons: Delurring Images As a classical deblurring algorithm, Richardson–Lucy algorithm (RL) (Lucy 1974; Richardson 1972) will be taken
123
bear goose kite
0.4
2
(a)
0.6
0.2
1.2
1.6
monk plantart tulip
2
2.4
2.8 2
Number of non−zero entities in SR (×10 )
(b)
as the baseline for comparison. In addition, five popularly used algorithms, namely, the Weighted Least Squares (WLS), the Sparse Prior (SP) (Levin et al. 2007), the Iteratively Reweighted Norm (IRN) minimization approach for the generalized TV (Rodriguez and Wohlberg 2009), the Alternating Direction Method of Multipliers (ADMM) (Afonso et al. 2010), and the Two-step Iterative Shrinkage/Thresholding (TwIST) (Bioucas-Dias and Figueiredo 2007), are also compared. Here, the source image is blurred by a Gaussian kernel with kernel size equal to 17 × 17 and the standard deviation equal to 2.1. The true blur kernel is supplied to the RL, WLS, SP, IRN, ADMM, and TwIST algorithms to guarantee that they can generate the best results. In addition, when running the SP algorithm, the regularization parameter λ was supplied respectively with the values in {0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.008, 0.01, 0.02, 0.04, 0.06, 0.08}. When running the IRN, ADMM, and TwIST algorithms, the regularization parameter λ was supplied respectively with the values in {0.0001, 0.001, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 2, 3, 4, 5, 6, 7, 8, 9}. The one with the highest PSNR is output for comparison. To run our algorithm, we sampled 90,000 pairs of patches with 17 × 17 pixels from the clear and burry images in each training set. In experiments, we set the number of the atoms in each dictionary to be 700 and take the number of non-zeros coefficients equal to 350 for dictionary learning. Here a high non-sparsity level for dictionary learning is supplied such that a pair of dictionaries are sought to approximate each of the training samples as exactly as possible. Figure 9 illustrates the deblurred results of six images, each of which belongs to the set of test samples. From Fig. 9, we see the results obtained by our algorithm are more clear, compared with those obtained by RL, WLS and SP. In contrast, TwIST, ADMM and IRN generate more clear edges. But, the results obtained by IRN render obviously staircasing effects. This can be perceived from the results of the “bear”, “kite” and “monk” images in Fig. 9 (zoom in). In addition, Table 1 gives a quantitative comparison in terms of PSNR
Int J Comput Vis
Fig. 9 Results deblurred from blurry images with known blur kernels by running RL, WLS, SP, Twist, ADMM, IRN and our algorithm CDL (with the learned three-pairs of dictionaries). The sizes of the bear,
goose, kite, monk, and tulip images are 300 × 300 pixels, while the size of the plantart image is 256 × 320, respectively
123
Int J Comput Vis Table 1 PSNR and SSIM results by applying different methods to the images in Fig. 9 RL
WLS
SP
TwIST
ADMM
IRN
CDL
CDL-2
CDL-3
Bear
21.1170
26.1910
26.4830
32.5686
28.3716
26.5221
26.4597
26.6106
26.5994
Goose
19.2928
33.2779
34.9224
33.8655
38.5345
35.9660
33.2371
34.2106
34.4956
Kite
18.7363
22.7197
23.4443
33.5136
25.2456
24.0023
22.7890
23.2994
23.4556
Monk
22.2029
32.7168
33.1911
25.6948
35.9823
32.8605
32.4100
33.0866
33.3239
Plantart
21.0015
24.1207
24.8084
22.0525
24.5984
25.1008
24.3880
24.8382
24.9286
Tulip
22.0369
25.1273
26.1101
29.1558
30.2230
26.2440
25.8293
26.0643
26.0745
Bear
0.6747
0.6889
0.6958
0.9311
0.8240
0.7031
0.7103
0.7221
0.7212
Goose
0.8057
0.9426
0.9545
0.9802
0.9760
0.9582
0.9425
0.9464
0.9478
Kite
0.6453
0.7090
0.7319
0.9300
0.8006
0.7462
0.6881
0.7033
0.7076 0.9318
PSNR
SSIM
Monk
0.8091
0.9258
0.9288
0.9600
0.9603
0.9263
0.9229
0.9301
Plantart
0.7222
0.7627
0.7882
0.7648
0.7922
0.8001
0.7779
0.7963
0.7983
Tulip
0.8106
0.8062
0.8380
0.9736
0.9406
0.8479
0.8368
0.8432
0.8425
The bold values mean the largest ones
(a)
(b)
(c)
(d)
Fig. 10 Demos of the pairs of the learned dictionaries. a and b, the clear and “blurry + noisy” dictionaries learned from the goose data set; c and d, the clear and “blurry + noisy” dictionaries learned from the tulip data set
and SSIM measures. The number in Table 1 is obtained by taking the ground truth as a reference. In experiments, we also evaluated the performance of the multi-phase dictionary learning described in Sect. 4.2. In Table 1, “CDL-2” means two pairs of the dictionaries are learned, while “CDL-3” means three pairs of the dictionaries are totally learned. We see the approaches with both “CDL-2” and “CDL-3” achieve better results than that with one pair of dictionaries (CDL). In addition, we see our algorithm achieves more accurate results than RL, WLS and SP. Comparatively, TwIST, ADMM and IRN outperform our algorithm as they actually employ more prior information to address the deblurred problem with known blur kernels. As for computation time, for image with 300×300 pixels, RL, WLS, SP, TwIST, ADMM, and IRN will take about 5.8, 53.1, 20.5, 22.2, 5.1, and 41.1 s, respectively, using Matlab 7.0 on a PC with 3.0 GHz CPU and 4.0 G RAM. In con-
123
trast, we see our algorithm will take a few hours (if running FOCUSS for (19)) or fifty minutes (if solving (20)). The main reason lies in that our algorithm needs to restore each local patch via linear reconstruction. 5.4 Comparisons: Restoring Blurry and Noisy Images In this subsection, we test the challenging problem of restoring the blurry and noisy images. Here, the source clear images used for training are blurred first by a Gaussian kernel with kernel size equal to 11 × 11 and the standard deviation equal to 1.3, and then further degraded by additive Gaussian noise with variation equal to 0.001. For each of the six data sets used in Sect. 5.3, 150,000 samples are randomly selected from the clear and “blurry+noisy” training images. Then, Algorithm 1 is employed to learn the dictionaries. Figure 10a, b show the clear and “blurry+noisy” dictionaries learned from the goose data set; Fig. 10c, d illustrate the clear and “blurry+noisy”
Int J Comput Vis
Fig. 11 Demo I: results restored from blurry and noisy images
dictionaries learned from the tulip data set. By contrast to those atoms demonstrated in Fig. 2, the “blurry+noisy” atoms contain more noise. For test images which are degraded with the same blur kernel and Gaussian noise, Algorithm 2 will be employed. In experiments, we use FOCUSS to solve problem (19). We observed that the noise can be well removed when taking the maximum number of non-zero entities between 5 and 10. Unfortunately, the blur still exists in the restored results. To remedy this drawback, the results of the training images restored with the previously-learned dictionaries are collected as the new samples. Then, a new pair of dictionaries can be learned. It has been observed that this pair of dictionaries are very similar to those obtained in the case that no noise is considered. Thus, one alternative way is to employ the existing dictionaries learned from the original clear/blurry patches for this task. Finally, these two pairs of dictionaries are utilized to restore the image.
For image denoising task, we compared our algorithm with the fast Project-Newton (PN) dual method for TV regularization (Barbero and Sra 2011), IRN, TwIST, ADMM, the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) (Beck and Teboulle 2009b) for TV-based restoration, and Block-Matching and 3D-filtering (BM3D) approach (Dabov et al. 2007; Danielyan et al. 2012). In addition, the Nonlocally Centralized Sparse Representation (NCSR) developed by Dong et al. (2013) is also compared. To conduct a fair comparison, when running the PN algorithm, the regularization parameter λ changes in {0.2, 0.4, 0.6, 0.8, 1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20}. For the IRN, TwIST and ADMM algorithms, the regularization parameter λ was supplied respectively with the values in { 0.0001, 0.001, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 2, 3, 4, 5, 6, 7, 8, 9}. For FISTA, it was tested with values in {0.0001, 0.001, 0.01, 0.02, 0.04, 0.06, 0.08, √ 0.1, 0.5, 1.0} × σ . For the PN, IRN and FISTA algorithms,
123
Int J Comput Vis
Fig. 12 Demo II: results restored from blurry and noisy images
among the denoised images obtained with different parameters, the one with the highest PSNR is reported here for comparison. The default parameters are employed to run the NCSR. When running the BM3D algorithm, besides the noise level parameter σ , all the default parameters are employed in experiments. Figures 11, 12, and 13 demonstrate the restored results of the images used in Fig. 9. Table 2 lists the PSNR and SSIM values, indicating IRN, BM3D and ADMM outperform our algorithm. By contrast, our algorithm achieves better results on most images, compared with PN, FISTA, and TwIST. 5.5 Comparisons: Deblurring Images with Multiple Blur Kernels Here the test images are divided into four regions, which are illustrated via the upper-right image in each panel in
123
Figs. 14, 15, and 16 and labeled as “Region”. Here, the green region is blurred by Gaussian blur kernel with standard deviation σ = 0.9 and window size equal to 9 × 9 pixels. The red region is blurred by Gaussian blur kernel with σ = 1.3 and window size equal to 11 × 11 pixels. The blue region is blurred by Gaussian blur kernel with σ = 1.7 and window size equal to 15 × 15 pixels. The gray region is blurred by Gaussian blur kernel with σ = 2.1 and window size equal to 17 × 17 pixels. The blurred images are shown in the middle column of the first row in each panel in Figs. 14, 15, and 16. Accordingly, four training data sets are constructed by blurring the training images in six data sets with four blur kernels. Each training set includes 25,000 samples of clear/blurry image patches. As the clear and blurry patches are concatenated into a vector, the dimensionality of training data points in model (3) will be equal to 162, 242, 450, and 578. We set the number of atoms in each dictionary to
Int J Comput Vis
Fig. 13 Demo III: results restored from blurry and noisy images Table 2 PSNR and SSIM results by applying different methods to the images in Figs. 11, 12, and 13 PN
IRN
FISTA
BM3D
ADMM
TwIST
NCSR
CDL
Bear
25.4883
25.8071
24.6075
26.5542
26.3497
25.9836
26.2675
25.7934
Goose
31.8328
34.3065
23.4159
33.2297
34.4106
28.5170
31.1629
33.2450
Kite
22.3106
23.7964
16.8168
24.1342
24.0525
23.5406
24.4435
23.0739
Monk
30.8561
31.6322
21.8776
32.7133
32.1612
21.5589
30.9527
31.3550
Plantart
23.5536
24.7570
17.5007
25.5684
21.8319
19.4256
25.4474
24.4055
Tulip
24.2615
25.7896
23.0669
26.7044
26.5246
24.0089
26.4043
25.7399
Bear
0.6155
0.6657
0.6211
0.6919
0.6825
0.6836
0.6746
0.6356
Goose
0.9159
0.9306
0.8968
0.9192
0.9252
0.9145
0.6963
0.8864
Kite
0.6128
0.6556
0.5312
0.7271
0.7003
0.6750
0.6669
0.6202
Monk
0.8585
0.8846
0.8256
0.8982
0.8873
0.7763
0.7853
0.8761
Plantart
0.6829
0.7349
0.5741
0.8018
0.5740
0.7101
0.7423
0.7265
Tulip
0.7273
0.7966
0.7178
0.8350
0.8300
0.8177
0.8128
0.7786
PSNR
SSIM
The bold values mean the largest ones
123
Int J Comput Vis
Fig. 14 Demo I: results deblurred from images degraded with multiple blur kernels. The size of the bear and goose images is 500 × 300 pixels
about 1.2 times of the dimensions. Specifically, the numbers of the dictionary atoms learned from the four training data sets are equal to 200, 300, 550, 700. In experiments, we
123
take the sparsity indictor s to be equal to half of the number of the dictionary atoms, namely, s = 100, 200, 275 and 350. In this way, four groups of dictionaries were learned.
Int J Comput Vis
Fig. 15 Demo II: results deblurred from images degraded with multiple blur kernels. The size of the kite and monk images are 426 × 278 and 500 × 310 pixels
Finally, the approach described in Sect. 4.3 is used to deblur the images. As RL, WLS, SP, TwIST, BM3D, ADMM, and IRN need to know the blur kernel, we run them four times by supplying one of the four kernels each time. The same parameters in Sect. 5.3 are employed, and the one with the best PNSR is reported here. The PNSR and SSIM values are given in Tables 3, 4, 5, and 6. As BM3D and TwIST gen-
erate unsatisfactory results, here we do not report them. The averaged PNSR and SSIM values are listed in Table 7. We see our algorithm achieves the best performance on most cases. To further compare these algorithms, the average kernel is employed. In this case, it is taken as a Gaussian blur kernel with σ = 1.5 and window size equal to 15 × 15 pixels. Figures 14, 15, and 16 illustrate the deblurred results, and
123
Int J Comput Vis
Fig. 16 Demo III: results deblurred from images degraded with multiple blur kernels. The size of the kite and monk images are 300 × 286 and 500 × 362 pixels
123
Int J Comput Vis Table 3 PSNR and SSIM results by applying different methods to the images in Figs. 14, 15, and 16 PSNR
SSIM
RL
WLS
SP
ADMM
IRN
CDL
RL
WLS
SP
Bear
25.468
25.481
25.975
24.823
26.220
25.510
0.646
0.631
0.645
Goose
28.194
28.347
28.545
25.587
28.398
28.878
0.848
0.849
0.851
Kite
20.293
20.287
20.390
20.043
20.406
20.385
0.569
0.562
0.568
Monk
27.140
27.330
27.383
26.130
27.212
28.465
0.810
0.810
0.812
Plant
25.178
25.975
25.918
25.130
25.770
26.818
0.832
0.844
0.838
Tulip
22.202
22.221
22.221
21.736
22.157
23.549
0.735
0.733
0.733
ADMM
IRN
CDL
0.631
0.659
0.659
0.813
0.840
0.868
0.553
0.574
0.565
0.783
0.797
0.860
0.809
0.829
0.864
0.720
0.726
0.809
The bold values mean the largest ones The Gaussian blur kernel with σ = 0.9 and window size equal to 9 × 9 pixels are supplied for RL, WLS, SP, ADMM, and IRN Table 4 PSNR and SSIM results by applying different methods to the images in Figs. 14, 15, and 16 PSNR
SSIM
RL
WLS
SP
ADMM
IRN
CDL
RL
WLS
SP
ADMM
IRN
CDL
Bear
24.450
24.841
24.576
23.187
24.404
25.510
0.645
0.640
0.600
0.552
0.589
0.659
Goose
27.700
28.375
28.367
23.477
28.258
28.878
0.850
0.855
0.847
0.743
0.847
0.868
Kite
20.288
20.494
20.667
19.661
20.678
20.385
0.579
0.581
0.596
0.522
0.576
0.565
Monk
27.215
27.768
27.676
25.357
27.628
28.465
0.827
0.831
0.822
0.759
0.817
0.860
Plant
25.195
26.593
26.503
24.584
26.239
26.818
0.838
0.859
0.849
0.776
0.844
0.864
Tulip
22.897
22.902
22.880
21.700
22.727
23.549
0.774
0.770
0.770
0.716
0.759
0.809
The bold values mean the largest ones The Gaussian blur kernel with σ = 1.3 and window size equal to 11 × 11 pixels are supplied for RL, WLS, SP, ADMM, and IRN Table 5 PSNR and SSIM results by applying different methods to the images in Figs. 14, 15, and 16 PSNR
SSIM
RL
WLS
SP
ADMM
IRN
CDL
RL
WLS
SP
ADMM
IRN
CDL
Bear
23.207
23.352
23.531
21.582
23.254
25.510
0.618
0.617
0.553
0.473
0.439
0.659
Goose
26.450
27.338
27.560
22.015
27.337
28.878
0.839
0.846
0.836
0.677
0.796
0.868
Kite
19.746
20.189
20.219
18.664
20.281
20.385
0.549
0.560
0.548
0.451
0.562
0.565
Monk
26.925
27.483
27.477
24.406
27.311
28.465
0.832
0.837
0.816
0.697
0.823
0.860
Plant
24.131
25.665
25.975
23.555
25.633
26.818
0.805
0.838
0.837
0.718
0.786
0.864
Tulip
23.227
23.255
23.079
21.011
22.862
23.549
0.794
0.791
0.780
0.669
0.778
0.809
The bold values mean the largest ones The Gaussian blur kernel with σ = 1.7 and window size equal to 15 × 15 pixels are supplied for RL, WLS, SP, ADMM, and IRN Table 6 PSNR and SSIM results by applying different methods to the images in Figs. 14, 15, and 16 PSNR
SSIM
RL
WLS
SP
ADMM
IRN
CDL
RL
WLS
SP
ADMM
IRN
CDL
Bear
22.338
21.800
22.344
20.457
22.954
25.510
0.586
0.575
0.526
0.424
0.421
0.659
Goose
25.057
25.655
26.198
20.547
26.679
28.878
0.811
0.814
0.819
0.599
0.785
0.868
Kite
18.997
19.375
19.581
17.742
19.770
20.385
0.507
0.513
0.510
0.372
0.469
0.565
Monk
26.002
26.211
26.749
23.325
26.260
28.465
0.812
0.815
0.809
0.634
0.716
0.860
Plant
22.437
23.452
24.308
22.230
4.313
26.818
0.719
0.760
0.797
0.628
0.313
0.864
Tulip
22.642
22.534
22.518
20.004
21.955
23.549
0.778
0.775
0.765
0.606
0.767
0.809
The bold values mean the largest ones The Gaussian blur kernel with σ = 2.1 and window size equal to 17 × 17 pixels are supplied for RL, WLS, SP, ADMM, and IRN
123
Int J Comput Vis Table 7 Averaged PSNR and SSIM results with the four kernels by applying different methods to the images in Figs. 14, 15, and 16 PSNR
SSIM
RL
WLS
SP
ADMM
IRN
CDL
RL
WLS
SP
ADMM
IRN
CDL
Bear
23.866
23.869
24.106
22.512
24.208
25.510
0.624
0.616
0.581
0.520
0.527
0.659
Goose
26.850
27.429
27.667
22.907
27.668
28.878
0.837
0.841
0.838
0.708
0.817
0.868
Kite
19.831
20.086
20.214
19.028
20.284
20.385
0.551
0.554
0.556
0.474
0.545
0.565
Monk
26.820
27.198
27.321
24.805
27.103
28.465
0.820
0.823
0.815
0.718
0.788
0.860
Plant
24.235
25.422
25.676
23.875
20.489
26.818
0.798
0.825
0.830
0.733
0.693
0.864
Tulip
22.742
22.728
22.674
21.113
22.425
23.549
0.770
0.767
0.762
0.678
0.757
0.809
The bold values mean the largest ones Table 8 PSNR and SSIM results by applying different methods to the images in Figs. 14, 15, and 16 PSNR
SSIM
RL
WLS
SP
ADMM
IRN
CDL
RL
WLS
SP
ADMM
IRN
CDL
Bear
23.786
24.140
24.077
22.186
23.696
25.510
0.633
0.632
0.563
0.515
0.578
0.659
Goose
27.109
27.972
28.032
22.668
27.716
28.878
0.846
0.853
0.840
0.709
0.846
0.868
Kite
20.065
20.417
20.473
19.144
20.566
20.385
0.567
0.576
0.573
0.486
0.573
0.565
Monk
27.128
27.745
27.610
24.953
27.599
28.465
0.832
0.837
0.818
0.732
0.823
0.860
Plant
24.799
26.354
26.374
24.219
26.020
26.818
0.830
0.854
0.843
0.750
0.845
0.864
Tulip
23.151
23.170
23.078
21.453
22.895
23.549
0.788
0.784
0.781
0.704
0.772
0.809
The bold values mean the largest ones The averaged Gaussian blur kernel with σ = 1.5 and window size equal to 15 × 15 pixels are supplied for RL, WLS, SP, ADMM, and IRN
Table 8 reports the PSNR and SSIM values. We see better performance is achieved by our algorithm. It is worthy pointing out that our approach need not take the blur kernel as input parameter after the dictionaries are learned. Considering this fact, we further evaluate the performance via blind deconvolution. Here we employ one of the most sophisticated blind deconvolution algorithms developed by Shan et al. (2008)3 for comparison. To run their algorithm, the size of blur kernel changes in {7 × 7, 9 × 9, . . . , 27 × 27}. With them, the result with the highest PNSR is reported. Figures 17 and 18 show the restorations. Table 9 lists the PSNR and SSIM values. We see in most cases our approach achieves better results. Finally, Table 10 summarizes the comparisons between our approach and the state-of-the-art algorithms. From this table, we see ADMM, BM3D and NCSR outperform our algorithm in the cases of single blur kernels, which is comparable to FISTA, IRN, and TwIST as a whole. A good characteristic of our approach lies in that it could generate more satisfactory results in the cases of multiple blur kernels. Furthermore, if a deblurring model is explicitly or implicitly constructed on the blurring function or the blur kernel needs to be supplied as parameter during restoration, we divide it 3
The executable file is downloaded from the webpage: http://www.cse. cuhk.edu.hk/~leojia/programs/deblurring/deblurring.htm.
123
into the category of Signal Processing (SP) based algorithms. By contrast, our algorithm is constructed on a learning-based framework where the training phase and test phase are separated from each other.
6 Conclusion This paper has developed a learning-based algorithm for image deblurring, which is addressed in terms of coupled dictionary learning. With this model, a pair of dictionaries are learned from the clear/burry patches. Among these dictionaries, one is able to describe clear patches, and the other can represent blurry ones. Given a blurred image, it is first reconstructed linearly with the blurry dictionary, and then the reconstruction coefficients are kept unchanged for deblurring by taking the clear dictionary as input. We have demonstrated the performance of the parameters included in the model of coupled dictionary learning, compared our algorithm with the state-of-the-art methods on the images that are blurred with a unique blur kernel, noise, and multiple blur kernels for different image regions. Comparative experiments indicate the validity of our method. In the future, we would like to introduce parallel computing techniques to speed up our algorithm. In addition, we
Int J Comput Vis Fig. 17 Demo I: Results restored by blind deconvolution (Shan et al. 2008) and our approach
Fig. 18 Demo II: Results restored by blind deconvolution (Shan et al. 2008) and our approach
123
Int J Comput Vis Table 9 PSNR and SSIM results with Shan et al.’ (2008) method and CDL on the images in Figs. 14, 15, and 16
The bold values mean the largest ones
PSNR Shan’s
CDL
SSIM Shan’s
CDL
Bear
25.671
Goose
27.244
25.510
0.649
0.659
28.878
0.842
Kite
0.868
20.404
20.385
0.583
0.565
Monk
26.703
28.465
0.810
0.860
Plantart
24.899
26.818
0.841
0.864
Tulip
22.255
23.549
0.751
0.809
Table 10 Comparisons our approach with the state-of-the-art approaches
RL
Accu. of deblurring
Accu. of deblurring with noise
Can deblur with multi-kernels
Can do multiphase deblurring
Need to input the blurring parameters
Category of approach
Low
–
×
×
Yes
Signal processing (SP)
WLS
Low
–
×
×
Yes
SP + regularizer
SP
Medium
Medium
×
×
Yes
SP + sparsity prior
IRN
Medium
Medium
×
×
Yes
SP + TV prior
FISTA
–
Medium
×
×
Yes
SP + TV prior
TwIST
High
Medium
×
×
Yes
SP + regularizer
ADMM
High
High
×
×
Yes
SP + regularizer
BM3D
High
High
×
×
Yes
SP + spare representation
NCSR
High
High
×
×
Yes
Our
Medium
Medium
√
√
SP + spare representation
No
Machine learning
Here “–” means it was not evaluated
would also like to integrate some prior knowledge into our framework to reconstruct the patches in a batch mode and develop nonlinear mappings between sparse representations to enhance the quality. Acknowledgments This work was supported by the National Natural Science Foundation of China (Grant Nos. 61272331, 91338202, 61370039, and 91120301).
References Afonso, M. V., Bioucas-Dias, J. M., & Figueiredo, M. (2010). Fast image recovery using variable splitting and constrained optimization. IEEE Transactions on Image Processing, 19(9), 2345–2356. Aharon, M., Elad, M., & Bruckstein, A. M. (2006). K-svd: An algorithm for designing of over-complete dictionaries for sparse representationn. IEEE Transactions on Signal Processing, 54(11), 4311–4322. Bar, L., Sochen, N., & Kiryati, N. (2007). Restoration of images with piecewise space-variant blur. In International conference on scale space and variational methods in computer vision (pp. 533–544). Ischia, Italy. Barbero, A., & Sra, S. (2011). Fast newton-type methods for total variation regularization. In International conference on machine learning (pp. 313–320). Bellevue, Washington, USA.
123
Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). icoseg: Interactive co-segmentation with intelligent scribble guidance. In IEEE conference on computer vision and pattern recognition (pp. 3169–3176). San Francisco, CA, USA. Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2011). Interactively co-segmentating topically related images with intelligent scribble guidance. International Journal of Computer Vision, 93(3), 273–292. Beck, A., & Teboulle, M. (2009a). Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing, 18(11), 2419–2434. Beck, A., & Teboulle, M. (2009b). A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202. Bioucas-Dias, J. M., & Figueiredo, M. (2007). A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Transactions on Image Processing, 16(12), 2992–3004. Bronstein, M. M., Bronstein, A. M., Zibulevsky, M., & Zeevi, Y. Y. (2005). Blind deconvolution of images using optimal sparse representations. IEEE Transactions on Image Processing, 14(6), 726–736. Chantas, G. K., Galatsanos, N. P., Molina, R., & Katsaggelos, A. K. (2010). Variational bayesian image restoration with a product of spatially weighted total variation image priors. IEEE Transactions on Image Processing, 19(2), 351–362. Couzinie-Devy, F., Mairal, J., Bach, F., & Ponce, J. (2011). Dictionary learning for deblurring and digital zoom. CoRR. arXiv:1110.0957.
Int J Comput Vis Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3d transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8), 2080–2095. Danielyan, A., Katkovnik, V., & Egiazarian, K. (2012). Bm3d frames and variational image deblurring. IEEE Transactions on Image Processing, 21(4), 1715–1728. Dong, W., Zhang, L., Shi, G., & Li, X. (2013). Nonlocally centralized sparse representation for image restoration. IEEE Transactions on Image Processing, 22(4), 1620–1630. Dong, W., Zhang, L., Shi, G., & Wu, X. (2011). Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. IEEE Transactions on Image Processing, 20(7), 1838–1857. Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745. Elad, M., Figueiredo, M., & Ma, Y. (2010). On the role of sparse and redundant representations in image processing. Proceedings of the IEEE, 98(6), 972–982. Engan, K., Aase, S. O., & Husoy, J. H. (2000). Multi-frame compression: Theory and design. EURASIP Signal Processing, 80(10), 2121– 2140. Fergus, R., Singh, B., Hertzmann, A., Roweis, S. T., & Freeman, W. T. (2006). Removing camera shake from a single photograph. ACM Transactions on Graphics (SIGGRAPH), 25(3), 787–794. Goldluecke, B., & Cremers, D. (2010). An approach to vectorial total variation based on geometric measure theory. In IEEE international conference on computer vision and pattern recognition (pp. 327– 333). San Francisco, CA, USA. Gorodnitsky, I. F., & Rao, B. D. (1997). Sparse signal reconstruction from limited data using focuss: A re-weighted norm minimization algorithm. IEEE Transactions on Signal Processing, 45(3), 600–616. He, L., Qi, H., & Zaretzki, R. (2013). Beta process joint dictionary learning for coupled feature spaces with application to single image super-resolution. In IEEE conference on computer vision and pattern recognition (pp. 345–352). Portland, Oregon, USA. Hu, Z., Huang, J. B., & Yang, M. H. (2010). Single image deblurring with adaptive dictionary learning. In IEEE international conference on image processing (pp. 1169–1172). Hong Kong, China. Jenatton, R., Mairal, J., Obozinski, G., & Bach, F. (2011). Proximal methods for hierarchical sparse coding. Journal of Machine Learning Research, 12, 2297–2334. Jia, K., Wang, X., & Tang, X. (2013). Image transformation based on learning dictionaries across image spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2), 367–380. Joshi, N., Kang, S. B., Zitnick, C. L., & Szeliski, R. (2010). Image deblurring using inertial measurement sensors. In SIGGRAPH (pp. 1–8). Los Angeles, CA, USA. Joshi, N., Zitnick, C. L., Szeliski, R., & Kriegman, D. J. (2009). Image deblurring and denoising using color priors. In International conference on computer vision and pattern recognition (pp. 1–8). Miami, FL, USA. Kamilov, U., Bostan, E., & Unser, M. (2012). Wavelet shrinkage with consistent cycle spinning generalizes total variation denoising. IEEE Signal Processing Letters, 19(4), 187–190. Kenig, T., Kam, Z., & Feuer, A. (2010). Blind image deconvolution using machine learning for three-dimensional microscopy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2191–2204. Levin, A., Fergus, R., Durand, F., & Freeman, W. T. (2007). Coded exposure photography: Motion deblurring using fluttered shutter. ACM Transactions on Graphics (SIGGRAPH), 26(3), 70:1–70:9. Levin, A., Weiss, Y., Durand, F., & Freeman, W. T. (2009). Understanding and evaluating blind deconvolution algorithms. In International conference on computer vision and pattern recognition (pp. 1–8). Miami, FL, USA.
Li, D., Mersereau, R. M., & Simske, S. (2007). Blind image deconvolution through support vector regression. IEEE Transactions on Neural Networks, 18(3), 931–935. Lou, Y., Bertozzi, A. L., & Soatto, S. (2011). Direct sparse deblurring. Journal of Mathematical Imaging and Vision, 39(1), 1–12. Lucy, L. (1974). An iterative technique for the rectification of observed distributions. The Astronomical Journal, 79(6), 745–754. Ma, L., Moisan, L., Yu, J., & Zeng, T. (2013). A dictionary learning approach for poisson image deblurring. IEEE Transactions on Image Processing, 32(7), 1277–1289. Ma, S., Yin, W., Zhang, Y., & Chakraborty, A. (2008). An efficient algorithm for compressed MR imaging using total variation and wavelets. In IEEE international conference on computer vision and pattern recognition (pp. 1–8). Anchorage, Alaska, USA. Mairal, J., Bach, F., & Ponce, J. (2012). Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 791–804. Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11, 19–60. Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2009). Non-local sparse models for image restoration. In IEEE international conference on computer vision (pp. 2272–2279). Kyoto, Japan. Mairal, J., Elad, M., & Sapiro, G. (2008). Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17(1), 53–69. Mairal, J., Jenatton, R., Obozinski, G., & Bach, F. (2011). Convex and network flow optimization for structured sparsity. Journal of Machine Learning Research, 12, 2681–2720. Peleg, T., Eldar, Y., & Elad, M. (2013). Exploiting statistical dependencies in sparse representations for signal recovery. IEEE Transactions on Signal Processing, 60(5), 2286–2303. Portilla, J. (2009). Image restoration through l0 analysis-based sparse optimization in tight frames. In IEEE international conference on image processing (pp. 3909–3912). Cairo, Egypt. Richardson, W. H. (1972). Bayesian-based iterative method of image restoration. Journal of the Optical Society of America, 62(1), 55–59. Rodriguez, P., & Wohlberg, B. (2009). Efficient minimization method for a generalized total variation functional. IEEE Transactions on Image Processing, 18(2), 322–332. Rubinstein, R., Peleg, T., & Elad, M. (2013). Analysis k-svd: A dictionary-learning algorithm for the analysis sparse model. IEEE Transactions on Signal Processing, 61(3), 661–677. Rubinstein, R., Zibulevsky, M., & Elad, M. (2010). Double sparsity: Learning sparse dictionaries for sparse signal approximation. IEEE Transactions on Signal Processing, 58(3), 1553–1564. Rudin, L. I., Osher, S., & Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D, 60(1), 259–268. Shan, Q., Jia, J., & Agarwala, A. (2008). High-quality motion deblurring from a single image. ACM Transactions on Graphics (SIGGRAPH), 27(3), 73:1–73:10. Su, M., & Basu, M. (2002). A hybrid learning system for image deblurring. Pattern Recognition, 35(12), 2881–2894. Takeda, H., Farsiu, S., & Milanfar, P. (2008). Deblurring using regularized locally adaptive kernel regression. IEEE Transactions on Image Processing, 17(4), 550–563. Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10), 2231–2242. Wang, S., Zhang, L., Liang, Y., & Pan, Q. (2012). Semi-coupled dictionary learning with applications to image super-resolution and photosketch synthesis. In IEEE conference on computer vision and pattern recognition (pp. 2216–2223). Providence, Rhode Island, USA. Wang, Z., & Li, Q. (2011). Information content weighting for perceptual image quality assessment. IEEE Transactions on Image Processing, 20(5), 1185–1198.
123
Int J Comput Vis Wen, Y. W., Chan, R. H., & Yip, A. M. (2012). A primal-dual method for total-variation-based wavelet domain inpainting. IEEE Transactions on Image Processing, 21(1), 106–114. Whyte, O., Sivic, J., Zisserman, A., & Ponce, J. (2012). Non-uniform deblurring for shaken images. International Journal of Computer Vision, 98(2), 168–186. Xiang, S., Nie, F., & Zhang, C. (2010). Semi-supervised classification via local spline regression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(11), 2039–2053. Xu, L., Zheng, S., & Jia, J. (2013). Unnatural l0 sparse representation for natural image deblurring. In IEEE conference on computer vision and pattern recognition (pp. 1107–1114). Portland, Oregon, USA. Yang, J., Wang, Z., Lin, Z., Cohen, S., & Huang, T. S. (2012). Coupled dictionary training for image super-resolution. IEEE Transactions on Image Processing, 21(8), 3467–3478. Yang, J., Wright, J., Huang, T. S., & Ma, Y. (2010). Image superresolution via sparse representation. IEEE Transactions on Signal Processing, 19(11), 2861–2873.
123
Yuan, L., Sun, J., Quan, L., & Shum, H. Y. (2007). Image deblurring with blurred/noisy image pairs. ACM Transactions on Graphics (SIGGRAPH), 26(3), 1–10. Zeyde, R., Elad, M., & Protter, M. (2010). On single image scale-up using sparse-representations. In Proceedings of the 7th international conference on curves and surfaces (pp. 711–730). Avignon, France. Zhang, H., Wipf, D. P., & Zhang, Y. (2013). Multi-image blind deblurring using a coupled adaptive sparse prior. In IEEE conference on computer vision and pattern recognition (pp. 1051–1058). Portland, Oregon, USA. Zhang, H., Yang, J., Zhang, Y., & Huang, T. S. (2011). Sparse representation based blind image deblurring. In IEEE international conference on multimedia and expo (pp. 1–6). Barcelona, Spain.