Image Deblurring with Coupled Dictionary Learning

Image deblurring is a challenging problem in vision computing. Traditionally, this task is addressed as an inverse problem that is enclosed into the i...

2 downloads 28 Views 7MB Size

Download PDF

Int J Comput Vis DOI 10.1007/s11263-014-0755-z

Image Deblurring with Coupled Dictionary Learning Shiming Xiang · Gaofeng Meng · Ying Wang · Chunhong Pan · Changshui Zhang

Received: 3 January 2014 / Accepted: 2 August 2014 © Springer Science+Business Media New York 2014

Abstract Image deblurring is a challenging problem in vision computing. Traditionally, this task is addressed as an inverse problem that is enclosed into the image itself. This paper presents a learning-based framework where the knowledge hidden in huge amounts of available data is explored and exploited for image deblurring. To this end, our algorithm is developed under the conceptual framework of coupled dictionary learning. Specifically, given pairs of blurred image patches and their corresponding clear ones, a learning model is constructed to learn a pair of dictionaries. Among them, one dictionary is responsible for the representation of clear images, while the other is responsible for that of the blurred images. Theoretically, the learning model is analyzed with coupled sparse representations for training samples. As the atoms of these dictionaries are coupled together oneby-one, the reconstruction information can be transmitted between the clear and blurry images. In application phase, the blurry dictionary is employed to reconstruct linearly the blurry image to be restored. Then, the reconstruction coeffi-

cients are kept unchanged along with the clear dictionary to restore the final results. The main advantage of our approach lies in that it works in the case of unknown blur kernels. Comparative experiments indicate the validity of our approach. Keywords Image deblurring · Coupled dictionary learning · Sparse representation · Multiple kernel deblurring

1 Introduction As an open problem, image deblurring has drawn attention all over the world in the past decades. Efforts have been made to develop various models. The road has been paved with tricks of deconvolution, regularization, multi-scale decomposition, and learning. Although the literature is rich, there does not exist a general framework that can work well for diverse images. 1.1 The State-of-the-Art Methods

Communicated by Julien Mairal, Francis Bach and Michael Elad. S. Xiang (B) · G. Meng · Y. Wang · C. Pan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China e-mail: [email protected]; [email protected] G. Meng e-mail: [email protected] Y. Wang e-mail: [email protected] C. Pan e-mail: [email protected] C. Zhang Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Automation, Tsinghua University, Beijing 100084, China e-mail: [email protected]

Most early deblurring methods are developed in terms of decovolution (Richardson 1972; Lucy 1974). Later, a large number of extensions have been introduced (Bronstein et al. 2005; Levin et al. 2009). However, their performance is largely determined on the quality of the supplied blur kernel. Even although it is exactly known, the deconvolution-based algorithms may generate ringing artifacts due to image noise. Regularization tricks have been widely introduced to image restoration (Li et al. 2007; Dong et al. 2011). The meaning of regularization has been largely widened with different kinds of prior information. For example, Bar et al. (2007) developed a variational regression method for deblurring image whose different parts are blurred by different blur kernels. Perhaps one of the famous types of prior knowledge

123

Int J Comput Vis

is the Total Variation (TV) (Rudin et al. 1992), and example methods can be found recently in Beck and Teboulle (2009a, b), Goldluecke and Cremers (2010), Chantas et al. (2010), Kamilov et al. (2012), Ma et al. (2013). The main drawback of the TV regularizer, however, is the staircasing effect on the restored results. Another type of prior knowledge is the natural image statistics. Specifically, Fergus et al. selected to restore images via the sparseness prior about heavy-tailed distributions of image gradients (Fergus et al. 2006). Danielyan et al. (2012) developed a block-matching framework for image deblurring. The motivation is to analyze and synthesize the patches for nonlocal patchwise image modeling (Danielyan et al. 2012). In the analysis phase, similar image blocks are collected in groups, from which a 3-D transform is estimated, while in the synthesis phase, the filtered blockwise spectra are employed to restore their corresponding original patches (Danielyan et al. 2012). Dong et al. (2013) developed a framework of non-locally centralized Sparse Representation (SR) for image restoration. The core idea is to suppress the sparse coding noise by exploiting and exploring image non-local self-similarity to obtain good estimates of the sparse coefficients of the original image (Dong et al. 2013). In parallel, some other types of knowledge such as hyper-Laplacian distribution on gradients (Levin et al. 2007), sparsity (Levin et al. 2007; Portilla 2009; Zeyde et al. 2010; Yang et al. 2010; Zhang et al. 2013), color statistics (Joshi et al. 2009), multi-image hints (Yuan et al. 2007), and geometric model about rotational motion of the camera (Whyte et al. 2012) are employed for modeling. In general, prior knowledge is utilized via the regularization tricks or similarly the Bayesian estimations (Fergus et al. 2006; Shan et al. 2008; Joshi et al. 2010), which reduces the degrees of freedom of the problem. Along with the deconvolution and regularization methods, multi-scale decomposition has also been considered for deblurring. The popularly used tools are those constructed on wavelet packets. The motivation here is to utilize the advantages of the multi-scale features and spectral representations. However, these approaches suffer from a few disadvantages. Actually, wavelet decomposition is not shift-invariant and always renders pseudo-Gibbs phenomenon along the edge. To remedy these drawbacks, some regularized tricks are introduced to improve the deblurring quality (Ma et al. 2008; Wen et al. 2012). Beyond the knowledge deduced from rules, observations, or statistics, learning has been currently identified as one of the most flexible ways to discover relations between variables or knowledge hidden in data. This is prompted largely due to the various available learning methods as well as the rich image data. Specifically, Su et al. (2002) developed a learning system for image deblurring, where the relationship between the original images and their blurred versions is

123

learned via radial-basis-function neural network. Takeda et al. (2008) proposed a deblurring framework with local kernel regression. Later, Kenig et al. (2010) proposed a subspace learning approach to reduce the possible space of the point spread functions, where the principal component analysis is employed to achieve this goal. Their approach shows satisfactory deblurring for three dimensional images acquired by a wide-field fluorescence microscope. Recently, another important line of learning for image deblurring is the one constructed on dictionary learning with sparse representation. Indeed, sparse representation provides an effective yet flexible way to describe images (Mairal et al. 2008, 2010, 2011; Elad et al. 2010; Rubinstein et al. 2010; Jenatton et al. 2011; Peleg et al. 2013; Xu et al. 2013). In this process, dictionary actually plays a fundamental role for representation. Beyond discrete cosine transform bases, Haar wavelets, Gabbor wavelets, curvelets, contourlets, and bandlets that are all mathematically designed with highlystructural patterns, over-complete dictionaries learned from data have proven to be more adaptive for sparse representation of images (Elad and Aharon 2006; Aharon et al. 2006; Mairal et al. 2008, 2009, 2012; Hu et al. 2010; Zeyde et al. 2010; Yang et al. 2012; Rubinstein et al. 2013; Dong et al. 2013), demonstrating promising results for solving the traditional difficult problems. Specifically, Lou et al. proposed a deblurring framework based on dictionary replacement (Lou et al. 2011). In their approach, the dictionary atoms corresponding to the clear images are first convolved by the burring kernel to generate the blurred ones, and these blurred atoms are then employed to reconstruct the blurred image (Lou et al. 2011). Zhang et al. (2011) presented a deblurring algorithm by combining the kernel estimation and dictionary learning together, where a dictionary is supplied to exploit the sparsity prior of natural images. In addition, Couzinie-Devy et al. (2011) modeled the clear image patches and blurred ones with a linear mapping function and selected to learn a dictionary to represent the lost sharp details. Recently, Ma et al. (2013) proposed a dictionary learning approach for poisson image deblurring. In their approach, the blurring kernel is given in advance and the dictionary is learned from the degraded image itself for sparsely reconstructing its patches. In summary, the existing approaches render two lines. One is that the dictionary works together with the deconvolution operation or the linear mapping between the blurred patches and the clear ones. Another is that the dictionary is unilaterally learned from the clear images or the image to be restored, without considering the collaboration between data. Accordingly, this unilateral dictionary learning mechanism suffers from one main limitation in that the goodness of dictionary is largely determined on the quality of the estimated (or given) blur kernel or the linear mappings. This technique line has not been totally isolated from the traditional ones, and esti-

Int J Comput Vis

mating a proper blur kernel has proven to be another open problem within this issue. Finally, we point out, in vision computing, some tasks are addressed in way of learning pair of dictionaries, such as super-resolution (Yang et al. 2012; Wang et al. 2012; He et al. 2013; Jia et al. 2013), photo-sketch synthesis (Wang et al. 2012), and cross-space based image transformation (Jia et al. 2013). The constraints on the atoms of dictionaries in the optimization model are all dropped into one type: “||·||2 ≤ 1” (see Yang et al. 2012; Wang et al. 2012; He et al. 2013; Jia et al. 2013). Differently, this paper will construct the equality constraints for deblurring. 1.2 Our Work This paper presents a learning-based framework for image deblurring. Our algorithm is developed on the conceptual framework of coupled dictionary learning. In this framework, we do not estimate the blur kernel, but learn a pair of dictionaries simultaneously from the clear and blurry images. To this end, a CDL model is developed in terms of sparse representation. This model is trained with pairs of clear and blurry image patches, generating a pair of clear and blurry dictionaries. With these dictionaries, the reconstruction coefficients of the blurry patches are estimated first via the blurry dictionary and then kept unchanged to restore the image. Comparative experiments illustrate the effectiveness of the proposed method. The advantages or details of our method can be highlighted as follows: 1. Beyond blur identification and deconvolution used in existing algorithms, our algorithm is tailored around a learning-based framework. Instead, a pair of dictionaries, called clear dictionary and blurry dictionary, are learned from the training data. In application phase, the blurry dictionary is first employed to linearly reconstruct the blurry image to be restored. The reconstruction coefficients are then kept unchanged together with the clear dictionary for deblurring. This operation can be done as the atoms of the two dictionaries are coupled together with one-by-one correspondences. 2. The CDL model is formulated as an optimization problem. A gradient descent method is developed to solve it, which is composed of two phases: estimating the sparse reconstruction coefficients and updating the atoms of the coupled dictionaries. With enough flexibility, the sparse coefficients can be estimated with any existing sparse representation methods. Meanwhile, in each iteration, the atoms of the coupled dictionaries are updated with analytically optimal solution. 3. Our approach can be naturally extended to image deblurring with unknown blur kernels, without any additional

modification of the algorithm framework. It will be favorable for real-world applications in two aspects. One is that the approach need not supply the blur kernel after the dictionaries are learned. The other lies in that it can work in the cases that the images are blurred with different blur kernels. Actually, a new mechanism appears in our framework due to the fact that the blurry image is evaluated in way of data reconstruction. This provides us a way to assess the reconstruction quality among the dictionaries. Thus, the dictionary corresponding to the minimum reconstruction error can be identified, and its clear partner will be picked out for deblurring. 4. Our approach can be naturally extended to multi-phase deblurring. That is, the image can be restored a few times with different pairs of dictionaries. To this end, the first group of the dictionaries can be learned from the clear/blurry patches, and the second group will be learned from the clear patches and the patches restored with the previous dictionaries. The remainder of this paper is organized as follows. Section 2 addresses the problem formulation. Section 3 presents the CDL model. The algorithm is given in Sect. 4, and the experimental results will be reported in Sect. 5. Conclusions will be drawn in Sect. 6.

2 Problem Formulation For image deblurring, it is standard to assume that the observed image G is formulated by subjecting the clear image I to the convolution with a blurring function k as follows: G = k ∗ I + v,

(1)

where v is the pixel-wise additive noise, and “*” is a convolution operator. Our task is to restore I from G. Note that this task is ill-posed as k and I in (1) are both unknown. To solve this task, most traditional methods follow along the line of deconvolution in signal processing. Here we will introduce a learning-based framework to exploit the information hidden in rich image data for image debluring. Without loss of generality and for convenience to explain our idea, we first assume that the blur kernel is known. Thus, images can be convolved with the kernel into blurry ones. Our task now is to learn from the collected clear-blurry pairs of data. To this end, we assume that both the clear and the blurry patches can be linearly reconstructed with their own over-complete dictionary. Thus the learning model can be constructed under sparse representation. We further assume that the sparse coefficients of clear patch and its blurry one with respect to their own dictionary are identical. Technically, this provides a way to deblur new images with the learned dictionaries.

123

Int J Comput Vis

Now we can formally address our task as follows. Let n n and Xb = {xib }i=1 collect n image patches Xc = {xic }i=1 which are randomly selected from the clear and blurry images at the same places. We assume that each image patch contains h × w pixels, where h and w stand for its height and width in pixels. For computation, the pixels in each patch are concatenated into a vector in Rm , with m = h × w. That is, xic and xib will be viewed as two vectors in Rm . Our task is to learn two dictionaries Dc , Db ∈ Rm×d from Xc and Xb , where d is the number of atoms in each dictionary. Two goals need to be achieved in this process. One is that the atoms of the two dictionaries should be aligned oneby-one, and the other is that the magnitudes of the atoms should be identical to each other. Under these two conditions, the reconstruction coefficients can be transferred from blurry images to clear ones.

3 Coupled Dictionary Learning 3.1 The Model Here we will develop the model to learn the dictionaries Dc and Db simultaneously from Xc and Xb . This task could not be replaced by respectively learning from these two sets as their atoms should be well aligned. Let Dc = [d1c , d2c , . . . , ddc ] and Db = [d1b , d2b , . . . , ddb ] collect d atoms {dcj } ⊂ Rm and {dbj } ⊂ Rm . Now we introduce the following learning model: min

Dc ,Db ai ,i=1,...,n

n i=1

n 2 xb − Db ai 2 xic − Dc ai 2 + i 2 i=1

||ai ||0 ≤ s, i = 1, . . . , n, ||dcj ||2 = ||dbj ||2 = 1, j = 1, . . . , d,

s.t.

(2)

where ai ∈ Rd is the coefficient vector, s denotes the maximum number of non-zero entities in ai , ||·||2 is the L 2 -norm, and || · ||0 stands for the L 0 -norm. We see (2) yields a coupled dictionary learning (CDL) model as xic and xib share the same coefficients in ai . Again, it should be mentioned that introducing the equality constraints on the magnitudes of the dictionary atoms is necessary for us to transfer the coefficient vector from blurry image to clear one. For compactness, let Xc = [x1c , x2c , . . . , xnc ] ∈ Rm×n , Xb = [x1b , x2b , . . . , xnb ] ∈ Rm×n , and A = [a1 , a2 , . . . , an ] ∈ Rd×n . Then, we have min

Dc ,Db ,A

s.t.

Xc − Dc A2F + Xb − Db A2F ||ai ||0 ≤ s, i = 1, . . . , n, ||dcj ||2 = ||dbj ||2 = 1, j = 1, . . . , d,

where || · || F denotes the Frobenius norm of matrix.

123

It is obvious that the atoms in Dc and Db are naturally aligned one by one. Let X = [XcT , XbT ]T ∈ R2m×n , and D = [DcT , DbT ]T ∈ R2m×d , then it holds that Xc − Dc A2F + Xb − Db A2F = X − DA2F .

(4)

Thus, the atoms dcj and dbj are now aligned into the jth column of matrix D. As mentioned in the introduction section, the term of “coupled dictionary learning” has recently appeared in vision computing (Yang et al. 2012; Wang et al. 2012; He et al. 2013; Jia et al. 2013). Differently, the constraints on the atoms of dictionaries are all dropped into one type: “|| · ||2 ≤ 1”. Here we use the equality constraints necessarily for deblurring. Finally, we point out that Zeyde et al. (2010) developed a dictionary learning framework for image scale-up. Their framework consists of two phases. In their framework, a dictionary is first learned from low-resolution patches via KSVD (Aharon et al. 2006). Then, another dictionary is sought such that each high-resolution patch is linearly approximated with this dictionary and the sparse reconstruction coefficients of its low-resolution patch by using the lowresolution dictionary. Therefore, the high-resolution dictionary can be obtained via linear regression (Zeyde et al. 2010). For convenience, we abbreviate their approach as “KSVDDictionary Regression” (KSVD-DR). Technically, we can transfer their idea from image scaleup to image deblurring since we are also given two sets of patches.1 Figure 1 gives an example. Figure 1a, b show the clear and blurry dictionaries learned via “KSVD-DR” from 25,000 patches of 11×11 pixels sampled from a bear data set (for more descriptions in Sect. 5.1). Figure 1c, d demonstrate the two dictionaries learned by our approach. By contrast, our approach captures more sharp details (zoom in for details). We also tested the deblurring function of these two groups of dictionaries. Figure 1e illustrates a test image, which is blurred by a Gaussian blur kernel with standard deviation equal to 1.3 and window size equal to 11 × 11 (see Fig. 1f). The deblurred results are given in Fig. 1g, h. As can be been, our approach achieves much better restoration. This also motivates us to develop our framework. 3.2 Solving the Optimization Model Note that in (3) there are two groups of variables A and {Dc , Db } to be solved. Accordingly, we select to solve them alternatively by fixing one for another.

(3) 1

KSVD-DR refines the high-resolution dictionary by using the overlapped patches. Here we can not do this treatment since the patches are sampled from different images.

Int J Comput Vis

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 1 a The clear dictionary obtained by dictionary regression; b the blurry dictionary learned by KSVD; c the clear dictionary learned by our approach; d the blurry dictionary learned by our approach; e a test

min Xb − Db A2F

First, A is solved by fixing Dc and Db . We have min A

Xc − Dc A2F

+ Xb − Db A2F

s.t. ||ai ||0 ≤ s, i = 1, 2, . . . , n.

Db

ai

s.t. ||ai ||0 ≤ s.

(5)

(6)

Problem (6) can be solved with L 1 -norm convex approximation approaches. Here we employ the orthogonal matching pursuit (OMP) (Tropp 2004) and the focal under-determined system solver (FOCUSS) (Gorodnitsky and Rao 1997) to solve it, largely due to the computational efficiency and representation quality in contrast to other methods (Aharon et al. 2006). Next, we consider to solve Dc and Db by fixing A. Formally, it can be divided into two subproblems: min Xc − Dc A2F Dc

s.t. and

||dcj ||2

= 1, j = 1, 2, . . . , d,

(8)

s.t. ||dbj ||2 = 1, j = 1, 2, . . . , d.

Obviously, this task can be decoupled into n independent subproblems: 2 2 min xic − Dc ai 2 + xib − Db ai 2

image; f the blurred image; g the result restored by “KSVD-DR”; h the result restored by our approach (CDL without using multi-phase learning)

(7)

Actually, (7) and (8) can be solved with the Method of Optimal Directions (MOD) (Engan et al. 2000). As mentioned by Aharon et al. (2006), the main drawback in MOD is the second-order computational complexity. In addition, the dictionary is always updated before the reevaluation of the coefficients in A. Such a fully-separated way will largely affect the training speed (Aharon et al. 2006). Thus, we also select to update simultaneously the dictionaries and the coefficients. It is worthy pointing out that such an update of coefficients will not replace the task in (5) where the sparsity of A is identified and then kept unchanged when solving (7) and (8). We follow the line of dictionary updating introduced in Aharon et al. (2006) to solve our model. But, a new need in our task is to align well the atoms. Suppose the j-th atom in Dc and Db will be updated currently. Then, the objective functions in (7) and (8) can be rewritten as 2 −j Xc − Dc A2F = Xc − Dc A− j − dcj a j , (9) F

and

2 −j Xb − Db A2F = Xb − Db A− j − dbj a j , F

(10)

123

Int J Comput Vis −j

where Dc ∈ Rm×(d−1) is a sub-matrix of Dc by deleting −j its j-th column, Db ∈ Rm×(d−1) is a sub-matrix of Db by deleting its j-th column, A− j ∈ R(d−1)×n is a sub-matrix of A by deleting its j-th row, and a j ∈ R1×n is the jth row of matrix A. Now, we mask the zero entities in a j to keep the sparsity of matrix A that is identified via (5). In other words, during the current iteration, the zero entities in a j will always be kept unchanged. Let a j contain p non-zeros entities. We then remove the n − p zero entities in a j , and obtain a new vector j −j j a¯ j ∈ R p . Further let Tc = Xc − Dc A− j and Tb = Xb − −j j Db A− j . Again, the corresponding n − p columns in Tc j and Tb should also be deleted. For clarity, we denote them j j by T¯ c ∈ Rm× p and T¯ b ∈ Rm× p . Then, our task is to solve the following two subproblems (Aharon et al. 2006): 2 ¯ j ca j , s.t. ||dc || = 1, ¯ T min − d c (11) j j 2 c dj

F

(15) holds as (dcj )T dcj = 1 and (dbj )T dbj = 1. Now setting the derivative to be zero, it follows 1 c T¯j j (d j ) Tc + (dbj )T T¯ b . (16) a¯ j = 2 j T¯ b

j Substituting the above rank-one approximations of T¯ c and into (16), we obtain

a¯ j =

1 (σc vc + σb vb ) . 2

(17)

After a¯ j is solved, we can assign its p entities to replace the corresponding p elements in the j-row of A. In this way, the j-th row of A is updated. For clarity, we summarize the steps of our approach in Algorithm 1. In steps 7–13, the d atoms can be updated in a natural or random order way. In addition, the dictionaries can be initialized via k-means clustering approach.

and 2 j min T¯ b − dbj a¯ j , s.t. ||dbj ||2 = 1. dbj

F

(12)

According to matrix theory, (11) and (12) can be solved via performing Singular Value Decomposition (SVD) of matrix. j With rank-one approximation, we have T¯ c ≈ uc σc vcT and j T m T¯ b ≈ ub σb vb , where uc , ub ∈ R are the left-singular vectors, vc , vb ∈ R p are the right-singular vectors, and σc and j j σb are the largest singular values of T¯ c and T¯ b , respectively. Accordingly, we have dcj = uc and dbj = ub . Naturally, we obtain ||dcj ||2 = 1 and ||dbj ||2 = 1. Note that updating dcj and dbj in the above way may cause a sign problem as we need to transfer the coefficients directly between blurry/clear patches for deblurring. Here, we fix uc and select to modify the sign of ub . This treatment can be done as there is only a unique a¯ j in (11) and (12), namely, a¯ j ≈ σc vcT ≈ σb vbT . Then, we can perform the following sign change operation: dbj = −ub , if ||σc vc + σb vb ||2 < ||σc vc − σb vb ||2 .

(13)

If the condition in (13) holds, the sign of vb could be also changed as vb ← −vb for computation. Finally, we need to update a¯ j ∈ R1× p . Given dcj and dbj , it can be solved from the following model: 2 2 j ¯ j b j ¯ a min T¯ c − dcj a¯ j + T − d (14) j . b a¯ j

F

F

Denoting the objective function by G(¯a j ) and taking its derivative with respect to a¯ j , we have ∂G(¯a j ) j j = 4¯a j − 2(dcj )T T¯ c − 2(dbj )T T¯ b . ∂ a¯ j

123

(15)

Finally, we point out that model (3) is non-convex with respect to A, Dc , and Db . Thus, the algorithm can only converge to a local optimum. As analyzed above, (11) and (12) have analytically optimal solutions, and thus each atom is updated alone using the steepest descent directions. However, (6) has no analytical solutions. Therefore, the convergence will largely rely on solving of (6). Here, we employ OMP and FOCUSS to solve it. OMP is a greedy solver, fast but not accurate; while FOCUSS is slower than OMP. Our experiments also indicate that the value of the objective function in (3) vibrates up and down if OMP is used. But, all of the experiments in our implementation show that the value

Int J Comput Vis

of the objective function in (3) strictly decreases if FOCUSS is employed. Figure 8 illustrates this fact.

4 Image Deblurring with Coupled Dictionaries 4.1 Deblurring with Known Blur Kernel Here we assume the blur kernel of the image to be restored is known. In this case, we can collect images similar to it, and obtain blurry images in way of convolution with this kernel. Then, a training set can be obtained by randomly sampling pairs of clear/blurry images. Finally, Algorithm 1 can be employed to train a pair of dictionaries Dc and Db . Let each training patch contain h × w pixels, where h and w denote its height and width in pixels. For pixel q located at the i-th row and j-th column in image I to be restored, we pick out the patch of h × w pixels with pixel q as its center. Denote the concatenated pixels by vector xqb ∈ Rm , with m = h × w. It can be reconstructed as xqb ≈ Db aq ,

(18)

and aq can be solved via the following model: 2 min xqb − Db aq , s.t. ||aq ||0 ≤ s. aq

2

4.2 Deblurring with Multi-phase Dictionary Learning (19)

Note that it is necessary to solve problem (19) for each pixel. Totally, this will take much time for large size image. Alternatively, we can employ the full reconstruction model to replace it: 2 2 min xqb − Db aq + λ aq 2 , s.t. aqT e = 1, (20) aq

2

where λ is a regularization parameter, and e is a vector in Rd with all 1 s. (20) can be easily solved with matrix operation, and has the solution as follows (Xiang et al. 2010): aq =

(λI + XqT Xq )−1 e eT (λI + XqT Xq )−1 e

,

(21)

where Xq = [d1b − xq , d2b − xq , . . . , ddb − xq ] ∈ Rm×d , and I is an d × d identity matrix. Next, we keep aq unchanged, and obtain the restoration of xqb as follows: xqc = Dc aq .

(22)

Finally, we average all the results of the overlapped patches together to fulfill the restoration task. Algorithm 2 lists the above steps in detail. Note that as the window patch move once a pixel and its center is located from ([ w2 ], [ h2 ]) to (C − [ w2 ], R − [ h2 ]), the boundary regions are covered and thus treated naturally.

It is worthy pointing out that Algorithm 1 can be called several times to construct a multi-phase training framework. Actually, given the training data sets {Xc , Xb }, a pair of dictionaries {Dc , Db } can be learned with Algorithm 1. In turn, we can employ them to restore the blurry images by running Algorithm 2. Then new samples can be obtained from the ¯ b , we can restored results. Collecting them into a new set X ¯ get a new training data sets {Xc , Xb }. Thus, a new pair of dic¯ b } can be learned from them via Algorithm ¯ c, D tionaries {D 1. In this way, we can perform the CDL algorithm several times and obtain a few pairs of dictionaries. In application, a blurry image can be deblurred first with ¯ c, D ¯ b }. This treatment dictionaries {Dc , Db } and then with {D can improve the quality of deblurring. 4.3 Deblurring with Unknown Blur Kernel Suppose we are given g groups of training samples, each of which are generated with different blurring functions. Accordingly, we can employ Algorithm 1 to learn g groups of g dictionaries {(Dic , Dib )}i=1 , where Dic is the clear dictionary b and Di is the blurry dictionary. Now each group of the dictionaries will be used to restore each patch of the burry image I. The one with the minimum construction error will be finally selected: arg min i

b b xq − Dib ai,q , 2

(23)

123

Int J Comput Vis

(a)

(b)

(c)

(d)

Fig. 2 Demos of the pairs of the learned dictionaries. a and b, the clear and blurry dictionaries learned from the goose data set; c and d, the clear and blurry dictionaries learned from the tulip data set b is the coeffiwhere xqb is the q-th blurry patch in I, and ai,q cient vector by using the dictionary Dib . After the optimal dictionary is identified, we can pick out its clear dictionary and use the linear reconstruction criterion like (22) to restore the patches. This treatment renders new distinct characteristics in our approach against those traditional ones developed implicitly or explicitly along the line of kernel estimation.

5 Experiments 5.1 Parameter Setting for the CDL Algorithm Except for the number of training samples, Algorithm 1 has two parameters: the number of the atoms in dictionary (d) and the sparsity indicator of the coefficients (s). Here we employ six image data sets to evaluate their performances, including four data sets from the CMU-Cornell iCoseg database (Batra et al. 2010, 2011), one data set from the Corel5K image database,2 and one data set constructed by ourselves. Specifically, the four ones in the CMU-Cornell iCoseg datasets are the “Alaskan Brown Bear” (bear), the “Goose-Riverside” (goose), the “Kite-kitekid” (kite), and the “Monks-LAO” (monk). These sets contain 19, 31, 10, and 17 images, respectively. The data set in Corel5K is the “plantart”, which is comprised of 100 images. The data set constructed by ourselves includes 13 tulip flower images. Half of the images in each data set are employed as training samples, and the rest images are used as test samples. First, seven training sets are constructed to evaluate the performance of the number of training samples. Specifically, each training image (grayscale) is blurred by a Gaussian kernel with parameter (11, 1.3), where the first value stands for the kernel size in pixels and the second value denotes its standard deviation (σ ). Then, seven sets including 10,000, 15,000, 20,000, 25,000, 30,000, 35,000 and 40,000 patches 2

Available at: http://msm.cais.ntu.edu.sg/LSCBIR.

123

of 11 × 11 pixels are randomly sampled from the clear and blurry images at the same positions. Each patch is concatenated into a vector, and thus in Algorithm 1 m = 121. Finally, a pair of dictionaries are learned from each of the above seven sets, by fixing d = 400 and s = 200. Figure 2 shows two pairs of dictionaries learned respectively from the 25,000 pairs of clear and blurry patches, which are randomly sampled from the “goose” and “tulip” data sets. The clear atoms are shown in Fig. 2a, c, while the blurry atoms are illustrated in Fig. 2b, d. We see that the atoms are aligned well one by one. That is, given an atom in the clear dictionary, one can found its partner in the blurry dictionary in the same position, and vise verse. Such a property provides us with a desired mechanism to transfer the reconstruction coefficients between blurry/clear patches. In addition, Fig. 2 also indicates that the atoms in the clear dictionary keep more sharp details. Thus, we can first reconstruct the blurry image with the blurry dictionary and then restore the image by employing the clear dictionary to render sharp details. To evaluate the performance of the learned dictionaries, the test images are blurred with the same Gaussian kernel. Figures 3 and 4 show six images restored by the dictionaries learned from the six training sets with 25,000 samples. They are all significantly restored. The peak signal-to-noise ratio (PSNR) and the Structural SIMilarity (SSIM) index (Wang and Li 2011) are taken to evaluate quantitatively the performance. The results obtained from one test image per data set are reported. These test images are shown in the first column in Figs. 3 and 4. Figure 5a reports the PNSR curves of the deblurred results, while Fig. 5b shows the SSIM curves. We see that the curves keep (roughly) horizontally. Next, we evaluate the performance of the parameter d by taking d = 250, 300, 350, 400, and 450, respectively. In experiments, the number of the training samples will be kept as 25,000, and s is set as 200. Each training set will be learned five times with different d. Then, the dictionaries are used respectively to restore the images. Figure 6a, b show

Int J Comput Vis

Fig. 3 Demo I: Three deblurred results with the dictionaries learned from the bear, goose and kite data sets

the PSNR and SSIM curves of the six images used in Figs. 3 and 4, indicating no significant changes within the curves. Finally, we evaluate the performance of the parameter s. To this end, we take s = 120, 160, 200, 240, and 280, respectively. The number of the training samples is kept as 25,000, and d is set as 400. Totally, five groups of dictionaries will be learned from each training set. Then, the six blurry images used in Figs. 3 and 4 are restored with the corresponding dicFig. 5 Image accuracy. The dictionaries are learned with different numbers of samples. a PSNR curves; b SSIM curves

Fig. 4 Demo II: Three deblurred results with the dictionaries learned from the monk, plantart and tulip data sets

tionaries. Figure 7a, b show their PSNR and SSIM curves. As can be seen from Fig. 7, there are also no significant changes within the curves.

35

1

0.8

25

SSIM

PSNR

30

20 bear goose kite

15 10

1

1.5

2

2.5

monk plantart tulip

3

0.6 bear goose kite

0.4

3.5

0.2

4 4

1

Number of training samples (×10 )

1.5 2 2.5 3 3.5 4 Number of training samples (×104)

(a) Fig. 6 Image accuracy. The dictionaries are learned with different d. a PSNR curves; b SSIM curves

(b) 1

35

0.8

25

SSIM

PSNR

30

20 bear goose kite

15 10

monk plantart tulip

2.5

3

3.5

monk plantart tulip

4

0.6 bear goose kite

0.4

4.5 2

0.2

2.5

3

3.5

monk plantart tulip

4

4.5 2

Number of atoms in each dictionary (×10 )

Number of atoms in each dictionary (×10 )

(a)

(b)

123

Int J Comput Vis Fig. 7 Image accuracy. The dictionaries are learned with different s. a PSNR curves; b SSIM curves

35

1

0.8

25

SSIM

PSNR

30

20 bear goose kite

15 10

1.2

1.6

monk plantart tulip

2

2.4

2.8

Number of non−zero entities in SR (×10 )

Value of objective function

5

bear goose kite

4

monk plantart tulip

3

2

1 1

40

80

120

160

200

Number of iterations

Fig. 8 The values of the objective function in (2) with respect to the number of iterations. The algorithm is convergent at the 138th, 110th and 87th iteration for “bear”, “goose” and “kite”, due to the condition at step 16 in Algorithm 1

5.2 Convergence As pointed in Sect. 3.2, the optimization model in (3) is not convex, and the convergence is relied on the solving of the sparse representation. We tried to employ OMP and FOCUSS to solve it. As pointed by Aharon et al. (2006), OMP is fast but its performance is not as good as that of FOCUSS. We observed that the value of the objective function changes up and down during iterations when OMP is executed, and it decreases when FOCUSS is employed. Figure 8 illustrates the curves of the objective function in the case that FOCUSS is implemented. Here, the 10,000 training samples for each data set is employed, with d = 400 and s = 120. The maximum number of iterations is set to 200 when running Algorithm 1. It can be seen that the curves are decreasing with the increase of iteration number. 5.3 Comparisons: Delurring Images As a classical deblurring algorithm, Richardson–Lucy algorithm (RL) (Lucy 1974; Richardson 1972) will be taken

123

bear goose kite

0.4

2

(a)

0.6

0.2

1.2

1.6

monk plantart tulip

2

2.4

2.8 2

Number of non−zero entities in SR (×10 )

(b)

as the baseline for comparison. In addition, five popularly used algorithms, namely, the Weighted Least Squares (WLS), the Sparse Prior (SP) (Levin et al. 2007), the Iteratively Reweighted Norm (IRN) minimization approach for the generalized TV (Rodriguez and Wohlberg 2009), the Alternating Direction Method of Multipliers (ADMM) (Afonso et al. 2010), and the Two-step Iterative Shrinkage/Thresholding (TwIST) (Bioucas-Dias and Figueiredo 2007), are also compared. Here, the source image is blurred by a Gaussian kernel with kernel size equal to 17 × 17 and the standard deviation equal to 2.1. The true blur kernel is supplied to the RL, WLS, SP, IRN, ADMM, and TwIST algorithms to guarantee that they can generate the best results. In addition, when running the SP algorithm, the regularization parameter λ was supplied respectively with the values in {0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.008, 0.01, 0.02, 0.04, 0.06, 0.08}. When running the IRN, ADMM, and TwIST algorithms, the regularization parameter λ was supplied respectively with the values in {0.0001, 0.001, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 2, 3, 4, 5, 6, 7, 8, 9}. The one with the highest PSNR is output for comparison. To run our algorithm, we sampled 90,000 pairs of patches with 17 × 17 pixels from the clear and burry images in each training set. In experiments, we set the number of the atoms in each dictionary to be 700 and take the number of non-zeros coefficients equal to 350 for dictionary learning. Here a high non-sparsity level for dictionary learning is supplied such that a pair of dictionaries are sought to approximate each of the training samples as exactly as possible. Figure 9 illustrates the deblurred results of six images, each of which belongs to the set of test samples. From Fig. 9, we see the results obtained by our algorithm are more clear, compared with those obtained by RL, WLS and SP. In contrast, TwIST, ADMM and IRN generate more clear edges. But, the results obtained by IRN render obviously staircasing effects. This can be perceived from the results of the “bear”, “kite” and “monk” images in Fig. 9 (zoom in). In addition, Table 1 gives a quantitative comparison in terms of PSNR

Int J Comput Vis

Fig. 9 Results deblurred from blurry images with known blur kernels by running RL, WLS, SP, Twist, ADMM, IRN and our algorithm CDL (with the learned three-pairs of dictionaries). The sizes of the bear,

goose, kite, monk, and tulip images are 300 × 300 pixels, while the size of the plantart image is 256 × 320, respectively

123

Int J Comput Vis Table 1 PSNR and SSIM results by applying different methods to the images in Fig. 9 RL

WLS

SP

TwIST

ADMM

IRN

CDL

CDL-2

CDL-3

Bear

21.1170

26.1910

26.4830

32.5686

28.3716

26.5221

26.4597

26.6106

26.5994

Goose

19.2928

33.2779

34.9224

33.8655

38.5345

35.9660

33.2371

34.2106

34.4956

Kite

18.7363

22.7197

23.4443

33.5136

25.2456

24.0023

22.7890

23.2994

23.4556

Monk

22.2029

32.7168

33.1911

25.6948

35.9823

32.8605

32.4100

33.0866

33.3239

Plantart

21.0015

24.1207

24.8084

22.0525

24.5984

25.1008

24.3880

24.8382

24.9286

Tulip

22.0369

25.1273

26.1101

29.1558

30.2230

26.2440

25.8293

26.0643

26.0745

Bear

0.6747

0.6889

0.6958

0.9311

0.8240

0.7031

0.7103

0.7221

0.7212

Goose

0.8057

0.9426

0.9545

0.9802

0.9760

0.9582

0.9425

0.9464

0.9478

Kite

0.6453

0.7090

0.7319

0.9300

0.8006

0.7462

0.6881

0.7033

0.7076 0.9318

PSNR

SSIM

Monk

0.8091

0.9258

0.9288

0.9600

0.9603

0.9263

0.9229

0.9301

Plantart

0.7222

0.7627

0.7882

0.7648

0.7922

0.8001

0.7779

0.7963

0.7983

Tulip

0.8106

0.8062

0.8380

0.9736

0.9406

0.8479

0.8368

0.8432

0.8425

The bold values mean the largest ones

(a)

(b)

(c)

(d)

Fig. 10 Demos of the pairs of the learned dictionaries. a and b, the clear and “blurry + noisy” dictionaries learned from the goose data set; c and d, the clear and “blurry + noisy” dictionaries learned from the tulip data set

and SSIM measures. The number in Table 1 is obtained by taking the ground truth as a reference. In experiments, we also evaluated the performance of the multi-phase dictionary learning described in Sect. 4.2. In Table 1, “CDL-2” means two pairs of the dictionaries are learned, while “CDL-3” means three pairs of the dictionaries are totally learned. We see the approaches with both “CDL-2” and “CDL-3” achieve better results than that with one pair of dictionaries (CDL). In addition, we see our algorithm achieves more accurate results than RL, WLS and SP. Comparatively, TwIST, ADMM and IRN outperform our algorithm as they actually employ more prior information to address the deblurred problem with known blur kernels. As for computation time, for image with 300×300 pixels, RL, WLS, SP, TwIST, ADMM, and IRN will take about 5.8, 53.1, 20.5, 22.2, 5.1, and 41.1 s, respectively, using Matlab 7.0 on a PC with 3.0 GHz CPU and 4.0 G RAM. In con-

123

trast, we see our algorithm will take a few hours (if running FOCUSS for (19)) or fifty minutes (if solving (20)). The main reason lies in that our algorithm needs to restore each local patch via linear reconstruction. 5.4 Comparisons: Restoring Blurry and Noisy Images In this subsection, we test the challenging problem of restoring the blurry and noisy images. Here, the source clear images used for training are blurred first by a Gaussian kernel with kernel size equal to 11 × 11 and the standard deviation equal to 1.3, and then further degraded by additive Gaussian noise with variation equal to 0.001. For each of the six data sets used in Sect. 5.3, 150,000 samples are randomly selected from the clear and “blurry+noisy” training images. Then, Algorithm 1 is employed to learn the dictionaries. Figure 10a, b show the clear and “blurry+noisy” dictionaries learned from the goose data set; Fig. 10c, d illustrate the clear and “blurry+noisy”

Int J Comput Vis

Fig. 11 Demo I: results restored from blurry and noisy images

dictionaries learned from the tulip data set. By contrast to those atoms demonstrated in Fig. 2, the “blurry+noisy” atoms contain more noise. For test images which are degraded with the same blur kernel and Gaussian noise, Algorithm 2 will be employed. In experiments, we use FOCUSS to solve problem (19). We observed that the noise can be well removed when taking the maximum number of non-zero entities between 5 and 10. Unfortunately, the blur still exists in the restored results. To remedy this drawback, the results of the training images restored with the previously-learned dictionaries are collected as the new samples. Then, a new pair of dictionaries can be learned. It has been observed that this pair of dictionaries are very similar to those obtained in the case that no noise is considered. Thus, one alternative way is to employ the existing dictionaries learned from the original clear/blurry patches for this task. Finally, these two pairs of dictionaries are utilized to restore the image.

For image denoising task, we compared our algorithm with the fast Project-Newton (PN) dual method for TV regularization (Barbero and Sra 2011), IRN, TwIST, ADMM, the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) (Beck and Teboulle 2009b) for TV-based restoration, and Block-Matching and 3D-filtering (BM3D) approach (Dabov et al. 2007; Danielyan et al. 2012). In addition, the Nonlocally Centralized Sparse Representation (NCSR) developed by Dong et al. (2013) is also compared. To conduct a fair comparison, when running the PN algorithm, the regularization parameter λ changes in {0.2, 0.4, 0.6, 0.8, 1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20}. For the IRN, TwIST and ADMM algorithms, the regularization parameter λ was supplied respectively with the values in { 0.0001, 0.001, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 2, 3, 4, 5, 6, 7, 8, 9}. For FISTA, it was tested with values in {0.0001, 0.001, 0.01, 0.02, 0.04, 0.06, 0.08, √ 0.1, 0.5, 1.0} × σ . For the PN, IRN and FISTA algorithms,

123

Int J Comput Vis

Fig. 12 Demo II: results restored from blurry and noisy images

among the denoised images obtained with different parameters, the one with the highest PSNR is reported here for comparison. The default parameters are employed to run the NCSR. When running the BM3D algorithm, besides the noise level parameter σ , all the default parameters are employed in experiments. Figures 11, 12, and 13 demonstrate the restored results of the images used in Fig. 9. Table 2 lists the PSNR and SSIM values, indicating IRN, BM3D and ADMM outperform our algorithm. By contrast, our algorithm achieves better results on most images, compared with PN, FISTA, and TwIST. 5.5 Comparisons: Deblurring Images with Multiple Blur Kernels Here the test images are divided into four regions, which are illustrated via the upper-right image in each panel in

123

Figs. 14, 15, and 16 and labeled as “Region”. Here, the green region is blurred by Gaussian blur kernel with standard deviation σ = 0.9 and window size equal to 9 × 9 pixels. The red region is blurred by Gaussian blur kernel with σ = 1.3 and window size equal to 11 × 11 pixels. The blue region is blurred by Gaussian blur kernel with σ = 1.7 and window size equal to 15 × 15 pixels. The gray region is blurred by Gaussian blur kernel with σ = 2.1 and window size equal to 17 × 17 pixels. The blurred images are shown in the middle column of the first row in each panel in Figs. 14, 15, and 16. Accordingly, four training data sets are constructed by blurring the training images in six data sets with four blur kernels. Each training set includes 25,000 samples of clear/blurry image patches. As the clear and blurry patches are concatenated into a vector, the dimensionality of training data points in model (3) will be equal to 162, 242, 450, and 578. We set the number of atoms in each dictionary to

Int J Comput Vis

Fig. 13 Demo III: results restored from blurry and noisy images Table 2 PSNR and SSIM results by applying different methods to the images in Figs. 11, 12, and 13 PN

IRN

FISTA

BM3D

ADMM

TwIST

NCSR

CDL

Bear

25.4883

25.8071

24.6075

26.5542

26.3497

25.9836

26.2675

25.7934

Goose

31.8328

34.3065

23.4159

33.2297

34.4106

28.5170

31.1629

33.2450

Kite

22.3106

23.7964

16.8168

24.1342

24.0525

23.5406

24.4435

23.0739

Monk

30.8561

31.6322

21.8776

32.7133

32.1612

21.5589

30.9527

31.3550

Plantart

23.5536

24.7570

17.5007

25.5684

21.8319

19.4256

25.4474

24.4055

Tulip

24.2615

25.7896

23.0669

26.7044

26.5246

24.0089

26.4043

25.7399

Bear

0.6155

0.6657

0.6211

0.6919

0.6825

0.6836

0.6746

0.6356

Goose

0.9159

0.9306

0.8968

0.9192

0.9252

0.9145

0.6963

0.8864

Kite

0.6128

0.6556

0.5312

0.7271

0.7003

0.6750

0.6669

0.6202

Monk

0.8585

0.8846

0.8256

0.8982

0.8873

0.7763

0.7853

0.8761

Plantart

0.6829

0.7349

0.5741

0.8018

0.5740

0.7101

0.7423

0.7265

Tulip

0.7273

0.7966

0.7178

0.8350

0.8300

0.8177

0.8128

0.7786

PSNR

SSIM

The bold values mean the largest ones

123

Int J Comput Vis

Fig. 14 Demo I: results deblurred from images degraded with multiple blur kernels. The size of the bear and goose images is 500 × 300 pixels

about 1.2 times of the dimensions. Specifically, the numbers of the dictionary atoms learned from the four training data sets are equal to 200, 300, 550, 700. In experiments, we

123

take the sparsity indictor s to be equal to half of the number of the dictionary atoms, namely, s = 100, 200, 275 and 350. In this way, four groups of dictionaries were learned.

Int J Comput Vis

Fig. 15 Demo II: results deblurred from images degraded with multiple blur kernels. The size of the kite and monk images are 426 × 278 and 500 × 310 pixels

Finally, the approach described in Sect. 4.3 is used to deblur the images. As RL, WLS, SP, TwIST, BM3D, ADMM, and IRN need to know the blur kernel, we run them four times by supplying one of the four kernels each time. The same parameters in Sect. 5.3 are employed, and the one with the best PNSR is reported here. The PNSR and SSIM values are given in Tables 3, 4, 5, and 6. As BM3D and TwIST gen-

erate unsatisfactory results, here we do not report them. The averaged PNSR and SSIM values are listed in Table 7. We see our algorithm achieves the best performance on most cases. To further compare these algorithms, the average kernel is employed. In this case, it is taken as a Gaussian blur kernel with σ = 1.5 and window size equal to 15 × 15 pixels. Figures 14, 15, and 16 illustrate the deblurred results, and

123

Int J Comput Vis

Fig. 16 Demo III: results deblurred from images degraded with multiple blur kernels. The size of the kite and monk images are 300 × 286 and 500 × 362 pixels

123

Int J Comput Vis Table 3 PSNR and SSIM results by applying different methods to the images in Figs. 14, 15, and 16 PSNR

SSIM

RL

WLS

SP

ADMM

IRN

CDL

RL

WLS

SP

Bear

25.468

25.481

25.975

24.823

26.220

25.510

0.646

0.631

0.645

Goose

28.194

28.347

28.545

25.587

28.398

28.878

0.848

0.849

0.851

Kite

20.293

20.287

20.390

20.043

20.406

20.385

0.569

0.562

0.568

Monk

27.140

27.330

27.383

26.130

27.212

28.465

0.810

0.810

0.812

Plant

25.178

25.975

25.918

25.130

25.770

26.818

0.832

0.844

0.838

Tulip

22.202

22.221

22.221

21.736

22.157

23.549

0.735

0.733

0.733

ADMM

IRN

CDL

0.631

0.659

0.659

0.813

0.840

0.868

0.553

0.574

0.565

0.783

0.797

0.860

0.809

0.829

0.864

0.720

0.726

0.809

The bold values mean the largest ones The Gaussian blur kernel with σ = 0.9 and window size equal to 9 × 9 pixels are supplied for RL, WLS, SP, ADMM, and IRN Table 4 PSNR and SSIM results by applying different methods to the images in Figs. 14, 15, and 16 PSNR

SSIM

RL

WLS

SP

ADMM

IRN

CDL

RL

WLS

SP

ADMM

IRN

CDL

Bear

24.450

24.841

24.576

23.187

24.404

25.510

0.645

0.640

0.600

0.552

0.589

0.659

Goose

27.700

28.375

28.367

23.477

28.258

28.878

0.850

0.855

0.847

0.743

0.847

0.868

Kite

20.288

20.494

20.667

19.661

20.678

20.385

0.579

0.581

0.596

0.522

0.576

0.565

Monk

27.215

27.768

27.676

25.357

27.628

28.465

0.827

0.831

0.822

0.759

0.817

0.860

Plant

25.195

26.593

26.503

24.584

26.239

26.818

0.838

0.859

0.849

0.776

0.844

0.864

Tulip

22.897

22.902

22.880

21.700

22.727

23.549

0.774

0.770

0.770

0.716

0.759

0.809

The bold values mean the largest ones The Gaussian blur kernel with σ = 1.3 and window size equal to 11 × 11 pixels are supplied for RL, WLS, SP, ADMM, and IRN Table 5 PSNR and SSIM results by applying different methods to the images in Figs. 14, 15, and 16 PSNR

SSIM

RL

WLS

SP

ADMM

IRN

CDL

RL

WLS

SP

ADMM

IRN

CDL

Bear

23.207

23.352

23.531

21.582

23.254

25.510

0.618

0.617

0.553

0.473

0.439

0.659

Goose

26.450

27.338

27.560

22.015

27.337

28.878

0.839

0.846

0.836

0.677

0.796

0.868

Kite

19.746

20.189

20.219

18.664

20.281

20.385

0.549

0.560

0.548

0.451

0.562

0.565

Monk

26.925

27.483

27.477

24.406

27.311

28.465

0.832

0.837

0.816

0.697

0.823

0.860

Plant

24.131

25.665

25.975

23.555

25.633

26.818

0.805

0.838

0.837

0.718

0.786

0.864

Tulip

23.227

23.255

23.079

21.011

22.862

23.549

0.794

0.791

0.780

0.669

0.778

0.809

The bold values mean the largest ones The Gaussian blur kernel with σ = 1.7 and window size equal to 15 × 15 pixels are supplied for RL, WLS, SP, ADMM, and IRN Table 6 PSNR and SSIM results by applying different methods to the images in Figs. 14, 15, and 16 PSNR

SSIM

RL

WLS

SP

ADMM

IRN

CDL

RL

WLS

SP

ADMM

IRN

CDL

Bear

22.338

21.800

22.344

20.457

22.954

25.510

0.586

0.575

0.526

0.424

0.421

0.659

Goose

25.057

25.655

26.198

20.547

26.679

28.878

0.811

0.814

0.819

0.599

0.785

0.868

Kite

18.997

19.375

19.581

17.742

19.770

20.385

0.507

0.513

0.510

0.372

0.469

0.565

Monk

26.002

26.211

26.749

23.325

26.260

28.465

0.812

0.815

0.809

0.634

0.716

0.860

Plant

22.437

23.452

24.308

22.230

4.313

26.818

0.719

0.760

0.797

0.628

0.313

0.864

Tulip

22.642

22.534

22.518

20.004

21.955

23.549

0.778

0.775

0.765

0.606

0.767

0.809

The bold values mean the largest ones The Gaussian blur kernel with σ = 2.1 and window size equal to 17 × 17 pixels are supplied for RL, WLS, SP, ADMM, and IRN

123

Int J Comput Vis Table 7 Averaged PSNR and SSIM results with the four kernels by applying different methods to the images in Figs. 14, 15, and 16 PSNR

SSIM

RL

WLS

SP

ADMM

IRN

CDL

RL

WLS

SP

ADMM

IRN

CDL

Bear

23.866

23.869

24.106

22.512

24.208

25.510

0.624

0.616

0.581

0.520

0.527

0.659

Goose

26.850

27.429

27.667

22.907

27.668

28.878

0.837

0.841

0.838

0.708

0.817

0.868

Kite

19.831

20.086

20.214

19.028

20.284

20.385

0.551

0.554

0.556

0.474

0.545

0.565

Monk

26.820

27.198

27.321

24.805

27.103

28.465

0.820

0.823

0.815

0.718

0.788

0.860

Plant

24.235

25.422

25.676

23.875

20.489

26.818

0.798

0.825

0.830

0.733

0.693

0.864

Tulip

22.742

22.728

22.674

21.113

22.425

23.549

0.770

0.767

0.762

0.678

0.757

0.809

The bold values mean the largest ones Table 8 PSNR and SSIM results by applying different methods to the images in Figs. 14, 15, and 16 PSNR

SSIM

RL

WLS

SP

ADMM

IRN

CDL

RL

WLS

SP

ADMM

IRN

CDL

Bear

23.786

24.140

24.077

22.186

23.696

25.510

0.633

0.632

0.563

0.515

0.578

0.659

Goose

27.109

27.972

28.032

22.668

27.716

28.878

0.846

0.853

0.840

0.709

0.846

0.868

Kite

20.065

20.417

20.473

19.144

20.566

20.385

0.567

0.576

0.573

0.486

0.573

0.565

Monk

27.128

27.745

27.610

24.953

27.599

28.465

0.832

0.837

0.818

0.732

0.823

0.860

Plant

24.799

26.354

26.374

24.219

26.020

26.818

0.830

0.854

0.843

0.750

0.845

0.864

Tulip

23.151

23.170

23.078

21.453

22.895

23.549

0.788

0.784

0.781

0.704

0.772

0.809

The bold values mean the largest ones The averaged Gaussian blur kernel with σ = 1.5 and window size equal to 15 × 15 pixels are supplied for RL, WLS, SP, ADMM, and IRN

Table 8 reports the PSNR and SSIM values. We see better performance is achieved by our algorithm. It is worthy pointing out that our approach need not take the blur kernel as input parameter after the dictionaries are learned. Considering this fact, we further evaluate the performance via blind deconvolution. Here we employ one of the most sophisticated blind deconvolution algorithms developed by Shan et al. (2008)3 for comparison. To run their algorithm, the size of blur kernel changes in {7 × 7, 9 × 9, . . . , 27 × 27}. With them, the result with the highest PNSR is reported. Figures 17 and 18 show the restorations. Table 9 lists the PSNR and SSIM values. We see in most cases our approach achieves better results. Finally, Table 10 summarizes the comparisons between our approach and the state-of-the-art algorithms. From this table, we see ADMM, BM3D and NCSR outperform our algorithm in the cases of single blur kernels, which is comparable to FISTA, IRN, and TwIST as a whole. A good characteristic of our approach lies in that it could generate more satisfactory results in the cases of multiple blur kernels. Furthermore, if a deblurring model is explicitly or implicitly constructed on the blurring function or the blur kernel needs to be supplied as parameter during restoration, we divide it 3

The executable file is downloaded from the webpage: http://www.cse. cuhk.edu.hk/~leojia/programs/deblurring/deblurring.htm.

123

into the category of Signal Processing (SP) based algorithms. By contrast, our algorithm is constructed on a learning-based framework where the training phase and test phase are separated from each other.

6 Conclusion This paper has developed a learning-based algorithm for image deblurring, which is addressed in terms of coupled dictionary learning. With this model, a pair of dictionaries are learned from the clear/burry patches. Among these dictionaries, one is able to describe clear patches, and the other can represent blurry ones. Given a blurred image, it is first reconstructed linearly with the blurry dictionary, and then the reconstruction coefficients are kept unchanged for deblurring by taking the clear dictionary as input. We have demonstrated the performance of the parameters included in the model of coupled dictionary learning, compared our algorithm with the state-of-the-art methods on the images that are blurred with a unique blur kernel, noise, and multiple blur kernels for different image regions. Comparative experiments indicate the validity of our method. In the future, we would like to introduce parallel computing techniques to speed up our algorithm. In addition, we

Int J Comput Vis Fig. 17 Demo I: Results restored by blind deconvolution (Shan et al. 2008) and our approach

Fig. 18 Demo II: Results restored by blind deconvolution (Shan et al. 2008) and our approach

123

Int J Comput Vis Table 9 PSNR and SSIM results with Shan et al.’ (2008) method and CDL on the images in Figs. 14, 15, and 16

The bold values mean the largest ones

PSNR Shan’s

CDL

SSIM Shan’s

CDL

Bear

25.671

Goose

27.244

25.510

0.649

0.659

28.878

0.842

Kite

0.868

20.404

20.385

0.583

0.565

Monk

26.703

28.465

0.810

0.860

Plantart

24.899

26.818

0.841

0.864

Tulip

22.255

23.549

0.751

0.809

Table 10 Comparisons our approach with the state-of-the-art approaches

RL

Accu. of deblurring

Accu. of deblurring with noise

Can deblur with multi-kernels

Can do multiphase deblurring

Need to input the blurring parameters

Category of approach

Low

–

×

×

Yes

Signal processing (SP)

WLS

Low

–

×

×

Yes

SP + regularizer

SP

Medium

Medium

×

×

Yes

SP + sparsity prior

IRN

Medium

Medium

×

×

Yes

SP + TV prior

FISTA

–

Medium

×

×

Yes

SP + TV prior

TwIST

High

Medium

×

×

Yes

SP + regularizer

ADMM

High

High

×

×

Yes

SP + regularizer

BM3D

High

High

×

×

Yes

SP + spare representation

NCSR

High

High

×

×

Yes

Our

Medium

Medium

√

√

SP + spare representation

No

Machine learning

Here “–” means it was not evaluated

would also like to integrate some prior knowledge into our framework to reconstruct the patches in a batch mode and develop nonlinear mappings between sparse representations to enhance the quality. Acknowledgments This work was supported by the National Natural Science Foundation of China (Grant Nos. 61272331, 91338202, 61370039, and 91120301).

References Afonso, M. V., Bioucas-Dias, J. M., & Figueiredo, M. (2010). Fast image recovery using variable splitting and constrained optimization. IEEE Transactions on Image Processing, 19(9), 2345–2356. Aharon, M., Elad, M., & Bruckstein, A. M. (2006). K-svd: An algorithm for designing of over-complete dictionaries for sparse representationn. IEEE Transactions on Signal Processing, 54(11), 4311–4322. Bar, L., Sochen, N., & Kiryati, N. (2007). Restoration of images with piecewise space-variant blur. In International conference on scale space and variational methods in computer vision (pp. 533–544). Ischia, Italy. Barbero, A., & Sra, S. (2011). Fast newton-type methods for total variation regularization. In International conference on machine learning (pp. 313–320). Bellevue, Washington, USA.

123

Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). icoseg: Interactive co-segmentation with intelligent scribble guidance. In IEEE conference on computer vision and pattern recognition (pp. 3169–3176). San Francisco, CA, USA. Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2011). Interactively co-segmentating topically related images with intelligent scribble guidance. International Journal of Computer Vision, 93(3), 273–292. Beck, A., & Teboulle, M. (2009a). Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing, 18(11), 2419–2434. Beck, A., & Teboulle, M. (2009b). A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202. Bioucas-Dias, J. M., & Figueiredo, M. (2007). A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Transactions on Image Processing, 16(12), 2992–3004. Bronstein, M. M., Bronstein, A. M., Zibulevsky, M., & Zeevi, Y. Y. (2005). Blind deconvolution of images using optimal sparse representations. IEEE Transactions on Image Processing, 14(6), 726–736. Chantas, G. K., Galatsanos, N. P., Molina, R., & Katsaggelos, A. K. (2010). Variational bayesian image restoration with a product of spatially weighted total variation image priors. IEEE Transactions on Image Processing, 19(2), 351–362. Couzinie-Devy, F., Mairal, J., Bach, F., & Ponce, J. (2011). Dictionary learning for deblurring and digital zoom. CoRR. arXiv:1110.0957.

Int J Comput Vis Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3d transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8), 2080–2095. Danielyan, A., Katkovnik, V., & Egiazarian, K. (2012). Bm3d frames and variational image deblurring. IEEE Transactions on Image Processing, 21(4), 1715–1728. Dong, W., Zhang, L., Shi, G., & Li, X. (2013). Nonlocally centralized sparse representation for image restoration. IEEE Transactions on Image Processing, 22(4), 1620–1630. Dong, W., Zhang, L., Shi, G., & Wu, X. (2011). Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. IEEE Transactions on Image Processing, 20(7), 1838–1857. Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745. Elad, M., Figueiredo, M., & Ma, Y. (2010). On the role of sparse and redundant representations in image processing. Proceedings of the IEEE, 98(6), 972–982. Engan, K., Aase, S. O., & Husoy, J. H. (2000). Multi-frame compression: Theory and design. EURASIP Signal Processing, 80(10), 2121– 2140. Fergus, R., Singh, B., Hertzmann, A., Roweis, S. T., & Freeman, W. T. (2006). Removing camera shake from a single photograph. ACM Transactions on Graphics (SIGGRAPH), 25(3), 787–794. Goldluecke, B., & Cremers, D. (2010). An approach to vectorial total variation based on geometric measure theory. In IEEE international conference on computer vision and pattern recognition (pp. 327– 333). San Francisco, CA, USA. Gorodnitsky, I. F., & Rao, B. D. (1997). Sparse signal reconstruction from limited data using focuss: A re-weighted norm minimization algorithm. IEEE Transactions on Signal Processing, 45(3), 600–616. He, L., Qi, H., & Zaretzki, R. (2013). Beta process joint dictionary learning for coupled feature spaces with application to single image super-resolution. In IEEE conference on computer vision and pattern recognition (pp. 345–352). Portland, Oregon, USA. Hu, Z., Huang, J. B., & Yang, M. H. (2010). Single image deblurring with adaptive dictionary learning. In IEEE international conference on image processing (pp. 1169–1172). Hong Kong, China. Jenatton, R., Mairal, J., Obozinski, G., & Bach, F. (2011). Proximal methods for hierarchical sparse coding. Journal of Machine Learning Research, 12, 2297–2334. Jia, K., Wang, X., & Tang, X. (2013). Image transformation based on learning dictionaries across image spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2), 367–380. Joshi, N., Kang, S. B., Zitnick, C. L., & Szeliski, R. (2010). Image deblurring using inertial measurement sensors. In SIGGRAPH (pp. 1–8). Los Angeles, CA, USA. Joshi, N., Zitnick, C. L., Szeliski, R., & Kriegman, D. J. (2009). Image deblurring and denoising using color priors. In International conference on computer vision and pattern recognition (pp. 1–8). Miami, FL, USA. Kamilov, U., Bostan, E., & Unser, M. (2012). Wavelet shrinkage with consistent cycle spinning generalizes total variation denoising. IEEE Signal Processing Letters, 19(4), 187–190. Kenig, T., Kam, Z., & Feuer, A. (2010). Blind image deconvolution using machine learning for three-dimensional microscopy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2191–2204. Levin, A., Fergus, R., Durand, F., & Freeman, W. T. (2007). Coded exposure photography: Motion deblurring using fluttered shutter. ACM Transactions on Graphics (SIGGRAPH), 26(3), 70:1–70:9. Levin, A., Weiss, Y., Durand, F., & Freeman, W. T. (2009). Understanding and evaluating blind deconvolution algorithms. In International conference on computer vision and pattern recognition (pp. 1–8). Miami, FL, USA.

Li, D., Mersereau, R. M., & Simske, S. (2007). Blind image deconvolution through support vector regression. IEEE Transactions on Neural Networks, 18(3), 931–935. Lou, Y., Bertozzi, A. L., & Soatto, S. (2011). Direct sparse deblurring. Journal of Mathematical Imaging and Vision, 39(1), 1–12. Lucy, L. (1974). An iterative technique for the rectification of observed distributions. The Astronomical Journal, 79(6), 745–754. Ma, L., Moisan, L., Yu, J., & Zeng, T. (2013). A dictionary learning approach for poisson image deblurring. IEEE Transactions on Image Processing, 32(7), 1277–1289. Ma, S., Yin, W., Zhang, Y., & Chakraborty, A. (2008). An efficient algorithm for compressed MR imaging using total variation and wavelets. In IEEE international conference on computer vision and pattern recognition (pp. 1–8). Anchorage, Alaska, USA. Mairal, J., Bach, F., & Ponce, J. (2012). Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 791–804. Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11, 19–60. Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2009). Non-local sparse models for image restoration. In IEEE international conference on computer vision (pp. 2272–2279). Kyoto, Japan. Mairal, J., Elad, M., & Sapiro, G. (2008). Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17(1), 53–69. Mairal, J., Jenatton, R., Obozinski, G., & Bach, F. (2011). Convex and network flow optimization for structured sparsity. Journal of Machine Learning Research, 12, 2681–2720. Peleg, T., Eldar, Y., & Elad, M. (2013). Exploiting statistical dependencies in sparse representations for signal recovery. IEEE Transactions on Signal Processing, 60(5), 2286–2303. Portilla, J. (2009). Image restoration through l0 analysis-based sparse optimization in tight frames. In IEEE international conference on image processing (pp. 3909–3912). Cairo, Egypt. Richardson, W. H. (1972). Bayesian-based iterative method of image restoration. Journal of the Optical Society of America, 62(1), 55–59. Rodriguez, P., & Wohlberg, B. (2009). Efficient minimization method for a generalized total variation functional. IEEE Transactions on Image Processing, 18(2), 322–332. Rubinstein, R., Peleg, T., & Elad, M. (2013). Analysis k-svd: A dictionary-learning algorithm for the analysis sparse model. IEEE Transactions on Signal Processing, 61(3), 661–677. Rubinstein, R., Zibulevsky, M., & Elad, M. (2010). Double sparsity: Learning sparse dictionaries for sparse signal approximation. IEEE Transactions on Signal Processing, 58(3), 1553–1564. Rudin, L. I., Osher, S., & Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D, 60(1), 259–268. Shan, Q., Jia, J., & Agarwala, A. (2008). High-quality motion deblurring from a single image. ACM Transactions on Graphics (SIGGRAPH), 27(3), 73:1–73:10. Su, M., & Basu, M. (2002). A hybrid learning system for image deblurring. Pattern Recognition, 35(12), 2881–2894. Takeda, H., Farsiu, S., & Milanfar, P. (2008). Deblurring using regularized locally adaptive kernel regression. IEEE Transactions on Image Processing, 17(4), 550–563. Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10), 2231–2242. Wang, S., Zhang, L., Liang, Y., & Pan, Q. (2012). Semi-coupled dictionary learning with applications to image super-resolution and photosketch synthesis. In IEEE conference on computer vision and pattern recognition (pp. 2216–2223). Providence, Rhode Island, USA. Wang, Z., & Li, Q. (2011). Information content weighting for perceptual image quality assessment. IEEE Transactions on Image Processing, 20(5), 1185–1198.

123

Int J Comput Vis Wen, Y. W., Chan, R. H., & Yip, A. M. (2012). A primal-dual method for total-variation-based wavelet domain inpainting. IEEE Transactions on Image Processing, 21(1), 106–114. Whyte, O., Sivic, J., Zisserman, A., & Ponce, J. (2012). Non-uniform deblurring for shaken images. International Journal of Computer Vision, 98(2), 168–186. Xiang, S., Nie, F., & Zhang, C. (2010). Semi-supervised classification via local spline regression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(11), 2039–2053. Xu, L., Zheng, S., & Jia, J. (2013). Unnatural l0 sparse representation for natural image deblurring. In IEEE conference on computer vision and pattern recognition (pp. 1107–1114). Portland, Oregon, USA. Yang, J., Wang, Z., Lin, Z., Cohen, S., & Huang, T. S. (2012). Coupled dictionary training for image super-resolution. IEEE Transactions on Image Processing, 21(8), 3467–3478. Yang, J., Wright, J., Huang, T. S., & Ma, Y. (2010). Image superresolution via sparse representation. IEEE Transactions on Signal Processing, 19(11), 2861–2873.

123

Yuan, L., Sun, J., Quan, L., & Shum, H. Y. (2007). Image deblurring with blurred/noisy image pairs. ACM Transactions on Graphics (SIGGRAPH), 26(3), 1–10. Zeyde, R., Elad, M., & Protter, M. (2010). On single image scale-up using sparse-representations. In Proceedings of the 7th international conference on curves and surfaces (pp. 711–730). Avignon, France. Zhang, H., Wipf, D. P., & Zhang, Y. (2013). Multi-image blind deblurring using a coupled adaptive sparse prior. In IEEE conference on computer vision and pattern recognition (pp. 1051–1058). Portland, Oregon, USA. Zhang, H., Yang, J., Zhang, Y., & Huang, T. S. (2011). Sparse representation based blind image deblurring. In IEEE international conference on multimedia and expo (pp. 1–6). Barcelona, Spain.

Image Deblurring with Coupled Dictionary Learning

Recommend Documents