Statistical modeling in the shearlet domain for blind image quality assessment

The state-of-the-art blind image quality assessment (BIQA) metrics usually require a large amount of human scored images to train a regression model u...

0 downloads 52 Views 828KB Size

Download PDF

Multimed Tools Appl DOI 10.1007/s11042-016-3519-7

Statistical modeling in the shearlet domain for blind image quality assessment Wen Lu 1 & Tianjiao Xu 1 & Yuling Ren 1 & Lihuo He 1

Received: 28 January 2016 / Revised: 20 March 2016 / Accepted: 5 April 2016 # Springer Science+Business Media New York 2016

Abstract The state-of-the-art blind image quality assessment (BIQA) metrics usually require a large amount of human scored images to train a regression model used to judge image quality, which makes the results are heavily dependent on the size of training data. In this paper, we present an efficient BIQA algorithm based on shearlet transform without using human scored images. This is mainly based on that the degradation of the image leads to significant variation in the spread discontinuities in all directions. However, shearlet transform has a strong ability to localize distributed discontinuities. The natural scene statistics (NSS) of shearlet coefficients are applicable to indicate the variation of image quality. Experimental results on benchmark databases illustrate that the proposed method has a good consistency with the subjective assessment of human beings. Keywords Blind image quality assessment . Human scores free . Natural scene statistics . Shearlet transform

1 Introduction Blind image quality assessment (BIQA) algorithms are designed with the aim of evaluating the quality of distorted images without any prior information from the reference image [24], it has become a topic of high interest. The vast majority of BIQA algorithms

* Lihuo He [email protected] Wen Lu [email protected] Tianjiao Xu [email protected] Yuling Ren [email protected]

1

School of Electronic Engineering, Xidian University, Xi’an 710071, People’s Republic of China

Multimed Tools Appl

have been appeared during the past several years. By characterizing the unnaturalness of distorted images using natural scene statistics, DIIVINE [17] employs a two-stage framework to assess the image quality. Given natural scene statistics (NSS) features, a simple Bayesian inference model is introduced in BLIINDS-II [20] to predict image quality scores. Scene statistics of locally normalized luminance coefficients is utilized in BRISQUE [15] to quantify possible losses of Bnaturalness^ in the image due to the presence of distortions, thereby leading to a holistic measure of quality. SHANIA [11] relies on a sparse auto-encoder to predict the mean of shearlet coefficient amplitudes (MSCA) in fine scales only through the coarse scale of distorted image. The predicted MSCA acts as reference and the difference between the reference and distorted parts is used as an indicator to perceive the image quality. BoWSF [14] applies an improved bag-of-words model to encode NSS features for mapping to the quality scores with a linear combination. NIQE [16] adopts a corpus of pristine images from which to estimate the NSS model parameters, and takes a simple distance metric between the model statistics and those of the distorted image as a measure of the image quality. BIQES [1] calculates the dissimilarity of high and low pass versions of the image to capture multi-scale information for reflecting the image quality. BIQA method in [4, 5] uses learning to rank to predict perceptual image quality scores from preference image pairs. A contrast-specific BIAQ method [3] is designed using intensity mean, standard deviation, skewness, kurtosis and entropy to characterize unnaturalness by the degree of deviation from the NSS models. BIQA method in [12] combines the convolutional neural network and the Prewitt magnitude weight of segmented images to obtain the image quality score. In the medical field, CRVM and MKCRVM [8] are proposed in a cardiac perfusion-defect detection task for evaluating single-photon emission computed tomography (SPECT) image quality. The above mentioned BIQA algorithms [11, 15, 17, 20, 26–28] all typically require plenty of human scored images when training the regression model, which makes the performance of proposed algorithm limited of training database and heavily dependent on the size of training data. Furthermore, these methods usually map the extracted image features into a quality score by learning a mapping function, which makes the relationship between features and quality score ambiguous and the BIQA process invisible [25]. NIQE estimates the NSS model parameters without using the human scored images. Nevertheless, the low pass filter adopted in this method fails to capture the multi-scale behavior of image. The degradation of the image leads to significant variation in the spread discontinuities in all directions of every scale image. However, the significant information of spread discontinuities in all directions of distorted image is absence in NIQE. While BIQES calculates multi-scale information of images, they use wavelet decomposition to get finer scale transformation. Since wavelets with the isotropic support cannot efficiently encode anisotropic features of edges, a great deal of directional information cannot be detected [13]. Based on the fact that shearlet transform [2, 13] has a strong ability to localize distributed discontinuities which overcome the limitation of wavelets, an efficient blind image quality assessment algorithm is proposed based on shearlet transform without using human scored images. In the proposed framework, the NSS features of distorted images are first extracted in the shearlet domain. Next patches with high information are selected and then the distribution of extracted features is fitted with a multivariate Gaussian (MVG) model to get the mean and covariance. Finally a simple distance metric between the distribution of the features extracted from the corpus of natural images and it of distorted image is employed to measure the distorted image quality.

Multimed Tools Appl

The remainder of the paper is organized as follows. Section 2 illustrates the detailed implementation of the proposed method. Section 3 presents the experimental results and a though analysis. Finally, conclusion is made in Section 4.

2 Blind image quality assessment in shearlet domain The framework of the proposed method is summarized in Fig. 1. An image entering the BIQA flow is first subjected to local shearlet coefficient computation. This stage consists of partitioning the image into equally sized n×n blocks, computing a local shearlet transform on each of the blocks to capture the multi-scale and significant variation in the spread discontinuities in all directions of image. The second stage applies a classical NSS model to extract quality aware features of each block of shearlet coefficients. Finally, we select patches with rich information whose contrasts are higher than a threshold and fit the distribution of extracted features with a MVG model [16] to get the mean and covariance. Finally a distance metric is introduced to predict the distorted image quality through measuring the distances between the mean/covariance of the corpus of natural images and the mean/covariance of the distorted image.

2.1 Shearlet transform Multivariate problems in applied mathematics are typically governed by anisotropic phenomena such as singularities concentrated on lower dimensional embedded manifolds or edges in digital images. The proposed BIQA algorithm is based on shearlet transform [2, 13]. Shearlet transform is capable for addressing anisotropic and directional information at different scales by applying a so-called shearing operator along with the anisotropic scaling operator. The goal of shearlet transform used in the proposed method is to compute shearlet coefficients of the signal f ð1Þ DST s;a;t ð f Þ ¼< ψs;a;t ; f >; a > 0; s∈ℝ; t∈ℝ 2 ; f ∈l2 Z 2

Fig. 1 Framework for the proposed BIQA

Multimed Tools Appl

where ψs,a,t is a cone-adapted discrete shearlet [13], which is defined as ψs;a;t ðxÞ ¼ a−3=4 ψ D−1 a;s ðx−t Þ

ð2Þ

= k2j/2 (k ∈ Z), where Da;s ¼ a −a1=2 s 0a1=2 Þ, aj = 2j (j ∈ Z), sj,k = k a1/2 j t j;k;m ¼ Da j ;s j ;k m∈Z 2 . a is the scaling parameter, s is the shear parameter, t is the translation parameter. With these parameters, shearlet systems are designed at different scales, orientations and locations to efficiently encode anisotropic features such as singularities concentrated on lower dimensional embedded manifolds or edges in digital images which overcome the weakness of the wavelet transform. There are many good properties of shearlet transform [2, 13]. For instance, shearlets show highly directional sensitivity which doubles the number of orientations at each finer scale. Shearlets are well localized and have fast decay in the spatial domain. Basis elements of shearlet transform with much higher directional sensitivity and of various shapes, to be able to capture the intrinsic geometrical features of image. The degradation of the image leads to significant variation in the spread discontinuities in all directions. However, shearlet transform has a strong ability to localize distributed discontinuities. With these good properties, shearlet transform is very suitable to describe the variation of image generated by distortion.

2.2 NSS features extraction in shearlet domain Assuming that the size of the input image is 2k (k> =7 is a positive integer). Then, we divide the image into no-overlapped patches of size 128 × 128, and apply shearlet transform on each patch to obtain its shearlet coefficients. Each patch is transformed into different directional subband over 4 different scales. The structure of shearlet coefficients of each patch is shown in Table 1. The obtained subband coefficients are then utilized to extract a series of statistical features. A process of local mean removal and divisive normalization is applied on the subband coefficients before the classical NSS model [19, 23] introduced in this paper. ∧

S ðm; nÞ ¼

S ðm; nÞ−μðm; nÞ σðm; nÞ þ 1

ð3Þ

where m ∈ {1, 2 … M}, n ∈ {1, 2 … N} are indices, M and N are the subband dimensions, and μðm; nÞ ¼

K L X X

ωk;l S ðm þ k; n þ l Þ

ð4Þ

k¼−K l¼−L

Table 1 Structure of shearlet coefficient of each patch Scale index

Orientation number

Matrix form

S1 S2

10 10

(128 × 128) × 10 (128 × 128) × 10

S3

18

(128 × 128) × 18

S4

18

(128 × 128) × 18

Multimed Tools Appl

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u K L h i uX X σðm; nÞ ¼ t ωk;l S ðm þ k; n þ lÞ−μði; jÞ 2

ð5Þ

k¼−K l¼−L

are the local mean and contrast of subband coefficients, where ω = {ωk,l | k = −K,…, K, l = −L,…, L} defines a unit-volume Gaussian window. The coefficient S ∧ ðm; nÞ of nature images follow a Gaussian distribution [19], however, this kind of distribution is disturbed by the distortion occurred in the nature images. The degree of modification can be indicative of perceptual distortion severity. The generalize Gaussian distribution (GGD) is utilized to capture the distortion of subband coefficients, which is given by f x; μ; σ2 ; γ ¼

bγ expð−ðbðx−μÞÞγ Þ; 2Γ ð1=γ Þ

where μ, σ2, γ are the mean, variance, and shape-parameter of the distribution and 1 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Γ ð3=γ Þ=Γ ð1=γ Þ; b¼ σ where Γ(·) is the gamma function Z ∞ t x−1 e−t dt x > 0: Γ ð xÞ ¼

ð6Þ

ð7Þ

ð8Þ

0

Since shearlet subbands response are zero mean, we have to estimate two parameters (σ2, γ) using the moment-matching based approach appeared in [21] for each subband. These form the first set of features that will be used to capture image distortion. h i ð9Þ f ¼ f σ2 ; f γ : The distribution of transformed shearlet coefficients (3) with the neighboring coefficients has been observed to follow a fairly regular structure. However, this correlation structure is violated by the distortions. This deviation can be captured by analyzing the sample distribution of the products of pairs of adjacent coefficients computed along horizontal (H), vertical (V) and diagonal orientations (D1, D2). The distribution of neighboring coefficients is well-modeled as following a zero mode asymmetric generalized Gaussian distribution (AGGD) [10] 8 λ ! > λ −x > > exp − > ;x < 0 > > 1 βl > > ð þ β ÞΓ β r < l λ ð10Þ f x; λ; σ2l ; σ2r ¼ λ ! > λ −x > > exp − > ; x ≥0; > > βr > > ðβ l þ β r ÞΓ 1 : λ 2 1 u ¼ ðβ l −β r ÞΓ =Γ ; λ λ

ð11Þ

The parameters of the AGGD (λ, βl, βr, μ) can be efficiently estimated using the momentmatching based approach in [21].

Multimed Tools Appl Table 2 Summary of extracted features Feature ID procedure

Feature description

Computation procedure

fσ2, fγ

Shape and variance

Fit GGD to subband coefficients

fH

Shape, left variance, right variance, mean

Fit AGGD to H pairwise products

fV

Shape, left variance, right variance, mean

Fit AGGD to V pairwise products

fD1 fD2

Shape, left variance, right variance, mean Shape, left variance, right variance, mean

Fit AGGD to D1 pairwise products Fit AGGD to D2 pairwise products

Thus for each subband coefficient of paired product, 16 parameters (4 parameters/orientation × 4 orientations) are generated, yielding the next four sets of features: h i H H H ð12Þ fH ¼ fH λ ; f β l ; f βr ; f u ; h i f V ¼ f Vλ ; f Vβl ; f Vβr ; f Vu ;

ð13Þ

h i D1 D1 D1 f D1 ¼ f D1 λ ; f βl ; f βr ; f u ;

ð14Þ

h i D2 D2 D2 f D2 ¼ f D2 λ ; f βl ; f βr ; f u :

ð15Þ

Until now, all the features based on NSS and their relationships with various distortions have been introduced and listed in Table 2. Let fSNSS denotes all the features extracted from an image.

Table 3 PLCC and SROCC of different metrics on the LIVE II database Metric

PLCC

SROCC

PLCC

DIIVINE

0.9220

0.9319

BLIINDS-II BRISQUE

0.9386 0.9229

0.9323 0.9139

JPEG2K

SROCC

PLCC

0.9210

0.9483

0.9880

0.9821

0.9426 0.9734

0.9331 0.9647

0.9635 0.9851

0.9463 0.9786

JPEG

SROCC

WN

NIQE

0.9370

0.9172

0.9564

0.9382

0.9773

0.9662

QAC

0.8452

0.8388

0.8828

0.8524

0.9161

0.9484

Proposed

0.9043

0.8967

0.9054

0.8544

0.9809

0.9805

Gblur

FF

All

DIIVINE

0.9230

0.9210

0.8680

0.8714

0.9170

0.9160

BLIINDS-II

0.8994

0.8912

0.8790

0.8519

0.9232

0.9202

BRISQUE NIQE

0.9506 0.9525

0.9511 0.9341

0.9030 0.9128

0.8768 0.8594

0.9424 0.9147

0.9208 0.9135

QAC

0.9088

0.9118

0.8166

0.8143

0.8273

0.8274

Proposed

0.9393

0.9420

0.8879

0.8833

0.7957

0.7884

Multimed Tools Appl Table 4 Mean of PLCC and SROCC of different metrics on the TID2013 database Metric

DIIVINE

BLIINDS-II

BRISQUE

NIQE

QAC

Proposed

PLCC

0.654

0.628

0.651

0.426

0.495

0.445

SROCC

0.549

0.536

0.573

0.317

0.390

0.406

h i f SNSS ¼ f σ2 ; f γ ; f V ; f H ; f D1 ; f D2 J K;

ð16Þ

where J ϵ R1 denotes the number of directional subband of each patch, and K ϵ R1 represents the number of patch with the size of 128 × 128 contained in an image.

2.3 Quality pooling To predict the image quality effective and efficient, it is necessary to select significant patches according to human perceptual characteristics. Humans tend to more sensitive to the high contrast region in the image, and make more heavily weight their judgments of image quality from the sharp image regions [6]. The sharp regions also have high contrast and rich information form the perceptive of information theory. Hence, we select the patches with a high contrast for simulating the perceptual process of salient attention. The variance is calculated by Eq. (5) in each pixel position. And the mean variance of the patch is obtained to indicate the patch contrast. Then those patches with a supra-threshold contrast above the threshold will be adopted. The NSS features of the selected patches can be fitted in to a MVG model [16]. Then we calculate the mean and covariance of the distorted image features:

80

90

90

70

80

80

70

70

60

60

DMOS

DMOS

50

40

DMOS

60

50 40

30

10

40

30

20

JPEG2000 images Fitting with Logistic Function 3.5

4

4.5

5

5.5 Q

6

6.5

7

7.5

30

JPEG images Fitting with Logistic Function

20 10

3

8

3

4

5

6 Q

7

8

Gaussian blur images Fitting with Logistic Function

20 10

9

3

4

5

(b) JPEG

80

80

70

70

60

60

50

50 DMOS

DMOS

(a) JPEG2000

40

30

6

7

8 Q

2

4

6

8

10 Q

12

(d) White noise

14

16

40

20

18

10

Fast-fading images Fitting with Logistic Function 3

9

10

(c) Gaussian blur

30

White Noise images Fitting with Logistic Function

20

10

50

4

5

6

7

8 Q

9

10

11

12

13

(e) Fast-fading

Fig. 2 Nonlinear scatter plots of MOS versus the proposed method test on LIVE II database

11

12

13

Multimed Tools Appl

m ¼ meanð f SNSS Þ;

ð17Þ

c ¼ covarianceð f SNSS Þ:

ð18Þ

The nature mean and covariance is trained from a varied set of 154 undistorted images with size ranging from 480 × 320 to 1280 × 720 as the ground truth. The quality of the distorted image is expressed as the distance between the distorted and nature model using their mean and covariance:

α c þ c −1 n d ðmn −md Þ ; ð19Þ Q ¼ ðmn −md ÞT 2 where mn, md and cn, cd are the mean vectors and covariance matrices of nature images and distorted image, and α = 0.25.

3 Experimental results In this section, the proposed method is tested on four benchmark databases: LIVE II [22], TID2013 [18], CSIQ [9], LIVE Multiply Distorted [7]. The LIVE II database contains 29 highresolution images as reference and corresponding 779 distorted images with 5 distortion types. The TID2013 database contains 1700 distorted images, which are derived from 25 reference images. There are 17 types of distortion of four levels for each reference image. The CSIQ database contains 30 high quality images and 866 distorted versions with 6 distortion types. The LIVE Multiply Distorted IQA database contains 2 types of hybrid distortions and we treat each type of them receptively as a single database, denoted by LIVE MD1 and LIVE MD2. The criteria considered in experiment are the Spearman’s rank ordered correlation coefficient (SROCC) and the Pearson linear correlation coefficient (PLCC). A value close to 1 for SROCC and PLCC indicates superior correlation with human perception.

3.1 Consistency experiment In this subsection, we compare the performance of the proposed BIQA framework with the well-known blind image quality assessment metrics DIIVINE, BLIINDS-II, BRISQUE, NIQE and QAC [25]. The evaluation results for all IQA methods being compared are given as benchmark in Tables 3 and 4. Figure 2 presents the scatter plots of mean opinion score (MOS) on LIVE II database versus the predicted score by objective metrics after the nonlinear mapping. The proposed method achieves a competed performance with the state-of-the-art techniques [11, 15, 17, 20] which heavily exploit the human subjective scores in training and performs better than NIQE in which the IQA model trained without using human scored image on the TID2013 database. Furthermore, the proposed method has very good consistency with human perception of image quality.

3.2 Rationality experiment To verify the rationality of the proposed framework, our BIQA method is tested on the Einstein image with different distortions, which are blurring (with smoothing window of W × W), additive Gaussian noise (mean = 0, variance = V), JPEG compression (compression rata = R) and

Multimed Tools Appl 14

12

12

10

10

8

8

Q

Q

14

6

6

4

4

2

2

0

0

0

2

4

6

8

10 W

12

14

16

18

20

0

0.001

0.002

0.003

(a) Blurring

0.004

0.005 V

0.006

0.007

0.008

0.009

0.01

(b) Gaussian 14

12

12

10

10

8

8

Q

Q

14

6

6

4

4

2

2

0

0

0

5

10

15

20

25 R

30

35

40

45

50

0

(c) JPEG Compression

0.005

0.01

0.015

0.02

0.025 D

0.03

0.035

0.04

0.045

0.05

(d) Impulsive Salt-Pepper Noise

Fig. 3 Results of rationality experiment

impulsive Salt-Pepper noise (density = D) and Fig. 3 shows the Einstein image with different types of distortions and the metrics prediction trend to the corresponding image, respectively. It is found that the framework proposed prediction trend to rise when the degree of the distortion is increasing. The proposed method has the same meaning of the quality with DMOS which has a higher value with lower visual perception quality. It is consistent well with the tendency of the decreasing image quality in fact.

3.3 Cross-database performance evaluation Since BIQA algorithms are trained and tested on all sorts of splits of a singled database in the former experiment, the evaluation of generalization capability is insufficient. In the real application, the BIQA system is very possible to deal with distortions that not exist in the training Table 5 Evaluation results when trained on LIVE II database Metric

TID2013

CSIQ

MD1

MD2

PLCC

SROCC

PLCC

SROCC

PLCC

SROCC

PLCC

SROCC

DIIVINE

0.545

0.355

0.697

0.596

0.767

0.708

0.702

0.602

BLIINDS-II

0.470

0.393

0.724

0.577

0.710

0.665

0.302

0.015

BRISQUE

0.475

0.367

0.742

0.557

0.866

0.791

0.459

0.299

NIQE

0.398

0.311

0.716

0.627

0.909

0.871

0.848

0.795

QAC

0.437

0.372

0.708

0.490

0.538

0.396

0.672

0.471

Proposed

0.445

0.406

0.677

0.699

0.854

0.770

0.783

0.678

Multimed Tools Appl Table 6 Weighted-mean of results on Table 5 Metric

DIIVINE

BLIINDS-II

BRISQUE

NIQE

QAC

Proposed

PLCC

0.595

0.525

0.548

0.512

0.509

0.531

SROCC

0.435

0.424

0.424

0.429

0.402

0.503

database. Therefore, it is essential to evaluate the generalization capability of algorithms. In this experiment, we train DIIVINE, BLIINDS-II, BRISQUE on one database then tested on other databases, since they need to be trained with human opinion scores. As for NIQE, QAC, and the proposed algorithm, we could omit the training stage. We train algorithms on LIVE II database and test on other databases. The results are shown in Table 5. And a weight-average performance result based on Table 5 is showed on Table 6. The weight of each database is calculated linearly dependent on the number of distorted images contained in that database. We also train algorithms on TID2013 database then test on other databases. The results are shown in Tables 7 and 8. The results in Tables 5, 6, 7 and 8 lead us to the following conclusion. First, human score free methods NIQE and our algorithm perform better than other algorithms (While QAC is only capable of predicting 4 types of distortions well, its performance is not good). Especially when trained on TID2013 then test on other databases, the generalization of human score free methods can be seen much better than other methods. Second, the result of our methods is close to NIQE, especially on TID2013 its performance is better than NIQE. Third, our methods perform best or nearly best in most scenarios. From these results, we can make sure that out algorithm have a strong generalization competing to other methods.

3.4 Multiscale experiment It is well understood that images are naturally multiscale and that the early IQA systems involve decompositions over scales. As a consequence, a multiscale feature extraction approach based on shearlet transform is implemented in this paper. The comparisons of prediction results on two benchmark databases for 2 scales, 3 scales, and 4 scales feature extraction are shown in Tables 9 and 10. In addition, for better evaluating the multiscale performance in TID2013 database, we choose some common distortions that are JPEG2K, JPEG, additive white Gaussian noise and Gaussian blur rather than take all pictures to train and test out algorithm. The performance of the

Table 7 Evaluation results when trained on TID2013 database Metric

LIVE

CSIQ

MD1

MD2

PLCC

SROCC

PLCC

SROCC

PLCC

SROCC

PLCC

SROCC

DIIVINE

0.093

0.042

0.255

0.146

0.669

0.639

0.367

0.252

BLIINDS-II

0.089

0.076

0.527

0.456

0.690

0.507

0.222

0.032

BRISQUE

0.108

0.088

0.728

0.639

0.807

0.625

0.591

0.184

NIQE

0.904

0.906

0.716

0.627

0.909

0.871

0.848

0.795

QAC

0.863

0.868

0.708

0.490

0.538

0.396

0.672

0.471

Proposed

0.796

0.788

0.677

0.699

0.854

0.770

0.783

0.678

Multimed Tools Appl Table 8 Weighted-mean of results on Table 7 DIIVINE

BLIINDS-II

BRISQUE

NIQE

QAC

Proposed

PLCC

0.251

0.349

0.491

0.821

0.744

0.752

SROCC

0.172

0.275

0.384

0.775

0.618

0.738

Table 9 PLCC and SROCC on LIVE II database Type

Two scales

Three scales

Four scales

PLCC

SROCC

PLCC

SROCC

PLCC

SROCC

JPEG2K

0.8962

0.9014

0.8944

0.8974

0.9043

0.8967

JPEG

0.8647

0.8574

0.9044

0.8629

0.9054

0.8544

WN

0.9805

0.9787

0.9807

0.9788

0.9809

0.9805

Gblur

0.8992

0.9111

0.9230

0.9397

0.9393

0.9420

FF

0.8408

0.8376

0.8700

0.8648

0.8879

0.8833

proposed method is generally improved with the increasing of image scale utilized in feature extraction and the optimal results are achieved on the fourth scale.

3.5 The performance of distortion types Although our method is not designed for specific distortion types, we still want to exploit its performance on different distortion types. Here, we test our methods on TID2013 database. The results are shown in Table 11. It can be seen that first our algorithm achieve positive performance on common distortions, such as Badditive Gaussian noise^, Bspatially correlated noise^, Bhigh frequency noise^, BGaussian blur^, BJPEG compression^, BJPEG2000 compression^, and BSparse sampling and reconstruction^. Second, there are still some special distortion types, such as Bquantization noise^, Bimage denoising^, Bcontrast change^, Bmultiplicative Gaussian noise^, Bcomfort noise^, BJPEG transmission errors^ and BJPEG2000 transmission errors^, our method does not achieve a promising result. The possible reason may be shearlet features could not seize the essence of those distortions. In the future work, we will work hard on how to deal with these special distortion types better.

Table 10 PLCC and SROCC on TID2013 database Type

Two scales

Three scales

Four scales

PLCC

SROCC

PLCC

SROCC

PLCC

SROCC

JPEG2K

0.8690

0.8633

0.8880

0.8698

0.8926

0.8713

JPEG

0.8932

0.8608

0.8964

0.8394

0.8974

0.8396

WN

0.8833

0.8906

0.8382

0.8463

0.8152

0.8260

Gblur

0.8529

0.8529

0.8529

0.8454

0.8687

0.8623

Multimed Tools Appl Table 11 PLCC and SROCC on TID2013 database Number

Distortion type

PLCC

SROCC

#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24

Additive Gaussian noise Additive noise in color components Spatially correlated noise Masked noise High frequency noise Impulse noise Quantization noise Gaussian blur Image denoising JPEG compression JPEG2000 compression JPEG transmission errors JPEG2000 transmission errors Non eccentricity pattern noise Local block-wise distortions Mean shift (intensity shift) Contrast change Change of color saturation Multiplicative Gaussian noise Comfort noise Lossy compression of noisy images Image color quantization with dither Chromatic aberrations Sparse sampling and reconstruction

0.8908 0.7648 0.8371 0.8004 0.9305 0.7973 −0.6758 0.8882 −0.7414 0.8975 0.9010 −0.3339 −0.4796 0.1681 0.2710 0.2390 −0.2201 0.2506 −0.5947 −0.1293 0.7878 0.7099 0.8263 0.8689

0.8862 0.7177 0.8300 0.7355 0.8914 0.8011 0.7285 0.8685 0.7137 0.8193 0.8822 0.4022 0.5829 0.0481 −0.2270 −0.1809 0.2859 −0.1508 0.6082 0.2079 0.7873 0.7108 0.5706 0.8758

3.6 The evaluation of feature performances To verify the effectiveness and rationality of each type of feature in the shearlet domain, we assess the performance of each type of feature on LIVE database II, respectively. Figure 4 shows the 1

Fig. 4 Feature effectiveness analysis experiment

0.8 f

0.6 PLCC

γ

f2 δ

0.4

f f

μ

0.2

f

βl

f 0

1 JPEG

βr

1.5

2 JP2K

2.5

3 WN

3.5

4 Gblur

4.5

5 FF

Multimed Tools Appl

PLCC values of each type of feature on different distortion dataset. We can see that different types of features have different degrees of correlation with image distortion which can reflect the different abilities to predict image quality. And it is obvious that by using integrated features our algorithm have a better performance than using single type of feature.

4 Conclusions A novel blind image quality assessment based on shearlet transform is proposed, which is completely free of human subjective scores in learning. The Bnature mean and covariance^ obtained from a corpus of nature images without using human scores, and the quality of the distorted image is expressed as the distance metric between the nature and distorted model using their mean and covariance. Experimental results illustrate that the proposed method has a good consistency with the subjective assessment of human beings. Furthermore, the potential of the NSS in shearlet domain is deserved to explore to implement the IQA task. Acknowledgments This research was supported partially by the National Natural Science Foundation of China (Nos. 61372130, 61432014, 61501349, 61571343), the Fundamental Research Funds for the Central Universities (Nos. BDY081426, JB140214, XJS14042), the Program for New Scientific and Technological Star of Shaanxi Province (No. 2014KJXX-47), the Project Funded by China Postdoctoral Science Foundation (No. 2014 M562378).

References 1. Ashirbani S, Wu QMJ (2015) Utilizing image scales towards totally training free blind image quality assessment. IEEE Trans Image Process 24(6):1879–1892 2. Easley G, Labate D, Lim WQ (2008) Sparse directional image representations using the discrete shearlet transform. Appl Comput Harmon Anal 25(1):25–46 3. Fang Y, Ma K, Wang Z, Lin W, Fang Z, Zhai G (2015) No-reference quality assessment of contrast-distorted images based on natural scene statistics. IEEE Signal Process Lett 22(7):838–842 4. Gao F, Tao D, Gao X, Li X (2015) Learning to rank for blind image quality assessment. IEEE Trans Neural Netw Learn Syst 26(10):2275–2290 5. Gao F, Yu J (2016) Biologically inspired image quality assessment. Signal Process 124:210–219 6. Hassen R, Wang Z, and Salama M (2010) No-reference image sharpness assessment based on local phase coherence measurement. In: 2010 I.E. international conference on Acoustics Speech and Signal Processing, pp 2434–2437 7. Jayaraman D, Mittal A, Moorthy AK, Bovik AC (2012) Objective quality assessment of multiply distorted images. In: 2012 46th Asilomar Conference on Signals, Systems and Computers (ASILOMAR). IEEE, pp 1693–1697 8. Kalayeh MM, Marin T, Brankov JG (2013) Generalization evaluation of machine learning numerical observers for image quality assessment. IEEE Trans Nucl Sci 60(3):1609–1618 9. Larson EC, Chandler DM (2010) Most apparent distortion: full-reference image quality assessment and the role of strategy. J Electron Imaging 19(1):143–153 10. Lasmar NE, Stitou Y, Berthoumieu Y (2009) Multiscale skewed heavy tailed model for texture analysis. In: 2009 I.E. International Conference on Image Processing (ICIP). IEEE, pp 2281–2284 11. Li Y, Po LM, Xu X, Feng L (2014) No-reference image quality assessment using statistical characterization in the shearlet domain. Signal Process Image Commun 29(7):748–759 12. Li J, Zou L, Yan J, Deng D, Qu T, Xie G (2015) No-reference image quality assessment using Prewitt magnitude based on convolutional neural networks. SIViP 1–8 13. Lim WQ (2013) Nonseparable shearlet transform. IEEE Trans Image Process 22(5):2056–2065

Multimed Tools Appl 14. Lu Y, Xie F, Liu T, Jiang Z, Tao D (2015) No reference quality assessment for multiply-distorted images based on an improved bag-of-words model. IEEE Signal Process Lett 22(10):1811–1815 15. Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708 16. Mittal A, Soundararajan R, Bovik AC (2013) Making a completely blind image quality analyzer. IEEE Signal Process Lett 20(3):209–212 17. Moorthy AK, Bovik AC (2011) Blind image quality assessment: from natural scene statistics to perceptual quality. IEEE Trans Image Process 20(12):3350–3364 18. Ponomarenko N, Ieremeiev O, Lukin V, Egiazarian K, Jin L, Astola J, Vozel B, Chehdi K, Carli M, Battisti F, Jay Kuo C.-C (2013) Color image database TID2013: Peculiarities and preliminary results. In: 2013 4th European Workshop on Visual Information Processing (EUVIP). IEEE, pp 106–111 19. Ruderman DL (1994) The statistics of natural images. Netw Comput Neural Syst 5(4):517–548 20. Saad MA, Bovik AC, Charrier C (2012) Blind image quality assessment: a natural scene statistics approach in the DCT domain. IEEE Trans Image Process 21(8):3339–3352 21. Sharifi K, Leon-Garcia A (1995) Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video. IEEE Trans Circuits Syst Video Technol 5(1):52–56 22. Sheikh HR, Sabir MF, Bovik AC (2006) A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans Image Process 15(11):3440–3451 23. Srivastava A, Lee AB, Simoncelli EP, Zhu SC (2003) On advances in statistical modeling of natural images. J Math Imaging Vision 18(1):17–33 24. Wang Z, Bovik AC (2006) Modern image quality assessment. Morgan and Claypool Publishing Company, New York 25. Xue W, Zhang L, Mou X (2013) Learning without human scores for blind image quality assessment. In: 2013 I.E. conference on computer vision and pattern recognition (CVPR). IEEE, pp 995–1002 26. Yang J, Ding Z, Guo F, Wang H, Nick H (2015) A novel multivariate performance optimization method based on sparse coding and hyper-predictor learning. Neural Netw 71:45–54 27. Yang J, He S, Lin Y, Lv Z (2015) Multimedia cloud transmission and storage system based on internet of things. Multimed Tools Appl 1–16 28. Yang J, Liu Y, Meng Q, Chu R (2015) Objective evaluation criteria for stereo camera shooting quality under different shooting parameters and shooting distances. IEEE Sensors J 15(8):4508–4521

Wen Lu received the BSc, MSc and PhD degrees in signal and information processing from Xidian University, China, in 2002, 2006 and 2009 respectively. He is currently an assistant professor at Xidian University and postdoctoral research in the department of electrical engineering at Stanford University, USA. His research interests include image & video quality metric, human vision system, computational vision. He has published 2 books and around 30 technical articles in refereed journals and proceedings including IEEE TIP, TSMC, Neurocomputing, Signal processing etc.

Multimed Tools Appl

Tianjiao Xu received the BSc degrees in electronic information engineering from Xidian University, China in 2015. He is currently a graduate student at Xidian University. His research interest is image quality metric.

Yuling Ren received the BSc in communication engineering from Xi’an University of Posts & Telecommunications, China, in 2013. She is currently a postgraduate student at Xidian University. Her research interests include image & video quality metric, vision quality assessment.

Lihuo He is currently a postdoctoral fellow at Xidian University. He received the BSc degree in electronic and information engineering and PhD degree in pattern recognition and intelligent systems from Xidian University, Xi’an, China, in 2008 and 2013. His research interests focus on image/video quality assessment, cognitive computing, and computational vision.

Statistical modeling in the shearlet domain for blind image quality assessment

Recommend Documents