Sample adaptive color space transform for screen content video coding

In this paper, an in-loop color space transform is proposed for screen content video coding to improve coding efficiency. The transform converts the c...

6 downloads 101 Views 1MB Size

Download PDF

SIViP DOI 10.1007/s11760-016-0994-2

ORIGINAL PAPER

Sample adaptive color space transform for screen content video coding Je-Won Kang1 · Woo-Shik Kim2 · Kei Kawamura3

Received: 28 March 2016 / Revised: 26 September 2016 / Accepted: 28 September 2016 © Springer-Verlag London 2016

Abstract In this paper, an in-loop color space transform is proposed for screen content video coding to improve coding efficiency. The transform converts the color space of an input block to a better color space to improve rate-distortion performance by decorrelating among color components. Specifically, to derive the optimal color transform the principal component analysis is performed using spatially or temporally adjacent pixels in each block, and the derived transform is applied to the residual samples after intra or inter prediction. Then, rate-distortion optimization is performed to select the better color space between the original color space of the input signal and the derived one. It is demonstrated with the experimental results that the proposed method provides significant coding gains. Keywords Color space transform · Screen content video coding · Color component analysis

This work was developed in part while J.-W. Kang was with Qualcomm, Inc. This research was also partially supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2014R1A1A2056587).

B

Je-Won Kang [email protected]

1 Introduction Screen content video has drawn increasing attention for its usage in emerging multimedia applications such as screencast, screen sharing, and augmented reality with text overlays. Often time screen content video is represented in a 4:4:4 RGB format to satisfy the high color fidelity requirement because any chroma subsampling or color transform can incur significant visual artifact due to color distortion. However, if the original RGB color space is directly coded to maintain the high color fidelity, there will be huge increment in the bit-rates due to inter-component correlation. Color space transform (CST) has been extensively studied for video coding [1–6] to achieve both coding efficiency improvement by reducing the inter-component correlation and maintaining the color fidelity at the same time. In this paper, a sample adaptive color space transform for video coding is proposed. The transform is derived from component analysis of previously coded local samples. Note that this paper presents an extension of our previous work in [1,7]. The detailed implementation of the proposed method is presented along with theoretical backgrounds and analytic studies. The rest of the paper is organized as follows. In Sect. 2, previous works are reviewed. In Sect. 3, we show the proposed method. Experimental results are given in Sect. 4. We conclude with remarks in Sect. 5.

Woo-Shik Kim [email protected] Kei Kawamura [email protected] 1

Department of Electronics Engineering, Ewha W. University, Seoul, Korea

2

Qualcomm, Inc., San Diego, CA, USA

3

KDDI RnD Laboratories, Inc., Saitama, Japan

2 Previous works The key coding tools incorporated into the HEVC main profile [8] are developed mainly for coding natural videos. Moving Picture Expert Group (MPEG) and Video Coding Expert Group (VCEG) have continued developing extensions of HEVC to support various types of videos. HEVC RExt

123

SIViP

supports extended chroma formats. On top of HEVC RExt, the HEVC Screen Content Coding (HEVC SCC) standard [9] is being developed to develop coding tools dedicated to screen content videos. Several coding techniques in HEVC RExt and HEVC SCC yield improved coding efficiency when applied for screen contents coding. The cross-component prediction (CCP) [10] in HEVC RExt improves coding efficiency by reducing the inter-component redundancies. The intra block copy (IBC) [11] in HEVC SCC finds a block in the previously decoded regions within the same picture and uses it as a reference block to predict the current coding block as in the conventional motion compensation. We review the previous works on the color transform coding. Color space transform (CST) converts an RGB component vector to a new coordinate. Because the transforms such as YCgCo, YIQ, HSV, etc., are shown to be efficient, they have attracted much attention to various image and video applications [12,13]. For video coding, YCbCr color space transform has been widely used. The transform converts RGB components to one luma (Y) and two chroma (Cb and Cr) components. However, YCbCr color conversion unequally distributes the power in the original signal into the three transformed components and may cause color distortion when it is converted back to RGB space for display. YCgCo (or YCgCo-R) [2] has been employed in H.264/AVC Fidelity Range Extension [14] to improve coding gain. YCgCo transform can achieve a better transform coding gain than YCbCr for 4:4:4 video coding [2]. These transforms are applied outside the coding loop. In other words, the forward transform is applied before encoding, and the backward transform is applied after decoding. However, the decoder would have a critical rendering problem with the decoder has no knowledge about the transform used in the encoder. As compared to the out-loop approaches, an in-loop color space transform converts the original color space of an input signal to a desired color space during encoding, and converts back to the original color space during decoding. Zhang et al. propose an in-loop YCgCo transform in [3,15] that is a part of the HEVC SCC standard at this moment. Whether applied inside or outside the coding loop, they involve a fixed linear or nonlinear operation, so that the same transform is applied to all blocks. Therefore, we refer them as a fixed color space transform (F-CST) method in this paper. Fig. 1 Block diagram of the encoder in the proposed method. The SA-CST is added into the HEVC RExt framework

123

Although F-CST methods can improve coding efficiency, they may not fit generic color image/video contents. Even within the same frame or sequence, each local area can have different color characteristics in the screen content videos. Hence, CST can be more efficient if it adapts to a video source. Bordes et al. use independent component analysis (ICA) to align a basis color axis to the color characteristics of an input video [16]. Block-based adaptive CST methods are proposed in [5] and [3]. The idea is to divide a picture into a variable block size and use a different transform in each block among several pre-defined color transforms. However, an encoder may choose a suboptimal color space if none of the pre-defined transforms are suitable for varying statistical characteristics of video contents. The preliminary work of the proposed method can be found in [1,7,17], where KLT is used to adaptively construct a color transform for each block by considering local color characteristics, and shows significantly improved coding gain over the HEVC main profile. However, it is observed that only small coding gain is retained with the later versions of the HEVC extensions [9,18]. One of the reason is that all blocks are coded in the transformed color space though KLT is not theoretically optimal and sometimes incurs a coding loss for a non-Gaussian source [19].

3 Proposed method 3.1 Overview We show a block diagram of the proposed method integrated into the HEVC SCC framework in Fig. 1, where the modification is highlighted with a gray color. First, the intra or inter prediction is performed to produce a residual signal for each color component. Then, a 3 × 3 color space transform matrix T is generated by using the reconstructed samples in the spatially or temporally neighboring blocks. The derived transform is applied to the residual signal sr ∈ R3 consisting of each of R, G, and B component to convert the original RGB color space into the new color space. The transformed vector z ∈ R3 is given as,

SIViP

Fig. 2 Samples used for deriving the SA-CST a in the intra coded CU and b in the inter coded CU

z = Tsr ,

(1)

where the vector z has the three components in the new color space. The proposed technique uses the two color spaces, i.e., the original RGB color space and the converted color space so that an image block can choose a better color space for coding efficiency. To this aim, the proposed technique uses a binary flag to indicate which color space transform is applied to a CU. If the flag is set to zero, the color transform is bypassed for the CU. Otherwise, the color space transform is used for residual coding. The flag is signaled for each CU in a bitstream and transmitted to a decoder. However, the flag is signaled only if the block contains nonzero DCT/DST coefficients, because the color transform can do nothing with zero coefficients. The decision of the flag is made by ratedistortion optimization at an encoder, shown in Sect. 3.3. 3.2 Sample adaptive color space transform

intra prediction modes. In this case, a prediction mode of a sub-block including the top-left pixel of the CU is used for the sample selection. The number of samples depends on the block size. If N is equal to 8, two lines of samples are used in the selected region (i.e. n=2 in Fig. 2a). Otherwise, if N is larger than 8, one line of samples are used in the selected region. For an inter coding and an intra BC coding, samples in the reference block are used for deriving the color space transform. For example, in Fig. 2b, the CU is partitioned into the two rectangular shapes of prediction units (PU), denoted by PU 0 and PU 1, respectively. The reference samples in each PU are used to train the color transform. The number of training samples is always the power-of-two both in the intra and inter coding cases, facilitating the computation of the transform by use of bit shift operation. In asymmetric motion prediction (AMP) blocks for an inter coding, the number of samples may not be the power-of-two, so the samples are subsampled to be the power-of-two. 3.2.2 Color component analysis We develop the sample adaptive color space transform (SA-CST) using spatially or temporally neighboring color samples. The problem is to find a set of orthonormal basis vector spanning the three color dimension that can reduce the inter-channel correlation. In other words, a distance between a color sample vector and its projection to the new space should be minimal. To this aim, we perform principal component analysis (PCA) of reference color samples. Denote x1 , x2 , . . . , x M to be M training local color sample vectors residing in a threedimensional color space, and denote xi to be the centered vectors, given as,

3.2.1 Selection of local samples The proposed method derives the transform from the spatially or temporally neighboring samples that are previously coded. The training samples are chosen with a luma prediction mode. For an intra coding, we use training samples in a boundary of an N × N CU as shown in a gray region of Fig. 2a. The left and the top regions are selected depending on the prediction direction. Specifically, samples in the left columns are used when the intra prediction directions correspond to some of horizontal modes, ranging from the right-up diagonal mode (i.e., mode 2) to the straight horizontal mode (i.e., mode 10) in HEVC, as shown in Fig. 2a. Similarly, samples in the above rows are used when the prediction directions correspond to some of vertical modes, ranging from the straight vertical mode (i.e., mode 26) to the left-down diagonal mode (i.e., mode 34). Otherwise, all the samples in both gray regions are used. If the CU has the minimum CU size, the luma CU can be divided into four sub-blocks that may have different

xi = xi − m(x),

(2)

M where a mean vector m(x) = M1 i=1 xi . We form a matrix XO = (x1 , x2 , . . . , x M ) including the centered vector as a column vector. Let us develop a covariance matrix CX as CX =

1 XO T XO . M

(3)

The covariance matrix CX can be eigen-decomposed so as to provide the orthonormal eigenvectors e1 , e2 , and e3 , such that CX ei = λi ei ,

(4)

where λi is an eigenvalue and i = 0, 1, 2.

123

SIViP

Fig. 3 Example of the color component analysis using the SA-CST enabled by the PCA. The samples are residues in a 32× 32 block size after the prediction

We define U = (e1 , e2 , e3 ) and λ = 3 i λi where λi is the corresponding eigenvalues of the eigenvectors in decreasing order. λ is used for the normalization. We construct the matrix TPCA = λUT in R3×3 such that UT CX U = ,

(5)

where = diag{λ0 , λ1 , λ2 } and the transform removes the second-order correlation. Figure 3 shows an example of SA-CST from the RGB domain to the transformed domain. In Fig. 3, the original residual samples after an intra prediction and their transformation are shown in the left column and the right column, respectively. The absolute values of the pixels are smaller if they are darker. The residual samples contain many redundancies in the RGB domain, but they are reduced after SA-CST, whose element is represented by ti j in the figure. For example, the letters “H” in the R channel (i = 0) and the G channel (i = 1) are almost the same. A typical RGB video includes considerable amounts of such the redundancies, and they degrade coding performance. As compared, the correlation in the letters is reduced in the right column. We briefly review Independent Component Analysis (ICA) [20] along the lines of TPCA because our experimental result shows a performance comparison. ICA is effectively used for many image processing, but there were few studies in a video coding. The color space transform derived by ICA needs additional transform such as rotation from the transform derived by PCA. ICA may provide better approximation than PCA in some applications [21] at the expense of complexity. 3.3 Transform selection based on RD analysis We adopt PCA to the part of the proposed method as it is effective in the image and video coding applications [22] and requires relatively less complexity than ICA. However, PCA is sometimes suboptimal for two reasons. First, a transform derived from neighboring samples is readily suboptimal if

123

a training sample is significantly different with the current sample. Second, the transform can degrade coding efficiency in non-Gaussian sources. Feng et al. show the same transform can provide a significantly different coding gain about 1.5 dB when it is used for different non-Gaussian sources [19,23]. Screen content videos include a lot of non-Gaussian sources whose correlation is low in adjacent samples. Therefore, a rate-distortion (RD) optimized in-loop transform selection to manage the suboptimality of a transform is developed. The idea is to choose the transform to minimize the Lagrangian cost function as C(T) = {|s − T−1 zq |2 + λ(RC + R H )},

(6)

where the first term represents the distortion as mean squared error (MSE) between the original signal s and quantized coefficient vector zq . The second term represents the bit-rates. RC is the bit-rates computed by the quantized coefficients. R H is the bit-rates of an overhead, which is used for signaling a flag to control the transform. The Lagrangian multiplier λ is the same as the constant used for a mode decision in HEVC. We choose an optimal transform by solving T∗ = arg minT∈T3×3 {C(T)},

(7)

where T3×3 refers an arbitrary set of 3 × 3 transform. An encoder chooses the transform in the set to minimize the cost. The coding efficiency may be improved with a large set of the transforms in (7) if they are complement to one another at the expense of computational complexity. In the proposed method, we include TPCA and the identity matrix in the set, considering the trade-off between coding efficiency and computational complexity. We will evaluate coding performance when TPCA is replaced with TICA in Sect. 4 as aforementioned. We show distributions of inputs to an entropy coder to reveal why the proposed method is efficient in a HEVC framework. The HEVC main profile encodes one luma component (e.g., Y) and two chroma components (e.g., Cb/Cr). In RGB video coding, a G component is coded as a luma component, and B/R components are coded as chroma components. Accordingly, the luma and chroma coefficients correspond to coefficients of the G and the B/R components, respectively. Fig. 4a shows a distribution of coefficients when the proposed method is disabled in a residual coding. Note that the distributions of the chroma components have a long tail, opposing the scenario into which the HEVC entropy encoder is fitted. In contrast, the distribution of the chroma coefficients obtained from the proposed method is shown in Fig. 4b. The color space transform provides a compact distribution of chroma coefficients, and therefore, it is more tailored to the HEVC entropy coder. Accordingly, the percentage of the block using the color transform is around 17.4 % in “Pro-

SIViP 0.7

0.7 Luma Coeff. Chroma Coeff. 0.6

0.5

0.5 Probability

Probability

Luma Coeff. Chroma Coeff. 0.6

0.4 0.3 0.2

0.3 0.2

0.1 0

0.4

0.1

−150

−100 −50 Transformed Coefficients

0

50

0 −80

−60

−40 −20 0 20 Transformed Coefficients

(a)

40

60

(b)

Fig. 4 Distributions of coefficients a when the color space transform is not applied and b when the color space transform is applied. The samples are from “Programming” sequence

gramming” sequence to improve the coding performance. The behaviors of the distributions are observed, and a considerable number of the blocks are dealt with the transform in other screen content videos though we show an example from “Programming” in Fig. 4a and b.

4 Experimental results We show the coding efficiency of the proposed method with extensive experimental results in this section. The proposed method is implemented to the SCM1.0 reference software [24]. Our previous work in [1] is also ported to the same software for a comparison. The reference software includes both CCP and IBC, which are used for the experiments. The coding tools such as RDPCM and transform skip are also turned on as specified in the common test condition [25]. The

test sequences are 1080p (“Mission control clip3,” “Social Network,” “Console,” “Flying Graphics,” and “Desktop”) and 720p HD (“ppt and xls,” “Map,” “Programming,” “Slide show,” “Viking,” “Robot,” and “Web browsing”) screen content videos. The coding performance of the proposed method is compared with the SCM1.0 that is a reference software in HEVC SCC, Zhang’s method [3], and our previous work in [1]. Note Zhang’s method is in the part of the current HEVC SCC. In the comparison, a negative value in the BD-rate saving tells a positive bit-rate reduction. We show BD-rates of the three color components separately. In [3], the three color components are coded with different QP offsets, and the Lagrangian parameters are modified as an encoder optimization. The proposed method outperforms all the tested methods in the RD performance of G components, as shown in Table 1. The BD-rate saving of the proposed method in the AI configuration is about 17.5 %. In the RA configuration, the BD-rate saving is about 16.4 %, and in the LB configuration, the BDrate saving is about 16.4 %. In Zhang’s method, YCgCo color space conversion is used as an F-CST. The coding gain over Zhang’s method is about 0.6 % in the AI, RA, and LB configuration. The coding efficiency over Zhang’s method is not very significant. However, Zhang’s method uses QP offsets allocating more bits to the G component as an encoder-only optimization. In our previous work [1], KLT is applied to all the blocks in a frame, while the proposed method uses a block-based color space transform, adapted to local samples. As shown in Table 1, the coding performance in [1] is not significant over the SCM1.0 reference software. This is caused

Table 1 BD-rate reduction of the proposed method vs. SCM1.0 [3] and [1] in All Intra (AI), Random Access (RA), and Low Delay (LD) configuration Seq. Name

vs. SCM1.0

vs. Zhang’s method [3]

vs. Previous work in [1]

AI (%)

RA (%)

LD (%)

AI (%)

RA (%)

LD (%)

AI (%)

RA (%)

LD (%)

MissionControlClip3

−18.1

−18.8

−19.3

−1.9

−1.8

−0.1

−19.6

−19.1

−18.9

Desktop

−11.8

−7.8

−6.2

−0.5

−0.2

−0.1

−26.4

−25.0

−24.2

Console

−10.5

−8.0

−7.6

+2.4

+1.1

−1.2

−28.2

−27.9

−27.8

Social network

−16.1

−18.9

−18.6

+0.7

+1.3

+3.4

−19.8

−17.8

−17.5

Flying graphics

−14.8

−12.9

−11.3

+0.5

+0.7

+1.6

−19.2

−18.4

−18.1

ppt and xls

−11.9

−9.3

−9.5

−1.3

−1.5

−1.8

−13.1

−12.6

−12.5

Map

−16.9

−18.2

−17.8

−0.6

−0.5

−0.9

−7.9

−7.4

−7.0

Programming

−17.3

−20.5

−20.9

+0.8

+1.1

+1.1

−17.4

−19.6

−21.3

Slide show

−25.7

−23.6

−23.2

−4.4

−4.0

−3.5

−4.5

−5.8

−6.1

Viking

−25.2

−22.1

−22.3

−1.9

−1.0

−0.9

−7.1

−6.0

−5.8

Robot

−26.3

−22.6

−23.9

−2.1

−1.9

−0.6

−10.8

−8.9

−8.2

Web browsing

−15.0

−15.1

−16.3

+0.4

−1.0

−2.5

−10.2

−9.2

−9.0

Average BD-rate

−17.5

−16.4

−16.4

−0.6

−0.6

−0.5

−15.4

−14.0

−14.7

Encoding time

130

134

134

104

105

105

127

132

130

Decoding time

121

121

120

117

117

118

102

102

103

123

SIViP Table 2 BD-rate (in the unit of %) reduction of the proposed method versus SCM1.0 as the anchor when intra BC [26] or CCP [10] is disabled Seq. Name

Prop. vs. Anchor

Intra BC Off (Prop. vs. Anchor)

CCP Off (Prop. vs. Anchor)

AI (%)

RA (%)

LD (%)

AI (%)

RA (%)

LD (%)

AI (%)

RA (%)

LD (%)

MissionControlClip3

−15.8

−18.1

−17.4

−17.2

−18.5

−17.7

−28.2

−26.3

−25.9

Desktop

−9.6

−5.8

−5.4

−11.2

−6.3

−5.9

−12.5

−10.0

−7.3

Console

−8.4

−6.0

−4.1

−9.8

−6.2

−4.4

−10.1

−6.7

−4.8

Social network

−16.2

−19.0

−18.9

−17.3

−19.1

−19.2

−22.7

−22.5

−22.9

Flying graphics

−14.9

−12.3

−11.5

−16.4

−12.5

−11.6

−23.9

−20.1

−19.8

ppt and xls

−10.3

−8.3

−8.0

−10.8

−8.7

−8.3

−12.5

−9.3

−8.9

Map

−14.5

−15.1

−15.2

−15.6

−15.3

−15.4

−20.3

−20.7

−21.3

Programming

−16.3

−17.0

−21.4

−16.7

−17.5

−21.8

−24.4

−25.0

−24.3

Slide show

−23.4

−23.1

−21.7

−23.8

−23.3

−22.0

−23.8

−23.3

−22.0

Viking

−24.6

−22.8

−19.7

−24.7

−22.8

−20.3

−24.7

−22.8

−20.3

Robot

−25.1

−23.7

−20.2

−25.2

−24.1

−20.4

−25.2

−24.1

−20.4

Web browsing

−13.8

−12.8

−15.3

−14.6

−13.5

−15.7

−14.6

−13.5

−15.7

Average BD-rate

−16.2

−15.3

−15.0

−16.9

−15.7

−15.2

−23.2

−21.4

−20.7

Encoding time

130

134

134

134

135

135

133

132

132

Decoding time

121

121

120

122

120

121

121

122

121

Table 3 BD-rate (in the unit of %) increments of the ICA used for the SA-CST as compared to the PCA Seq. Name

AI (%)

RA (%)

LD (%)

MissionControlClip3

1.2

0.6

0.5

Desktop

1.4

0.6

0.5

Console

1.4

0.6

0.6

Social network

0.3

0.1

0.1

Flying graphics

0.4

0.2

0.2

ppt and xls

0.5

0.2

0.1

Map

1.4

0.6

0.7

Programming

1.5

0.6

0.6

Slide show

1.2

0.5

0.5

Viking

1.9

0.9

0.8

Robot

1.7

0.8

0.7

Web browsing

0.8

0.4

0.3

Encoding time

104

103

104

Decoding time

94

95

95

Note the tested method is the ICA, and the reference is the PCA

by the suboptimality of the KLT in non-Gaussian sources and other competing coding tools such as CCP in the reference software, exploiting the same inter-component redundancies. As compared to the previous work, the proposed method can adaptively select the color space transform method and provides a significant coding gain over the previous work. As for the complexity assessments, the proposed method shows the similar encoding time to Zhang’s method, because they perform the RD optimization in every coding unit. The measurement times of [1] and SCM1.0 are lower than that

123

of the proposed technique because they use a single color space for a coding. However, the PCA-based transform does not affect the encoding time significantly when observing the measurement times in Zhang’s method and the proposed technique. The decoding time in Zhangs method is lower than the proposed method because the PCA-based transform needs higher computational complexity than the fixed transform. However, the decoding time is similar to [1] because the proposed technique and [1] use the same PCA-based transform. IBC is a key coding tool providing a significant coding gain in HEVC SCC [26], and CCP is designed to reduce the inter-component correlation [10] as in the proposed method. Thus, we investigate how the coding tools affect the RD performance of the proposed method. Specifically, we disable IBC and CCP one by one and figure out a change of the coding performance of the proposed method. Average PSNR is used for computing a BD-rate reduction in the test. The coding performance of the proposed method over SCM1.0 is about 16.2, 15.3, and 15.0 %, respectively, in the AI, RA, and LD configurations as shown in the first test in Table 2. When IBC is disabled in the experiments, the coding performance is changed to 16.9, 15.7, and 15.2 %. These results imply that the proposed method provides an almost additive coding gain to IBC. However, when CCP is disabled, the coding gain is greatly improved about 5–7 % as shown in the last test results. The color space transform and CCP commonly employ inter-component correlation, and thus, the coding gain about 5–7 % disappear when they are used at the same time. Similar results are observed when the other color transforms are used under CCP on/off conditions.

SIViP

Fig. 5 Grid implies the quantization applied to the source depicted by dots. The basis is changed with the transform derived by the PCA and the ICA, respectively

The coding efficiency of PCA over an alternative color component analysis such as ICA is evaluated. The coding efficiency of ICA is summarized in Table 3. The tested method is ICA, and the anchor is PCA in the BD-rate calculation. Thus, the positive numbers represent a coding loss. ICA may adapt to orientations of color axes with the anisotropy property of its basis function. However, the experimental

results are against the intuition. We give an explanation for this result. The source density is illustrated above in Fig. 5 in a 2D grid. The two figures in the left-hand side below the original signal represent the densities of the transformed coefficients, and the grid indicates the cell boundaries of a uniform scalar quantizer applied to the densities. In the righthand side, we show the reconstructed sources returned to the original coordinates and the quantization partition linearly transformed. As can be seen, the grids are aligned to the partition after the inverse PCA, and they are still orthogonal. That is, the metric is preserved. In contrast, the shape of the grid in the ICA becomes a parallelogram so that the unit can be more compact to the anisotropic properties of the source density. However, for a given unit, the quality is dependent on the shape that minimize the average distance to the center, and thus, the parallelogram other than the square suffers from the cell shape in the inverse transform and inverse quantization (i.e., |x1 |×|x2 | is minimized when the grids are orthogonal for a unit size of a parallelogram). Thus, the PCA provides the better trade-off between reducing the color correlation and the minimizing the reconstruction error [27]. For the computational complexity, the ICA-based method requires more encoding time because it needs extra steps. We compare the perceptual quality provided by the proposed method and the anchor software as shown in Fig. 6. The bit-rates used for the proposed method are slightly smaller than those for the anchor in the comparisons. The proposed method yields much better perceptual quality as shown in the figures. For example, the textures in a floor is severely blurred in Fig. 6b while the proposed method reconstructs many details. In Fig. 6d–f, the “Map” contents include several paths between the “Onex” street and “Petit Lancy” street. However, in Fig. 6e, the paths are blurred while it is well kept in the proposed method in Fig. 6f.

Fig. 6 Subjective quality comparison of reconstructed images in “Robot” sequence in similar bit-rates range: a the original video frame, b conventional method with the anchor software, and c the proposed method

123

SIViP

5 Conclusion We proposed the adaptive in-loop color space transform, applied to inter and intra coded residual signals, for reducing inter-color-component correlation for an efficient screen content video coding. The transform is derived by the principle component analysis (PCA) with using the spatial or temporal neighboring samples. Then, the color space transform is used for a residual coding. The proposed method provides significantly improved coding efficiency over the reference software in the screen contents video coding standard.

References 1. Kawamura, K., Yoshino, T., Naito, S.: RCE1: results of in-loop color-space transformation of residual signals, Document JCTVCM0411 . In: ISO/IEC/JTC1/SC29/WG11 and ITU-T SG16 Q.6 (2013) 2. Malvar, H., Sullivan, G.: YCoCg-R: a color space with RGB reversibility and low dynamic range, Document JVT-I014. In: ISO/IEC/JTC1/SC29/WG11 and ITU-T SG16 Q.6 (2003) 3. Zhang, L., Chen, J., Sole, J., Karczewicz, M.: In-loop color-space transform, Document JCTVC-Q0112 . In: ISO/IEC/JTC1/SC29/WG11 and ITU-T SG16 Q.6 (2014) 4. Kim, H.M., Kim, W.-S., Cho, D.: A new color transform for RGB coding. In: Proceedings of IEEE International Conference on Image Processing (ICIP), pp. 107–111 (2004) 5. Marpe, D., Kirchhoffer, H., George, V., Kauff, P., Wiegand, T.: An adaptive color transform approach and its application in 4:4:4 video coding. In: Proceedings of EUSIPCO, pp. 2005–2008 (2006) 6. Khan, A., Khan, A.: Lossless colour image compression using rct for bi-level bwca. Signal Image Video Process. 10, 601–607 (2016) 7. Kawamura, K., Yoshino, T., Naito, S.: In-loop color-space transformation of residual signals for range extensions, Document JCTVC-L0371.: In: ISO/IEC/JTC1/SC29/WG11 and ITU-T SG16 Q.6 (2013) 8. Sullivan, G., Ohm, J., Han, W.-J., Wiegand, T.: Overview of the high efficiency video coding standard. IEEE Trans. Circuits Syst. Video Tech. 22(12), 1649–1668 (2012) 9. Joshi, R., Liu, S., Xu, J., Ye, Y.: HEVC Screen Content Coding Draft Text 3, Document JCTVC-T1005. In: ISO/IEC/JTC1/SC29/WG11 and ITU-T SG16 Q.6 (2015) 10. Pu, W., Kim, W.-S., Chen, J., Raspaka, K., Guo, L., Sole, J., Karczewicz, M. Inter color component residual prediction, Document JCTVC-N0266. In: ISO/IEC/JTC1/SC29/WG11 and ITU-T SG16 Q.6 (2013) 11. Kwon, D., Budagavi, M.: RCE3: Intra motion compensation, Document JCTVC-N0205 (2013)

123

12. Gunes, A., Kalkan, H., Durmus, E.: Optimizing the color-tograyscale conversion for image classification. Signal, Image Video Process. 10, 853–860 (2016) 13. Sowmya, V., Govind, D., Soman, K.P.: Significance of incorporating chrominance information for effective color-to-grayscale image conversion. Signal, Image Video Process. (on-line) 14. Sullivan, G., Topiwala, P., Luthra, A.: The H.264/AVC advanced video coding standard: overview and introduction to the fidelity range extension. In: Proceedings of SPIE on Applications of Digital Image Processing XXVII. 2004, pp. 107–111 (2004) 15. Zhang, L., Chen, J., Sole, J., Karczewicz, M., Xiu, X., Xu, J.Z.: Adaptive Color-Space Transform for HEVC Screen Content Coding. In: Data Compression Conference (2015) 16. Bordes, P., Andrivon, P.: Content-Adaptive Color Transform For HEVC. In: Picture Coding Symposium 2013 (2013) 17. Kawamura, K., Kato, H., Naito, S.: In-loop colour-space-transform coding based on integered SVD for HEVC range extensions. In: Proceedings of Picture Coding Symposium, pp. 241–244 (2013) 18. Sullivan, G., Boyce, J.M., Chen, Y., Ohm, J.R., Segall, C.A., Vetro, A.: Standardized extensions of high efficiency video coding (HEVC). IEEE J. Select. Top. Signal Process. 7(6), 1001–1016 (2013) 19. Effros, M., Feng, H., Zeger, K.: Suboptimality of the KarhunenLoeve transform for transform coding. IEEE Trans. Inf. Theory 50(8), 1605–1619 (2004) 20. Hyvrinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13(4–5), 411–430 (2000) 21. Buciu, I., Kotropoulos, C., Pitas, I.: Comparison of ica approaches for facial expression recognition. Signal Image Video Process. 3, 345–361 (2009) 22. Feng, H., Effros, M.: On the rate–distortion performance and computational efficiency of the K–L transform for lossy data compression. ieee trans. image process. 11(2), 113–122 (2002) 23. Majumdar, A.: Image compression by sparse pca coding in curvelet domain. Signal Image Video Process. 3, 27–34 (2009) 24. Joshi, R., Xu, J., Cohen, R., Liu, S., Ma, Z., Ye, Y.: Screen content coding test model 1 (SCM 1), Document JCTVC-Q1014 . In: ISO/IEC/JTC1/SC29/WG11 and ITU-T SG16 Q.6 (2014) 25. Rosewarne, C., Sharman, K., Flynn, D.: Common test conditions and software reference configurations for HEVC range extensions, Document JCTVC-P1006. In: ISO/IEC/JTC1/SC29/WG11 and ITU-T SG16 Q.6 (2014) 26. Pang, C., Sole, J., Guo, L., Karczewicz, M., Joshi, R.: Intra Motion Compensation with 2-D MVs, Document JCTVC-N0256. In: ISO/IEC/JTC1/SC29/WG11 and ITU-T SG16 Q.6 (2013) 27. Goyal, V.K.: Theoretical foundations of transform coding. Signal Process. Mag. IEEE 18(5), 921 (2001)

Sample adaptive color space transform for screen content video coding

Recommend Documents