Vol.25 No.6
JOURNAL OF ELECTRONICS (CHINA)
November 2008
CODING-ORIENTED MULTI-VIEW VIDEO COLOR CORRECTION1 Shao Feng*
Jiang Gangyi* **
Yu Mei* **
Chen Xiexiong***
*
(Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China) (National Key Lab of Software New Technology, Nanjing University, Nanjing 210093, China) *** (Department of Information and Electronics Engineering, Zhejiang University, Hangzhou 310027, China) **
Abstract Color inconsistency between views is an important problem to be solved in multi-view video applications, such as free viewpoint television and other three-dimensional video systems. In this paper, by combining with multi-view video coding, a coding-oriented multi-view video color correction method is proposed. We first separate foreground and background in first Group Of Pictures (GOP) by using SKIP coding mode. Then by transferring means and standard deviations in backgrounds, color correction is performed for each frame in GOP, and multi-view video coding is performed and used to renew the backgrounds. Experimental results show the proposed method can obtain better performances in color correction and multi-view video coding. Key words
Multi-view video; Color correction; Multi-view video coding; Background
CLC index
TN 919.81
DOI 10.1007/s11767-008-0017-8
I. Introduction Multi-view video is a new type of natural video media that expands the user’s sensation far beyond what is offered by traditional media[1]. The user can choose an own viewpoint and viewing direction within a visual scene, to create an interactive free viewpoint experience. Although multi-view video is capable to provide an exciting viewing experience, it is challenging to put it into practical applications. Usually, to provide a smooth multiperspective viewing experience, content producers need to capture the same scene with ideal quality from multiple viewpoints. With extremely large data required, transmission of the multi-view video requires much larger bandwidth than traditional video. To reduce the redundancy of multi-view video, Multi-view Video Coding (MVC) is identi1
Manuscript received date: February 1, 2008; revised date: June 6, 2008. Supported by the National Natural Science Foundation of China (No.60672073, No.60872094), the Program for New Century Excellent Talents in University (NCET-06-0537), Scientific Research Fund of Zhejiang Provincial Education Department (No.20070962), and the Natural Science Foundation of Ningbo (No.2008A610016). Communication author: Jiang Gangyi, born in 1964, male, Professor. Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China. Email:
[email protected].
fied as one of the most challenging issues associated with such new applications as Free Viewpoint Television (FTV)[2]. Although standard geometric calibration methods exist for calibrating array of cameras[3], much less attention has been paid to color correction for multiple cameras. In practical imaging, camera parameters in multi-camera capturing system may be inconsistent, and exposure or focus may be variable for different views. The heterogeneous cameras may lead to global or local mismatches across different views when disparity estimation and compensation is performed at encoder of FTV. In addition, it is often impossible to capture an object under perfectly constant lighting conditions at different spatial positions within an imaging environment. These variations provide serious challenge for realization of FTV, and degrade the performance of subsequent MVC or virtual view synthesis. In this paper, a coding-oriented multi-view video color correction method is proposed. The rest of the paper is organized as follows. First, previous work is reviewed in Section II. Then, the proposed coding-oriented multi-view video color correction method is described in Section III and experimental results are shown in Section IV. Finally, conclusions and future work are given.
722
II.
JOURNAL OF ELECTRONICS (CHINA), Vol.25 No.6, November 2008
Previous Work
For Charge Coupled Device (CCD) or Complementary Metal Oxide Semiconductor (CMOS) sensor camera arrays, it is difficult to guarantee color consistency for all cameras when capture the same objects. Color inconsistency means that the same point on an object may have different color when observed from different viewpoints. To eliminate the color inconsistency, some color correction methods were proposed. Yamamoto, et al. proposed to obtain color correspondence map with color pattern board[4] or without color pattern board[5] (detecting the correspondences by scale invariant feature transform). Chen, et al. presented to perform luminance and chrominance correction using simplified color error model[6]. Fecker, et al. used histogram matching to compensate luminance and chrominance variations in a pre-filtering step[7]. Shao, et al. proposed a content-adaptive color correction method[8]. While as a pre-processing step, these available methods are all independent of MVC, and the influence of color inconsistency to MVC has not been researched deeply. However, when illumination changes occur between views, the displacement motion vector and disparity vector in MVC cannot be accurately estimated, so that the corresponding predictive error will be increased and, consequently, coding efficiency will decrease. Some Illumination Compensation (IC) methods with weighted prediction technique were proposed to reduce the influence of color inconsistency[9–11]. Fundamentally, weighted prediction has already existed in H.264/AVC (Advanced Video Coding), which supports a multiplicative weighting factor and an additive offset[9]. Also, the 2D Lloyd max algorithm was used to efficiently represent the scale and offset factors[10]. Recently, macroblock-based adaptive illumination change compensation method was proposed[11], and used to Joint Multi-view Video Model (JMVM)[12] in MVC.
III. Coding-oriented Multi-view Video Color Correction Method The current JMVM proposes illumination change-adaptive motion compensation and prediction structures with Hierarchical Bi-predictive Picture (HBP), as shown in Fig.1[13]. The first
pictures of a sequence with I-coded or P-coded are called key pictures (white in Fig.1) and coded in regular intervals. The views with key pictures are coded using simulcast prediction structure with hierarchical B pictures for temporal prediction only. The remaining pictures in one GOP are coded with hierarchical B pictures for temporal and spatial prediction simultaneously.
Fig.1 HBP prediction structure
As we have known, a color image is the result of a complex reflection between three components of optical properties of scene, illumination source and sensor. Usually, Lambertian reflectance assumption is not always satisfied in the scene. Under the Lambertian reflectance assumption, the illuminated region of the surface emits the entire light equally in all direction. Furthermore, whenever RGB reflectances are measured, the measurements are valid only for the particular spectral power distribution of the light source. In addition, if multiple cameras are used, variations between their filter sets should be compensated. Therefore, it is necessary to acquire a uniform reference surface which is inserted in place of the actual sample to be viewed. There are nine macroblock coding modes in JMVM codec, that is, SKIP, Motion SKIP, 16×16, 16×8, 8×16, 8×8, Intra16, Intra8, and Intra4. An important feature of SKIP coding mode is that
SHAO et al. Coding-oriented Multi-view Video Color Correction
motion vector can be predicted from the motion of neighboring macroblocks and quantized transform coefficients are all equal to zero. Based on the important feature of SKIP coding mode, the macroblock with SKIP coding mode is located in static region, regarded as background region in video. Therefore, SKIP coding mode can provide an important clue in foreground-background separation. In order to separate the background and foreground from a view, a probabilistic model based on a Hidden Markov Model (HMM) is used. With temporal continuity constraints, once a macroblock is inferred to be in a background region, it is expected to be within a background region for some time. The coding mode over time for one specific macroblock is modeled as an HMM. Let Tmode denote one of the nine coding modes for one macroblock, P(A|B) denotes the conditional probability that coding mode A will be selected given that the coding mode B has been selected already and δ (·) denote impulse response function. The initial model parameter of the HMM is defined as v, ⎧⎪1, if Tmode = SKIP v = ⎪⎨ ⎪⎪0, otherwise ⎩ and the HMM between adjoining frames is expressed as
P (T t +k +1 | T t +k ) = δ(T t +k +1,T t +k )
(1)
where t denotes the location of key frame in video sequence, and k denotes P-coded or B-coded frame in one GOP, since SKIP coding mode only exists in these frames Then the final coding state Pn for one macroblock throughout one GOP can be written as Pn = v ∏ P (T t +k +1 | T t +k )
(2)
k
where v denotes initial coding state for one macroblock. If Pn=1, the corresponding macroblock belongs to background, otherwise, it is considered as foreground. After the initial foreground-background separation, some macroblocks in foreground or background may be isolated. A state smoothing operation is performed to obtain continuous background contour, and described as
723
Pn (i, j ) =
1 M
′
∑
Pn (i ′ , j ′ )
(3)
′
(i , j )∈N (i , j )
where N(i, j) denotes the set of the neighborhood macroblocks of the macroblock (i, j), M is the total number of neighborhood macroblocks. Let T1 and T2 be two thresholds to represent coding state. If Pn(i, j) ≥ T1, the current macroblock belongs to background. If Pn(i, j) ≤ T2, the current macroblock is considered as foreground. Here, 4-neighborhood is used, thereupon, T1 and T2 are set as 0.75 and 0.25, respectively. It should be noted that MVC and foreground-background separation are performed synchronously. The coding state in Eq.(2) is updated continuously during coding in one GOP. Once the coding is finished, the background information for each view can also be obtained. While color correction is a pre-processing operation before coding, the backgrounds in the previous GOP are used as uniform reference planes. The mean μi and standard deviation σi in background for one view can be expressed as μi = σi =
∑
(x ,y )∈Bl −1
∑
(x ,y )∈B
l −1
I i (x , y )
∑
(4)
1
(x ,y )∈Bl −1
(I i (x , y ) − μi )2
∑
1
(5)
(x ,y )∈Bl −1
where l denotes the number of GOP, and Bl–1 denotes background in the previous GOP. By transferring means and standard deviations in backgrounds from reference view to current view, the corrected result for each frame in current view is described as[14]
I icorr (x , y ) =
σiref cur (I i (x , y ) − μicur ) + μiref cur σi
(6)
where μiref and μicur are means of reference view and current view in backgrounds, respectively. σiref and σicur are standard deviations of reference view and current view in backgrounds, respectively. Iicur(x, y) is color value of current view in (x, y) and Iicour(x, y) is its corresponding corrected color value. From the mentioned above, the processing of proposed coding-oriented multi-view video color correction method as illustrated in Fig.2, which can be carried out by the following steps: Step 1 For the first GOP in multi-view video, MVC and foreground-background separation are
724
JOURNAL OF ELECTRONICS (CHINA), Vol.25 No.6, November 2008
performed. Step 2 For the next GOP, by using the backgrounds in the previous GOP as uniform reference planes, color correction is first performed, and then MVC and foreground-background separation are performed. Step 3 Repeat the same operations with Step 2 until finishing the coding.
that the moving objects can be effectively excluded from the background. While the foregroundbackground separation is related to baseQP, with higher quantization strength, more quantized transform coefficients are zero, and more SKIP coding modes are used.
Fig.3 Foreground-background separation results of ‘flamenco1’ and ‘golf2’ in baseQP=22
Fig.2 The processing flow of the proposed method
IV. Experimental Results and Analyses We select two representative multi-view video sequences, ‘flamenco1’ and ‘golf2’ (320×240, 4:2:0 YUV format, 8 viewpoints)[15], for evaluating the performance of our proposed method. We implement our method on JMVM 4.0. JMVM is the MVC reference software used at the MPEG/JVT meeting. The main coding parameters are set as following: four quantization strengths (baseQP=22, 27, 32, 37) are used, the temporal GOP size is 15, total number of encoded frames for each view is 620 and IC function is optional with IC-On, and IC-Off in the configuration file of JMVM codec, respectively. The first view with I-coded frames located is regarded as reference view, and the others are regarded as current views. The first experiment tests the performance of the proposed foreground-background separation method. Fig.3 shows the foreground-background separation results for the original first and second viewpoint images of ‘flamenco1’ and ‘golf2’ in baseQP=22. The regions marked in cross box denote the background. From the figure, it is clear
In the second experiment, we compare the rate-distortion performances of JMVM codec with IC-On, JMVM codec with IC-Off and the proposed method, as shown in Figs.4 and 5. For ‘flamenco1’, for luminance component Y, under the same bit rate, JMVM codec with IC-On can achieve 0.1dB Peak Signal to Noise Ratio (PSNR) gain compared with JMVM codec with IC-Off and almost consistent performance for chrominance component U and V. While the proposed method can achieve 0.2~0.3dB and 0.25dB PSNR gains for U and V components, respectively. For ‘golf2’, even though the coding performances are almost consistent for Y component, 0.6~0.7dB and 0.3dB PSNR gains can be achieved with the proposed method for U and V components, respectively. It is obvious that IC can mainly improve the coding performance for Y component, while the proposed method can achieve higher coding performances for U and V components simultaneously. Then we show the decoded images of ‘flamenco1’ at the 570-th frame and ‘golf2’ at the 260-th frame, respectively. Figs.6(a) and 7(a) show the decoded first viewpoint images. Figs.6(b) and 7(b) show the decoded second viewpoint image of JMVM codec with IC-Off. Figs.6(c) and 7(c) show the decoded second viewpoint images of JMVM
SHAO et al. Coding-oriented Multi-view Video Color Correction
725
codec with IC-On. Figs.6(d) and 7(d) show the decoded second viewpoint images of the proposed method. It is clear that IC only eliminates the difference in motion vector estimation and disparity vector estimation, while the reconstructed images are still color-inconsistent. This color-inconsistent characteristic is not conducive to following virtual view synthesis or 3D display in FTV.
Fig.5 Rate-distortion performance comparison results of ‘golf2’
Fig.4 Rate-distortion performance comparison results of ‘flamenco1’
Finally, in order to objectively evaluate the quality of color consistency between decoded images, the color differences are calculated. Scale Invariant Feature Transform (SIFT) is first performed to accurately extract matching keypoints between two images. Then CIEDE2000 color differences between each matching keypoints are
726
JOURNAL OF ELECTRONICS (CHINA), Vol.25 No.6, November 2008
calculated. Fig.8 shows the color difference comparison results for the first and second decoded viewpoint images in Figs.6 and 7. The average color differences of three coding methods are 11.7854, 11.8056, 5.9127 for ‘flamenco1’ and 6.6762, 6.8638, 4.3612 for ‘golf2’, respectively. It is obvious that the proposed method can achieve lowest overall color difference, which is consistent with the subjective color appearance evaluation in Figs.6 and 7.
Fig.6 Decoded images of ‘flamenco1’ in baseQP=22
Fig.8 Color difference comparison results of three coding methods for ‘flamenco1’ and ‘golf2’
V.
Conclusions
Color correction is an important issue for MVC and virtual view synthesis in FTV. In this paper, a coding-oriented multi-view video color correction is proposed. SKIP coding mode is used to separate foreground and background, and color correction is first performed before coding. Experimental results show the effectiveness of the proposed method in color correction and MVC. However, the SKIP coding mode is dependent of baseQP. In future work, we will do further research on how to introduce other clues in foreground-background separation. Besides, virtual view synthesis between decoded images is also worth study.
References Fig.7 Decoded images of ‘golf2’ in baseQP=22
[1]
Jiang-Guang Lou, Hua Cai, and Jiang Li. A real-time
SHAO et al. Coding-oriented Multi-view Video Color Correction
[2]
[3]
[4]
[5]
[6]
[7]
[8]
interactive multi-view view video system. The 13th ACM International Conference on Multimedia, Singapore, November 2005, 161–170. JVT of ISO/IEC MPEG & ITU-T VCEG. Prediction structure for free viewpoint TV. Doc. JVT-V097, Marrakech, Morocco, January 2007. Z. Y. Zhang. A flexible new technique for camera calibration. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(2002)11, 1330–1334. K. Yamamoto, T. Yendo, T. Fujii, et al. Colour correction for multiple-camera system by using correspondences. The Journal of the Institute of Image Information and Television Engineers, 61(2007)2, 213–222. K. Yamamoto, M. Kitahara, H. Kimata, et al. Multiview video coding using view interpolation and color correction. IEEE Trans. on Circuits and Systems for Video Technology, 17(2007)11, 1436–1449. Y. Chen, J. Chen, and C. Cai. Ni luminance and chrominance correction for multi-view video using simplified color error model. In Proc. Picture Coding Symposium, Beijing, China, April 2006. U. Fecker, M. Barkowsky, and A. Kaup. Improving the prediction efficiency for multi-view video using histogram matching. In Proc. Picture Coding Symposium, Beijing, China, April 2006. Feng Shao, Gangyi Jiang, Mei Yu, et al. A content-adaptive multi-view video color correction algorithm. In Proc. IEEE International Conference on
727
[9]
[10]
[11]
[12]
[13]
[14]
[15]
Acoustics, Speech, and Signal Processing, Honolulu, Hawaii, April 2007, 969–972. T. Wiegand, G. J. Sullivan, G. Bjntegaard, et al. Overview of the H.264/AVC video coding standard. IEEE Trans. on Circuits and Systems for Video Technology, 13(2003)7, 560–576. MPEG of ISO/IEC JTC1/SC29/WG11, Results on CE2 using IBDE for multi-view video coding. Doc. M13543, Klagenfurt, Austria, July 2006. J. H. Hur, S. Cho, and Y. L. Lee. Adaptive local illumination change compensation method for H.264/AVC-based multi-view video coding. IEEE Trans. on Circuits and Systems for Video Technology, 17(2007)11, 1496–1505. JVT of ISO/IEC JTC1/SC29/WG11, AHG Report: JMVM & JD text editing. Doc. JVT-W013, San Jose, USA, April 2007. H. Schwarz, D. Marpe, and T. Wiegand. Analysis of hierarchical B picture and MCTF. In Proc. IEEE International Conference on Multimedia and Expo, Toronto, Canada, July 2006, 1929–1932. Y. Chang, S. Saito, and M. Nakajima. Example-based color transformation of image and video using basic color categories. IEEE Trans. on Image Processing, 16(2007)2, 329–336. MPEG of ISO/IEC JTC1/SC29/WG11, KDDI multi-view video sequences for MPEG 3DAV use. Doc. M10533, Munich, Germany, March 2004.