Multimed Tools Appl DOI 10.1007/s11042-013-1791-3
Fast 3D face reconstruction based on uncalibrated photometric stereo Yujuan Sun & Junyu Dong & Muwei Jian & Lin Qi
# Springer Science+Business Media New York 2013
Abstract This paper proposes a fast algorithm for three-dimensional face reconstruction using uncalibrated Photometric Stereo. With a reference face model, lighting parameters are estimated from input face images lighted by unknown illumination, which can be used in classical photometric stereo to estimate surface normal and albedo. The estimated results are used in turn to refine the lighting parameters until an optimal estimation of the surface normal is achieved. Differing from traditional optimization algorithms, the iteration method used in this paper is a unified process thus results accurate lighting estimation. The proposed method relaxes lighting constraints and simplifies the image acquisition procedure. The reconstructed results tested on YaleB and BU3D databases show the effectiveness of our method. Keywords Photometric stereo . Face surface normal . Face albedo . Lambertian model
1 Introduction Three-dimensional (3D) face reconstruction is one of the most active research fields in computer vision. The geometrics and reflectance characteristics of the reconstructed human face are illumination invariant. With specularity and shading removed, the reconstructed 3D results are robust features of human faces, which can be used in practical applications [29]. However, classical photometric stereo (PS) requires many constraints to reconstruct the 3D shape and albedo of an object. For example, ref. [18, 27] reconstructed the 3D shape of an object from several input images, which were captured under a fixed camera and accurately recorded lighting directions. Recording the positions of light sources cannot always be easily achieved as in a laboratory, e.g. in a conference room or restaurant, lighting conditions are complex. Ref. [3, 8] overcame this drawback by using matrix decomposition and an Y. Sun : J. Dong (*) : L. Qi Department of Computer Science and Technology, Ocean University of China, Qingdao, China e-mail:
[email protected] Y. Sun Department of Information and Electrical Engineering, Ludong University, Yantai, China M. Jian Centre for Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Multimed Tools Appl
optimization algorithm to estimate surface normal of an object based on uncalibrated PS. However, the running time of optimization algorithms is much longer than PS. In this paper, a fast 3D face reconstruction method based on uncalibrated PS is presented. First, PS is used to calculate face surface normal and albedo for building the reference face model, which is used to estimate the lighting parameters of input images. Then PS is combined with an optimization algorithm to iteratively reconstruct the 3D shape using input face images with unknown lighting parameters. Our iteration method is different from the commonly used optimization algorithms [3, 8, 13], which employed the alternative optimization methods to estimate light, shape and albedo of the input image. In [3, 8, 13], all parameters of light, albedo and shape need to go through the optimization process to be estimated iteratively; thus, the computation cost is high. In our approach, iteration is performed only for estimating the lighting parameters by solving the over-constrained linear equations to simplify the computation. Then, without going through another optimization process to estimate surface normal as in [13], our method can immediately obtain the surface normal by solving the inverse matrix of lighting parameters. Since there are only a small number of lighting parameters (compared with those of albedo or shape), the computation complexity is very low. In order to solve the ill-posed problem in an optimization algorithm, the reference face model is used as the initial value. According to the characteristics of face shape and albedo, the smoothness and symmetry constraints are added to our objective function. Through discarding the reference face model and using the new estimated one to refine the lighting parameters, the reconstructed 3D face shape can be further refined. The experiments based on YaleB and BU3D databases have verified the effectiveness of our proposed method. The paper is organized as follows. Section 2 discusses the related works. Section 3 introduces the proposed fast algorithm for reconstructing face surface normal and 3D shape. Experiments and results are presented in Section 4. The last section concludes this paper.
2 Related works Three-dimensional reconstruction has been studied for a long time in computer vision and PS is considered to be an accurate method of 3D reconstruction from images. However many problems in this field still remain unsolved. For example, reconstructing the 3D shape of an object still bears many constraints; high-precision equipments are still needed to reconstruct accurate 3D shapes [20, 23, 31]. In recent years, PS [15, 24, 27] has been extended to uncalibrated PS [3, 8, 13], example-based PS [10, 28] and learning-based PS [6, 16, 21]. In the uncalibrated PS, ref. [3] presented an effective method to solve the general basrelief (GBR) [4] by combining Singular Value Decomposition (SVD) and an optimization algorithm. The accurately reconstructed 3D shape relied on many input images. Ref. [8] transformed GBR from surface normal to lighting parameters; the input four images of unknown lighting conditions were used to estimate the low order spherical harmonics of illumination using Shur Decomposition; the GBR in matrix decomposition was solved by an optimization algorithm to obtain the accurate face surface normal and albedo. The authors in [13] proposed a novel method for 3D face reconstruction that only used a single input image to match a 3D reference model by rotation and alignment; accurate results were obtained but the computational complexity is also large. Almost all optimizationbased PS algorithms need to iteratively calculate surface normal and albedo per pixel, and real time 3D reconstruction was difficult to achieve.
Multimed Tools Appl
In the example-based PS, e.g. Woodham [28], a reference object with known shape and color was imaged together with the target object. The key idea was to determine the surface normal at point p of the target object by finding a point on the reference object, which reflected the incident illumination in the same way as p. In [10], many reference objects, which had known geometrics (small balls), similar materials and color to the object, were imaged together with the object under the same lighting conditions. The surface normal of the object was estimated according to the shapes of reference objects using features mapping. Although this method does not need to consider the ambiguity problem, it requires many reference balls with similar materials. In the learning-based algorithms, common information is shared by the 3D shape and 2D image subspace, which used coupled training sets by combining pairs of 2D face images and corresponding 3D shapes. Ref. [6, 16, 21, 25] presented a coupled statistical model for face shape recovery from brightness images, which ignored the variance of lighting conditions. However, these algorithms are sensitive to the change of illumination. If there are various changes in illumination, learning-based algorithms will perform poorly in practice. In [1], an original evolutionary-based method was proposed for 3D reconstruction from an uncalibrated stereovision system, which was composed of five cameras located on an arc of a circle around the object. Three-dimensional coordinates were directly obtained through jointly computing the transformed matrix between two consecutive images. Ref. [11] presented the 2D to 3D conversion scheme to reconstruct a 3D human model by using a single depth image and several color images; this method could deal with the self-occlusion problem. Among existing literature, there are several papers on fast 3D shape reconstruction. Ref. [17] achieved fast capture of a surface view under various lighting conditions by controlling 16 lights synchronized with a high-speed video camera. It exploited graphics hardware to compute surface normal. Ref. [26] used the controlled set-ups to achieve reflection transformation for theatrical performances. The lighting set-ups were controlled through computer and could achieve fast 3D reconstruction. Nevertheless, auxiliary equipment was required to control the variance of the light sources.
3 Reconstruction of reference face model We assume that the projection axis is pointing to the viewer under orthographic projection. The surface of the object is then projected to the 2-D space of the image plane. A set of face images can be obtained by varying the directions of a light source and then the reference face model can be estimated by the captured images. PS [27] is used to estimate the reference face model by assuming Lambertian reflection. I i ðx; yÞ ¼ ρij nij ⋅ li
ð1Þ
where: Ii is the ith captured image, which is a m×n matrix; (x, y) represents the Cartesian coordinate of one pixel in face image; ρij and nij represent the face albedo and surface normal respectively. To simplify notations, let j denote the pixel location of the two-dimensional coordinates (x, y) in face image, and then j can be calculated by using j=(x-1)*m+y. A point light source is assumed to be at infinite distance from human face and the lighting parameters are same for all pixels of the captured face image. Then let li represent the lighting parameter of the ith input face image. Since there are totally Ω (Ω=m×n) pixels in each face image, we scan the input image either horizontally or vertically and then concatenate them to form a row
Multimed Tools Appl
vector with the dimension of 1×Ω. Let Isi denote the row vector being scanned, where 1≤i≤r. Then the Lambertian model of multi-images can be expressed as: 2
3 2 3 2 3 l 11 ; l 12 ; l13 I s1 n1x ; n2x ; …; nΩx 6 I s2 7 6 l 21 ; l 22 ; l23 7 7 6 7 ⋅ ρ4 n1y ; n2y ; ::::nΩy 5 I ¼6 4 … 5 ¼ L ⋅ ðρnÞ ¼ 4 … 5 n1z ; n2z ; …; nΩz I sr l r1 ; lr2 ; l r3
ð2Þ
where: I is the r×Ω matrix; L is the lighting matrix and li =[li1,li2,li3] is the lighting parameter of the ith input image; ρn is the intrinsic feature of human face. Each column of n is the surface normal of one pixel on human face and can be expressed T T −p j −q j 1 p ffiffiffiffiffiffiffiffiffiffiffiffiffiffi p ffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; ; , pj and qj are the partial as n j ¼ njx ; njy ; njz ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 2 2 2 2 p j þq j þ1
p j þq j þ1
p j þq j þ1
derivatives of the surface height in x, y direction respectively. Due to the controllability of the light source in PS, it can be set at least three lighting directions that are not coplanar. Then the rank of L matrix in equation (2) will not be less than 3 and ρn will have a liner programming solution. Then the intrinsic feature of human face can be estimated according to equation (3) and the estimated face albedo and surface normal can be computed according to equation (4). ρn ¼ L−1 I;
ρ¼
ð3Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 2 ðρnx Þ2 þ ρny þ ðρnz Þ2 Þ; n ¼ ρn=ρ ;
ð4Þ
In order to make our model more general, the average of different faces is used as the reference face model, which is computed according to 1X n i¼1 n
ρref ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ðρi nix Þ2 þ ρi niy þ ðρi niz Þ2
1X ρ ni =ρi n i¼1 i n
nref ¼
ð5Þ
where: ρref and nref in equation (5) are the reference face albedo and surface normal respectively; ρi and ni can be computed according to equation (4); i (i=1,2,…,n; n=10) is the one of ten randomly selected face samples from YaleB or BU3D. Figure 1 shows the estimated reference face model. From left to right are the mean face albedo, mean face surface normal and mean face 3D shape.
Fig. 1 The reference face model
Multimed Tools Appl
3.1 Illumination calibration for input face images In a global sense, human faces share some similarity; even different people have some similar characteristics, such as the eyes, nose, mouth and ears. Furthermore the position and the scale of these features are roughly same. By exploiting the similarity among human faces, illumination parameters of input face images can be estimated by using the reference face model. The surface reflectance features and illumination conditions are simulated by the Lambertian model (including ambient light), which corresponds to lighting conditions on a cloudy day or a constant background illumination. Based on the rough similarity of human faces, the albedo and surface normal of the reference model are used to substitute those of the input new face. Then the Lambertian model is modified as: ð6Þ I j i; n j ¼ ρjref njref ⋅ l i þ ai ; where: Ij(i, nj) is the jth pixel intensity of the ith input new image; nj is the jth surface normal of the new face; ρjref and njref are the jth albedo and surface normal of the reference face model respectively; ai is ambient, which is assumed to be independent of nj. Then we write the image intensity I(i, n) into the data matrix as a product. I ði; nÞ ¼ Li ⋅ Bi ; where: Li ¼ ½ li1 li2 li3 ai , 2 ‐ρ1ref ⋅ p1ref qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 6 p2 þ q2 þ 1 6 1ref 1ref 6 ‐ρ1ref ⋅ q1ref 6 6 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 6 Bi ¼ 6 p21ref þ q21ref þ 1 6 ρ1ref 6 6 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 6 2 4 p1ref þ q21ref þ 1 1
‐ρ2ref ⋅ p2ref qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p22ref þ q22ref þ 1 ‐ρ2ref ⋅ q2ref qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p22ref þ q22ref þ 1 ρ2ref qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 p2ref þ q22ref þ 1 1
ð7Þ
⋯ ⋯ ⋯
‐ρΩref ⋅ pΩref qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2Ωref þ q2Ωref þ 1 ‐ρΩref ⋅ qΩref qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2Ωref þ q2Ωref þ 1 ρΩref qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 pΩref þ q2Ωref þ 1 ⋯
3 7 7 7 7 7 7 7: 7 7 7 7 5
1
I(i, n) is the 1×Ω matrix; Bi is the 4×Ω matrix; the unknown parameter is the light Li. pjref and qjref (j=1,2,…,Ω) are the partial derivatives of the surface height in x, y direction respectively. Meanwhile all the pixel points of the ith face image satisfy (7). Thus by solving the over-constrained linear equations [14], the light Li can be obtained [10]. 3.2 3D face reconstruction based on uncalibrated PS After calibrating the light parameters, PS can be used to reconstruct the 3D shape of the new face. Assume the number of the input images is r (r≧3); these images are captured under different illumination conditions and at least three illumination parameters are independent with each other. Then we write all image intensity data as in [12]:
2
3
I ¼ ANL
ð8Þ
i11 i21 ⋯ ir1 6 i12 i22 ⋯ ir2 7 6 7 is the image matrix. Each column of I represents an image where, I ¼ 4 ⋮ ⋮ ⋮5 i1Ω i2Ω ⋯ irΩ captured under a certain illumination direction and each row represents the intensity values of a
Multimed Tools Appl
2 6 pixel location under different illumination directions. A ¼ 6 4
ρ1
0
ρ2
3 7 7 is the surface 5
⋱ 0 ρΩ albedo matrix, which represents the reflectance characteristic of a new human face. 2 3 n1x n1y n1z 1 6 n2x n2y n2z 1 7 7 N ¼6 4 ⋮ ⋮ ⋮ ⋮ 5 is the face surface normal matrix; nj =[njx,njy ,njz,1] represent nΩx nΩy nΩz 1 the components of the jth unit surface normal in the 3-D Cartesian coordinate system. 2 3 i11 l21 ⋯ lr1 l ⋯ lr2 7 6i L ¼ 4 11 22 5 is the lighting matrix. Due to the number of the input images is r i11 l23 ⋯ lr3 a1 a2 ⋯ ar (r≧3), each image has a lighting parameter [li1,li2,li3,ai]T. The face intrinsic features can be estimated according to: AN ¼ IL−1
ð9Þ
Then the new face albedo and surface normal (new face model) can be calculated according to: ρnew ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ðAN x Þ2 þ AN y þ ðAN z Þ2 ; nnew ¼ AN=ρnew
ð10Þ
3.3 Refining the reconstructed results Due to the fact that the reference face model is used to estimate the lighting conditions for input images, there exist errors in the estimated lighting parameters, which will affect the accuracy of the reconstructed 3D shape of the new face. Therefore we substitute the reference face model (ρref and nref) with the estimated new face model (ρnew and nnew), and then estimate the lighting parameters iteratively. The detailed flowchart is shown in Fig. 2.
Database
Reference face model
Input face images
Estimate Estimated lighting parameters
Compute albedo and surface normal of the new face
New face model Update
Meet termination conditions? Yes End
Fig. 2 Refined flowchart for reconstructing 3D face shape
No
Add constraints
Multimed Tools Appl
Input face images The input images must be aligned with frontal-view human face and have the similar scale with the reference face model. The optical flow algorithm [5] can be used to achieve finer alignment and similar scale. The objective function and constraints The large deviation between the reference face model and the new face model will produce inaccurate estimation of lighting parameters. Then the deviation of lighting parameters will generate an error in estimated face normal. This kind of error can be adjusted using the symmetry constraint, which, however, also causes noise in the face normal and albedo. Hence, a smoothness constraint is added after the new face model is reconstructed. Then the objective energy function can be built as: Z Z
2 E¼ ðI ðx; yÞ − ρnew ðx; yÞnnew ðx; yÞ ⋅ Lest Þ dxdy þ ψ ρnew ðx; yÞ; nnew ðx; yÞ dxdy x;y∈Ω
x;y∈Ω
ð11Þ where: the first term is the data constraint; I is the input images matrix; Lest is the estimated lighting matrix; The second term ψ(ρnew,nnew) is the symmetry constraint, which is defined as: ψðρnew ; nnew Þ ¼ λ1 ðρnew ðx; yÞ − ρnew ð−x; yÞÞ2 þ λ2 ððnnewx ðx; yÞ þ nnewx ð−x; yÞÞ2 ð12Þ þðnnewy ðx; yÞ − nnewy ð−x; yÞ Þ2 þ ðnnewz ðx; yÞ − nnewz ð−x; yÞÞ2 Þ where: (x, y) is symmetrical about the longitudinal center of a human face; λ1 and λ2 represent the symmetrical weights for face albedo and surface normal respectively. Because the symmetry of the face shape is more stable than that of the albedo, λ1 and λ2 are set to 0.15 and 0.3 respectively in our experiments. Then the new face model (ρnew and nnew) is updated according to
ðiÞ ðiÞ ðiÞ iþ1Þ ðx; yÞ ¼ ð1−λ1 Þ ⋅ ρnew ðx; yÞ þ λ1 ⋅ ρnew ðx; yÞ − ρnew ð−x; yÞ ρðnew ðiÞ
ðiÞ ðiÞ iþ1Þ ð Þ nðnew ð x; y Þ ¼ 1−λ x; y þ λ ⋅ n x; y þ n −x; y ⋅ n ð Þ ð Þ ð Þ 2 2 newx newx x x new
ð13Þ ðiÞ ðiÞ ðiÞ iþ1Þ ð Þ nðnew ð x; y Þ ¼ 1−λ x; y þ λ ⋅ n x; y − n −x; y ⋅ n ð Þ ð Þ ð Þ 2 2 newy newy newy y ðiÞ
ðiÞ ðiÞ iþ1Þ ð Þ ð x; y Þ ¼ 1−λ x; y þ λ ⋅ n x; y − n −x; y ⋅ n nðnew ð Þ ð Þ ð Þ 2 2 newz newz newz z where i is the iteration number. The smoothness constraint (equation (14) and (15)) is added after the new face model is reconstructed. ρnew ðx; yÞ ¼
nnew ζ ðx; yÞ ¼
ðx−αÞ2 þðy−β Þ2 1 ρnew ðα; β Þ pffiffiffiffiffiffi e− 2δ2 dαdβ 2π δ α;β∈ΩðδÞ
ð14Þ
ðx−αÞ2 þðy−βÞ2 1 nnewζ ðα; β Þ pffiffiffiffiffiffi e− 2δ2 dαdβ 2πδ α; β∈ΩðδÞ
ð15Þ
∬
∬
ζ∈x; y; z
In formula (14) and (15), (α, β) is the Cartesian coordinate of each pixel in the Gaussian window; ζ represents the components of the surface normal in x, y and z direction. A variable δ is used to control the size of the Gaussian convolution kernel. Ω(δ) is the area of the Gaussian convolution kernel δ. Due to the large deviation between the reference and new face model at the
Multimed Tools Appl
first step of iteration, a larger size of the Gaussian convolution kernel is adopted. In the experiments, we w set the maximum of the Gaussian convolution kernel to w (5 in proposed method), and δ is set to iter (where iter is the number of iterations). Then δ gradually becomes smaller with increase of iteration number. From the flowchart in Fig. 2, we can see that the imposed constraints are simply used to make reference face model more general for accurate estimation of lighting parameters and will not affect the accuracy of the reconstructed new face albedo and surface normal. Optimization Our iterative method is different from the optimization algorithms as used in [3, 8, 13]. We only estimate the lighting parameters (only four parameters for one input image, 12 parameters for three input images) by using linear programming, and then the albedo and surface normal of the input images will be obtained. The maximum number of iterations is set to 6; if the minimum value of the equation (11) is no longer decreasing or the number of iterations exceeds the maximum, the algorithm will terminate. Our experiments based on YaleB and BU3D databases show that stable results can be obtained through 2-3 iterations and the computational complexity is linear times of the classical PS algorithm. We tested our algorithm on a desktop computer (Intel (R) Core(TM) Quad CPU Q9550 @ 2.83 GHz and 3G RAM). The average reconstruction times for face surface normal are 0.310 and 0.443 s on YaleB and BU3D databases respectively, whereas the approach in [13] requires 9 s to reconstruct a human face in YaleB.
4 Experiments and discussion In order to verify our algorithm, face images in YaleB database [9] are used as inputs to reconstruct 3D face shapes. Due to images of the 14th face being damaged, the number of test images on YaleB database is 38. The image resolution of each face is 192×168 and all images on the database are monochromatic and frontal face images. For avoiding the influence of the shadow caused by the nose, the slant angle of selected input images are all less than 45°. In order to evaluate the performance of an algorithm, many error criterions [13, 19] are often used. We use the error criterion defined in [13] to assess the performance of our method:
Z ðx; yÞ − Z gt ðx; yÞ
error mapðx; yÞ ¼
Z gt ðx; yÞ x;y∈Ω
ð16Þ
where: error_map is a relative error image; Z represents the reconstructed height map of the new face; Zgt represents the ground truth, which is estimated by using PS with input face images of known lighting parameters. Therefore the overall_mean and overall_devi of the error_map are defined by equation (17) and (18) respectively. overall mean ¼
m n 1 XX error mapðx; yÞ 100% m n x¼1 y¼1
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uX m X n 1 u overrall mean 2 t overall devi ¼ error mapðx; yÞ − 100% m n x¼1 y¼1 100
ð17Þ
ð18Þ
Multimed Tools Appl
where, overall_mean represents the overall average percentages of the error_map, which is a measure of the average relative error between our reconstructed height map and the ground truth; overall_devi represents the overall deviation percentages of error_map, which is a measure of the average relative variance between our reconstructed height map and the ground truth. The integration algorithm of the M_estimator in [7] is used to integrate the estimated face surface normal for reconstructing 3D human face. Table 1 shows the comparison results of reconstructed 3D shapes between ref. [2] and our algorithm. The number of input face images is four in our experiments and the first column is one of the input images. The second column is the reconstructed 3D shape by using PS with respect to the ground truth. The third and Table 1 Comparison of 3D shapes between [2] and our algorithm
Input image
PS
[29]
Proposed
[29]
Proposed
No. 1
11.05±9.80% 4.49±7.07%
No.3
10.56±8.71% 4.42±5.90%
No. 4
6.42±8.20% 4.88±6.32%
No. 7
9.19±7.65% 5.79±7.68%
No. 8
No. 9
12.21±10.16% 5.67±8.26%
9.95±8.64% 6.39±9.01%
Multimed Tools Appl
fourth columns are the results by using the method of ref. [2] and ours respectively. The fourth and fifth columns are error_map of ref. [2] and ours respectively. The number under each error
Table 2 Comparison of error maps of differences between state-of-the-art methods and PS
Input image
PS-[ 7]
PS-[ 29]
No. 1
5.9 ± 4.6%
12.6 ± 14. 7%
2.4 ± 3.6%
No.3
6.9 ± 5.6%
8.6 ± 10.5%
2.3 ± 3.1%
No. 4
6.2 ± 5.9%
6. 7± 8. 2%
2.1 ± 2.8%
PS-Proposed
No. 7
9.3 ± 5.2%
7. 4± 8. 1%
2.2 ± 2.8%
No. 8
5.3 ± 4.9%
10. 9± 12. 6%
5.1 ± 4.6%
No. 9
4.8 ± 3.9%
4. 8± 5. 2%
5.5 ± 4.3%
Multimed Tools Appl
map represents overall_mean and overall_devi in percentage, which are calculated by equation (17) and (18) respectively. The authors in ref. [2, 13] also used the result of PS as the ground truth. Hence the requirement of the input data is same as our method, i.e. images captured under unknown lighting conditions. Table 2 is the comparison of the error maps of differences between stateof-the-art methods and PS. Our reconstructed results are more accurate than those of ref. [2, 13]. In addition, all faces in YaleB database, which includes different racial and gender people, are reconstructed by using our method. Figure 3 shows the comparison of the statistical characteristics of error maps between ref. [2] and ours. The horizontal axis in Fig. 3 represents the comparison of overall_mean/overall_devi for ref.[2] and ours; the vertical axis represents the indices of different human face in YaleB database. We can see from Fig. 3 that the overall_mean and overall_devi of our error maps are mostly under 6 %. The accuracy of the reconstructed 3D face shape using the proposed method is higher than that of ref.[2]. In order to show the generalization of our algorithm, we further verify our algorithm by using the BU3D database [30], which presented the ground truth of 3D human face of different races. The input images of this database were rendered under a point light sources with ambient and their resolution was 512×512. We reconstruct the first nine human faces and compare our results with ref. [2]. The comparison is shown in Table 3. Every column in Table 3 has the similar denotation as Table 1. In Table 3, the first five human faces were rendered under a point light source with ambient; the last four human faces were rendered under multiple light sources with ambient. Because
Fig. 3 Statistical characteristic comparison of error maps between ref. [2] and ours
Multimed Tools Appl Table 3 Comparison of reconstructed 3D human face between ref. [2] and proposed method
Input I mage GroundTruth
[29]
Proposed
[29]
Proposed
No. 1
10.16±9.08% 5.52±6.91%
No.2
12.65±11.48 5.58±6.26%
No. 3
12.60±11.94% 4.88±6.32%
No. 4
11.17±11.63% 5.61±6.10%
No. 5
11.03±11.71% 6.14±7.31%
No. 6
11.31±10.15% 7.00±8.22%
No. 7
14.95±11.58% 7.19±8.81%
No. 8
11.55±11.89
No. 9
11.32±11.16% 8.41±7.59%
8.11±9.49%
Multimed Tools Appl
Fig. 4 The rendered images. The images in first row were rendered using materials with different reflectance property; the images in second row were rendered under different lighting conditions
the lighting model used in our paper is the one-order spherical harmonics [22], for the input image with multiple illumination conditions, its lighting parameters will be estimated as a main lighting direction and an ambient. Hence, the reconstruction accuracy of the last four human faces is lower than those of the first five. However, compared with [2], the reconstructed results using our proposed method are closer to the ground truth. Figure 4 shows the rendered images, which can be used in practical multimedia application, such as 3D games and animations, to provides a vivid and realistic scene effect. The images in the first row in Fig. 4 were produced by rendering the model using materials with different reflectance properties; the images in the second row in Fig. 4 were generated by rendering with different lighting conditions.
5 Conclusions A fast uncalibrated algorithm for 3D face reconstruction is presented by combining of classical PS and lighting calibration. The computational complexity of the proposed algorithm is only linear times of the PS algorithm. Meanwhile the constraints on lighting conditions are relaxed. In our approach, a reference face model, which is used to estimate the lighting parameters of the input face images, is built first based on YaleB and BU3D databases. PS is used to reconstruct the 3D face shape of the new face. By discarding the reference face model and using the new estimated face model to refine the lighting parameters, we further optimize the reconstructed 3D face shape. Experimental results based on the YaleB and BU3D databases show that our algorithm produces more accurate results. In this paper, we can only apply the fast reconstruction algorithm on human faces. Future work may investigate fast reconstruction of general objects using similar ideas.
References 1. Alain K, Dipanda A, Claire Bourgeois R (2012) Evolutionary-based 3D Reconstruction Using an uncalibrated stereovision system: application of building a panoramic object view. Multimed Tools Appl 57(3):565–586
Multimed Tools Appl 2. Alldrin NG, Mallick SP, Kriegman DJ (2007) Resolving the Generalized Bas-Relief Ambiguity by Entropy Minimization. Conf on Comp. Vision and Pattern Recognition (CVPR):1–7. 3. Basri R, Jacobs D, Ira Kemelmacher I (2007) Photometric stereo with general, unknown lighting. Int J Comp Vision 72(3):239–257 4. Belhumeur PN, Kriegman DJ, Yuille AL (1999) The Bas-relief ambiguity. Int J Comp Vision 35(1):33–44 5. Black MJ, Anandan P (1996) The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. Comp Vis Image Underst 63(1):75–104 6. Castelan M, Smith WAP, Hancock ER (2007) A coupled statistical model for face shape recovery from brightness images. IEEE Trans Image Process 16(4):1139–1151 7. Chellappa R, Raskar R (2005) An algebraic approach to surface reconstruction from gradient fields. Proc Tenth IEEE Int Conf Comput Vis 01:174–181 8. Chen C-P, Chen C-S (2006) The 4-Source Photometric Stereo under General Unknown Lighting. ECCV, pp: 72–83 9. Georghiades AS, Belhumeur PN, Kriegman DJ (2001) From Few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660 10. Hertzmann A, Seitz SM (2003) Shape and Materials by Example: A Photometric Stereo Approach. Comp Vision Pattern Recognit, pp:533–540 11. Jang IY, Cho J-H, Lee KH (2012) 3D human modeling from a single depth image dealing with selfocclusion. Multimed Tools Appl 58(1):267–288 12. Jian M, Dong J (2011) Capture and fusion of 3D Surface texture. Multimed Tools Appl 53(1):237–251 13. Kemelmacher-Shlizerman I, Basri R (2011) 3D Face Reconstruction from a Single Image Using a Single Reference Face Shape. IEEE Trans Pattern Anal Mach Intell 33(2):394–405 14. Kim S-J, Koh K, Lustig M et al (2007) An interior-point method for large-scale 1-regularized least squares. IEEE J Sel Top Sign Process 1(4):606–617 15. Lee SW, Wang PSP, Yanushkevich SN (2008) Noniterative 3D face reconstruction based on photometric stereo. Int J Pattern Recog Artif Intell 22(3):389–410 16. Li A, Shan S, Chen X, Chai X, W Gao (2008) Recovering 3-D facial shape via coupled 2D/3D space learning. IEEE Int Conf Autom Face Gesture Recognit, pp:1–6 17. Malzbender T, Wilburn B, Gelb D et al (2006) Surface enhancement using real-time photometric stereo and reflectance transformation. Proc EGSR 245–250 18. McGunnigle G, Dong J (2011) Augmenting photometric stereo with coaxial illumination. Comp Vision IET 5(1):33–49 19. Metz CE, Herman BA, Shen JH (1998) Maximum-likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med, in press. 20. Nehab D, Rusinkiewicz S, Davis J et al (2005) Efficiently combining positions and normals for precise 3D geometry. Proc SIGGRAPH 24(3):536–543 21. Puerto-Souza G, Van Horebeek J (2009) Using subspace multiple linear regression for 3-D face shape prediction from a single image. Int Symp Visual Comput, pp: 662–673 22. Ramamoorthi R, Hanrahan P (2001) An Efficient Representation for Irradiance Environment Maps. Proceedings of the 28th annual conference on Computer graphics and interactive techniques, 497–500. 23. Seitz S, Curless B, Diebel J et al. (2006) A comparison and evaluation of multi-view stereo reconstruction algorithms. Proc. of Computer Vision and Pattern Recognition, pp:519–526 24. Shashua A (1992) Geometry and Photometry in 3D visual Recognition. Ph.D. thesis, MIT. 25. Song M, Tao D, Huang X (2012) Three-dimensional face reconstruction from a single image by a coupled RBF network. IEEE Trans Image Process 21(5):2887–2897 26. Wegner A, Gardner A, Tschou C, Unger J, Hawkins T, Debevec T (2005) Performance relighting and reflectance transformation with time multiplexed illumination. Proc Siggraph:756–764 27. Woodham RJ (1980) Photometric Method for Determining Surface Orientation from Multiple Images. Opt Eng 19(1):139–144 28. Woodham RJ (1994) Gradient and curvature from the photometric stereo method, including local confidence estimation. J Opt Soc Am 11(11):3050–3068 29. Xiaofeng Z, Caiming Z, Wenjing T (2012) Medical image segmentation using improved FCM. Sci China Inf Sci 55(4):1052–1061 30. Yin L, Wei X, Sun Y (2006) A 3D facial expression database for facial behavior research. IEEE Int Conf Autom Face Gesture Recog 10(12):211–216 31. Zitnick CL, Kang SB, Uyttendaele M et al (2004) High-quality video view interpolation using a layered representation. ACM Trans Graph 23(3):600–608
Multimed Tools Appl
Yujuan Sun is a lecturer in the College of Information and Electrical Engineering at Ludong University of China. She has published more than 20 journal and international conference papers (indexed by EI and ISTP). Her research interests include 3D shape reconstruction from images, pattern recognition, and machine learning.
Junyu Dong received his B.Sc. and M.Sc. in Applied Mathematics from the Ocean University of China (formerly called Ocean University of Qingdao) in 1993 and 1999, respectively. He won the Overseas Research Scholarship and James Watt Scholarship for his PhD study in 2000 and was awarded a Ph.D. degree in Image Processing in 2003 from the School of Mathematical and Computer Sciences, Heriot-Watt University, UK. Dr. Junyu Dong joined Ocean University of China in 2004. From 2004 to 2010, Dr. Junyu Dong was an associate professor at the Department of Computer Science and Technology. He became a Professor in 2010 and is currently the Head of the Department of Computer Science and Technology. Prof. Dong was actively involved in professional activities. He has been a member of the program committee of several international conferences, including the 4th International Workshop on Texture Analysis and Synthesis (associated with ICCV2005), the 2006 British Machine Vision Conference (BMVC2006) and the 3rd International Conference on Appearance (Predicting Perceptions 2012). Currently, Prof. Dong is the vice Chairman of Qingdao Young Computer Science and Engineering Forum (YOCSEF Qingdao). He is a member of China Computer Federation (CCF), ACM and IEEE. Prof. Dong’s research interest includes texture perception and analysis, 3D reconstruction, video analysis and underwater image processing.
Multimed Tools Appl
Muwei Jian is currently pursuing the PhD degree in the Department of Electronic and Information Engineering, The Hong Kong Polytechnic University. Mr. Jian holds 3 patents and has published more than 20 international conference and journal papers. He has served as a reviewer for several international SCI-indexed journals, such as Pattern Recognition, The Imaging Science Journal, MVAP (Machine Vision and Applications), and MTAP (Multimedia Tools and Applications). His research interests include image processing, wavelet analysis, 3D multimedia analysis and human face hallucination/recognition.
Lin Qi received his BSc and MSc in Computer Science from the Ocean University of China in 2001 and 2005 respectively. He received his PhD in 2012 from Heiriot-Watt University. He is now a lecturer in the Computer Science Department of the Ocean University of China. His research interests include visual perception, 3D computer graphics and computer vision.