 Research
 Open Access
 Published:
Single viewbased 3D face reconstruction robust to selfocclusion
EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 176 (2012)
Abstract
Stateoftheart 3D morphable model (3DMM) is used widely for 3D face reconstruction based on a single image. However, this method has a high computational cost, and hence, a simplified 3D morphable model (S3DMM) was proposed as an alternative. Unlike the original 3DMM, S3DMM uses only a sparse 3D facial shape, and therefore, it incurs a lower computational cost. However, this method is vulnerable to selfocclusion due to head rotation. Therefore, we propose a solution to the selfocclusion problem in S3DMMbased 3D face reconstruction.
This research is novel compared with previous works, in the following three respects. First, selfocclusion of the input face is detected automatically by estimating the head pose using a cylindrical head model. Second, a 3D model fitting scheme is designed based on selected visible facial feature points, which facilitates 3D face reconstruction without any effect from selfocclusion. Third, the reconstruction performance is enhanced by using the estimated pose as the initial pose parameter during the 3D model fitting process.
The experimental results showed that the selfocclusion detection had high accuracy and our proposed method delivered a noticeable improvement in the 3D face reconstruction performance compared with previous methods.
Introduction
3D face modeling originated with Parke’s pioneering studies [1, 2], which aimed to generate realistic faces for computer animation. Since Parke’s work, 3D face reconstruction has attracted considerable attention from many computer vision researchers because it has many useful applications, such as poseinvariant face recognition [3, 4], ageinvariant face recognition [5, 6], 3D face generation for game and movie characters [7, 8], monitoring suspects using surveillance camera systems, video conferencing, and automatic face conversion of a 2D face image into a 3D face for 3D TV.
3D face modeling technologies can be divided into two basic approaches. One approach uses specific sensors, such as stereographic cameras, structured light, or 3D laser scanners [9]. These methods produce accurate 3D face data, but they are expensive and require additional operations, such as calibration. To overcome these limitations, a monocular camerabased approach has been researched intensively. This approach can be categorized further into single viewbased and multiviewbased approaches. The multiviewbased approach uses more 2D facial information than the single viewbased approach, but it has some limitations such as a requirement for multiple images and detection of the correspondences among the images. In this study, we focus on the single viewbased approach.
Among the single viewbased methods, shapefromshading (SFS) is a traditional method for deriving a 3D facial shape from the brightness variations in a single image. However, SFSbased methods have impractical constraints because the Lambertian reflectance model and a known light source direction need to be assumed to produce accurate results [10–12]. Recently, several new techniques have been proposed to overcome this problem. These methods reconstruct a 3D face by modeling the relationships between the intensities and the depth information of the face using statistical learning techniques, such as principal component analysis (PCA), partial least squares, and canonical correlation analysis [13–16]. However, all of these methods assume that the input face is viewed from the front, but this strict constraint cannot always be satisfied in real world applications, such as surveillance camera systems.
The stateoftheart 3D morphable model (3DMM), which was proposed by Blanz and Vetter [3, 17], is a singleimagebased 3D face reconstruction method that requires no frontal pose constraint. In this method, 3D face reconstruction is performed by fitting a morphable model to a 2D image. The reconstructed 3D face is represented by model parameters that minimize the texture residual errors between the rendered model image and the input image. To generate a highquality 3D face, the model parameters contain rendering parameters, such as the camera geometry and illumination direction, as well as facial texture and shape parameters. Therefore, 3DMM can reconstruct a more realistic 3D face in less restrictive conditions. However, it has a high computational complexity because of the large number of model parameters that need to be estimated simultaneously [4]. In addition, 3DMM requires manual initialization and a dense pointtopoint correspondence between all of the face images.
Simplified versions [5, 6, 18–20] of 3DMM have been proposed to reduce these computational costs. Unlike the original 3DMM, most simplified 3DMM (S3DMM)based methods use a sparse shape model, which is constructed by statistically learning a data set of 3D facial feature points (FFPs). The FFPs indicate the salient features of a face, such as the corners of the eyes, nose tip, and the corners of the mouth.
S3DMM reconstructs a 3D facial shape (that consists of 3D FFPs) by finding the optimal shape parameter and the pose parameter that minimizes the difference between the projected 3D FFPs of the shape model and the input 2D FFPs. Therefore, the S3DMMbased method does not need to find a dense correspondence between the faces and it has a lower computational complexity because of the drastic reduction in the number of parameters. However, some of these methods [18, 19] still have the limitation that the input face needs to be a frontal view whereas others [5, 6, 20] are unaffected by pose variations. Wang et al. [20] proposed an automatic framework for 3D face reconstruction on the basis of an arbitrary view image and estimated the shape and pose parameters using an expectationmaximization (EM) algorithm. However, they did not report any quantitative results and only qualitative results were presented from some test images in their experiments. Park et al. [5, 6] used S3DMM to create a 3D aging model for ageinvariant face recognition. They derived a 3D facial shape by alternating the pose parameter and the shape parameter until the shape residual error converged. The pose and shape parameters were estimated separately using the least squares method. This alternation methodology is used in most S3DMMbased methods [5, 6, 18, 20] because it reduces the computational time and produces a linear cost function, compared with the onestep methodology that estimates all of the parameters at the same time using a nonlinear optimization process.
Existing methods [5, 6, 20] are known to perform using an arbitrary view, but they are not robust to pose variations. This is because they are vulnerable to selfocclusion errors in the 2D input FFPs caused by pose changes. If an input face is rotated, some parts of the face are selfoccluded by other parts. Therefore, the occluded facial region is not visible in a rotated face image, and the real FFPs placed on the region are also not observable. As a result, the FFPs detected on the rotated face image contain location errors caused by selfocclusion. These errors degrade the performance of S3DMMbased 3D face reconstruction. Unfortunately, existing S3DMMbased methods have not addressed this problem.
Therefore, we propose a solution to the selfocclusion problem of S3DMMbased 3D face reconstruction. Our proposed method consists of the following two steps. In the first step, visible FFPs that are not affected by selfocclusion are automatically discriminated from selfoccluded FFPs by estimating the head pose using a cylindrical head model. In the next step, 3D face reconstruction is performed using a model fitting scheme based on selected visible FFPs to reduce the selfocclusion effect. The performance of the 3D face reconstruction is enhanced by using a pose estimated with the cylindrical head model as the initial pose parameter in the proposed model fitting. In experiments, we evaluated the performance of our proposed method qualitatively and quantitatively by using groundtruth 3D face data acquired with a 3D laser scanner. The experimental results showed that the visible FFPs were selected with high accuracy and that the 3D face reconstruction performance with the proposed method was improved greatly compared with previous S3DMMbased methods. Comparisons of previous methods and our proposed method are summarized in Table 1.
The remainder of this paper is organized as follows. In the following section, the S3DMM is explained and the selfocclusion problem is analyzed. In Section “Proposed 3D face reconstruction method”, our proposed method is described, including the automatic selection of visible FFPs and the model fitting strategy. The experimental results are presented in Section “Experimental results”. Finally, our conclusions are summarized in Section “Conclusions”.
S3DMM and selfocclusion problem
S3DMM
In S3DMM, the geometry of a face is defined as a shape vector $\mathbf{S}={\left({X}_{1},{Y}_{1},{Z}_{1},{X}_{2},\dots {Y}_{n},{Z}_{n}\right)}^{T}\in {\mathfrak{R}}^{3n}$, which contains the X , Y , and Z coordinates of n vertices. The original 3DMM generally uses a dense shape with thousands of vertices, whereas S3DMM uses a sparse shape with only dozens of vertices. In order to build a morphable shape model, S3DMM performs PCA on a training set of shape vectors S_{j}. The mean shape s_{0} and m shape variations s_{i} are then obtained, and a new shape S can be expressed as a linear combination of the mean shape s_{0} and the shape variations s_{i} as follows:
where $\mathbf{\beta}={\left({\beta}_{1},{\beta}_{2},\dots ,{\beta}_{m}\right)}^{T}$ is the shape parameter and m is the dimension of the shape parameter which was determined to represent 99 % of the shape variations of the training face set [21]. Finally, a new 3D facial shape can be generated by changing the shape parameter β.
Given the 2D FFPs of an input face image, such as ${\mathbf{s}}_{2d}={\left({x}_{1},{y}_{1},{x}_{2},\dots {y}_{n}\right)}^{T}\in {\mathfrak{R}}^{2n}$, the shape parameter β needs to be determined such that it minimizes the shape residual between the projected 3D facial shape generated by the shape parameter and the input 2D facial shape. The optimal shape and pose parameters $\left(\mathbf{\beta},{\mathbf{R}}_{\theta},\mathbf{T}\right)$ are obtained from (2):
Where $\tilde{\mathbf{S}}$ is a 3 × n matrix that is reshaped from the 3 n × 1 model shape vector S obtained using (1), ${\tilde{\mathbf{s}}}_{2d}$ is a 2 × n matrix that is reshaped from the 2 n × 1 input shape vector ${\mathbf{s}}_{2d}$ , P is a 2 × 3 orthographic projection matrix, $\tilde{\mathbf{T}}$ is a 3 × n translation matrix consisting of n translation vectors $\mathbf{T}={\left[{t}_{x}\phantom{\rule{0.22em}{0ex}}{t}_{y}\phantom{\rule{0.22em}{0ex}}{t}_{z}\right]}^{T}$, and R_{θ} is a 3 × 3 rotation matrix where the yaw angle is θ . Note that in this paper, we consider mainly yaw rotation because the selfocclusion caused by yaw rotation is relatively greater than that caused by pitch rotation, and t_{z} is set to 0 because an orthographic projection is assumed.
Several methodologies are used to estimate the model parameters for S3DMM. We used the alternation methodology of [5], which alternately finds the optimal shape parameter and pose parameter until the shape residual converges, because this approach reduces the computational cost by transforming a nonlinear cost function into a linear one. As shown in Algorithm 1, the procedure for 3D model fitting is as follows. First, the shape parameter β_{0} and translation parameter T_{0} are initialized to 0 and the input 2D FFPs s_{ 2d } are aligned with the 2D mean shape obtained by projecting the 3D mean shape (s_{0}) with a frontal pose onto the x y plane. As the alignment method, we use the Procrustes analysis, which includes translation, rotation, and scaling [22]. The optimal model parameters are determined by alternately updating the pose parameter (R_{ θ }T) at the fixed β and updating the shape parameter β at fixed (R_{ θ }T) until the shape residual error converges. The cost function is solved as a least squares problem and the rotation matrix is calculated by QR decomposition, as in [5]. Finally, a new 3D facial shape S_{ 3d } is reconstructed by applying the optimal shape parameter β to (1).
Algorithm 1 3D Model Fitting[5]
Input: ${\mathbf{s}}_{2d}={\left({x}_{1},{y}_{1},{x}_{2},\dots ,{y}_{n}\right)}^{T}$
Output: ${\mathbf{S}}_{3d}={\left({X}_{1},{Y}_{1},{Z}_{1},{X}_{2},\dots ,{Y}_{n},{Z}_{n}\right)}^{T}$

1.
Initialization
Set β_{ 0 } = 0, T_{0 = }0 and k = 1.

2.
Alignment
s_{ 2d } is aligned with the 2D mean shape obtained by projecting the frontal 3D mean shape (s_{0}) onto the x – y plane.

3.
Update R _{ θ } and T with the fixed shape parameter
$$arg{min}_{{\left({\mathbf{R}}_{\theta},\mathbf{T}\right)}_{k}}{\Vert \mathbf{P}\left({\mathbf{R}}_{\theta}\tilde{\mathbf{S}}+\tilde{\mathbf{T}}\right){\tilde{\mathbf{s}}}_{2d}\Vert}_{\mathbf{\beta}={\mathbf{\beta}}_{k1}}^{2}$$Update the shape parameter with the fixed pose parameter
$$arg{min}_{{\mathbf{\beta}}_{k}}{\Vert \mathbf{P}\left({\mathbf{R}}_{\theta}\tilde{\mathbf{S}}+\tilde{\mathbf{T}}\right){\tilde{\mathbf{s}}}_{2d}\Vert}_{\left({\mathbf{R}}_{\theta},\mathbf{T}\right)={\left({\mathbf{R}}_{\theta},\mathbf{T}\right)}_{k}}^{2}$$ 
4.
Reconstruct (S _{ 3d })_{ k } using the shape parameter β _{k}

5.
Verify whether ${\Vert \mathbf{P}\left({\mathbf{R}}_{\theta}{\left({\tilde{\mathbf{S}}}_{3d}\right)}^{k}+\tilde{\mathbf{T}}\right){\tilde{\mathbf{s}}}_{2d}\Vert}_{\left({\mathbf{R}}_{\theta},\mathbf{T}\right)={\left({\mathbf{R}}_{\theta},\mathbf{T}\right)}_{k}}<\u03f5$ or k > τ
If not, go to Step 3 and k = k + 1

6.
β = β _{k} and (R _{ θ },T) = (R _{ θ },T)_{ k }

7.
Reconstruct S _{ 3d } using the final shape parameters
Selfocclusion problem
S3DMMbased methods reconstruct a 3D facial shape from 2D FFPs detected in an input 2D image. Therefore, these methods are vulnerable to large location errors that affect the 2D FFPs observed in a given image. Unfortunately, the observed 2D FFPs may have severe location errors caused by selfocclusion when detecting the 2D FFPs in a highly rotated face image, as shown in Figures 1 and 2.
Figure 1 shows selfocclusion errors that occur after comparing the observed 2D FFPs with the groundtruth 2D FFPs of the facial contours. When detecting the facial contour FFPs in a halfprofile view image, the visible FFPs laid on the visible facial region can be detected as those of the groundtruth facial contour, but the observed FFPs on the occluded facial region are located in the outline of the face because the occluded real facial contour cannot be observed, as shown in Figure 1a. Therefore, the 2D FFPs observed in a rotated face image have location errors, which are the differences between the observed FFPs and the occluded real FFPs, as shown in Figure 1c. These errors increase as the degree of head rotation increases, as shown in Figure 2. As a result, this selfocclusion problem degrades the S3DMM performance.
Proposed 3D face reconstruction method
Overall procedure of the proposed method
The proposed 3D face reconstruction process starts with the localization of the FFPs in a given 2D face image. To detect selfocclusion in an input face, the head pose is estimated using a cylindrical head modelbased method [23]. The estimated pose can then be used to determine which FFPs are selfoccluded. Next, a sparse 3D facial shape is reconstructed using the model fitting process based on the selected visible FFPs. Subsequently, a dense 3D facial shape is interpolated from the reconstructed sparse 3D facial shape using the Thin Plate Spline (TPS) method [24, 25]. Finally, the facial texture directly extracted from the input image is mapped onto the dense 3D facial shape. The overall procedure of the proposed method is shown in Figure 3.
Head pose estimation
The head orientation of an input face is useful for determining whether the FFPs are selfoccluded. There are many appearancebased head pose estimation methods. Manifold embedding methods such as PCA, KPCA, LDA, and kernel discriminant analysis have been used to extract texture features, and these features are used to estimate the discrete head pose [26–29]. MurphyChutorian et al.[30] used local gradient orientation and estimated the continuous yaw and pitch using support vector regression. However, the accuracies of these methods can be affected by the detection performance in the face region and these methods require many training samples [31]. Therefore, we used a cylindrical head model [23] in this study, which is a geometrybased head pose estimation method. In general, S3DMMbased methods alone can estimate the head pose because they can rotate the 3D shape model as closely as possible to the pose of the input face to find the bestmatched shape model for the input 2D facial shape. However, pose estimation by S3DMM is vulnerable to selfocclusion because the head pose is obtained from the relationship between the projected 3D FFPs of the shape model and the 2D FFPs detected in the image. As shown in Figure 1, selfocclusion errors may occur between the projected 3D FFPs and the observed 2D FFPs in a highly rotated face image. Consequently, these errors lead to lower accuracy results during head pose estimation with S3DMM.
Therefore, instead of using the pose estimator included in S3DMM, we employed a cylindrical head model to estimate the pose. The estimated pose is used for selfocclusion detection. The cylindrical head model, proposed by Ohue et al. [23], was used to detect the direction of a driver’s face in a realtime system. This method is based on the assumption that a human head is basically cylindrical in shape, as shown in Figure 4a. The head pose is calculated simply by using three facial lines, namely, the right facial edge, left facial edge, and the center line of the face, as shown in Figure 4b. The yaw angle θ of the face is calculated using the following equation:
where r is the radius of the cylinder, x_{m} is the x coordinate of the cylinder center line and x_{c} is the x coordinate of the facial center line. Here, the radius r and the center line x_{m} are obtained from the x coordinates of the right and left facial edges, i.e., $r=\left({x}_{l}{x}_{r}\right)/2$ and ${x}_{m}=\left({x}_{l}\phantom{\rule{0.25em}{0ex}}+{x}_{r}\right)/2$ , respectively, as shown in Figure 4b. When detecting the facial edge lines x_{i} and x_{r} in a rotated view, the cylindrical head model does not require the occluded facial edge line, and instead it uses the newly observed facial edge line of the rotated view, as shown in Figure 5. Therefore, this method is less affected by selfocclusion.
As shown in (3), the performance of this method depends on how accurately the three lines are detected in the face image. Ohue et al .[23] used image processing methods, such as a Sobel filter and histogram analysis, to detect the lines. However, in our study, the three lines are obtained from the 2D FFPs which are manually annotated or automatically detected using the Active Appearance Models (AAMs) algorithm [32]. The selected FFPs and the three facial lines are shown in Figure 4c, where the facial center line is on the center point of the bottom of nose while the facial right and left boundaries are obtained from the midpoint between the two points located on each side of the facial contour, respectively.
Determination of visible FFPs
In this section, we present a method for automatically discriminating the visible FFPs and occluded FFPs by using the estimated head pose. The human head has a 3D structure but its shape varies slightly from person to person. Thus, the set of visible FFPs is inconsistent even with the same head pose and it changes depending on the individual. However, the individual differences between the sets of visible FFPs are not very large, and therefore, we use the generic visible FFP set for each yaw angle. This set is defined by manually analyzing the training set of 3D FFPs annotated in the 3D face scans. When defining the generic set of the visible FFPs for each pose, a point is excluded as occluded in the generic set of visible FFPs if it is occluded in the face of a certain individual. In this manner, we can prepare an FFP index table that contains the indices of the visible FFPs for each head pose. Figure 6 shows the defined visible FFPs (as crosses) and the occluded FFPs (as circles) for each yaw angle. In this study, the yaw angles are quantized as seven discrete angles, which range from −45° to +45° at intervals of 15°, because the visible FFPs do not change greatly over a 15° interval.
Finally, given the estimated head pose, the pose angle is quantized into one of the seven discrete angles, and a masking matrix M_{ θ } is obtained from the index table of the visible FFPs related to the quantized angle θ , as shown in Figure 7. Each component of the masking matrix M_{ θ } represents a visible FFP as 1 and an occluded FFP as 0:
where i =1,2, …, n , and n is the number of vertices. We used 80 vertices in this study. M_{ θ } is a 2 × 80 binary matrix, the pose angle θ has one of the seven discrete values, and m_{ i } is a 2 × 1 columnvector.
3D Face model fitting
A detailed description of the proposed model fitting method is shown in Algorithm 2. This is a modified version of the earlier S3DMMbased algorithm mentioned in Section “S3DMM”. The proposed model fitting scheme is based on the selected visible FFPs, which eliminates the selfocclusion effect. As a result, the cost function of (2) is modified as follows:
where the symbol “°” represents the Hadamard product, which is known as entrywise multiplication [33], while M_{ θ } is the masking matrix at rotation angle θ . M_{ θ } is obtained from the index table of the visible FFPs for the estimated pose, as explained in Sections “Head pose estimation” and “Determination of visible FFPs”. We can calculate the shape residual between the visible FFPs of the shape model and the input 2D facial shape using this masking matrix. The shape parameter β and pose parameter (R_{ θ }T) can be obtained without any selfocclusion effect by minimizing this shape residual. The proposed 3D model fitting algorithm has the following two advantages compared with the previous method:

1)
The pose angle $\widehat{\theta}$ estimated by the cylindrical model is used for the pose parameter initialization. Therefore, the parameter estimation starts from a relatively exact initial pose parameter, which enhances the 3D face reconstruction performance. During the alignment step, an accurate alignment result is obtained by aligning the input 2D FFPs with the FFPs of the 2D mean shape, which are obtained by rotating the 3D mean shape (s _{0}) from 0° to $\widehat{\theta}$ and projecting it onto the x – y plane.

2)
3D model fitting is achieved on the basis of the visible FFPs by using the masking matrix. Therefore, the proposed method can reconstruct 3D faces that are less affected by selfocclusion.
Algorithm 2 Proposed 3D Model Fitting
Input: ${\mathbf{s}}_{2d}={\left({x}_{1},{y}_{1},{x}_{2},\dots ,{y}_{n}\right)}^{T}$
Output: ${\mathbf{S}}_{3d}={\left({X}_{1},{Y}_{1},{Z}_{1},{X}_{2},\dots ,{Y}_{n},{Z}_{n}\right)}^{T}$

1.
Initialization
Set ${\theta}_{0}=\widehat{\theta}$ , ${\mathbf{T}}_{0}=\mathbf{0}$, $\mathbf{M}={\mathbf{M}}_{\overline{\theta}}$ , and k = 1.
($\widehat{\theta}$ is the angle estimated by the cylindrical model and $\overline{\theta}$ is a quantized angle of ${\theta}_{0}$ .)

2.
Alignment
${\mathbf{s}}_{2d}$ is aligned with the 2D mean shape produced by rotating the 3D mean shape (${\mathbf{s}}_{0}$ ) from 0° to $\widehat{\theta}$ and projecting it onto the x–y plane.

3.
Update the shape parameter with the fixed pose parameter
$arg{min}_{{\mathbf{\beta}}_{k}}{\Vert \mathbf{M}\phantom{\rule{0.11em}{0ex}}\xb0\left(\mathbf{P}\left({\mathbf{R}}_{\theta}\tilde{\mathbf{S}}+\tilde{\mathbf{T}}\right){\tilde{\mathbf{s}}}_{2d}\right)\Vert}_{\left({\mathbf{R}}_{\theta},\mathbf{T}\right)={\left({\mathbf{R}}_{\theta},\mathbf{T}\right)}_{k1}}^{2}$
Update ${\mathbf{R}}_{\theta}\text{and}\mathbf{T}$ with the fixed shape parameter
$$arg{min}_{{\left({\mathbf{R}}_{\theta},\mathbf{T}\right)}_{k}}{\Vert \mathbf{M}\phantom{\rule{0.11em}{0ex}}\xb0\left(\mathbf{P}\left({\mathbf{R}}_{\theta}\tilde{\mathbf{S}}+\tilde{\mathbf{T}}\right){\tilde{\mathbf{s}}}_{2d}\right)\Vert}_{\mathbf{\beta}={\mathbf{\beta}}_{k}}^{2}$$ 
4.
Reconstruct ${\left({\mathbf{S}}_{3d}\right)}^{k}$ using the shape parameter β _{ k }

5.
Verify whether ${\Vert \mathbf{M}\phantom{\rule{0.11em}{0ex}}\xb0\left(\mathbf{P}\left({\mathbf{R}}_{\theta}{\left({\tilde{\mathbf{S}}}_{3d}\right)}^{k}+\tilde{\mathbf{T}}\right){\tilde{\mathbf{s}}}_{2d}\right)\Vert}_{\left({\mathbf{R}}_{\theta},\mathbf{T}\right)={\left({\mathbf{R}}_{\theta},\mathbf{T}\right)}_{k}}<\u03f5$ or k > τ
If not, go to Step 3 and k = k + 1

6.
β = β _{ k } and (R _{ θ }, T) = (R _{ θ }, T)_{ k }

7.
Reconstruct S _{3 d} using the final shape parameter
Dense 3D facial shape and texture mapping
A sparse 3D facial shape is produced as the reconstruction result after model fitting. Therefore, we have to perform interpolation to produce a dense 3D facial shape. Interpolation is achieved by mapping a generic dense mean shape onto the reconstructed sparse 3D facial shape using the TPS algorithm [24, 25]. The TPS mapping function is designed by learning the relationship between the FFPs of the generic 3D mean shape and the reconstructed 3D facial shape, which is similar to [24]. Let u be the FFPs of the generic mean shape and F (u) be the FFPs of the reconstructed facial shape. The mapping function is then:
where c represents a translation, A is a rotation, W is the nonlinear deformation, and s (u) is a spline function. The mapping function F (u) is then used to transform all of the other vertices in the mean shape, which produces adapted dense 3D face.
Finally, an available texture in the input 2D image is mapped onto the dense 3D facial shape to complete the 3D face reconstruction. However, some facial texture regions can be selfoccluded in rotated view images. Thus, the bilateral symmetry of a face is used for texture mapping to recover the selfoccluded texture. Specifically, we produce a mirrored image of the halfface contained in the visible region, and use the mirrored texture of the visible halfface for texture mapping in the occluded facial region.
Experimental results
Face database
For the experiments, we acquired 86 male and 64 female 3D face scans using a 3D laser scanner ( Cyberware 3030 RGB model) [34]. To construct 3D shape models, each 3D face scan was manually annotated with 80 FFPs, and the annotated 3D FFPs were aligned with others by Procrustes analysis [22]. An iterative alignment method was applied to our data to produce accurate aligned data. A reference face scan was randomly selected and 3D FFPs in the remaining face scans were aligned with 3D FFPs in the reference face scan. The mean FFPs of the aligned data became the new reference FFPs in the next iteration, and this process was repeated until the mean FFPs stopped changing. The final aligned 3D FFPs were used to build a 3D shape model in the training stage. The training and test datasets were divided using the leaveoneout methodology, which uses a single data as the test data and the remaining data as the training data.
2D face images of 150 subjects were obtained by projecting the textured 3D face scans as test images. In other words, we obtained face images of seven different head poses by rotating the face scans at 15° intervals in the range −45° to +45° and projecting them onto the x y plane. Therefore, the total number of test images was 1050 (150 subjects × 7 poses). The resolution of the obtained 2D images is 1200 × 900 pixels and the size of the facial region was approximately 350 × 350 pixels. Figure 8 shows examples of the 2D face test images obtained. Given a test image, 2D FFPs were detected for 3D model fitting in S3DMM. In the experiments, we used two methods to detect 2D FFPs in the test images for ideal and practical cases. In the ideal case, Test data 1 was created by combining the groundtruth 2D FFPs in the visible facial region and the manually marked 2D FFPs (observed 2D FFPs) for the selfoccluded facial regions, as shown in Figure 9a. The groundtruth 2D FFPs were acquired by rotating the groundtruth 3D FFPs (manually annotated points on a face scan) and projecting them onto the x y plane. In the case of a rotated facial image, it is difficult to detect the groundtruth 2D FFPs on the selfoccluded region because they are not visible. Therefore, the manually marked 2D FFPs (observed 2D FFPs) were used in the selfoccluded facial regions. Test data 1 could be regarded as ideal data because all of the FFPs were obtained manually. In the practical case, Test data 2 was derived by automatically detecting the FFPs using the AAMs fitting algorithm, as described in Figure 9b. The AAMs used in this work is based on the simultaneous inverse compositional algorithm [32]. These data may have AAMs fitting errors as well as errors caused by selfocclusion.
We used the root mean squared error (RMSE) to measure the similarity between the reconstructed facial shape and the corresponding groundtruth facial shape. Let the reconstructed facial shape vector be S_{ re } = [ X ’_{1}, Y ’_{1}, Z ’_{1}, X ’_{2}, …, Y ’_{ n }, Z ’_{ n }]^{T} and the corresponding groundtruth facial shape vector be S_{ gt } = [ X_{1}, Y_{1}, Z_{1}, X_{2}, …, Y_{ n }, Z_{ n }]^{T}. Given these two shape vectors, the RMSE is calculated as follows:
where n is the number of FFPs.
Effect of selfocclusion on 3D face reconstruction
To demonstrate the effect of selfocclusion on the performance of the previous S3DMM method, we performed 3D face reconstruction using the S3DMM algorithm found in Section “S3DMM” with the groundtruth data and Test data 1. The groundtruth data was obtained by projecting the groundtruth 3D FFPs onto the x – y plane, and hence no errors were caused by selfocclusion, whereas Test data 1 contained errors, as shown in Figure 9a. Figure 10 shows the RMSEs of the reconstructed facial shape for the two types of test data. As shown in Figure 10, the RMSE increased sharply when using Test data 1 because the degree of head rotation increased, and thus the location errors in the 2D FFPs in the occluded facial region also increased. In the highly rotated views (±45°), the RMSE differences between the two data sets were greater than approximately 1 mm. Both datasets had the same RMSE in the frontal view (0°), because the frontal view was not affected by selfocclusion. The results showed that the 2D FFP location errors caused by selfocclusion severely deteriorated the reconstruction performance of the earlier S3DMM algorithm.
Head pose estimation performance
This section details the performance of the cylindrical modelbased pose estimation method compared with QR decompositionbased head pose estimation, which was used in [5], and the effect of using the estimated pose as the initial pose parameter on the proposed 3D reconstruction method. Figure 11 and Table 2 show the mean absolute errors (MAEs) with the two different head pose estimation methods using Test data 1 and 2. The results clearly show the following:

1)
The cylindrical modelbased method showed better performance with the highly rotated face images (±30°, ±45°) because this method used facial features that were less sensitive to selfocclusion.

2)
The QR decompositionbased method showed slightly better performance with the almost frontal face images (±15°, 0°) because the FFP location error in these images caused by selfocclusion was small and this method used a relatively larger number of facial features. However, these facial features were more sensitive to selfocclusion because the performance of this method was severely degraded in the highly rotated images.

3)
The results were similar irrespective of the data set that was used, but the pose estimation error with Test data 2 was slightly larger than that with Test data 1 because the automatically obtained test data had more severe FFP location errors than the manually obtained test data. However, even in such case, the proposed 3D reconstruction method showed better performance than the previous method. This result will be discussed in Section “Quantitative results with yaw variations”. In order to remove noisy FFPs, we need a different approach such as texture analysis around each FFP.
To evaluate the effects of different head pose estimation methods on 3D face reconstruction, we applied QR decomposition and the cylindrical head model to our proposed S3DMM algorithm seen in Section “3D face model fitting”, respectively. As shown in Figures 12a,b, the two methods showed similar performances in the −15° to +15° pose range, but the performance of the cylindrical model is better than that of the QR decomposition method at ±30° and ±45°, regardless of the test dataset. The method based on QR decomposition showed better pose estimation performance with the almost frontal face images, but this method did not show a better 3D face reconstruction performance, as shown in Figure 12, because the low pose estimation error was compensated for by our proposed model fitting algorithm from Section “3D face model fitting”. Note that the cylindrical modelbased method showed considerably better pose estimation performance with highly rotated face images, which led to an improvement in 3D face reconstruction performance.
In this work, the yaw angle estimated by the cylindrical model was used as the initial pose parameter (${\theta}_{0}=\widehat{\theta}$ ), as shown in the initialization step of the proposed 3D model fitting algorithm in Section “3D face model fitting”. To demonstrate the effect of using the estimated pose as the initial pose parameter, we further obtained results to compare the following two cases with Test data 1 and 2: the first case used the estimated pose whereas the second case did not. In the first case, 3D model fitting began with shape parameter estimation because the approximate initial pose was known, as shown in Algorithm 2 in Section “3D face model fitting”. In the second case, the initial pose parameter was not provided, and hence the pose parameter had to be estimated prior to shape parameter estimation, as shown in Step 3 of Algorithm 1. As shown in Figures 13a,c, both cases produced similar RMSEs with the almost frontal views (−15° to +15°). However, the first case showed the better performance with the highly rotated views (±30°, ±45°), irrespective of the test dataset used. As shown in Figures 13b,d, the use of the estimated pose was also more efficient than not using the estimated pose in terms of the processing time. Based on these observations, we found that an accurate initial pose parameter led to a performance improvement during 3D face reconstruction based on our proposed algorithm and reduced the number of iterations required for 3D model fitting.
In summary, the cylindrical modelbased method is suitable for selecting a reliable set of visible FFPs while the use of the estimated yaw angle as the initial pose parameter indicates that the proposed 3D model fitting algorithm shows better reconstruction performance with higher efficiency.
Quantitative performance of the proposed method
Quantitative results with yaw variations
To evaluate the performance of our proposed method, we obtained results from the two test data sets with yaw variations using the proposed algorithm and we compared its performance with that of the previous S3DMM algorithm found in Section “S3DMM”.
Figure 14 shows the RMSEs of the proposed method and the previous method with Test data 1 and 2. As shown in Figure 14, the proposed algorithm produced a significantly lower RMSE than the previous method as the head rotation increased. This was because the FFP selfocclusion errors increased with the head rotation. Figure 14b showed that although Test data 2 contained selfocclusion errors and detection errors, because they were obtained automatically using the AAMs algorithm, the results showed that the proposed method provided superior performance compared to the previous method. However, if detection errors of the FFPs were very large, the reconstruction performance of the proposed method could be significantly degraded because these FFPs still contained large detection errors, although selfocclusion errors were excluded by the proposed method. This problem of FFP detection is a common limitation of S3DMMbased methods.
We also obtained results from Test data 1 and 2 using the onestep parameter estimation method. The onestep methodology has been used for model parameter estimation by the S3DMMbased method, which simultaneously estimates the shape and pose parameters using a nonlinear optimization algorithm [3, 17, 19]. In this study, onestep parameter estimation was achieved using code available in MATLAB [35]. We compared the onestep methodology using all FFPs (previous method) with one using only visible FFPs (proposed method). Figure 15 shows the RMSEs for both methods. In the highly rotated views (±30°, ±45°), the proposed method using selected visible FFPs showed considerably better performance than the previous method using all of the FFPs with both test datasets. With the almost frontal views (0°, ±15°), both methods delivered similar reconstruction performances. From the observations in Figures 14 and 15, the proposed strategy of using the visible FFPs was an adequate solution for the selfocclusion problem during S3DMMbased 3D face reconstruction.
Quantitative results with yaw and pitch variations
In this experiment, we tested whether yaw and pitch variations affected the performance of the proposed method. For this experiment, face images were generated with yaw and pitch variations by rotating and projecting the textured face scans of 150 subjects, as explained in Section “Face database”. A head pose set consisted of 28 different poses, which were a combination of seven yaw angles (0°, ±15°, ±30°, ±45°) and four pitch angles (±15°, ±30°). The total number of the test images is 4200 (150 subjects × 28 poses). To evaluate the performance of the proposed method using these test images, Test data 1 and 2 were obtained from the test images, as explained in Section “Face database”, and the performance of the proposed algorithm from Section “3D face model fitting” was compared with that of the previous algorithm seen in Section “S3DMM”. Figure 16 shows the effects of yaw and pitch variations on the performance of the proposed method. In this figure, each graph shows the RMSEs of the proposed method and the previous method with yaw variations at a fixed pitch angle. From the results, it is found that the performance of the proposed method was remarkably improved compared to that of the previous method even with yaw and pitch variations. This was because the selfocclusion error caused by yaw variations was more dominant than that caused by pitch variations. Consequently, it is found that the occlusion error occurred by a combination of yaw and pitch could be compensated for using the proposed algorithm by considering only yaw variations, as shown in Figure 16. However, the performance of the proposed method degraded slightly when both head yaw and tilt occurred because the cylindrical model could only estimate head yaw. In order to improve the performance of the proposed method when both head yaw and tilt occur, we need another tilt estimator.
Qualitative performance of the proposed method
We obtained qualitative results for the proposed method and the previous method using Test data 1 and 2. Figure 17 shows the reconstructed results for four subjects from our database when using Test data 1. The first column shows the groundtruth images while the a, c, and e columns show the reconstructed faces using the previous method when the yaw angle of the input face was 15°, 30°, and 45°, respectively. The b, d, and f columns show the reconstructed faces using the proposed method when the yaw angle of the input face was 15°, 30°, and 45°, respectively. In the same way, Figure 18 shows the faces reconstructed using Test data 2. As shown in Figures 17 and 18, the performance of the proposed method was almost the same as that of the previous method with a 15° head pose because the almost frontal view is negligibly affected by selfocclusion. However, we observed the following with the highly rotated views (30°, 45°):

1)
The proposed method showed that the frontal and profile shapes were similar to the groundtruth facial shapes.

2)
The previous method produced a wider facial contour in the frontal view compared with the groundtruth facial shape because selfocclusion error occurred, as shown in Figure 19. Figure 18 showed that the proposed method remarkably improved the reconstruction performance compared with the previous method, even with the practical data (Test data 2).
Finally, we present the reconstruction results for realworld images from the FacePix database [36] and the CASPEALR1 database [37]. Figure 20 shows individual results for five subjects in the Face Pix database, where the rows indicate different subjects and the seventh column shows the test images. The first and fourth columns show different views of the original image. The a and c columns show the results with the previous method, whereas the b and d columns show the results with the proposed method. Figure 21 shows the results for five subjects in the CASPEALR1 database. From these figures, we can find that:

1)
With highly rotated face images, the proposed method provided reconstructed facial shapes in the frontal and profile views that were closer to the groundtruth shape than the previous method were. In particular, the previous method had a wider facial contour in the frontal view compared with the groundtruth facial shape.

2)
With almost frontal face images, the performance of the proposed method was similar to that of the previous method because selfocclusion error was very small, as shown in the fourth and fifth rows of Figure 20 and in the third, fourth, and fifth rows of Figure 21.
The proposed 3D face reconstruction required about 0.1 s per test image using our proposed algorithm based on an alternation methodology and about 3.6 s per test image using the algorithm based on a onestep methodology. The computation times were measured on an Intel Core i5 CPU 750, 2.7 GHz, 3 GB RAM machine.
Conclusions
We analyzed the selfocclusion problem that occurs in S3DMMbased 3D face reconstruction and proposed a method for solving this problem. Our main contributions are summarized as follows

The 3D model fitting scheme of S3DMM was modified to make it suitable for 3D face reconstruction based on visible FFPs. The reconstruction accuracy of the proposed method was improved greatly compared with the original S3DMMbased method by using the visible FFPs having no selfocclusion errors.

To exclude selfoccluded FFPs in the 3D model fitting process, selfoccluded FFPs were separated automatically from visible FFPs using a pose estimation method based on a cylindrical head model and an index table of visible FFPs.

The reconstruction performance was enhanced by using the estimated pose as the initial pose parameter during the 3D model fitting process.
Since the proposed method can automatically reconstruct a 3D face from an arbitraryview image, it can be applied for to a variety of useful applications, such as 3D game and animation character generation, and 2D frontal face generation from a sideview face image which can be used for monitoring suspects in surveillance cameras.
In future works, we would research about the 3D face reconstruction method robust to the detection error of FFP by combining the information of FFP and facial texture. In addition, we would develop a more accurate 3D face reconstruction method for a wide range of pose variations and study a new method to solve the selfocclusion problem without eliminating occluded FFPs.
Abbreviations
 SFS:

shapefromshading
 PCA:

principal component analysis
 3DMM:

3D morphable model
 S3DMM:

simplified 3D morphable model
 FFPs:

facial feature points
 TPS:

Thin Plate Spline
 AAMs:

Active Appearance Models
 RMSE:

root mean squared error
 MAE:

mean absolute error.
References
 1.
Parke FI: Proceedings of ACM National Conference. Computer generated animation of faces 1972.
 2.
Parke FI: A parametric model of human faces. University of Utah, Salt Lake City; 1974. PhD thesis
 3.
Blanz V, Vetter T: Face recognition based on fitting a 3D morphable model. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25(9):10631074. 10.1109/TPAMI.2003.1227983
 4.
Zhang X, Gao Y: Face recognition across pose: a review. Pattern Recognit. 2009, 42(11):28762896. 10.1016/j.patcog.2009.04.017
 5.
Park U, Tong Y, Jain AK Proceedings of International Conference on Automatic Face and Gesture Recognition. Face recognition with temporal invariance: a 3D aging model 2008, 17.
 6.
Park U, Tong Y, Jain AK: Age invariant face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32(5):947954.
 7.
Example of game character generation. Accessed 2 January 2012 http://fightnight.easports.com/featureFrame.action?id=FeatureTBA1&fType=video
 8.
Maejima A, Wemler S, Machida T, Takebayashi M, Morishima S: Instant casting movie theater: the future cast system. IEICE Transactions on Information and Systems 2008, E91D(4):11351148. 10.1093/ietisy/e91d.4.1135
 9.
Gökberk B, Salah AA, Alyüz N, Akarun L: Handbook of Remote Biometrics for Surveillance and Security. In 3D face recognition: technology and applications. Springer; 2009:217246.
 10.
Zhang R, Tsai P, Cryer JE, Shah M: Shape from shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 1999, 21(8):690706. 10.1109/34.784284
 11.
Dobgard R, Basri R: Statistical symmetric shape from shading for 3D structure recovery of faces. LNCS on European Conference on Computer Vision 2004, 3022: 99113.
 12.
Ahmed A, Farag A, Starr T Proceedings of International Conference on Image Processing. A new symmetric shape from shading algorithm with an application to 3D face reconstruction 2008, 201204.
 13.
Castelan M, Smith WAP, Hancock ER: A coupled statistical model for face shape recovery from brightness images. IEEE Trans. Image Process. 2007, 16(4):11391151.
 14.
Castelan M, Horebeek JV: Relating intensities with threedimensional facial shape using partial least squares. IET Computer vision 2009, 3(2):6073. 10.1049/ietcvi.2008.0060
 15.
Reiter M, Donner R, Langs G, Bischof H Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06). 3D and infrared face reconstruction from RGB data using canonical correlation analysis 2006, 425428.
 16.
Lei Z, Bai Q, He R, Li SZ Proceedings of IEEE Conference on Computer vision and Pattern Recognition (CVPR 2008). Face shape recovery from a single image using CCA mapping between tensor spaces 2008, 17. June
 17.
Blanz V, Vetter T Proceedings of SIGGRAPH. A morphable model for the synthesis of 3D faces 1999, 187194.
 18.
Jiang D, Hu Y, Yan S, Zhang L, Zhang H, Gao W: Efficient 3D reconstruction for face recognition. Pattern Recognit. 2005, 38(6):787798. 10.1016/j.patcog.2004.11.004
 19.
Wang S, Lai S: Efficient 3D face reconstruction from a single 2D image by combining statistical and geometrical information. LNCS on Asian Conference on Computer Vision 2006, 3852: 427436.
 20.
Wang C, Yan S, Li H, Zhang H, Li M: Automatic, effective, and efficient 3D face reconstruction from arbitrary view image. LNCS on Advances in Multimedia Information Processing – PCM 2004, 3332: 553560. October 10.1007/9783540305422_68
 21.
Cootes TF, Edwards GJ, Taylor CJ: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23(6):681685. 10.1109/34.927467
 22.
Matthews I, Baker S: Active appearance models revisited. Int. J. Comput. Vision 2004, 60(2):135164.
 23.
Ohue K, Yamada Y, Uozumi S, Tokoro S, Hattori A, Hayashi T: Development of a new precrash safety system. SAE World Congress 2006. April
 24.
Park U, Jain AK Proceedings of the 3rd Canadian Conference on Computer and Robot Vision. 3D face reconstruction from stereo video 2006, 41. June
 25.
Bookstein FL: Principal warps: thinplate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11(6):567585. 10.1109/34.24792
 26.
Watta P, Gandhi N, Lakshmanan S Proceedings of IEEE Conference Intelligent Transportation Systems. An eigenface approach for estimating driver pose 2000, 376381.
 27.
Lakshmanan S, Watta P, Hou YL, Gandhi N Proceedings of IEEE Conference Intelligent Transportation Systems. Comparison between eigenfaces and fisherfaces for estimating driver pose 2001, 889894.
 28.
Watta P, Lakshmanan S, Hou YL: Nonparametric approaches for estimating driver pose. IEEE Trans. Veh. Technol. 2007, 56(4):20282041.
 29.
Wu J, Trivedi MM: A twostage head pose estimation framework and evaluation. Pattern Recognit. 2008, 41(3):11381158. 10.1016/j.patcog.2007.07.017
 30.
MurphyChutorian E, Doshi A, Trivedi MM Proceedings of IEEE Conference Intelligent Transportation Systems. Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation 2007, 709714.
 31.
MurphyChutorian E, Trivedi MM: Head pose estimation in computer vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31(4):607626.
 32.
Baker S, Gross R, Matthews I: LucasKanade 20 years on: a unifying framework: part 3, (Technical Report CMURITR0335). Carnegie Mellon University, Robotics Institute; 2003.
 33.
Horn RA, Johnson CR Cambridge University Press, Matrix Analysis; 1985.
 34.
3D Scanner specification. Accessed 2 January 2012 http://www.cyberware.com/products/pdf/headFace.pdf
 35.
Boggs T, Tolle JW: Sequential quadratic programming. Acta Nemerica. 1996, 1: 151.
 36.
Black J, Gargesha M, Kahol K, Kuchi P, Panchanathan S: A framework for performance evaluation of face recognition algorithms. ITCOM, Internet Multimedia Systems II, Boston; 2002.
 37.
Gao W, Cao B, Shan S, Chen X, Zhou D, Zhang X, Zhao D: The CASPEAL LargeScale Chinese Face Database and Baseline Evaluations. IEEE Trans. SMCPart A: Systems and Humans 2008, 38(1):149161.
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2012–0005223).
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Lee, Y.J., Lee, S.J., Park, K.R. et al. Single viewbased 3D face reconstruction robust to selfocclusion. EURASIP J. Adv. Signal Process. 2012, 176 (2012). https://doi.org/10.1186/168761802012176
Received:
Accepted:
Published:
Keywords
 3D face reconstruction
 Single image
 Selfocclusion
 Arbitrary view
 3D model fitting