Open Access

Subspace-Based Holistic Registration for Low-Resolution Facial Images

EURASIP Journal on Advances in Signal Processing20102010:591412

https://doi.org/10.1155/2010/591412

Received: 9 December 2009

Accepted: 14 July 2010

Published: 29 July 2010

Abstract

Subspace-based holistic registration is introduced as an alternative to landmark-based face registration, which has a poor performance on low-resolution images, as obtained in camera surveillance applications. The proposed registration method finds the alignment by maximizing the similarity score between a probe and a gallery image. We use a novel probabilistic framework for both user-independent as well as user-specific face registration. The similarity is calculated using the probability that the face image is correctly aligned in a face subspace, but additionally we take the probability into account that the face is misaligned based on the residual error in the dimensions perpendicular to the face subspace. We perform extensive experiments on the FRGCv2 database to evaluate the impact that the face registration methods have on face recognition. Subspace-based holistic registration on low-resolution images can improve face recognition in comparison with landmark-based registration on high-resolution images. The performance of the tested face recognition methods after subspace-based holistic registration on a low-resolution version of the FRGC database is similar to that after manual registration.

1. Introduction

Face recognition in the context of camera surveillance is still a challenging problem. For reliable face recognition, it is crucial that an acquired facial image is registered to a reference coordinate system. Most conventional registration methods are based on landmarks. To locate these landmarks accurately, high-resolution images are needed. For those methods, it is problematic to register low resolution facial images as obtained in video surveillance. In the Face Recognition Vendor Test [1], low-resolution face images are defined to contain an interocular distance of 75 pixels, we used even lower resolutions with interocular distances of 50 pixels and lower. High-resolution face images have an interocular distance of more than 100 pixels. Face registration on low-resolution images is in these cases often omitted and the region found by the face detection is directly used for face recognition [2, 3]. In our opinion, accurate face registration can contribute to better recognition performance on low-resolution images. Therefore, we developed a Subspace-based Holistic Registration (SHR) method, which uses the entire face region to correct for translation, rotation, and scale transformation of the face, which enables us to accurately register low-resolution facial images. The face registration is performed after a frontal face detector, which detects a face at a certain scale and rotation variations, limiting the search for the final registration parameters.

As already pointed out above, registration methods can be divided into two categories: landmark-based registration, using landmarks to register the face image, and holistic registration, using the entire image for registration. Of the latter only a few methods have been reported.

In the first category, the object detection method of Viola and Jones [4], originally proposed for face detection, is a popular approach to locating landmarks [57]. The advantages of this method are that it is fast and robust in comparison with other landmark methods. Many papers report good results especially in uncontrolled scenarios. However, occasionally landmarks are not found by this method. In [8], a probabilistic approach using Principal Component Analysis (PCA) is used to locate the landmarks. Subspace methods for facial feature detection are also used in [911]. Some landmarking techniques are not only based on texture, but also use geometric relations between landmarks, for instance [1215]. These methods usually require more landmarks and high-resolution facial images. A well-known example of such a method is Elastic Bunch Graphs [12]. Elastic Bunch Graphs are used to determine the relation between different landmarks. The relation between the landmarks and the scores of Gabor Jets are combined to register and recognize the face. Active Shape Models [16] and Active Appearance Models [17] can also be used to perform a fine registration of a face, by using both texture and the relation between the landmarks. Both methods, however, need a good initialization to find an accurate registration, which can be provided by, for instance, the Viola and Jones landmark finding method.

In the second category of registration, there are correlation-based registration methods that are invariant to translation. The MACE filter originally described in [18] and used in face recognition in [19, 20], is invariant for translations. In [21], a face registration method using super resolution is described that performs correlation to compare the original image with a reconstructed image obtained using super resolution, correcting for translation and scale variations. The method described in [22, 23] is a correlation-based method that finds a rigid transformation to align the facial images, which is done using robust correlation to a user template.

Another way of evaluating the registration quality is by using the similarity score determined by a face recognition algorithm. In [24], the manually labelled eye coordinates are used as a starting point from which the eye coordinates are varied to obtain different registrations. The registration that resulted in the best similarity score is selected. This experiment was performed using several different face recognition algorithms. In [25], we performed a similar experiment and in addition showed that small changes in the registration parameters can have a huge effect on the similarity scores of face recognition algorithms. In [26, 27], we proposed a matching score-based face registration approach, which searches for the optimal alignment by maximizing the similarity score of several holistic face recognition algorithms, for example, PCA Mahalanobis distance. In [28], the PCA Mahalanobis distance is used to find the registration parameters for low-resolution images using a different search strategy as in [27], where the focus of the paper is face hallucination. In [29], this face registration method is extended especially for the purpose of face hallucination. We performed no experiment using face hallucination, because our focus is on face registration and its effect on the recognition. In this paper, we extended the work in [26, 27], by developing Subspace-based Holistic Registration (SHR) method. The novelty of this method is that we use a probabilistic framework designed to evaluate the registration of faces, instead of maximizing the score of a face recognition method, which might not be suited for comparing unregistered face images.

2. Face Registration Method

2.1. Subspace-Based Holistic Registration

Face registration is performed to correct for variations that occur when the face region is selected from an image. We assume that the face detection obtains frontal faces from a camera, and that we have to correct for in-plane rotations of these faces. The exact positions of the camera and the face are usually unknown, making a correction for scale and translation necessary as well. A Procrustes transformation denoted by corrects for these variations, allowing us to scale by a factor , rotate with an angle , and translate over a vector an image. The optimal face registration is assumed to be found if there is a maximum similarity between the transformed input image (probe image), and the gallery images. In SHR, we try to find the best registration parameters , by maximizing a similarity function . Here denotes the probe image, which is transformed by , denotes a registered reference object (gallery image) and denotes a model of the reference object (faces). The equation for finding the best registration parameters is
(1)
An important issue is how to measure the similarity between probe and gallery image. In our previous work, we used similarity scores from well-known face recognition algorithms for this purpose. However, these scores are usually optimal for face recognition, measuring the similarity between faces of different individuals in a face space. In this paper, we argue that the correct quantifier for the face registration should also include the probability that the face might be misaligned, measuring also the error outside face space. We thus use the probability that the aligned image belongs to the object class of the gallery image . Let be an operator that vectorizes the features in and using a set of predefined locations in the images. We adopt a Gaussian model of which is the mean and the covariance matrix
(2)
Our goal is to optimize as function of the registration parameters . For notational compactness, we define and and
(3)
The training samples to determine both the mean and covariance matrix are correctly aligned images. Notice that needs to be a registered image in order to find the registration parameters for . The exact estimation of the covariance matrix is not possible with a limited number of training samples. As a consequence, the estimate of is often singular, so that cannot be computed, and even if can be calculated, the results will be inaccurate. Furthermore, the computational costs of evaluating (3) are large, due to the high dimensionality of and . For these reasons, we use Principal Component Analysis (PCA) to reduce the dimensionality. We obtain a subspace by solving the eigenvalue problem:
(4)
where are the eigenvalues and are the eigenvectors of the covariance matrix . We can obtain a reduced feature vector , where . The principal subspace , which reduces the feature vector from to dimensions, has an orthogonal complement , which contains the variations that are not modelled by PCA. Using only similarities in the principal subspace, as in our previous work [27], results in the Mahalanobis distance. However, if we optimize the alignment only for the principal subspace , we might walk further away in the orthogonal complement , ignoring details not included in our model but which indeed might be important for the registration. To overcome this problem, we use a distance measure, proposed in [8].
(5)
(6)

where are the eigenvalues in and which is the average eigenvalue in . This distance measure consist of two parts, the first is called "distance-in-feature-space" (DIFS) and the second is called "distance-from-feature-space" (DFFS). In our experiments, we compare the results of using only DIFS for face registration, which is used in [27, 28], and using both DIFS and DFFS (see Section 4.1). We show that using both distances result in a better performance than using DIFS.

In Figure 1, we give a schematic representation of the components needed for SHR and the interaction between them. We use an iterative search method to find the optimal similarity between probe image and gallery images. The initial registration parameters are given by a face detection algorithm, for instance the method of Viola and Jones [4]. The alignment registers the probe image based on the specified parameters. We will discuss the components in Figure 1 in the following sections: evaluation (Section 2.2), the alignment (Section 2.3), and the search methods (Section 2.4).
Figure 1

Schematic representation of SHR.

2.2. Evaluation

Two important issues in the evaluation function are the model and the features. The model can be either user independent as explained in the previous section or user specific. This we will discuss in the first paragraph below. As features, we propose edge images, instead of grey level images, which reduce the number of local minima in the evaluation. This will be explained in the second paragraph.

2.2.1. Evaluation to a User Specific Face Model

Instead of registration to a mean face model, which may differ substantially from individual faces, registration to a user-specific model, if available may improve registration results. For user-specific face registration, we need a user template to register a probe image. For face identification, user-specific registration has the drawback that we have to register the probe to every user template in the database.

For user-specific registration, we define the similarity measure , where models registered facial images of user . The user-specific model consists of a user template and the covariance matrix . For the covariance matrix , we use a within-class covariance matrix that models the variations among face images of the same person for all users, because we often do not have enough images to estimate a user-specific covariance matrix. The similarity function for the user-specific model is
(7)

2.2.2. Using Edge Images to Avoid Local Minima

Using grey level images for registration often leads to local minima in the search space. Better registration results can be obtained by using edge images, which is for instance shown in [30] for Active Appearance Models. In image registration, regions containing large variations (structure) contribute more to registration than homogeneous regions. By applying edge filters, the regions that contain structure will be highlighted, and the homogeneous regions will be suppressed. In our case, the use of edge filters results in a search space with fewer local minima. In Figure 2, a 2D search space is shown where we varied the scale and translation in -direction of a grey level image and an edge image. The edge image (right) shows a single clear minimum, while the grey level image has a global minimum at the same place, but also a large local minimum in the right corner.
Figure 2

A 2D search space based on the grey level image (a) and edge image (b), for scale (a-b) and translation in direction (front-back), showing a local minimum in the left score landscape.

In order to calculate the edges in the image, we take the derivatives in the and directions in the images. Because images usually contain noise, we use the Gaussian kernels and :
(8)
The derivatives and of the images are calculated by convolution. We refer to these as "edge images". If we use both edge images in the feature vector instead of the grey level image, this doubles the length of the feature vector, resulting in increased computation time. An alternative is to combine the two edge images as follows into a "magnitude image":
(9)

The default features used in this paper are the "edge images", and a comparison between the features is performed in Section 4.1.

2.3. Alignment

We use a Procrustes transformation to align the probe image to the gallery images, which is common practice in face recognition, preserving the distance ratios. Given the pixel location , we can define a transformation on the pixel location as follows:
(10)
is the rotation matrix. The transformation of the image is defined as
(11)

This allows us to obtain an aligned image by backward mapping and interpolation. Most landmark-based methods also perform this transformation based on the found landmarks in order to obtain a registered face image [13].

2.4. Search Methods

In (1), we have to maximize the similarity score to find the best alignment parameters . Ideally, an iterative search method should be able to find the optimal solution using a small number of evaluations, making it possible to register the probe image almost real time. The search method also has to be robust against local minima. Confirmed by our observations, we assume reasonably smooth search landscapes. We applied two different search methods the first is the downhill simplex method [31] that we also used in [26, 27], and the second is a gradient-based method.

2.4.1. Downhill Simplex Search Method

This method is able to maximize a similarity function using around 100 evaluations. A good initialization of the downhill simplex method is necessary to be robust against local minima. This was also observed in [27], where we used several initializations to reduce outliers. To initialize the downhill simplex method, we need to create a simplex (geometric shape in dimensions, consisting of points). To obtain the four registration parameters, this means that we have to select five starting points. The first starting point is given by the initial parameter vector . The other starting points are given by
(12)

where is the maximum expected offset for a single registration parameter in positive or negative direction, where we use the offset which gives the best similarity score. The downhill simplex methods is however able to find optimal registration parameters that lay outside the maximum expected offsets. This search method maximizes the similarity function by replacing those registration parameters in the simplex that gives the worst similarity score by a better set using some simple heuristics.

2.4.2. Gradient-Based Search Method

In (1), we find the best alignment parameters by maximizing the similarity score. We start with the initial registration parameters ; improving these parameters means that we have to determine an offset to the optimal alignment called [32, 33]. We achieve this by expanding the image using a first-order Taylor expansion:
(13)
In this case, is the Jacobian matrix of with respect to the parameters , given in [32] for a transformation with translation, rotation, and scale. By setting the derivative of (2) with respect to to zero, we can determine the offset from the original parameters:
(14)

In the appendix, it is shown how this equation is solved and how updated parameters are obtained analytically. This procedure is repeated until convergence has been reached.

3. Experiments

In this section, we describe experiments to evaluate the performance of SHR. The main purpose of SHR is to improve the face recognition performance, particularly at low-resolutions. The goal of the experiments, therefore, is to demonstrate and quantify the improvement of face recognition performance if SHR is used for face registration. We will present results of the following comparisons:
  1. (i)

    Comparison with earlier versions of SHR [27]. These experiments are included to illustrate the positive effect of the new evaluation criteria given in (6) and of the features discussed in Section 2.2.2;

     
  2. (ii)

    Comparison with landmark-based registration based on automatically detected landmarks as well as on manual landmarks;

     
  3. (iii)

    Comparison between user-independent and user-specific registration;

     
  4. (iv)

    Comparison between two search methods (Section 2.4) in both performance and computation time;

     
  5. (v)

    Comparison of SHR performed on lower resolutions.

     

3.1. Experimental Setup

3.1.1. Face Database

To perform the experiments, we use the Face Recognition Grand Challenge version 2 (FRGCv2) database [34], on which we perform the one-to-one controlled versus controlled experiments. We train both face registration (landmark methods and SHR) and face recognition methods on the training set defined in the FRGCv2. We calculated all the similarity scores, which resulted in the Receiver Operating Characteristic (ROC) of the entire set and the ROC of the three masks defined by the FRGCv2 database. Mask I compares the images that are recorded within a semester, for Mask II this is within a year, while Mask III compares images that are recorded between semesters. To compare the different settings of SHR, we use a random subset to reduce computational costs of the face recognition. We still register every gallery and probe image but instead of computing all the scores, we calculate for every probe image one genuine and one impostor score from a randomly chosen image in the gallery. The same random images are used for all the experiments. We show in Table 1, that the recognition results of the random subset are comparable to the results on the entire set.
Table 1

The verification rates at FAR 0.1% of several Face Recognition Methods which allow us to compare the registration methods, these verification rates are achieved using manually registered images.

 

Mask I

Mask II

Mask III

Entire Set

Random Subset

PCA Mah

54.0%

48.8%

42.9%

50.3%

52.2%

PCA MahCos

72.4%

67.2%

61.8%

68.2%

69.8%

Adaboost

91.4%

88.3%

84.9%

88.9%

89.5%

PCA LDA

92.1%

90.4%

88.6%

90.8%

91.0%

3.1.2. Face Detection

Face registration depends on the input of a Face Detection method. We used the OpenCV implementation [35, 36] of the Viola and Jones algorithm [4] to find the faces. We used the pretrained model called "haarcascade_frontalface_default.xml". In order to avoid misdetections, we included some simple heuristics based on the manually labelled landmarks to determine if the face regions were correctly found. All landmarks have to be inside the face region and the width and height of this region is less than four times the distance between the eyes. Facial images in which the face is not correctly found are removed from all experiments.

3.1.3. Low Resolution

SHR is developed for low-resolution images. Because there are no large low-resolution face databases, we used the FRGCv2 database and created low-resolution facial images by low-pass filtering and subsequent downsampling. Using low-resolution facial images makes the comparison of the performance of our face recognition methods with the state of the art difficult, because these are primarily focussed on high-resolution facial images. Also, landmark-based registration methods work poorly on these resolutions. For this reason, we performed the landmark finding on high-resolutions images, thus given them an advantage over SHR.

3.1.4. Face Recognition

We measured the performance of face registration by its effect on face recognition. In [37], a similar comparison is performed on the FRGC database, where the baseline PCA and PCA-LDA face recognition methods are used. We decided to use not only holistic but also feature-based methods, in order to demonstrate that different face recognition methods benefit from improved registration. We used our own implementation of the following face recognition methods:
  1. (i)

    PCA Mahalanobis distance (baseline) [38],

     
  2. (ii)

    PCA Mahalanobis Cosine distance [38],

     
  3. (iii)

    Adaboost with Local Binary Patterns (LBP) [39],

     
  4. (iv)

    PCA LDA likelihood ratio [40].

     
In Table 1, we show the face recognition results with an interocular distance (distance between centers of the eyes) of 50 pixels using registration with manually labelled landmarks, showing the capacity of the face recognition methods if the registration is almost perfect. This is confirmed by [37], where their registration method is not able to perform better than manually registered images. From the results in Table 1, we observe that of the selected face classifiers, the PCA-LDA likelihood ratio performs best, closely followed by Adaboost with LBP. SHR is developed for low-resolution images using an interocular distance of 50 pixels instead of the available 350 pixels, this makes comparison with other results published on these databases difficult. In Figure 3, we attempt to show the relation between resolution and verification rate. Below approximately 50 pixel interocular distance, we expect that the verification rate decreases rapidly. At least part of this decrease is caused by failing registration at low-resolutions, which we address in this paper. The area of interest for camera surveillance is the shadowed area in Figure 3 and the stars mark the published results. In [1], an experiment is performed on a low-resolution database called HCInt portion of the FRVT 2002 (not available to us), which uses an interocular distance of 75 pixels. The best verification rate reported on the HCInt portion are 95% at FAR 0.1% for gallery normalized experiments. Our best face recognition method gave a verification rate of 91% at FAR 0.1% for an interocular distance of 50 pixels with a one-to-one experiment, which is more difficult than a gallery normalized experiment. This matches the expectations we have of good results that can be obtained using face recognition on facial images with an interocular distance of 50 pixels. In [41], a verification rate of 67% at FAR 0.1% was reported for the PCA Mahalanobis distance classifier on the high-resolution experiments. For the same classifier, we obtained a verification rate of 50.3% at FAR 0.1% for an interocular distance of 50 pixels. This once again illustrates the drop in verification rates for low-resolutions.
Figure 3

Best verification rates reported during the FRVT 2006. Our focus is at even lower resolutions (grayed area) expecting a slightly lower verification rate.

3.1.5. Landmark Methods for Comparison

We compared SHR to two landmark registration methods. The first method is the Viola and Jones detector [4] trained to find facial landmarks. The second method is called MLLL (Most Likely Landmark Locator) [10], which finds the landmarks by maximizing the likelihood ratio using PCA and LDA. This algorithm is run in combination with BILBO, which is a subspace-based method to correct for outliers. We have trained both methods on the FRGCv2 database and evaluated them using high-resolution images. Both the Viola-Jones and MLLL + BILBO find four landmarks (eyes, nose, and mouth). Based on the found landmarks, we calculate the Procrustes transformation to align the images.

3.2. Experimental Settings

In this section, we introduce the default experimental settings, unless other setting are explicitly mentioned, these settings are used in the experiments. We use the user-independent registration, with edge images as features and the downhill simplex search method to find the registration parameters. The number of subspace components is set to 300, which is a good compromise between speed and accuracy. For the edge images, we use kernels of pixels with , which, according to our observations, gives good results on several databases. The maximum expected offsets for scale, rotation and translation needed to create the initial simplex are respectively 0.2, 5 degrees and 5 pixels. The downhill simplex method can also find the optimal registration parameters outside the maximum expected offsets. The gradient-based search method is not limited in the registration parameter search either. In the case of user-independent registration, both gallery image and probe image are registered to the same user independent registration template (depicted in Figure 4). The registration template is the mean face obtained from the training set. For user-specific registration, we register to a single gallery image. Our subspace model is based on registered facial images, therefore, we need a correctly registered template. Furthermore, face recognition methods assume that both gallery and probe images are correctly registered, making proper registration of the gallery image important for user-specific registration. To obtain a registered gallery image, we perform the user-independent registration with the mean face as registration template (see Figure 5). Although in our experiments we use a single image as registration template, it is also possible to use multiple images to build a user-specific template. In this case, registration among gallery images can also be applied to improve the accuracy of the alignment of the gallery images.
Figure 4

Schematic representation of user-independent registration using the same template for the gallery and probe image.

Figure 5

Schematic representation of user-specific registration, where the template is an automatically registered gallery image.

4. Results

4.1. Comparison with Earlier Work

In Sections 2.1 and 2.2.2, we introduce a new evaluation criterion instead of the PCA Mahalanobis distance [27, 28] and new edge features for registration. In this section, we compare the effects of these changes separately. Figure 6 shows the effects which the new evaluation criteria (Bayesian Framework) and the new features have on the face recognition results, which are depicted using a ROC. After performing the registration with the different settings, we used the PCA-LDA likelihood ratio method for the recognition. In Figure 6, the ROC of the Bayesian Framework (grey values) shows that for % the verification rate decreases quickly, and for % the distance to the Bayesian Framework (edge images) remains constant. This behaviour is caused by incorrect registration, due to local minima in the search space, an example was shown in Figure 2. Comparing the performance of the Bayesian Framework (edge images) to the Bayesian Framework (magnitude images), we observe that edge images are slightly better. For this reason, we use the edge images in the remaining part of the paper. In Figure 6, we also show that the verification rate of the PCA Mahalanobis (edge images) distance drops rapidly to 98% when FAR decreases from 100%. This is caused by failures to find a correct registration. Figure 6 shows that the Bayesian Framework (edge images) containing the Distance From Features Space has made SHR more robust against these failures, resulting in a higher overall recognition performance.
Figure 6

Comparing the effects of our new evaluation criteria and new features, this shows that the Bayesian Framework with edge images achieves the best results.

4.2. Subspace-Based Holistic Registration versus Landmark-Based Face Registration

In this experiment, we registered every face image using two landmark-based face registration methods, SHR (user-independent face model) and the manually labelled landmark given by FRGCv2 database. For each face recognition method, we had to train the recognition methods on face images, which were registered by the specific registration method. This made the recognition method more robust against the specific variations. For SHR, we used the manual registration of the training set to train the face recognition methods. The results of our face recognition experiments using PCA-LDA likelihood ratio face recognition method are shown in Figure 7. Note that these results are obtained for verification at 50 pixels interocular distance. Our focus is on the registration, which means that the relative results to manual registration are important. Other papers on face registration like [10, 37] do not achieve better recognition results than manual registration on the FRGC. In Figure 7, we observe that the performance of SHR is better than manual registration at %. SHR also outperformed the automatic landmark-based registration algorithms, which used high-resolution images to obtain a registration. In Figure 7, the best landmark-based registration method is MLLL + BILBO, which performed better than the Viola-Jones landmark method. In the case of the Viola-Jones landmark method, we removed 997 of the 15982 images from the query set of [4, Experiment ], because 3 or less landmarks where found in these images which often resulted in poor alignments. We also experimented with the Viola-Jones method at an interocular distance of 50 pixels. In this case it failed to find the 4 landmarks for 10734 of the 15982 face images. In Table 2, we present the verification rates of all registration methods and the gain or loss in the recognition results by using automatic face registration methods instead of the manual face registration. Again all face recognition results were obtained at 50 pixels interocular distance. We observe that SHR improved the performance of all the face recognition methods in comparison with automatic landmark registration, which indicates that it is not dependent on the choice of the face recognition method. Some face recognition methods seem to be more robust against registration variations, for example Adaboost, but still more accurate registration improves the final recognition performance. In Table 2, the performance of the user-independent SHR is for most recognition methods similar or better than manual registration. To understand why SHR sometimes performs better than manually registered images, we first determined the difference in found registration parameters between manual and automatic registration, which is shown in Figure 8. We observe that the results of MLLL + BILBO, which find landmarks very accurately, are closer to manual landmarks in scale and -translation. Both SHR and MLLL + BILBO have similar results in rotation and -translation, but SHR finds different scale and -translations. In Figure 9, a few examples of facial images with large differences in scale and -translation between registration with manual landmark (third column) and SHR user independent (fourth column) are shown, together with the input for the registration determined by the face detection of the probe image (first column) and gallery image (second column). The white marks on the face are the manually labelled landmark locations. We pictured half of the registered probe image (left) and the other half registered gallery image (right) to show the alignment between the images. In the first row of Figure 9, we show a probe image with the head tilted up and a gallery image without tilt, because of the tilt of the head the relative positions of the landmarks change. We observe that the eyes, nose, and mouth in the probe, and gallery image are on almost the same line using manual registration, but there is a big difference in scale. On the other hand, SHR aligned both images on the same scale, this places the nose of the probe image higher but gives a better match with the mouth. In the second and third row, a slightly different definition of the landmark location is used (especially the nose), resulting in misalignments for manual registration, where the two halves in the third column do not overlap in the nose and mouth regions because of scaling differences. Another difficulty in the third images are the landmark locations of closed eyes, which is done correctly in this case, positioning the eyes somewhat above the closed eyebrows, but this is often not the case. In the last column of Figure 9, we observe that expressions can also change the ratio between landmark especially in the mouth area. The nose in the probe image is located higher than the nose in the gallery image using manual registration.
Table 2

Verification rate at and in parenthesis the relative contribution that automatic registration has in comparison with manual registration on FRGC [4, Experiment ], comparing all registration methods using all face classifiers. The best automatic registration is achieved using user-independent SHR using low-resolutions, this often performs even better than manual registration.

Face Classifier

FAR

Viola-Jones (high resolution)

MLLL + BILBO (high resolution)

SHR (low-resolution)

Manual

PCA Mah

1%

57.3% ( 8.9%)

67.4% (+1.3%)

68.1% (+2.0%)

66.2%

 

0.1%

44.5% ( 5.8%)

52.9% (+2.6%)

54.0% (+3.3%)

50.3%

 

0.01%

34.0% ( 3.4%)

40.9% (+3.4%)

42.2% (+4.7%)

37.5%

PCA MahCos

1%

73.2% ( 13.8%)

85.2% ( 1.7%)

87.9% (+2.0%)

87.0%

 

0.1%

57.4% ( 10.8%)

68.2% ( 0.0%)

71.9% (+3.3%)

68.2%

 

0.01%

39.7% ( 9.3%)

47.4% (+4.1%)

50.7% (+4.7%)

43.3%

Likelihood ratio

1%

86.7% ( 10.1%)

94.0% ( 2.8%)

95.9% ( 0.9%)

96.8%

 

0.1%

76.9% ( 13.9%)

86.9% ( 4.7%)

91.0% (+0.2%)

90.8%

 

0.01%

65.5% ( 14.8%)

77.2% ( 3.1%)

82.5% (+2.2%)

80.3%

Adaboost

1%

86.5% ( 8.4%)

93.5% ( 1.4%)

94.1% ( 0.8%)

95.0%

 

0.1%

78.3% ( 10.5%)

87.1% ( 1.7%)

87.9% ( 1.0%)

88.9%

 

0.01%

69.8% ( 11.1%)

78.9% ( 2.0%)

80.1% ( 0.8%)

80.9%

Figure 7

Comparison of face recognition (PCA-LDA likelihood ratio) with several registration methods on FRGC [4, Experiment ] using the entire set. SHR outperforms the results of face recognition with landmark based methods.

Figure 8

Cumulative differences of registration parameters compared with manual registration, showing that MLLL + BILBO produces very accurate landmarks and that SHR and Manual differ especially in scale and translation in -direction.

Figure 9

Examples of registration, the first and second column contain the face detection regions of probe and gallery images together with the manual landmarks. The third column, we present half of the probe image and other half of gallery image to compare the final alignment of manual registration. For the fourth column, we performed the same procedure as in the third column but with user-independent SHR.

4.3. User Independent versus User Specific

In this section, we compare user-specific registration to the user-independent registration. In Figures 4 and 5, we show the two scenarios to obtain the user-independent and user-specific templates. In Figure 10, we show ROCs of the user-independent and user-specific face registration using the edge images. We observe that the performance consistently improves by using user-specific registration. Figure 10 also shows that user-specific registration performs slightly better than manual registration, which indicates that SHR gives more stable registration than the landmarks located by humans.
Figure 10

Comparison of user-independent and user-specific face registration. User specific-registration obtains better results than user-independent registration and manual registration.

4.4. Comparing Search Algorithms

The two search methods, described in Section 2.4, were compared using a similar experiment as performed in the previous section. In all other experiments, the downhill simplex search method is used. It costs our matlab implementation on AMD opteron 275 around the 2.7 seconds to perform a registration for a single image, while the obtained matlab implementation of MLLL + BILBO [10] takes around 7 seconds for a single image. The Viola and Jones landmark implementation in C++ performs almost real-time registration. Note that we spent not much effort in optimizing our code, because our main focus is on improving the accuracy. However, we can imagine that computation time in practical scenarios can be an issue. For this reason, we show a tradeoff between computation time measured in the number of iteration and accuracy measured in the verification rate, see Figure 11. Although the average search time of the gradient-based method is larger, Figure 11 shows that it is able to find a good solution within a smaller number of iterations. This makes the difference between both search method in computation and accuracy very small.
Figure 11

Comparison of the search algorithms showing the verification rates of the likelihood ratio at different number of iterations. It takes the gradient-based method 3 times more computation time to calculate the same number of iterations as the downhill simplex method.

4.5. Lower Resolutions

In video surveillance, the resolution of the facial images is often below the interocular distance of 50 pixels used in previous section. To simulate this, we downsampled the images even more. In this section, we ran experiments using several lower resolutions to test the performance of SHR. After finding the alignment parameters for these resolutions, we use the alignment to register the facial images using an interocular distance of 50 pixels. This allows us to show the effects of low-resolution on the registration, while ignoring the effects of low-resolution on the face recognition.

In Figure 12, we show the results on user-independent registration for all the face recognition methods. We expect that registration performance decreases for lower resolutions. The registration results start becoming worse at an interocular distance smaller than 25 pixels. Some methods like Adaboost are less sensitive for the registration errors caused by the lower resolutions than for instance PCA-LDA likelihood ratio.
Figure 12

Registration performance by varying the resolution used in SHR, the found registration parameters are then used to align facial images with an interocular distance of 50 pixels, showing only the performance of SHR at low-resolution, which is still good at an an interocular distance of 25 pixels.

5. Conclusion

We presented a novel subspace-base holistic registration (SHR) method, which is developed to perform registration on low-resolution face images. In contrast to most landmark-based registration methods, which can only perform accurate registration on high resolutions. SHR is able to use a user-independent face model or a user-specific face model to register face images. For the user-specific registration, we defined two scenarios to register the gallery images. We show that by using edges as features for the registration, we obtain better results than using the grey levels of the image. The search for the best registration parameters is iterative, and we proposed two search methods, namely, the downhill simplex method and a gradient-based method.

To evaluate the face registration, we measured the effects it has on the results of face recognition. We used the FRGCv2 database to perform our face registration experiments. We compared SHR with two landmark-based registration methods, working on high resolution facial images. Nevertheless, the recognition results of SHR were better than those of the landmark-based methods. User-independent SHR gives a similar performance in face recognition results than registration with manually labelled landmarks. User-specific SHR performs better than the user-independent SHR and manual registration. One of the advantages over the landmark-based methods is that SHR is able to register low-resolution face images with an interocular distance as low as 25 pixels. The results at this resolution make SHR suitable for use in video surveillance.

Declarations

Authors’ Affiliations

(1)
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente

References

  1. Phillips JP, Scruggs TW, Otoole AJ, et al.: FRVT 2006 and ice 2006 large-scale results. National Institute of Standards and Technology; March 2007.Google Scholar
  2. Acosta E, Torres L, Albiol A, Delp E: An automatic face detection and recognition system for video indexing applications. Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP '02), May 2002 3644-3647.Google Scholar
  3. Balcan M, Blum A, Choi PP, et al.: Person identification in webcam images: an application of semi-supervised learning. Proceedings of the International Conference on Machine Learning Workshop on Learning from Partially Classified Training Data, 2005 1-9.Google Scholar
  4. Viola PA, Jones MJ: Rapid object detection using a boosted cascade of simple features. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '01), December 2001 511-518.Google Scholar
  5. Cristinacce D, Cootes T, Scott I: A multi-stage approach to facial feature detection. Proceedings of the 15th British Machine Vision Conference, 2004, London, UK 277-286.Google Scholar
  6. Chen L, Zhang L, Zhu L, Li M, Zhang H: A novel facial feature localization method using probabilistic-like output. Proceedings of the Asian Conference on Computer Vision, 2004 1-10.Google Scholar
  7. Castrilln-Santana M, Dniz-Surez O, Antn-Canals L, Lorenzo-Navarro J: Face and facial feature detection. Proceedings of the 3rd International Conference on Computer Vision Theory and Applications (VISAPP '08), 2008 2: 167-172.Google Scholar
  8. Moghaddam B, Pentland A: Probabilistic visual learning for object representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 1997, 19(7):696-710. 10.1109/34.598227View ArticleGoogle Scholar
  9. Bazen A, Veldhuis R, Croonen G: Likelihood ratio-based detection of facial features. Proceedings of the 14th Annual Workshop on Circuits, Systems and Signal Processing (ProRisc '03), November 2003, Veldhoven, The Netherlands 2: 323-329.Google Scholar
  10. Beumer GM, Tao Q, Bazen AM, Veldhuis RNJ: A landmark paper in face recognition. Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR '06), April 2006 73-78.View ArticleGoogle Scholar
  11. Everingham M, Zisserman A: Regression and classification approaches to eye localization in face images. Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR '06), April 2006 441-446.View ArticleGoogle Scholar
  12. Wiskott L, Fellous J-M, Krüger N, von der Malsburg C: Face recognition by elastic bunch graph matching. In Intelligent Biometric Techniques in Fingerprint and Face Recognition. Edited by: Jain LC, Halici U, Hayashi I, Lee SB. CRC Press, Boca Raton, Fla, USA; 1999:355-396.Google Scholar
  13. Shi J, Samal A, Marx D: How effective are landmarks and their geometry for face recognition? Computer Vision and Image Understanding 2006, 102(2):117-133. 10.1016/j.cviu.2005.10.002View ArticleGoogle Scholar
  14. Arca S, Campadelli P, Lanzarotti R: A face recognition system based on automatically determined facial fiducial points. Pattern Recognition 2006, 39(3):432-443. 10.1016/j.patcog.2005.06.015View ArticleMATHGoogle Scholar
  15. Salah AA, Çinar H, Akarun L, Sankur B: Robust facial landmarking for registration. Annals of Telecommunications 2007, 62(1-2):1608-1633.Google Scholar
  16. Cootes TF, Taylor CJ, Cooper DH, Graham J: Active shape models—their training and application. Computer Vision and Image Understanding 1995, 61(1):38-59. 10.1006/cviu.1995.1004View ArticleGoogle Scholar
  17. Cooles TF, Edwards GJ, Taylor CJ: Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001, 23(6):681-685. 10.1109/34.927467View ArticleGoogle Scholar
  18. Mahalanobis A, Kumar BVKV, Casasent D: Minimum average correlation energy filters. Applied Optics 1987, 26(6):3633-3640.View ArticleGoogle Scholar
  19. Savvides M, Vijaya Kumar B: Efficient design of advanced correlation filters for robust distortion-tolerant face recognition. Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, July 2003 45-52.View ArticleGoogle Scholar
  20. Savvides M, Abiantun R, Heo J, Park S, Xie C, Vijayakumar BVK: Partial & holistic face recognition on frgc-ii data using support vector machine. Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops (CVPRW '06), June 2006 48-48.Google Scholar
  21. Jia K, Gong S, Leung A: Coupling face registration and super-resolution. Proceedings of the British Machine Vision Conference, September 2006 2: 449-458.Google Scholar
  22. Jonsson K, Matas J, Kittler J, Haberl S: Saliency-based robust correlation for real-time face registration and verification. Proceedings of the British Machine Vision Conference (BMVC '98), 1998 44-53.Google Scholar
  23. Matas J, Jonsson K, Kittler J: Fast face localization and verification. Image and Vision Computing 1999, 17(8):575-581. 10.1016/S0262-8856(98)00176-0View ArticleGoogle Scholar
  24. Wang P, Tran LC, Ji Q: Improving face recognition by online image alignment. Proceedings of the 18th International Conference on Pattern Recognition (ICPR '06), August 2006 1: 311-314.View ArticleGoogle Scholar
  25. Spreeuwers L, Boom B, Veldhuis R: Better than best: matching score based face registration. Proceedings of the 28th Symposium on Information Theory in the Benelux, 2007 125-132.Google Scholar
  26. Boom B, Beumer G, Spreeuwers L, Veldhuis R: Matching score based face registration. In Proceedings of the 17th Annual Workshop on Circuits, Systems and Signal Processing (ProRISC '06), 2006, Veldhoven, The Netherlands. STW;Google Scholar
  27. Boom B, Spreeuwers L, Veldhuis R: Automatic face alignment by maximizing similarity score. Proceedings of the 7th International Workshop on Pattern Recognition in Information Systems (PRIS '07), June 2007 221-230.Google Scholar
  28. Liu C, Shum H-Y, Freeman WT: Face hallucination: theory and practice. International Journal of Computer Vision 2007, 75(1):115-134. 10.1007/s11263-006-0029-5View ArticleGoogle Scholar
  29. Jia K, Gong S: Generalized face super-resolution. IEEE Transactions on Image Processing 2008, 17(6):873-886.MathSciNetView ArticleGoogle Scholar
  30. Cootes TF, Taylor CJ: On representing edge structure for model matching. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '01), December 2001 1: 1114-1119.Google Scholar
  31. Nelder J, Mead R: A simplex method for function minimization. The Computer Journal 1965, 7(10):308-315.View ArticleMATHGoogle Scholar
  32. Hager GD, Belhumeur PN: Efficient region tracking with parametric models of geometry and illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998, 20(10):1025-1039. 10.1109/34.722606View ArticleGoogle Scholar
  33. Baker S, Matthews I: Lucas-Kanade 20 years on: a unifying framework. International Journal of Computer Vision 2004, 56(3):221-255.View ArticleGoogle Scholar
  34. Phillips PJ, Flynn PJ, Scruggs T, Bowyer KW, Chang J, Hoffman K, Marques J, Min J, Worek W: Overview of the face recognition grand challenge. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), June 2005 1: 947-954.Google Scholar
  35. Lienhart R, Kuranov A, Pisarevsky V: Empirical analysis of detection cascades of boosted classifiers for rapid object detection. In Pattern Recognition, Lecture Notes in Computer Science. Volume 2781. Springer, Berlin, Germany; 2003:297-304. 10.1007/978-3-540-45243-0_39Google Scholar
  36. Intel : Open computer vision library. http://sourceforge.net/projects/opencvlibrary/
  37. Wang P, Green M, Ji Q, Wayman J: Automatic eye detection and its validation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), June 2005 164-164.Google Scholar
  38. Perlibakas V: Distance measures for PCA-based face recognition. Pattern Recognition Letters 2004, 25(6):711-724. 10.1016/j.patrec.2004.01.011View ArticleGoogle Scholar
  39. Zhang G, Huang X, Li SZ, Wang Y, Wu X: Boosting local binary pattern (lbp)-based face recognition. Proceedings of the Chinese Conference on Biometric Recognition (SINOBIOMETRICS '04), 2004, Guangzhou, China 179-186.Google Scholar
  40. Veldhuis R, Bazen A, Booij W, Hendrikse A: Hand-geometry recognition based on contour parameters. Biometric Technology for Human Identification II, March 2005, Orlando, Fla, USA, Proceedings of SPIE 344-353.View ArticleGoogle Scholar
  41. Jonathon Phillips P, Flynn PJ, Scruggs T, Bowyer KW, Worek W: Preliminary face recognition grand challenge results. Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR '06), April 2006 15-24.View ArticleGoogle Scholar

Copyright

© B. J. Boom et al. 2010

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.