### 2.1. Subspace-Based Holistic Registration

Face registration is performed to correct for variations that occur when the face region is selected from an image. We assume that the face detection obtains frontal faces from a camera, and that we have to correct for in-plane rotations of these faces. The exact positions of the camera and the face are usually unknown, making a correction for scale and translation necessary as well. A Procrustes transformation denoted by corrects for these variations, allowing us to scale by a factor , rotate with an angle , and translate over a vector an image. The optimal face registration is assumed to be found if there is a maximum similarity between the transformed input image (probe image), and the gallery images. In SHR, we try to find the best registration parameters , by maximizing a similarity function . Here denotes the probe image, which is transformed by , denotes a registered reference object (gallery image) and denotes a model of the reference object (faces). The equation for finding the best registration parameters is

An important issue is how to measure the similarity between probe and gallery image. In our previous work, we used similarity scores from well-known face recognition algorithms for this purpose. However, these scores are usually optimal for face recognition, measuring the similarity between faces of different individuals in a face space. In this paper, we argue that the correct quantifier for the face registration should also include the probability that the face might be misaligned, measuring also the error outside face space. We thus use the probability that the aligned image belongs to the object class of the gallery image . Let be an operator that vectorizes the features in and using a set of predefined locations in the images. We adopt a Gaussian model of which is the mean and the covariance matrix

Our goal is to optimize as function of the registration parameters . For notational compactness, we define and and

The training samples to determine both the mean and covariance matrix are correctly aligned images. Notice that needs to be a registered image in order to find the registration parameters for . The exact estimation of the covariance matrix is not possible with a limited number of training samples. As a consequence, the estimate of is often singular, so that cannot be computed, and even if can be calculated, the results will be inaccurate. Furthermore, the computational costs of evaluating (3) are large, due to the high dimensionality of and . For these reasons, we use Principal Component Analysis (PCA) to reduce the dimensionality. We obtain a subspace by solving the eigenvalue problem:

where are the eigenvalues and are the eigenvectors of the covariance matrix . We can obtain a reduced feature vector , where . The principal subspace , which reduces the feature vector from to dimensions, has an orthogonal complement , which contains the variations that are not modelled by PCA. Using only similarities in the principal subspace, as in our previous work [27], results in the Mahalanobis distance. However, if we optimize the alignment only for the principal subspace , we might walk further away in the orthogonal complement , ignoring details not included in our model but which indeed might be important for the registration. To overcome this problem, we use a distance measure, proposed in [8].

where are the eigenvalues in and which is the average eigenvalue in . This distance measure consist of two parts, the first is called "distance-in-feature-space" (DIFS) and the second is called "distance-from-feature-space" (DFFS). In our experiments, we compare the results of using only DIFS for face registration, which is used in [27, 28], and using both DIFS and DFFS (see Section 4.1). We show that using both distances result in a better performance than using DIFS.

In Figure 1, we give a schematic representation of the components needed for SHR and the interaction between them. We use an iterative search method to find the optimal similarity between probe image and gallery images. The initial registration parameters are given by a face detection algorithm, for instance the method of Viola and Jones [4]. The alignment registers the probe image based on the specified parameters. We will discuss the components in Figure 1 in the following sections: evaluation (Section 2.2), the alignment (Section 2.3), and the search methods (Section 2.4).

### 2.2. Evaluation

Two important issues in the evaluation function are the model and the features. The model can be either user independent as explained in the previous section or user specific. This we will discuss in the first paragraph below. As features, we propose edge images, instead of grey level images, which reduce the number of local minima in the evaluation. This will be explained in the second paragraph.

#### 2.2.1. Evaluation to a User Specific Face Model

Instead of registration to a mean face model, which may differ substantially from individual faces, registration to a user-specific model, if available may improve registration results. For user-specific face registration, we need a user template to register a probe image. For face identification, user-specific registration has the drawback that we have to register the probe to every user template in the database.

For user-specific registration, we define the similarity measure , where models registered facial images of user . The user-specific model consists of a user template and the covariance matrix . For the covariance matrix , we use a within-class covariance matrix that models the variations among face images of the same person for all users, because we often do not have enough images to estimate a user-specific covariance matrix. The similarity function for the user-specific model is

#### 2.2.2. Using Edge Images to Avoid Local Minima

Using grey level images for registration often leads to local minima in the search space. Better registration results can be obtained by using edge images, which is for instance shown in [30] for Active Appearance Models. In image registration, regions containing large variations (structure) contribute more to registration than homogeneous regions. By applying edge filters, the regions that contain structure will be highlighted, and the homogeneous regions will be suppressed. In our case, the use of edge filters results in a search space with fewer local minima. In Figure 2, a 2D search space is shown where we varied the scale and translation in -direction of a grey level image and an edge image. The edge image (right) shows a single clear minimum, while the grey level image has a global minimum at the same place, but also a large local minimum in the right corner.

In order to calculate the edges in the image, we take the derivatives in the and directions in the images. Because images usually contain noise, we use the Gaussian kernels and :

The derivatives and of the images are calculated by convolution. We refer to these as "edge images". If we use both edge images in the feature vector instead of the grey level image, this doubles the length of the feature vector, resulting in increased computation time. An alternative is to combine the two edge images as follows into a "magnitude image":

The default features used in this paper are the "edge images", and a comparison between the features is performed in Section 4.1.

### 2.3. Alignment

We use a Procrustes transformation to align the probe image to the gallery images, which is common practice in face recognition, preserving the distance ratios. Given the pixel location , we can define a transformation on the pixel location as follows:

is the rotation matrix. The transformation of the image is defined as

This allows us to obtain an aligned image by backward mapping and interpolation. Most landmark-based methods also perform this transformation based on the found landmarks in order to obtain a registered face image [13].

### 2.4. Search Methods

In (1), we have to maximize the similarity score to find the best alignment parameters . Ideally, an iterative search method should be able to find the optimal solution using a small number of evaluations, making it possible to register the probe image almost real time. The search method also has to be robust against local minima. Confirmed by our observations, we assume reasonably smooth search landscapes. We applied two different search methods the first is the downhill simplex method [31] that we also used in [26, 27], and the second is a gradient-based method.

#### 2.4.1. Downhill Simplex Search Method

This method is able to maximize a similarity function using around 100 evaluations. A good initialization of the downhill simplex method is necessary to be robust against local minima. This was also observed in [27], where we used several initializations to reduce outliers. To initialize the downhill simplex method, we need to create a simplex (geometric shape in dimensions, consisting of points). To obtain the four registration parameters, this means that we have to select five starting points. The first starting point is given by the initial parameter vector . The other starting points are given by

where is the maximum expected offset for a single registration parameter in positive or negative direction, where we use the offset which gives the best similarity score. The downhill simplex methods is however able to find optimal registration parameters that lay outside the maximum expected offsets. This search method maximizes the similarity function by replacing those registration parameters in the simplex that gives the worst similarity score by a better set using some simple heuristics.

#### 2.4.2. Gradient-Based Search Method

In (1), we find the best alignment parameters by maximizing the similarity score. We start with the initial registration parameters ; improving these parameters means that we have to determine an offset to the optimal alignment called [32, 33]. We achieve this by expanding the image using a first-order Taylor expansion:

In this case, is the Jacobian matrix of with respect to the parameters , given in [32] for a transformation with translation, rotation, and scale. By setting the derivative of (2) with respect to to zero, we can determine the offset from the original parameters:

In the appendix, it is shown how this equation is solved and how updated parameters are obtained analytically. This procedure is repeated until convergence has been reached.