Face recognition using nonparametric-weighted Fisherfaces

This study presents an appearance-based face recognition scheme called the nonparametric-weighted Fisherfaces (NW-Fisherfaces). Pixels in a facial image are considered as coordinates in a high-dimensional space and are transformed into a face subspace for analysis by using nonparametric-weighted feature extraction (NWFE). According to previous studies of hyperspectral image classification, NWFE is a powerful tool for extracting hyperspectral image features. The Fisherfaces method maximizes the ratio of between-class scatter to that of within-class scatter. In this study, the proposed NW-Fisherfaces weighted the between-class scatter to emphasize the boundary structure of the transformed face subspace and, therefore, enhances the separability for different persons' face. The proposed NW-Fisherfaces was compared with Orthogonal Laplacianfaces, Eigenfaces, Fisherfaces, direct linear discriminant analysis, and null space linear discriminant analysis methods for tests on five facial databases. Experimental results showed that the proposed approach outperforms other feature extraction methods for most databases.


Introduction
Face representation is important in recognizing face in many applications such as database matching, security systems, face indexing on web pictures, and humancomputer interfaces. The appearance-based method is one of the well-studied techniques for face representation [1,2]. Two purposes of the appearance-based method are reducing dimensionality and increasing discriminability of extracted features. Hence, a good feature extraction method helps recognize face in a highly discriminative subspace with low dimensionality.
Two of the most classical feature extraction techniques for this purpose are the Eigenfaces and Fisherfaces methods. Eigenfaces [3] applies principal component analysis (PCA) to transform facial data to the linear subspace spanned by coordinates that maximize the total scatter across all classes. Unlike the Eigenfaces method, which is unsupervised, the Fisherfaces method is supervised. Fisherfaces applies linear discriminant analysis (LDA) to transform data into directions with optimal discriminability. LDA searches for coordinates that separate data of different classes and draw data of the same class close. However, both Eigenfaces and Fisherfaces see only the global Euclidean structure, which may lose some discriminability contained in other hidden structures.
To discover local structure, He et al. [4] and Cai et al. [5] proposed the Laplacianfaces method [4] and its orthogonal form, which is referred to as O-Laplacianfaces [5]. The Laplacianfaces algorithm is based on the locality preserving projection (LPP) algorithm, which aims at finding a linear approximation to the eigenfunctions of the Laplace Beltrami operator on the face manifold. Han et al. [1] proposed the eigenvector-weighting function based on graph embedding framework.
There are several drawbacks in LDA. First, it suffers from the singularity problem, which makes it hard to perform. Second, LDA has the distribution assumption which may make it fail in applications where the distribution is more complex than Gaussian. Third, LDA cannot determine the optimal dimensionality for discriminant analysis, which is an important issue but has often been neglected previously. Fourth, applying LDA may encounter the so-called small sample size problem (SSSP) [14].
However, the classical LDA formulation requires the nonsingularity of the scatter matrices involved. For undersampled problems, where the data dimensionality is much larger than the sample size, all scatter matrices are singular and classical LDA fails. Many extensions, including null space LDA (N-LDA) [15] and orthogonal LDA (OLDA), have been proposed in the past to overcome this problem. N-LDA aims to maximize the between-class distance in the null space of the withinclass scatter matrix, while OLDA computes a set of orthogonal discriminant vectors via the simultaneous diagonalization of the scatter matrices.
Direct linear discriminant analysis (D-LDA) [16] is an extension of LDA to deal with SSSP. D-LDA does not use the information inside the null space, as some discriminative information may be lost. D-LDA will be equivalent to N-LDA and LDA in high-dimensional data and small sample size.
In this study, we propose an appearance-based face recognition scheme called nonparametric-weighted Fisherfaces (NW-Fisherfaces). The NW-Fisherfaces approach is a derivative of the nonparametric-weighted feature extraction (NWFE) [17], which performs well in the studies of hyperspectral image classification [18,19]. The proposed NW-Fisherfaces method weights the betweenclass scatter to emphasize the boundary structure of the transformed face subspace and, therefore, enhances the face recognition discriminability. The proposed approach is compared with O-Laplacianfaces, Eigenfaces, Fisherfaces, N-LDA, and D-LDA methods for tests on five face databases. Experimental results show that the proposed approach gains the least error rates in lowdimensional subspaces for most databases.
The rest of this article is organized as follows. Section 2 gives a brief review of related studies. Section 3 introduces the NW-Fisherfaces algorithm. Section 4 presents the experimental results on face recognition. In Section 5, we draw some conclusions and provide some ideas for future research.

Related study
Linear feature extraction methods can reduce excessive dimensionality of image data with simple computation. In essence, linear methods project high-dimensional data to low-dimensional subspace.

PCA
PCA finds directions efficient for representation. Considering a set of N sample images, x 1 , x 2 ,..., x N , in an ndimensional image space, the original n-dimensional image space is linearly transformed to an m-dimensional feature space, where m <n. The new feature vectors y k are defined by the following linear transformation: where W R n×m is a matrix with orthonormal columns. Total scatter matrix S T is defined as where N is the number of sample images and μ is the mean of all samples. The objective function is as follows where W PCA is the set of n-dimensional eigenvectors of S T corresponding to the m largest eigenvalues.

LDA
LDA finds directions efficient for discrimination. Considering a set of N sample images, x 1 , x 2 ,..., x N , which belong to l classes of face in an n-dimensional image space, the objective function of LDA is as follows where μ is the mean of all samples, N i is the number of samples in class i, μ i is the average of class i, and x i j is the jth sample in class i. S w is the within-class scatter matrix. S b is the between-class scatter matrix. W LDA is the set of generalized eigenvectors of (S w ) -1 S b corresponding to the m largest generalized eigenvalues.

D-LDA
The new D-LDA method is applicable to solve the SSSP which often arising in face recognition. Most LDAbased algorithms including Fisherfaces [20] and D-LDA [21] utilize the conventional Fisher criterion defined in (4) while some authors use the alternative given in (6) proposed by Liu [22,23].
where S t is population scatter matrix. A variant of Fisher criterion of D-LDA is expressed as follows

N-LDA
In this new LDA method, they proved that the most expressive vectors derived in the null space of the within-class scatter matrix using PCA are equal to the optimal discriminant vectors derived in the original space using LDA. This method is more efficient, accurate, and stable to calculate the most discriminant projection vectors based on the modified Fisher's criterion (7). This process starts by calculating the projection vectors in the null space of the within-class scatter matrix S w . This null space can be spanned by those eigenvectors corresponding to the set of zero eigenvalues of S w . If this subspace does not exist, i.e., S w is nonsingular, then S t is also nonsingular. Under these circumstances, we choose those eigenvectors corresponding to the set of the largest eigenvalues of the matrix (S b + S w ) -1 S b as the most discriminant vector set; otherwise, the SSSP will occur.

LPP
LPP finds directions efficient for preserving the intrinsic geometry of the data and local structure. The objective function of LPP is as follows: with the constraint where D ii = ∑ j S ij and L = D -S is the Laplacian matrix. S is a similarity matrix attempting to ensure that if x i and x j are "close", then y i and y j are close as well. The basic functions of LPP are the eigenvectors of the matrix (XDXT) -1 XLXT associated with the smallest eigenvalues. Moreover, Cai et al. [5] proposed the orthogonal form of LPP (OLPP) and proved that OLPP outperforms LPP. In this study, OLPP is applied with a supervised similarity matrix for comparison. The weights of S are defined as follows: where cos(·) denotes the cosine distance measure, i and j denote sample indices, and k and l denote classes. The applied S preserves the locality depending on the cosine distance measure and ensures preservation only for within-class face by setting the between-class weights as 0.

Methodology: NW-Fisherfaces
The proposed NW-Fisherfaces scheme is based on the NWFE method proposed by Kuo and Landgrebe [17]. NWFE is an LDA-based method that improves LDA by focusing on samples near the eventual decision boundary location. Both NWFE and OLPP use distance function to evaluate closeness between samples. While OLPP emphasizes the local structure by defining a closeness graph map, NWFE emphasizes the boundary structure by weighting the calculation of mean and covariance with the measured closeness. The main ideas of NWFE put different weights on every sample to compute the "weighted means" and define new nonparametric between-class and within-class scatter matrices. In NWFE, the nonparametric between-class scatter matrix is defined as follows: where N i is the training sample size of class i, x i k is the kth sample of class i, M j (x i k ) denotes the weighted mean corresponding to x i k for class j, and dist(x, y) is the distance measured from x to y. The closer x i k and M j (x i k ) are, the larger the weight λ kl for computing weighted means is a function of x i k and x j l . The closer x i k and x j l are, the larger w is one. In face recognition, the dimension of face data often exceeds the size of data. In this case, the covariance was not a full rank matrix and could not be inverted. A simple method to deal with the SSSP is called regularized discriminant analysis, which artificially increases the number of available samples by adding white noise to existing samples. Some regularized techniques [18,24], can be applied to within-class scatter matrix. In this article, the within-class scatter matrix was regularized by where diag(.) denotes the diagonal part of a matrix.
The NW-Fisherfaces computational scheme is as follows.
(1) PCA projection: Face images are projected into the PCA subspace by throwing away the components corresponding to zero eigenvalue. W PCA denotes the transformation matrix of PCA projection. The projected components are statistically uncorrelated and the rank of the projected data matrix is equal to the data dimensionality. This study applied the PCA projection method proposed in [5,25] to prevent the singularity of S w due to the simple computation and fair comparison with Fisherfaces and O-Laplacianfaces. However, throwing the dimensionalities corresponding to zero eigenvalue may lose important discriminant information [26]. For further applying LDA-based methods to practical applications, an advanced regularization method proposed in [26] is suggested.
(2) Compute the distances between each pair of samples and form the distance matrix.
where W is the transformation matrix and the column vectors of W are the so-called NW-Fisherfaces.

Experimental results
The performance of the proposed NW-Fisherfaces method was compared with the three most popular linear methods in face recognition: Eigenfaces [3], Fisherfaces [20], and O-Laplacianfaces [5]. Three face databases were tested: Yale database, Olivetti Research Laboratory (ORL) database, and the PIE (pose, illumination, and expression) database from CMU [24]. This study applied the same preprocessing in [5] to locate face. Gray level images were manually aligned, cropped, and re-sized to 32 × 32 pixels. Each image was represented by a 1,024-dimensional vector. For simplicity, the k nearest-neighbor (k-nn) classifier, where k = 1, was applied in all experiments. Recognition processes were as follows: face subspace was calculated from training samples; new testing face images were projected into calculated subspace; and new facial images were identified by the 1-nearest neighbor classifier.

ORL database
The ORL database contains 10 different images for each of 40 distinct individuals. For some individuals, the images were captured at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling), and facial details (glasses/no glasses) as shown in Figure 1. The database is divided into training and testing sets for experiment. The applied divisions are n images per individual for training and 10 -n images per individual for testing, where n = 2, 3, 4, and 5. Furthermore, experimental results are averaged over 20 random sets for each division. Table 1 presents the least error rates and the corresponding dimensions obtained by Eigenfaces, Fisherfaces, O-Laplacianfaces, and NW-Fisherfaces. The proposed NW-Fisherfaces outperformed other methods on the ORL database. Figure 1 shows the plots of error rate versus reduced dimensionality. Since the optimization of LDA produces at most L -1 features [20], the maximal dimension of Fisherfaces is also L -1, where L is the number of individuals. As observed, error rates of O-Laplacianfaces are below those of PCA and LDA after the dimension reaches a certain degree, which is 19 in Figure 2a. The error rates of NW-Fisherfaces are lower than those of other methods where over all dimensions below L -1.

Yale database
The Yale face database contains 165 grayscale images of 15 individuals. There are 11 images per individual, one per different facial expression or configuration: centerlight, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink as shown in Figure 3. The database is divided into training and testing sets for experiment. The applied divisions are n images per individual for training and 11 -n images per individual for testing, where n = 2, 3, 4, and 5. Furthermore, the experimental results are averaged over 20 random sets for each division. Table 2 and Figure 4 show the experimental results. The proposed NW-Fisherfaces still outperformed other methods with low dimensionality.

PIE database
The CMU PIE face database contains 41,368 face images of 68 individuals. Each individual was imaged under various poses, illuminations, and expressions. In this study, 5 near frontal poses (C05, C07, C09, C27, and C29) and all the images under various illuminations, lighting, and expressions were gathered as 170 near frontal facial images for each individual as shown in Figure 5. The database is divided into training and testing sets for experiment. The applied divisions are n images per individual for training and 170 -n images per individual for testing, where n = 5, 10, 20, and 30. Furthermore, the experimental results were averaged over 20 random sets for each division. Table 3 presents the least error rates and the corresponding dimensions. Both O-Laplacianfaces and NW-Fisherfaces outperformed the Fisherfaces and Eigenfaces. O-Laplacianfaces resulted in the least error rates on PIE database. However, the dimensionality required by the NW-Fisherfaces to reach its least error rate is much lower than the dimensionalities required by other methods. As shown in Figure 6, NW-Fisherfaces outperformed other methods over the dimensions below L -1, where L is the number of individuals.
There is no result for N-LDA for PIE database after 10 Train. Because the sample number in training set is larger than the dimension of feature, there was no null space for within scatter matrix S w .

PIE_Small database
The PIE_Small database is a part of PIE database. To check the performance of the proposed method, we reduced number of pictures for each subject. Instead of 170 images, we took 15 images for each person as shown in Figure 7 and found that the performance of the proposed method is better than that of others especially for small sample size data. The applied divisions are n images per individual for training and 15 -n images per individual for testing, where n = 5, 6, 7, and 8. Furthermore, the experimental results were averaged over ten random sets for each division. Table 4 presents the least error rates and the corresponding dimensions. Both O-Laplacianfaces and NW-Fisherfaces outperformed the Fisherfaces and Eigenfaces. O-Laplacianfaces resulted in the least error rates on PIE_Small database. However, the dimensionality required by the NW-Fisherfaces to reach its least error rate is much lower than the dimensionalities required by other methods. As shown in Figure 8, NW-Fisherfaces outperformed other methods over the dimensions below L -1, where L is the number of individuals.

AR database
In order to check the capability of invariance to lighting condition and face orientation, which have been better solved by 3D deformation approaches. We used AR face database for our proposed method and we found that it is giving better result compare to other method which has been proposed previously.
In this database, there are totally 126 subjects (70 men, 56 women) and each subject has 26 different images as shown in Figure 9. This had taken in different facial expressions, illumination conditions, and occlusions. The applied divisions are n images per individual for training and 13 -n images per individual for testing, where n = 5, 6, 7, and 8. Furthermore, the experimental results are averaged over ten random sets for each division. Table 5 and Figure 10 show the experimental results. The proposed NW-Fisherfaces still outperformed other methods with low dimensionality.
The pictures were taken at the CVC under strictly controlled conditions. No restrictions on wear (clothes,
(2) This study applied a nonparametric feature extraction method into the scheme of appearance-based face recognition.
(3) The proposed NW-Fisherfaces method weights the between-class scatter to emphasize boundary structure of the transformed face subspace and, therefore, enhances the discriminability of face recognition.  (4) For practical applications, the computational load will depend on the dimensionality of the trained linear projection matrix. In this study, experimental results show that the proposed method can reach its lowest error rate with low dimensionality. Hence, the NW-Fisherfaces method is practical for real-world face recognition due to the low dimensionality requirement.

Future works
The future research in this area could involve the following.
(1) The supervised OLPP weights the scatter matrix to preserve the locality of within class face. This weighting concept may enhance the within-class scatter of LDA and other LDA-based methods such as NDA and NWFE.
(2) Linear feature extraction methods measure and optimize closeness between samples depending on Euclidean distance. However, Euclidean distance is basically light variant. Variance caused by lighting should be reduced before using linear feature extraction methods. Several solutions to reduce light variances of face images are proposed: (a) Mapping face images into the same intensity distribution by simple preprocessing such as histogram specification. (b) Transforming images into frequency domain by Fourier-based methods such as Gabor wavelets.
The performance of NW-Fisherfaces in nonlinear feature space, such as kernel Hilbert space, can be further evaluated.