This study presents an appearance-based face recognition scheme called the nonparametric-weighted Fisherfaces (NW-Fisherfaces). Pixels in a facial image are considered as coordinates in a high-dimensional space and are transformed into a face subspace for analysis by using nonparametric-weighted feature extraction (NWFE). According to previous studies of hyperspectral image classification, NWFE is a powerful tool for extracting hyperspectral image features. The Fisherfaces method maximizes the ratio of between-class scatter to that of within-class scatter. In this study, the proposed NW-Fisherfaces weighted the between-class scatter to emphasize the boundary structure of the transformed face subspace and, therefore, enhances the separability for different persons' face. The proposed NW-Fisherfaces was compared with Orthogonal Laplacianfaces, Eigenfaces, Fisherfaces, direct linear discriminant analysis, and null space linear discriminant analysis methods for tests on five facial databases. Experimental results showed that the proposed approach outperforms other feature extraction methods for most databases.
Face representation is important in recognizing face in many applications such as database matching, security systems, face indexing on web pictures, and human-computer interfaces. The appearance-based method is one of the well-studied techniques for face representation [1, 2]. Two purposes of the appearance-based method are reducing dimensionality and increasing discriminability of extracted features. Hence, a good feature extraction method helps recognize face in a highly discriminative subspace with low dimensionality.
Two of the most classical feature extraction techniques for this purpose are the Eigenfaces and Fisherfaces methods. Eigenfaces  applies principal component analysis (PCA) to transform facial data to the linear subspace spanned by coordinates that maximize the total scatter across all classes. Unlike the Eigenfaces method, which is unsupervised, the Fisherfaces method is supervised. Fisherfaces applies linear discriminant analysis (LDA) to transform data into directions with optimal discriminability. LDA searches for coordinates that separate data of different classes and draw data of the same class close. However, both Eigenfaces and Fisherfaces see only the global Euclidean structure, which may lose some discriminability contained in other hidden structures.
To discover local structure, He et al.  and Cai et al.  proposed the Laplacianfaces method  and its orthogonal form, which is referred to as O-Laplacianfaces . The Laplacianfaces algorithm is based on the locality preserving projection (LPP) algorithm, which aims at finding a linear approximation to the eigenfunctions of the Laplace Beltrami operator on the face manifold. Han et al.  proposed the eigenvector-weighting function based on graph embedding framework.
Recently, many LDA-based methods have been proposed to embed manifold structure into the facial feature extraction process [6–13]. Park and Savvides  proposed a multifactor extension of LDA. Na et al.  proposed the linear boundary discriminant analysis, which increases class separability by reflecting different significances of nonboundary and boundary patterns.
There are several drawbacks in LDA. First, it suffers from the singularity problem, which makes it hard to perform. Second, LDA has the distribution assumption which may make it fail in applications where the distribution is more complex than Gaussian. Third, LDA cannot determine the optimal dimensionality for discriminant analysis, which is an important issue but has often been neglected previously. Fourth, applying LDA may encounter the so-called small sample size problem (SSSP) .
However, the classical LDA formulation requires the nonsingularity of the scatter matrices involved. For undersampled problems, where the data dimensionality is much larger than the sample size, all scatter matrices are singular and classical LDA fails. Many extensions, including null space LDA (N-LDA)  and orthogonal LDA (OLDA), have been proposed in the past to overcome this problem. N-LDA aims to maximize the between-class distance in the null space of the within-class scatter matrix, while OLDA computes a set of orthogonal discriminant vectors via the simultaneous diagonalization of the scatter matrices.
Direct linear discriminant analysis (D-LDA)  is an extension of LDA to deal with SSSP. D-LDA does not use the information inside the null space, as some discriminative information may be lost. D-LDA will be equivalent to N-LDA and LDA in high-dimensional data and small sample size.
In this study, we propose an appearance-based face recognition scheme called nonparametric-weighted Fisherfaces (NW-Fisherfaces). The NW-Fisherfaces approach is a derivative of the nonparametric-weighted feature extraction (NWFE) , which performs well in the studies of hyperspectral image classification [18, 19]. The proposed NW-Fisherfaces method weights the between-class scatter to emphasize the boundary structure of the transformed face subspace and, therefore, enhances the face recognition discriminability. The proposed approach is compared with O-Laplacianfaces, Eigenfaces, Fisherfaces, N-LDA, and D-LDA methods for tests on five face databases. Experimental results show that the proposed approach gains the least error rates in low-dimensional subspaces for most databases.
The rest of this article is organized as follows. Section 2 gives a brief review of related studies. Section 3 introduces the NW-Fisherfaces algorithm. Section 4 presents the experimental results on face recognition. In Section 5, we draw some conclusions and provide some ideas for future research.
2. Related study
Linear feature extraction methods can reduce excessive dimensionality of image data with simple computation. In essence, linear methods project high-dimensional data to low-dimensional subspace.
PCA finds directions efficient for representation. Considering a set of N sample images, x1, x2,..., xN, in an n-dimensional image space, the original n-dimensional image space is linearly transformed to an m-dimensional feature space, where m < n. The new feature vectors yk are defined by the following linear transformation:
where W∈Rn×mis a matrix with orthonormal columns. Total scatter matrix ST is defined as
where N is the number of sample images and μ is the mean of all samples. The objective function is as follows
where WPCA is the set of n-dimensional eigenvectors of ST corresponding to the m largest eigenvalues.
LDA finds directions efficient for discrimination. Considering a set of N sample images, x1, x2,..., xN, which belong to l classes of face in an n-dimensional image space, the objective function of LDA is as follows
where μ is the mean of all samples, Ni is the number of samples in class i, μi is the average of class i, and is the j th sample in class i. Sw is the within-class scatter matrix. Sb is the between-class scatter matrix. WLDA is the set of generalized eigenvectors of (Sw)-1Sb corresponding to the m largest generalized eigenvalues.
The new D-LDA method is applicable to solve the SSSP which often arising in face recognition. Most LDA-based algorithms including Fisherfaces  and D-LDA  utilize the conventional Fisher criterion defined in (4) while some authors use the alternative given in (6) proposed by Liu [22, 23].
where St is population scatter matrix.
A variant of Fisher criterion of D-LDA is expressed as follows
In this new LDA method, they proved that the most expressive vectors derived in the null space of the within-class scatter matrix using PCA are equal to the optimal discriminant vectors derived in the original space using LDA. This method is more efficient, accurate, and stable to calculate the most discriminant projection vectors based on the modified Fisher's criterion (7). This process starts by calculating the projection vectors in the null space of the within-class scatter matrix Sw. This null space can be spanned by those eigenvectors corresponding to the set of zero eigenvalues of Sw. If this subspace does not exist, i.e., Sw is nonsingular, then St is also nonsingular. Under these circumstances, we choose those eigenvectors corresponding to the set of the largest eigenvalues of the matrix (Sb + Sw)-1Sb as the most discriminant vector set; otherwise, the SSSP will occur.
LPP finds directions efficient for preserving the intrinsic geometry of the data and local structure. The objective function of LPP is as follows:
with the constraint
where Dii = ∑jSij and L = D - S is the Laplacian matrix. S is a similarity matrix attempting to ensure that if xi and xj are "close", then yi and yj are close as well. The basic functions of LPP are the eigenvectors of the matrix (XDXT)-1XLXT associated with the smallest eigenvalues. Moreover, Cai et al.  proposed the orthogonal form of LPP (OLPP) and proved that OLPP outperforms LPP. In this study, OLPP is applied with a supervised similarity matrix for comparison. The weights of S are defined as follows:
where cos(·) denotes the cosine distance measure, i and j denote sample indices, and k and l denote classes. The applied S preserves the locality depending on the cosine distance measure and ensures preservation only for within-class face by setting the between-class weights as 0.
3. Methodology: NW-Fisherfaces
The proposed NW-Fisherfaces scheme is based on the NWFE method proposed by Kuo and Landgrebe . NWFE is an LDA-based method that improves LDA by focusing on samples near the eventual decision boundary location. Both NWFE and OLPP use distance function to evaluate closeness between samples. While OLPP emphasizes the local structure by defining a closeness graph map, NWFE emphasizes the boundary structure by weighting the calculation of mean and covariance with the measured closeness. The main ideas of NWFE put different weights on every sample to compute the "weighted means" and define new nonparametric between-class and within-class scatter matrices. In NWFE, the nonparametric between-class scatter matrix is defined as follows:
where Ni is the training sample size of class i, is the k th sample of class i, denotes the weighted mean corresponding to for class j, and dist(x, y) is the distance measured from x to y. The closer and are, the larger the weight is. The sum of for class i is one. The weight for computing weighted means is a function of and . The closer and are, the larger is. The sum of for is one.
In face recognition, the dimension of face data often exceeds the size of data. In this case, the covariance was not a full rank matrix and could not be inverted. A simple method to deal with the SSSP is called regularized discriminant analysis, which artificially increases the number of available samples by adding white noise to existing samples. Some regularized techniques [18, 24], can be applied to within-class scatter matrix. In this article, the within-class scatter matrix was regularized by
where diag(.) denotes the diagonal part of a matrix.
The NW-Fisherfaces computational scheme is as follows.
PCA projection: Face images are projected into the PCA subspace by throwing away the components corresponding to zero eigenvalue. WPCA denotes the transformation matrix of PCA projection. The projected components are statistically uncorrelated and the rank of the projected data matrix is equal to the data dimensionality. This study applied the PCA projection method proposed in [5, 25] to prevent the singularity of Sw due to the simple computation and fair comparison with Fisherfaces and O-Laplacianfaces. However, throwing the dimensionalities corresponding to zero eigenvalue may lose important discriminant information . For further applying LDA-based methods to practical applications, an advanced regularization method proposed in  is suggested.
Compute the distances between each pair of samples and form the distance matrix.
Compute with the distance matrix.
Use to compute the weighted means .
Compute the scatter matrix weight .
Compute and the regularized .
Compute WNWFE = [w1,..., wm] as the eigenvectors of corresponding to the m largest eigenvalues.
Compute NWFE embedding as follows:
where W is the transformation matrix and the column vectors of W are the so-called NW-Fisherfaces.
4. Experimental results
The performance of the proposed NW-Fisherfaces method was compared with the three most popular linear methods in face recognition: Eigenfaces , Fisherfaces , and O-Laplacianfaces . Three face databases were tested: Yale database, Olivetti Research Laboratory (ORL) database, and the PIE (pose, illumination, and expression) database from CMU . This study applied the same preprocessing in  to locate face. Gray level images were manually aligned, cropped, and re-sized to 32 × 32 pixels. Each image was represented by a 1,024-dimensional vector. For simplicity, the k nearest-neighbor (k-nn) classifier, where k = 1, was applied in all experiments. Recognition processes were as follows: face subspace was calculated from training samples; new testing face images were projected into calculated subspace; and new facial images were identified by the 1-nearest neighbor classifier.
4.1. ORL database
The ORL database contains 10 different images for each of 40 distinct individuals. For some individuals, the images were captured at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling), and facial details (glasses/no glasses) as shown in Figure 1. The database is divided into training and testing sets for experiment. The applied divisions are n images per individual for training and 10 - n images per individual for testing, where n = 2, 3, 4, and 5. Furthermore, experimental results are averaged over 20 random sets for each division. Table 1 presents the least error rates and the corresponding dimensions obtained by Eigenfaces, Fisherfaces, O-Laplacianfaces, and NW-Fisherfaces. The proposed NW-Fisherfaces outperformed other methods on the ORL database. Figure 1 shows the plots of error rate versus reduced dimensionality. Since the optimization of LDA produces at most L - 1 features , the maximal dimension of Fisherfaces is also L - 1, where L is the number of individuals. As observed, error rates of O-Laplacianfaces are below those of PCA and LDA after the dimension reaches a certain degree, which is 19 in Figure 2a. The error rates of NW-Fisherfaces are lower than those of other methods where over all dimensions below L - 1.
4.2. Yale database
The Yale face database contains 165 grayscale images of 15 individuals. There are 11 images per individual, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink as shown in Figure 3. The database is divided into training and testing sets for experiment. The applied divisions are n images per individual for training and 11 - n images per individual for testing, where n = 2, 3, 4, and 5. Furthermore, the experimental results are averaged over 20 random sets for each division. Table 2 and Figure 4 show the experimental results. The proposed NW-Fisherfaces still outperformed other methods with low dimensionality.
4.3. PIE database
The CMU PIE face database contains 41,368 face images of 68 individuals. Each individual was imaged under various poses, illuminations, and expressions. In this study, 5 near frontal poses (C05, C07, C09, C27, and C29) and all the images under various illuminations, lighting, and expressions were gathered as 170 near frontal facial images for each individual as shown in Figure 5. The database is divided into training and testing sets for experiment. The applied divisions are n images per individual for training and 170 - n images per individual for testing, where n = 5, 10, 20, and 30. Furthermore, the experimental results were averaged over 20 random sets for each division. Table 3 presents the least error rates and the corresponding dimensions. Both O-Laplacianfaces and NW-Fisherfaces outperformed the Fisherfaces and Eigenfaces. O-Laplacianfaces resulted in the least error rates on PIE database. However, the dimensionality required by the NW-Fisherfaces to reach its least error rate is much lower than the dimensionalities required by other methods. As shown in Figure 6, NW-Fisherfaces outperformed other methods over the dimensions below L - 1, where L is the number of individuals.
There is no result for N-LDA for PIE database after 10 Train. Because the sample number in training set is larger than the dimension of feature, there was no null space for within scatter matrix Sw.
4.4. PIE_Small database
The PIE_Small database is a part of PIE database. To check the performance of the proposed method, we reduced number of pictures for each subject. Instead of 170 images, we took 15 images for each person as shown in Figure 7 and found that the performance of the proposed method is better than that of others especially for small sample size data. The applied divisions are n images per individual for training and 15 - n images per individual for testing, where n = 5, 6, 7, and 8. Furthermore, the experimental results were averaged over ten random sets for each division. Table 4 presents the least error rates and the corresponding dimensions. Both O-Laplacianfaces and NW-Fisherfaces outperformed the Fisherfaces and Eigenfaces. O-Laplacianfaces resulted in the least error rates on PIE_Small database. However, the dimensionality required by the NW-Fisherfaces to reach its least error rate is much lower than the dimensionalities required by other methods. As shown in Figure 8, NW-Fisherfaces outperformed other methods over the dimensions below L - 1, where L is the number of individuals.
4.5. AR database
In order to check the capability of invariance to lighting condition and face orientation, which have been better solved by 3D deformation approaches. We used AR face database for our proposed method and we found that it is giving better result compare to other method which has been proposed previously.
In this database, there are totally 126 subjects (70 men, 56 women) and each subject has 26 different images as shown in Figure 9. This had taken in different facial expressions, illumination conditions, and occlusions. The applied divisions are n images per individual for training and 13 - n images per individual for testing, where n = 5, 6, 7, and 8. Furthermore, the experimental results are averaged over ten random sets for each division. Table 5 and Figure 10 show the experimental results. The proposed NW-Fisherfaces still outperformed other methods with low dimensionality.
The pictures were taken at the CVC under strictly controlled conditions. No restrictions on wear (clothes, glasses, etc.), make-up, hair style, etc. were imposed to participants. Each person participated in two sessions, separated by 2 weeks (14 days) time. The same pictures were taken in both sessions.
In this face database, there are totally 13 expressions of each person. The expressions are as follows: Neutral expression, Smile, Anger, Scream, Left light on, Right light on, All side lights on, Wearing sun glasses, Wearing sun glasses and left light on, Wearing sun glasses and right light on, Wearing scarf, Wearing scarf and left light on, Wearing scarf and right light on 14 to 26: second session (same conditions as 1 to 13).
5. Conclusions and future works
The proposed NW-Fisherfaces consistently outperforms the Eigenfaces, Fisherfaces, D-LDA, and N-LDA methods.
This study applied a nonparametric feature extraction method into the scheme of appearance-based face recognition.
The proposed NW-Fisherfaces method weights the between-class scatter to emphasize boundary structure of the transformed face subspace and, therefore, enhances the discriminability of face recognition.
For practical applications, the computational load will depend on the dimensionality of the trained linear projection matrix. In this study, experimental results show that the proposed method can reach its lowest error rate with low dimensionality. Hence, the NW-Fisherfaces method is practical for real-world face recognition due to the low dimensionality requirement.
5.2. Future works
The future research in this area could involve the following.
The supervised OLPP weights the scatter matrix to preserve the locality of within class face. This weighting concept may enhance the within-class scatter of LDA and other LDA-based methods such as NDA and NWFE.
Linear feature extraction methods measure and optimize closeness between samples depending on Euclidean distance. However, Euclidean distance is basically light variant. Variance caused by lighting should be reduced before using linear feature extraction methods. Several solutions to reduce light variances of face images are proposed:
Mapping face images into the same intensity distribution by simple preprocessing such as histogram specification.
Transforming images into frequency domain by Fourier-based methods such as Gabor wavelets.
The performance of NW-Fisherfaces in nonlinear feature space, such as kernel Hilbert space, can be further evaluated.
Han PY, Jin ATB, Siong LH: Eigenvector weighting function in face recognition. Discrete Dyn Nat Soc 2011: 521935. doi:10.1155/2011/521935
Park SW, Savvides M: a multifactor extension of linear discriminant analysis for face recognition under varying pose and illumination. EURASIP J Adv Signal Process 2010: 158395. doi:10.1155/2010/158395
Wu X-J, Kittler J, Yang J-Y, Messer K, Wang S: A new direct LDA (D-LDA) algorithm for feature extraction in face recognition. Proceedings of the 17th International Conference on Pattern Recognition 2004 ICPR 2004 2004, 4: 545-548.
The authors would like to thank the editors and reviewers for their helpful comments. They would also like to thank Dr. Li-Wei Ko, Department of Electrical and Control Engineering, National Chiao-Tung University, Taiwan, for their support in writing this article, and Dr. Deng Cai, for his essential contribution and providing a complete framework on face recognition.
Authors and Affiliations
Department of Electrical Engineering, National Chiao-Tung University, 1001 University Road, Hsinchu, Taiwan, 300, ROC
Dong-Lin Li, Sheng-Chih Hsu, Chao-Ting Hong & Chin-Teng Lin
Institute of Computer Science and Engineering, National Chiao-Tung University, 1001 University Road, Hsinchu, Taiwan, 300, ROC
This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (
), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.