Facial expression recognition using local binary patterns and discriminant kernel locally linear embedding
© Zhao and Zhang; licensee Springer. 2012
Received: 4 October 2011
Accepted: 27 January 2012
Published: 27 January 2012
Given the nonlinear manifold structure of facial images, a new kernel-based supervised manifold learning algorithm based on locally linear embedding (LLE), called discriminant kernel locally linear embedding (DKLLE), is proposed for facial expression recognition. The proposed DKLLE aims to nonlinearly extract the discriminant information by maximizing the interclass scatter while minimizing the intraclass scatter in a reproducing kernel Hilbert space. DKLLE is compared with LLE, supervised locally linear embedding (SLLE), principal component analysis (PCA), linear discriminant analysis (LDA), kernel principal component analysis (KPCA), and kernel linear discriminant analysis (KLDA). Experimental results on two benchmarking facial expression databases, i.e., the JAFFE database and the Cohn-Kanade database, demonstrate the effectiveness and promising performance of DKLLE.
Affective computing, which is currently an active research area, aims at building the machines that recognize, express, model, communicate and respond to a user's emotion information . Within this field, recognizing human emotion from facial images, i.e., facial expression recognition, is increasingly attracting attention and has become an important issue, since facial expression provides the most natural and immediate indication about a person's emotions and intentions. Over the last decade, the importance of automatic facial expression recognition has increased significantly due to its applications to human-computer interaction (HCI), human emotion analysis, interactive video, indexing and retrieval of image, etc.
An automatic facial expression recognition system generally comprises of three crucial steps : face acquisition, facial feature extraction, and facial expression classification. Face acquisition is a preprocessing stage to detect or locate the face regions in the input images or sequences. One of the most widely used face detector is the real-time face detection algorithm developed by Viola and Jones , in which a cascade of classifiers is employed with Harr-wavelet features. Once a face is detected in the images, the corresponding face regions are usually normalized to have the same eye distance and the same gray level. Facial feature extraction attempts to find the most appropriate representation of facial images for recognition. There are mainly two approaches: geometric features-based systems and appearance features-based systems. In the geometric features-based systems, the shape and locations of major facial components such as mouth, nose, eyes, and brows, are detected in the images. Nevertheless, the geometric features-based systems require the accurate and reliable facial feature detection, which is difficult to realize in real-time applications. In the appearance features-based systems, the appearance changes (skin texture) of the facial images, including wrinkles, bulges, and furrows, are presented. Image filters, such as principal component analysis (PCA) , linear discriminant analysis (LDA) , regularized discriminant analysis (RDA)  and Gabor wavelet analysis [7, 8], can be applied to either the whole-face or specific face regions to extract the facial appearance changes. It's worth pointing out that it is computationally expensive to convolve facial images with a set of Gabor filters to extract multi-scale and multi-orientation coefficients. Moreover, in practice the dimensionality of Gabor features is so high that the computation and memory requirements are very large. In recent years, an effective face descriptor called local binary patterns (LBP) , originally proposed for texture analysis , have attracted extensive interest for facial expression representation. One of the most important properties of LBP is its tolerance against illumination changes and its computational simplicity. So far, LBP has been successfully applied as a local feature extraction method in facial expression recognition [11–13]. In the last step of an automatic facial expression recognition system, i.e., facial expression classification, a classifier is employed to identify different expressions based on the extracted facial features. The representative classifiers used for facial expression recognition are neural networks , the nearest neighbor (1-NN)  or k-nearest neighbor (KNN) classifier , and support vector machines (SVM) , etc.
In recent years, it has been proved that facial images of a person with varying expressions can be represented as a low-dimensional nonlinear manifold embedded in a high-dimensional image space [18–20]. Given the nonlinear manifold structure of facial expression images, two representative manifold learning (also called nonlinear dimensionality reduction) methods, i.e., locally linear embedding (LLE)  and isometric feature mapping (Isomap) , have been used to project the high-dimensional facial expression images into a low-dimensional embedded subspace in which facial expressions can be easily distinguished from each other [18–20, 23, 24]. However, LLE and Isomap fail to perform well on facial expression recognition tasks due to their unsupervised ways of failing to extract the discriminant information.
To overcome the limitations of unsupervised manifold learning methods for supervised pattern recognition, some supervised manifold learning algorithms have been recently proposed by means of a supervised distance measure, such as supervised locally linear embedding (SLLE)  using the linear supervised distance, probability-based LLE using a probability-based distance , locally linear discriminant embedding using a vector translation and distance rescaling model , and so forth. Among them, SLLE has become one of the most promising supervised manifold learning techniques due to its simple implementation, and successfully applied for facial expression recognition . However, SLLE still has two shortcomings. Firstly, due to the used linear supervised distance, the interclass dissimilarity in SLLE keeps increasing in parallel while the intraclass dissimilarity is increased. However, an ideal classification mechanism should maximize the interclass dissimilarity while minimizing the intraclass dissimilarity. In this sense, this kind of linear supervised distance in SLLE is not a good property for classification since it will go to a great extent to decrease the discriminating power of the low-dimensional embedded data representations produced with SLLE. Secondly, as a non-kernel method, SLLE cannot explore the higher-order information of input data as SLLE cannot employ the characteristic of a kernel-based learning, i.e., a nonlinear kernel mapping. To tackle the above-mentioned problems of SLLE, in this article a new kernel-based supervised manifold learning algorithm based on LLE, called discriminant kernel locally linear embedding (DKLLE), is proposed and applied for facial expression recognition. On one hand, with a nonlinear supervised distance measure, DKLLE considers both the intraclass scatter information and the interclass scatter information in a reproducing kernel Hilbert space (RKHS), and emphasizes the discriminant information. On the other hand, with kernel techniques DKLLE extracts the nonlinear feature information when mapping input data into some high dimensional feature space. In order to evaluate the performance of DKLLE on facial expression recognition, we adopt the LBP features as facial representations and then employ DKLLE to produce the low-dimensional discriminant embedded data representations from the extracted LBP features with striking performance improvement on facial expression recognition tasks. The facial expression recognition experiments are performed on two benchmarking facial expression databases, i.e., the JAFFE database  and the Cohn-Kanade database .
The remainder of this article is organized as follows: in Section 2, LBP is introduced briefly. In Section 3, LLE and SLLE are reviewed briefly. The proposed DKLLE algorithm is presented in detail in Section 4. In Section 5, experiments and results are given. Finally, the conclusions are summarized in Section 6.
Local binary patterns
The procedure of extracting LBP features for facial representations is implemented as follows:
First, a face image is divided into several non-overlapping blocks. Second, LBP histograms are computed for each block. Finally, the block LBP histograms are concatenated into a single vector. As a result, the face image is represented by the LBP code.
LLE and SLLE
Given the input data point x i ∈ R D and the output data point y i ∈ R d (i = 1,2,3,..., N), the standard LLE  consists of three steps:
Step 1: Find the number of nearest neighbors for each x i based on the Euclidean distance.
Step 2: Compute the reconstruction weights by minimizing the reconstruction error.
subject to two constraints: and W ij = 0, if x i and x j are not neighbors.
Step 3: Compute the low-dimensional embedding.
subject to two constraints: where I is the d × d identity matrix. To find the matrix Y under these constraints, a new matrix M is constructed based on the matrix W: M = (I-W) T (I-W). The d eigenvectors which correspond to the d smallest non-zero eigenvalues of M yield the final embedding Y.
where Δ is the distance matrix without considering the class label information, and Δ' is the distance integrating with the class label information. If x i and x j belong to the different classes, then Λ ij = 1 and Λ ij = 0 otherwise. In this formulation, the constant factor α (0 ≤ α ≤ 1) controls the amount to which the class information should be incorporated. At one extreme, when α = 0, we get the unsupervised LLE. At the other extreme, when α = 1, we get the fully supervised LLE (1-SLLE). As α varies between 0 and 1, a partially supervised LLE (α-SLLE) is obtained. From Eq. (3), it can be observed that when the intraclass dissimilarity (i.e., Δ' = Δ, when Λ ij = 0) is linearly increased, the interclass dissimilarity (i.e., Δ' = Δ +α max(Δ), when Λ ij = 1) keeps increasing in parallel, since α max(Δ) is a constant. Therefore, the used supervised distance measure in SLLE is linear.
The proposed DKLLE
A discirminant and kernel variant of LLE is developed by designing a nonlinear supervised distance measure and minimizing the reconstruction error in a RKHS, which gives rise to DKLLE.
Given the input data point (x i , L i ), where x i ∈ R D and L i is the class label of x i , the output data point is y i ∈ R d (i = 1,2,3,..., N). The detailed steps of DKLLE are presented as follows:
Step 1: Perform the kernel mapping for each data point x i .
A nonlinear mapping φ is defined as:
where κ is called a kernel.
Step 2: Find the nearest neighbors for each φ(x i ) by using a nonlinear supervised kernel distance.
where KDist is the supervised kernel distance matrix with the class label information, while Dist is the kernel Euclidean distance matrix without the class label information. α is a constant factor (0 ≤ α ≤ 1) and gives a certain chance for the data points in different classes to be more similar so that the dissimilarity in different classes may be smaller than that in the same class. β is used to prevent the supervised kernel distance matrix KDist from increasing too fast when the kernel Euclidean distance matrix Dist is relatively large, since Dist is in the exponent. Hence, the value of β should depend on the "density" of data sets and it is usually feasible to set β to be the average kernel Euclidean distance between all pairs of data points.
As shown in Eq. (6), we can make two observations. First, both the interclass dissimilarity and the intraclass dissimilarity in KDist, is monotone increasing with respect to the kernel Euclidean distance. This ensures that the main geometric structure of the original data sets can be preserved well in the process of dimensionality reduction. Second, the interclass dissimilarity in KDist can be always definitely larger than the intraclass dissimilarity, conferring a high discriminating power of DKLLE's low-dimensional embedded data representations. This is a good property for classification.
Step 3: Measure the reconstruction error in a RHKS.
Therefore, the reconstruction weights can be computed by the kernel matrix's eigenvalues and eigenvectors.
Step 4: Compute the final embedding.
where M = (I-W) T (I-W), subject to two constraints: . The final embedding Y comprises d eigenvectors corresponding to d smallest non-zero eigenvalues of M
Experiments and results
To verify the effectiveness of the proposed DKLLE, we use two benchmarking facial expression databases, i.e., the JAFFE database  and the Cohn-Kanade Database , for facial expression recognition experiments. Each database contains seven emotions: anger, joy, sad, neutral, surprise, disgust, and fear. The performance of DKLLE is compared with LLE, SLLE, PCA, LDA, kernel principal component analysis (KPCA) , and kernel linear discriminant analysis (KLDA) . The typical Gaussian kernel is used for KPCA, KLDA, and DKLLE, and the parameter σ is empirically set to 1 for its satisfying performance. The number of nearest neighbors for LLE, SLLE, and DKLLE is fixed with an adaptive neighbor selection technique . To cope with the embeddings of the new samples, the out-of-sample extensions of LLE and SLLE are developed by an existed linear generalization technique , in which a linear relation is built between the high and low-dimensional spaces and then the adaptation to a new sample can be done by updating the weight matrix W. As a kernel method, the proposed DKLLE can directly project the new samples into a low-dimensional space by using a kernel trick as in KPCA. For simplicity, the nearest neighbor (1-NN) classifier with the Euclidean metric is used for facial expression classification. A 10-fold cross validation scheme is employed in 7-class facial expression recognition experiments, and the average recognition results are reported.
Due to the computation complexity constraint, the reduced dimension is confined to the range [2, 100] with an interval of 5. An exception is that in the low range [2, 10] we present the recognition results of each reduced dimension with a small interval of 1, since the reduced dimension of LDA and KLDA is at most c-1, where c is the number of facial expression classes. In each reduced dimension, the constant α (0 ≤ α ≤ 1) for SLLE and DKLLE can be optimized using a simple exhaustive search within a scope (α = 0,0.1,0.2,...,1).
As done in [11, 12], on the JAFFE database and the Cohn-Kanade Database, the eye distance of facial images was normalized to a fixed distance of 55 pixels once the centers of two eyes were located. Generally, it is observed that the width of a face is roughly two times of the distance, and the height is roughly three times. Therefore, based on the normalized value of the eye distance, a resized image of 110 × 150 pixels was cropped from the original images. To locate the centers of two eyes, automatic face registration was performed by using the robust real-time face detector developed by Viola and Jones . From the results of automatic face detection including face location, face width, and face height, two square bounding boxes for left eye and right eye were automatically constructed by using the geometry of a typical up-right face which has been widely used to find a proper spatial arrangement of facial features . Then, the center locations of two eyes could be automatically worked out in terms of the centers of two square bounding boxes for left eye and right eye. No further alignment of facial features such as alignment of mouth was performed. Additionally, there was no attempt made to remove illumination changes due to LBP's gray-scale invariance.
When the facial images of 110 × 150 pixels, including mouth, eyes, brows, and noses, were cropped from the original images, the LBP operator was applied to each cropped image and extracted the LBP features. As suggested in [10–12], we selected the 59-bin LBP operator, and divided the 110 × 150 pixels facial images into 42 (6 × 7) blocks, and finally extracted the LBP features represented by the length of 2478 (59 × 42).
Experiments on the JAFFE database
The best accuracy (std) of different methods on the JAFFE database
80.81 ± 3.6
78.09 ± 4.2
80.93 ± 3.9
78.47 ± 4.0
75.24 ± 3.8
78.57 ± 4.0
84.06 ± 3.8
From the results in Figure 3 and Table 1, we can see that DKLLE achieves the highest accuracy of 84.06% at 40 reduced dimension, outperforming the other methods. More crucially, DKLLE makes about 9% improvement over LLE and about 6% improvement over SLLE. This demonstrates that DKLLE is able to extract the most discriminative low-dimensional embedded data representations for facial expression recognition. Note that it's difficult to perform directly a comparison with all the previously reported work on the JAFFE database due to the different experimental settings. Nevertheless, in our work with LBP-based 1-NN the reported accuracy of 84.06% is still very encouraging compared with the previously published work  similar to our experimental settings. In , after extracting the most discriminative LBP (called boosted-LBP) features, they used SVM and separately obtained 7-class facial expression recognition accuracy of 79.8, 79.8, and 81.0% with linear, polynomial, and radial basis function (RBF) kernels. It's worth pointing out that in this work for simplicity we did not use the boosted-LBP features and SVM. To further compare the performance of DKLLE with the work in , we will explore the performance of the boosted-LBP features and SVM integrating with DKLLE in our future work.
Confusion matrix of recognition results with DKLLE on the JAFFE database
Experiments on the Cohn-Kanade database
The best accuracy (std) of different methods on the Cohn-Kanade database
90.18 ± 3.0
92.43 ± 3.3
93.32 ± 3.0
92.59 ± 3.6
83.67 ± 3.4
92.64 ± 3.2
95.85 ± 3.2
Confusion matrix of recognition results with DKLLE on the Cohn-Kanade database
A new kernel-based supervised manifold learning algorithm, called DKLLE, is proposed for facial expression recognition. DKLLE has two prominent characteristics. First, as a kernel-based feature extraction method, DKLLE can extract the nonlinear feature information embedded on a data set, as KPCA and KLDA does. Second, DKLLE is designed to obtain a high discriminating power for its low-dimensional embedded data representations in an effort to improve the performance on facial expression recognition. Experimental results on the JAFFE database and the Cohn-Kanade Database show that DKLLE not only makes an obvious improvement over LLE and SLLE, but also outperforms the other used methods including PCA, LDA, KPCA, and KLDA.
This work was supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. Z1101048 and Grant No. Y1111058.
- Picard RW: Affective Computing. The MIT Press, Cambridge; 2000.Google Scholar
- Tian Y, Kanade T, Cohn J: Facial expression analysis, Handbook of face recognition. Springer, Heidelberg; 2005:247-275.View ArticleGoogle Scholar
- Viola P, Jones M: Robust real-time face detection. Int J Comput Vision 2004, 57(2):137-154.View ArticleGoogle Scholar
- Turk MA, Pentland AP: Face recognition using eigenfaces. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, HI USA; 1991:586-591.Google Scholar
- Belhumeur PN, Hespanha JP, Kriegman DJ: Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 1997, 19(7):711-720. 10.1109/34.598228View ArticleGoogle Scholar
- Lee CC, Huang SS, Shih CY: Facial affect recognition using regularized discriminant analysis-based algorithms. EURASIP J Adv Signal Process 2010, 2010: 10.Google Scholar
- Daugman JG: Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression. IEEE Trans Acoust Speech Signal Process 1988, 36(7):1169-1179. 10.1109/29.1644View ArticleGoogle Scholar
- Shen L, Bai L: Information theory for Gabor feature selection for face recognition. EURASIP J Adv Signal Process 2006, 2006: 11.View ArticleGoogle Scholar
- Ahonen T, Hadid A, Pietikäinen M: Face description with local binary patterns: Application to face recognition. IEEE Trans Pattern Anal Mach Intell 2006, 28(12):2037-2041.View ArticleGoogle Scholar
- Ojala T, Pietikäinen M, Mäenpää T: Multiresolution gray scale and rotation invariant texture analysis with local binary patterns. IEEE Trans Pattern Anal Mach Intell 2002, 24(7):971-987. 10.1109/TPAMI.2002.1017623View ArticleGoogle Scholar
- Shan C, Gong S, McOwan P: Robust facial expression recognition using local binary patterns. In IEEE International Conference on Image Processing (ICIP). Volume 2. IEEE Computer Society, Italy; 2005:370-373.Google Scholar
- Shan C, Gong S, McOwan P: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 2009, 27(6):803-816. 10.1016/j.imavis.2008.08.005View ArticleGoogle Scholar
- Moore S, Bowden R: Local binary patterns for multi-view facial expression recognition. Comput Vis Image Understand 2011, 115: 541-558. 10.1016/j.cviu.2010.12.001View ArticleGoogle Scholar
- Tian Y, Kanade T, Cohn J: Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 2002, 23(2):97-115.View ArticleGoogle Scholar
- Lyons MJ, Budynek J, Akamatsu S: Automatic classification of single facial images. IEEE Trans Pattern Anal Mach Intell 1999, 21(12):1357-1362. 10.1109/34.817413View ArticleGoogle Scholar
- Sebe N, Lew MS, Sun Y, Cohen I, Gevers T, Huang TS: Authentic facial expression analysis. Image Vis Comput 2007, 25(12):1856-1863. 10.1016/j.imavis.2005.12.021View ArticleGoogle Scholar
- Kotsia I, Pitas I: Facial expression recognition in image sequences using geometric deformation features and support vector machines. IEEE Trans on Image Process 2007, 16(1):172-187.MathSciNetView ArticleGoogle Scholar
- Chang Y, Hu C, Turk M: Manifold of facial expression. In IEEE International Workshop on Analysis and Modeling of Faces and Gestures. IEEE Computer Society, France; 2003:28-35.Google Scholar
- Shan C, Gong S, McOwan PW: Appearance manifold of facial expression. In Computer Vision in Human-Computer Interaction, Lecture Notes in Computer Science. Volume 3766. Springer, China; 2005:221-230. 10.1007/11573425_22View ArticleGoogle Scholar
- Chang Y, Hu C, Feris R, et al.: Manifold based analysis of facial expression. Image Vis Comput 2006, 24(6):605-614. 10.1016/j.imavis.2005.08.006View ArticleGoogle Scholar
- Roweis ST, Saul LK: Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290(5500):2323-2326. 10.1126/science.290.5500.2323View ArticleGoogle Scholar
- Tenenbaum JB, de Silva V, Langford JC: A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290(5500):2319-2323. 10.1126/science.290.5500.2319View ArticleGoogle Scholar
- Cheon Y, Kim D: Natural facial expression recognition using differential-AAM and manifold learning. Pattern Recogn 2009, 42(7):1340-1350. 10.1016/j.patcog.2008.10.010View ArticleGoogle Scholar
- Xiao R, Zhao Q, Zhang D, Shi P: Facial expression recognition on multiple manifolds. Pattern Recogn 2011, 44(1):107-116. 10.1016/j.patcog.2010.07.017View ArticleGoogle Scholar
- de Ridder D, Kouropteva O, Okun O, Pietikäinen M, Duin RPW: Supervised locally linear embedding. In Artificial Neural Networks and Neural Information Processing-ICANN/ICONIP-2003, Lecture Notes in Computer Science. Volume 2714. Springer, Heidelberg; 2003:333-341. 10.1007/3-540-44989-2_40View ArticleGoogle Scholar
- Zhao L, Zhang Z: Supervised locally linear embedding with probability-based distance for classification. Comput Math Appl 2009, 57(6):919-926. 10.1016/j.camwa.2008.10.055View ArticleGoogle Scholar
- Li B, Zheng C-H, Huang D-S: Locally linear discriminant embedding: An efficient method for face recognition. Pattern Recogn 2008, 42(12):3813-3821.View ArticleGoogle Scholar
- Liang D, Yang J, Zheng Z, Chang Yuchou: A facial expression recognition system based on supervised locally linear embedding. Pattern Recogn Lett 2005, 26(15):2374-2389. 10.1016/j.patrec.2005.04.011View ArticleGoogle Scholar
- Kanade T, Tian Y, Cohn J: Comprehensive database for facial expression analysis. In International Conference on Face and Gesture Recognition. Volume 4. IEEE Computer Society, France; 2000:46-53.Google Scholar
- Scholkopf B: The kernel trick for distances, in advances in neural information processing systems. MIT Press Cambridge, Canada; 2001:301-307.Google Scholar
- Scholkopf B, Smola A, Muller K: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 1998, 10(5):1299-1319. 10.1162/089976698300017467View ArticleGoogle Scholar
- Baudat G, Anouar F: Generalized discriminant analysis using a kernel approach. Neural comput 2000, 12(10):2385-2404. 10.1162/089976600300014980View ArticleGoogle Scholar
- Wang J, Zhang Z, Zha H: Adaptive manifold learning. In Advances in neural information processing systems. Volume 17. MIT Press Cambridge, Canada; 2005:1473-1480.Google Scholar
- Saul LK, Roweis ST: Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Mach Learn Res 2003, 4: 119-155.MathSciNetGoogle Scholar
- Campadelli P, Lanzarotti R, Lipori G, Salvi E: Face and facial feature localization. In International Conference on Image Analysis and Processing. Springer, Heidelberg, Italy; 2005:1002-1009.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.