- Research
- Open Access
3D facial expression recognition using maximum relevance minimum redundancy geometrical features
- Habibu Rabiu^{1}Email author,
- M Iqbal Saripan^{1},
- Syamsiah Mashohor^{1} and
- Mohd Hamiruce Marhaban^{1}
https://doi.org/10.1186/1687-6180-2012-213
© Rabiu et al.; licensee Springer. 2012
- Received: 1 March 2012
- Accepted: 13 September 2012
- Published: 3 October 2012
Abstract
In recent years, facial expression recognition (FER) has become an attractive research area, which besides the fundamental challenges, it poses, finds application in areas, such as human-computer interaction, clinical psychology, lie detection, pain assessment, and neurology. Generally the approaches to FER consist of three main steps: face detection, feature extraction and expression recognition. The recognition accuracy of FER hinges immensely on the relevance of the selected features in representing the target expressions. In this article, we present a person and gender independent 3D facial expression recognition method, using maximum relevance minimum redundancy geometrical features. The aim is to detect a compact set of features that sufficiently represents the most discriminative features between the target classes. Multi-class one-against-one SVM classifier was employed to recognize the seven facial expressions; neutral, happy, sad, angry, fear, disgust, and surprise. The average recognition accuracy of 92.2% was recorded. Furthermore, inter database homogeneity was investigated between two independent databases the BU-3DFE and UPM-3DFE the results showed a strong homogeneity between the two databases.
Keywords
- 3D facial expression recognition
- Maximum relevance minimum redundancy feature selection
- Inter database homogeneity
- Multi-class SVM classifier
Introduction
Facial expression recognition (FER) refers to the study of facial changes elicited as a result of relative changes in the shape and positions of the main facial components, such as eyebrows, eyelids, nose, lips, cheeks, and chin. Other subtle changes caused by contraction of facial muscles causing wrinkles or bulges are also considered. The subject has been researched for decades since the pioneer study of Darwin et al.[1]. The further study by Ekman on facial action coding system[2] in which relative facial muscle movements are described by action units, inspired many researchers to work on facial expression analysis, understanding, and recognition[3–10].
The earlier study on the subject was dominated by 2-dimensional (2D) based techniques and recorded impressive results, as presented in[6, 11, 12]. However, illumination changes and pose variation are two challenges that remained unsolved by 2D modalities. With the current advancement in 3D technology, which leads to faster and cheaper 3D acquiring equipment, many 3D facial expression databases have emerged. The BU-3DFE database of Binghamton University has been the most patronized recently by researchers[8–10, 13]. The first study on the database by Wang et al.[14] used primitive features from the seven regions of the face as features. Using linear discriminate analysis (LDA) classifier they reported a mean recognition rate of 83.6%. Tie Yun and Ling Guan[15] using the same database extracted 3D Gabor features and reported an average recognition accuracy of 85.39%. Xiaoli et al.[8] used a 28 geometrical feature set to recognize seven basic expressions and recorded a recognition rate of 90.2% using the PNN classifier. Hao and Thomas used 96 lines and their slopes[9] to recognize six basic expressions and recorded a mean recognition rate of 87.1% using the SVM classifier. Soyel and Hassan classified the seven fundamental expressions[10] using six distance measures and reported a mean recognition accuracy of 91.3%. Berretti et al.[13] used SIFT features detected from the depth image to recognize the seven facial expressions. They reported an average recognition rate of 78.43% using the SVM classifier. Tekguc et al.[16] used the NSGA-II feature extraction technique to classify the seven facial expressions and reported an average recognition rate of 88.1% using the PNN classifier.
There are basically two unequivocal approaches for the analysis of facial expressions. The message based method, bases its expression recognition on finding what underlies the displayed expression. Its’ main target is to interpret the displayed expression into one of the six universally accepted expressions, i.e.; happy, sad, angry, fear, disgust, and surprise. Conversely, the sign based method tries to detect and describe the relative positions and shapes of the main facial components, such as eyes, brows, and mouth, while leaving the interpretation of the shown expression to a high level of decision making[17].
In this study, we propose a message based facial expression recognition system using geometrical features. The selection of these features was guided by MPEG-4 and FACS[2, 18]. To strengthen these features; we adopted the mRMR algorithm to select the top most relevant and non-redundant features. The selected features are finally passed to the multi-class SVM classifier for appropriate categorization. In the second part of the experiment, we tested the inter database homogeneity. A hybrid database was formed by pooling an equal number of samples from the two databases, BU-3DFE and UPM-3DFE.
Database description
The UPM-3DFE database was recently developed in our laboratory. It contains facial images of 50 persons. The subjects were drawn from different ancestral and ethnic backgrounds, such as Africans, Arabs, Malays, Chinese, Indians, and Persians. Among the 50 subjects recorded, 30 are males while the remaining 20 are females. Each subject was asked to portray the six basic expressions happy, sad, angry, fear, disgust, and surprise. Included in the database are 32 manually annotated land marks conspicuously placed on fiducial points of the cropped face mesh model, as shown in Figure2b.
The 3D Flexscan (V2.6) system was used in acquiring our 3D facial images. The system consists of two high vision cameras placed at a distance of 30 inches apart with a projector mounted between them. The projector projects different binary patterns onto the subject’s face, while the two cameras captures the pattern as deformed by the subject’s facial components. Using the stereo photometry technique, the system automatically determines correspondences between the images captured by the two cameras and merges them into a single 3D face model, with a resolution of 25 to 35 K polygons per model. The whole exercise is controlled and coordinated by a computer system.
We manually identified and annotated 32 expressive sensitive points on each face mesh in the database. To have a more reliable landmark localization, two different persons annotated the database independently. The final landmark points were determined by averaging the two manually labeled face meshes. These feature points are intended to be used as ground-truth reference.
Feature extraction
In many pattern recognition applications, feature selection focuses on identifying the most significant features that have relevance to the interpretation of the target classes. A properly selected feature set, not only leads to higher classification accuracy, but also results in a faster computation and reduction in storage capacity.
Distance vectors
- (I)A common subset of 29 feature points was extracted from each database, with each vertex having x,y,z coordinates. Let each feature point be represented by ß, then the total feature points per face mesh can be given as:$\phantom{\rule{1em}{0ex}}{\text{\xdf}}_{i}=({x}_{i},{y}_{i},{z}_{i})$(1)
for i=1,2,…,29.
- (II)
Due to relative rigidity of the nose to expression changes, a point in the nose neighborhood was assigned to be the origin of the communal coordinate system; this point was given by the 16th point of the subset matrix, that is ß(16,:).
- (III)To convert each subset into the communal coordinate system, the ß(16,:) coordinate was subtracted from each entry of the matrix ß(i). The transformed matrix is now given by β$\phantom{\rule{1em}{0ex}}\mathrm{\beta i}={\text{\xdf}}_{i}-{\text{\xdf}}_{i(16,:)}$(2)
for i=1,2,…,29.
- (IV)To transform all the feature points into a standard scale eachβ_{ i } was normalized. The normalization was achieved by dividing the point with the distance between the two inner corners of the eyes (ß_{10} − ß_{9}), Figure3b. The normalized points were assigned asβ^{ ′ }.${\beta}_{i}^{\prime}=\frac{{\beta}_{i}}{{\beta}_{10}-{\beta}_{9}}$(3)
for i=1,2,…,29.
The feature points${\beta}_{i}^{\prime}$ are now in a communal coordinate system and in a standard scale, this procedure helps in canceling out all inter personnel variation, and thus, the feature points${\beta}_{i}^{\prime}$ are at now ready for any objective measurements or comparison.
Distance calculation
for i = 1,2,…,29.
Where, k and j are end points of the line segment under consideration.
The determined distancesd_{ i }are further combined to instigate a more meaningful description of the face component’s expression contribution.
Angles calculation
Calculated slopes
${s}_{1}=({\beta}_{2}^{\prime}-{\beta}_{1}^{\prime})$ | ${s}_{2}=({\beta}_{2}^{\prime}-{\beta}_{3}^{\prime})$ | ${s}_{3}=({\beta}_{5}^{\prime}-{\beta}_{4}^{\prime})$ | ${s}_{4}=({\beta}_{5}^{\prime}-{\beta}_{6}^{\prime})$ |
${s}_{7}=({\beta}_{9}^{\prime}-{\beta}_{8}^{\prime})$ | ${s}_{8}=({\beta}_{9}^{\prime}-{\beta}_{13}^{\prime})$ | ${s}_{9}=({\beta}_{7}^{\prime}-{\beta}_{1}^{\prime})$ | ${s}_{7}=({\beta}_{2}^{\prime}-{\beta}_{8}^{\prime})$ |
${s}_{11}=({\beta}_{7}^{\prime}-{\beta}_{13}^{\prime})$ | ${s}_{12}=({\beta}_{9}^{\prime}-{\beta}_{3}^{\prime})$ | ${s}_{13}=({\beta}_{9}^{\prime}-{\beta}_{8}^{\prime})$ | ${s}_{14}=({\beta}_{9}^{\prime}-{\beta}_{13}^{\prime})$ |
${s}_{15}=({\beta}_{10}^{\prime}-{\beta}_{4}^{\prime})$ | ${s}_{16}=({\beta}_{10}^{\prime}-{\beta}_{11}^{\prime})$ | ${s}_{17}=({\beta}_{10}^{\prime}-{\beta}_{14}^{\prime})$ | ${s}_{18}=({\beta}_{12}^{\prime}-{\beta}_{6}^{\prime})$ |
${s}_{19}=({\beta}_{12}^{\prime}-{\beta}_{11}^{\prime})$ | ${s}_{20}=({\beta}_{12}^{\prime}-{\beta}_{14}^{\prime})$ | ${s}_{21}=({\beta}_{21}^{\prime}-{\beta}_{7}^{\prime})$ | ${s}_{22}=({\beta}_{21}^{\prime}-{\beta}_{16}^{\prime})$ |
${s}_{23}=({\beta}_{21}^{\prime}-{\beta}_{22}^{\prime})$ | ${s}_{24}=({\beta}_{21}^{\prime}-{\beta}_{28}^{\prime})$ | ${s}_{25}=({\beta}_{21}^{\prime}-{\beta}_{29}^{\prime})$ | ${s}_{26}=({\beta}_{22}^{\prime}-{\beta}_{17}^{\prime})$ |
${s}_{27}=({\beta}_{23}^{\prime}-{\beta}_{24}^{\prime})$ | ${s}_{28}=({\beta}_{23}^{\prime}-{\beta}_{22}^{\prime})$ | ${s}_{29}=({\beta}_{24}^{\prime}-{\beta}_{19}^{\prime})$ | ${s}_{30}=({\beta}_{25}^{\prime}-{\beta}_{12}^{\prime})$ |
${s}_{31}=({\beta}_{25}^{\prime}-{\beta}_{20}^{\prime})$ | ${s}_{32}=({\beta}_{25}^{\prime}-{\beta}_{24}^{\prime})$ | ${s}_{33}=({\beta}_{25}^{\prime}-{\beta}_{26}^{\prime})$ | ${s}_{34}=({\beta}_{25}^{\prime}-{\beta}_{31}^{\prime})$ |
${s}_{35}=({\beta}_{27}^{\prime}-{\beta}_{26}^{\prime})$ | ${s}_{36}=({\beta}_{27}^{\prime}-{\beta}_{8}^{\prime})$ | ${s}_{37}=({\beta}_{30}^{\prime}-{\beta}_{31}^{\prime})$ | ${s}_{38}=({\beta}_{30}^{\prime}-{\beta}_{29}^{\prime})$ |
Having calculated all the relevant slopes, then using Equation (22), we determine all the angles fromθ_{1} toθ_{27} and form angle vectors represented by ϕ, as follows:
for i = 1,2,…,N, where N is the number of face models.
Feature dimensionality reduction
Dimensionality reduction (DR) techniques are data pre-processing steps that aim to find a suitable low dimensional representation of the original data, that accurately and sufficiently represents the original data. DR has been a successful paradigm for automatically identifying and selecting the latent features, while removing the redundant ones[22]. Mathematically, the data reduction problem can be explained as follows: given an n-dimensional feature vector P = (p_{1}p_{2},…,p_{ n }), the objective of DR is to find a representation in lower dimension Q = (q_{1}q_{2},…,q_{ m }), where, m < n, which sufficiently represents the original data P.
Basically there are two modalities to DR: (i) supervised based technique, which utilizes both the training data and the target label information to learn the lower dimensional representation, it then selects a subset from the existing feature set. examples of the supervised methods are maximum margin criteria (MMC), maximum relevance minimum redundancy (mRMR) and linear discriminate analysis (LDA), (ii) the unsupervised technique, however, transform the existing features by rotating and projecting them onto a minimal number of axes without using the target labels. Example of the unsupervised method is, the principal components analysis (PCA). In this study, we utilized the maximum relevance minimum redundancy mRMR algorithm, which reduces the features’ dimensions by selecting the most relevant features while removing the redundant ones.
where x is the feature from set of selected features X, and y is a class label from the set of target prototypes Y .
Expression classification
The reduced features are now ready to be classified into their appropriate categories. To achieve this, we designed a multi-class SVM classifier to categorize these vectors into their appropriate groups.
Support vector machine
SVM is basically designed to solve two class recognition problems. The goal is to finding a hyperplane with maximum margin that linearly separates the given instances into two distinct clusters. For classification task of more than two classes, the basic SVM need to be extended using either of the two methods One versus one, which classify between each pair of labels or one versus all where the classification is between each class and the all remaining classes. In this study, we adopted one versus one strategy to classify the seven expressions considered.
where,x_{ i } is the reduced feature vector of dimension i,μ_{ i } is the mean along the feature i, andσ_{ i }is standard deviation along feature i dimension.
We then constructed and trained 1/2N(N−1)unique pair SVM classifiers. To produce the classification results, the test feature vectors, however, were tested against all the trained SVM models. The majority voting scheme is finally employed to predict the class of ensemble outputs, using the strategy of the winner takes all[25, 26]. In each round of the experiment, 21 unique SVM classifiers are trained to classify the seven basic expressions. To ensure person in dependency, the intersection between the training and testing sets was always curbed to zero, meaning that any subject belonging to the testing set will not appear in the training set. Using ten-fold cross validation, the experiment is repeated 10 times so that each of the 10 subset is tested at least once. Since average recognition accuracy varies from experiment to experiment, to increase the reliability of the experiment and get a more stable recognition result, the experiment was run 100 times. Finally, the results were averaged. At the end of each round of the test cycle, all classifiers are reset and retrained in the next cycle.
In the second segment of the experiment, we created a hybrid database of sixty subjects; 30 subjects were drawn from BU-3DFE while the remaining 50% were drawn from UPM-3DFE. The purpose of this experiment is to investigate the inter database homogeneity. This will allow experiments requiring samples larger than one can draw from a single database, to easily be made possible by simply pooling from similar databases to satisfy the sample size requirement. Following this, the same procedures used in the first segment of the experiment were repeated.
Results and discussion
Average confusion matrix (BU-3DFE)
Neutral | Happy | Sad | Angry | Fear (%) | Disgust | Surprise | |
---|---|---|---|---|---|---|---|
(%) | (%) | (%) | (%) | (%) | (%) | (%) | |
Neutral | 91.1 | 0.0 | 5.6 | 3.3 | 0.0 | 0.0 | 0.0 |
Happy | 0.0 | 97.6 | 0.0 | 0.0 | 1.8 | 0.6 | 0.0 |
Sad | 8.4 | 0.0 | 85.5 | 6.1 | 0.0 | 0.0 | 0.0 |
Angry | 6.7 | 0.0 | 3.8 | 86.8 | 0.8 | 1.9 | 0.0 |
Fear | 4.2 | 1.2 | 3.3 | 1.6 | 89.7 | 0.0 | 0.0 |
Disgust | 1.4 | 0.0 | 2.7 | 0.0 | 0.0 | 95.9 | 0.0 |
Surprise | 0.0 | 0.2 | 0.0 | 0.0 | 0.0 | 1.1 | 98.7 |
Average confusion matrix (BU-3DFE + UPM-3DFE)
Neutral | Happy | Sad | Angry | Fear | Disgust | Surprise | |
---|---|---|---|---|---|---|---|
(%) | (%) | (%) | (%) | (%) | (%) | (%) | |
Neutral | 88.8 | 0.0 | 3.8 | 4.6 | 1.6 | 1.2 | 0.0 |
Happy | 0.0 | 93.8 | 0.0 | 0.0 | 2.6 | 1.9 | 1.7 |
Sad | 10.2 | 0.0 | 83.6 | 3.9 | 1.8 | 0.5 | 0.0 |
Angry | 8.8 | 0.0 | 5.2 | 84.7 | 0.0 | 1.3 | 0.0 |
Fear | 6.1 | 0.8 | 3.2 | 2.3 | 86.9 | 0.7 | 0.0 |
Disgust | 4.7 | 0.0 | 3.0 | 2.1 | 0.0 | 89.4 | 0.8 |
Surprise | 0.0 | 2.6 | 0.0 | 0.0 | 0.0 | 2.1 | 95.3 |
Conclusion
In this article, we proposed a 3D facial expression recognition using maximum relevance and minimum redundancy face geometry features. We fragment the experiment into two segments. In the first segment, we made use of the BU-3DFE database and multi-class SMV classifier; we achieved a mean classification rate of 92.2%, showing a consistence improved performance in all the expression classes as compared to the some related studies. In the second segment, we investigated the inter database homogeneity by forming a hybrid face database drawn from the BU-3DFE and UPM-3DFE. We performed facial expression recognition using the same setup as in the first segment. The performance recorded here is slightly less than that of the first segment with the average recognition rate of 88.9%; this short fall can be attributed to the different landmark methods used in labeling the two databases.
Declarations
Authors’ Affiliations
References
- Darwin C, Ekman P, Prodger P: The Expression of the Emotions in Man and Animals, 1st edn. (Oxford University Press, USA, 2002), pp. 19–25Google Scholar
- Ekman P, Friesen WV: Facial action coding system. Manual Facial Act. Cod. Syst 1977: (Consulting Psychologists Press, Stanford University, Palo Alto, 1977)Google Scholar
- Zhao X, Zhang S: Facial expression recognition using local binary patterns and discriminant kernel locally linear embedding. EURASIP J. Adv. Si. Pr 2012., 2012(1): 10.1186/1687-6180-2012-20Google Scholar
- Ioannou SV, Raouzaiou AT, Tzouvaras VA, Mailis TP, Karpouzis KC, Kollias SD: Emotion recognition through facial expression analysis based on a neurofuzzy network. Neu. Netws 2005, 18(4):423-435. 10.1016/j.neunet.2005.03.004View ArticleGoogle Scholar
- Wang J, Yin L: Static topographic modelling for facial expression recognition and analysis. Comput. Vis. Image. Und 2007, 108(1-2):19-34. 10.1016/j.cviu.2006.10.011View ArticleGoogle Scholar
- Kotsia I, Zafeiriou S, Pitas I: Texture and shape information fusion for facial expression and facial action unit recognition. Pattern. Recogn 2008, 41(3):833-851. 10.1016/j.patcog.2007.06.026View ArticleMATHGoogle Scholar
- Porta M: Human-Computer input and output techniques: an analysis of current research and promising applications. Artif. Intell. Rev 2007, 28(3):197-226. 10.1007/s10462-009-9098-5View ArticleGoogle Scholar
- Li X, Ruan Q, Ming Y: 3D Facial expression recognition based on basic geometric features. IEEE International Conference on Signal Processing (ICSP) (China, 2010), pp. 1366–1369. 10.1109/ICOSP.2010.5656891Google Scholar
- Tang H, Huang TS: 3D facial expression recognition based on properties of line segments connecting facial feature points. IEEE International Conference on Automatic Face Gesture Recognition (AFGR) (Netherlands, 2008), pp. 1–6. 10.1109/AFGR.2008.4813304Google Scholar
- Soyel H, Demirel H: 3D facial expression recognition with geometrically localized facial features, In International Symposium on Computer and Information Sciences (ISCIS). (Turkey, 2008); pp. 1–4. 10.1109/ISCIS.2008.4717898Google Scholar
- Fasel B, Luettin J: Automatic facial expression analysis: a survey. Pattern Recogn 2003, 36(1):259-275. 10.1016/S0031-3203(02)00052-3View ArticleMATHGoogle Scholar
- Samal A, Iyengar PA: Automatic recognition and analysis of human faces and facial expressions: a survey. Pattern recogn 1992, 25(1):65-77. 10.1016/0031-3203(92)90007-6View ArticleGoogle Scholar
- Berretti S, Bimbo AD, Pala P, Amor BB, Daoudi M: A set of selected SIFT features for 3D facial expression recognition. International Conference on Pattern Recognition (ICPR), vol.(2010) International Association for Pattern Recognition (IAPR) (Turkey, 2010), pp. 4125–4128Google Scholar
- Wang J, Yin L, Wei X, Sun Y: 3D facial expression recognition based on primitive surface feature distribution. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol.(2) (New York, 2006), pp. 1399–1406Google Scholar
- Yun T, Guan L: Human emotion recognition using real 3D visual features from Gabor library. IEEE International Workshop on Multimedia Signal Processing (MMSP) (France, 2010), pp. 05–510. 10.1109/MMSP.2010.5662073Google Scholar
- Tekguc U, Soyel H, Demirel H: Feature selection for person-independent 3D facial expression recognition using NSGA-II. International Symposium on Computer and Information Sciences (ISCIS) (Cyprus, 2009), pp. 35–38. doi:10.1109/ISCIS.2009.5291925Google Scholar
- Pantic M, Bartlett MS: Machine Analysis of Facial Expressions. (I-Tech Education and Publishing, Austria, 2007), pp. 377–416Google Scholar
- Ostermann J: Face animation in MPEG-4, MPEG-4 facial animation. 2002, 2002: 17-55. 10.1002/0470854626.ch2View ArticleGoogle Scholar
- Yin L, Wei X, Sun Y, Wang J, Rosato MJ: A 3D facial expression database for facial behaviour research. proceedings of IEEE Automatic face and gesture recognition’06 (2006), pp. 211–216. doi:10.1109/FGR.2006.6Google Scholar
- Salah AA, Alyuz N, Akarun L: Registration of three-dimensional face scans with average face models. J. Elec. Imag 2008, 17(1):011006. 10.1117/1.2896291View ArticleGoogle Scholar
- Stroud KA, Booth DJ: Engineering Mathematics, 5th edn. (Industrial Pr. Inc., New York, 2001), pp. 571–596Google Scholar
- Kumar AC: Analysis of unsupervised dimensionality reduction techniques. Comput. Sci. Inf. Syst 2009, 6(2):217-227. 10.2298/CSIS0902217KView ArticleGoogle Scholar
- Peng H, Long F, Ding C: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell 2005, 27(8):1226-1238.View ArticleGoogle Scholar
- Bonev B, Escolano F, Cazorla M: Feature selection, mutual information, and the classification of high-dimensional patterns. Pattern Anal. Appl 2008, 11(3):309-319. 10.1007/s10044-008-0107-0MathSciNetView ArticleGoogle Scholar
- Joutsijoki H, Juhola M: Kernel selection in multi-class support vector machines and its consequence to the number of ties in majority voting method. Artif. Intell. Rev 2011, 2011: 1-18. 10.1007/s10462-011-9281-3Google Scholar
- Liu Q, Chen C, Zhang Y, Hu Z: Feature selection for support vector machines with RBF kernel. Artif. Intell. Rev 2011, 36(2):99-115. 10.1007/s10462-011-9205-2View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.