- Open Access
3D facial expression recognition using maximum relevance minimum redundancy geometrical features
EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 213 (2012)
In recent years, facial expression recognition (FER) has become an attractive research area, which besides the fundamental challenges, it poses, finds application in areas, such as human-computer interaction, clinical psychology, lie detection, pain assessment, and neurology. Generally the approaches to FER consist of three main steps: face detection, feature extraction and expression recognition. The recognition accuracy of FER hinges immensely on the relevance of the selected features in representing the target expressions. In this article, we present a person and gender independent 3D facial expression recognition method, using maximum relevance minimum redundancy geometrical features. The aim is to detect a compact set of features that sufficiently represents the most discriminative features between the target classes. Multi-class one-against-one SVM classifier was employed to recognize the seven facial expressions; neutral, happy, sad, angry, fear, disgust, and surprise. The average recognition accuracy of 92.2% was recorded. Furthermore, inter database homogeneity was investigated between two independent databases the BU-3DFE and UPM-3DFE the results showed a strong homogeneity between the two databases.
Facial expression recognition (FER) refers to the study of facial changes elicited as a result of relative changes in the shape and positions of the main facial components, such as eyebrows, eyelids, nose, lips, cheeks, and chin. Other subtle changes caused by contraction of facial muscles causing wrinkles or bulges are also considered. The subject has been researched for decades since the pioneer study of Darwin et al.. The further study by Ekman on facial action coding system in which relative facial muscle movements are described by action units, inspired many researchers to work on facial expression analysis, understanding, and recognition[3–10].
The earlier study on the subject was dominated by 2-dimensional (2D) based techniques and recorded impressive results, as presented in[6, 11, 12]. However, illumination changes and pose variation are two challenges that remained unsolved by 2D modalities. With the current advancement in 3D technology, which leads to faster and cheaper 3D acquiring equipment, many 3D facial expression databases have emerged. The BU-3DFE database of Binghamton University has been the most patronized recently by researchers[8–10, 13]. The first study on the database by Wang et al. used primitive features from the seven regions of the face as features. Using linear discriminate analysis (LDA) classifier they reported a mean recognition rate of 83.6%. Tie Yun and Ling Guan using the same database extracted 3D Gabor features and reported an average recognition accuracy of 85.39%. Xiaoli et al. used a 28 geometrical feature set to recognize seven basic expressions and recorded a recognition rate of 90.2% using the PNN classifier. Hao and Thomas used 96 lines and their slopes to recognize six basic expressions and recorded a mean recognition rate of 87.1% using the SVM classifier. Soyel and Hassan classified the seven fundamental expressions using six distance measures and reported a mean recognition accuracy of 91.3%. Berretti et al. used SIFT features detected from the depth image to recognize the seven facial expressions. They reported an average recognition rate of 78.43% using the SVM classifier. Tekguc et al. used the NSGA-II feature extraction technique to classify the seven facial expressions and reported an average recognition rate of 88.1% using the PNN classifier.
There are basically two unequivocal approaches for the analysis of facial expressions. The message based method, bases its expression recognition on finding what underlies the displayed expression. Its’ main target is to interpret the displayed expression into one of the six universally accepted expressions, i.e.; happy, sad, angry, fear, disgust, and surprise. Conversely, the sign based method tries to detect and describe the relative positions and shapes of the main facial components, such as eyes, brows, and mouth, while leaving the interpretation of the shown expression to a high level of decision making.
In this study, we propose a message based facial expression recognition system using geometrical features. The selection of these features was guided by MPEG-4 and FACS[2, 18]. To strengthen these features; we adopted the mRMR algorithm to select the top most relevant and non-redundant features. The selected features are finally passed to the multi-class SVM classifier for appropriate categorization. In the second part of the experiment, we tested the inter database homogeneity. A hybrid database was formed by pooling an equal number of samples from the two databases, BU-3DFE and UPM-3DFE.
The general framework of the proposed approach is shown in Figure1; switch S1 is only closed in the second segment of the experiment. The rest of the article is organized as follows, in Section ‘Database description’, we present the databases used. Feature detection is presented in Section ‘Feature extraction’. Selection of the most relevant features using mRMR is carried out in Section ‘Feature dimensionality reduction’, while Section ‘Expression classification’ presents the expression classification. Section ‘Results and discussion’ gives discussion of the results. Finally, the conclusion follows in Section ‘Conclusion’.
The BU-3DFE database developed by Li et al. at the Binghamton University was specifically designed to foster research in the field of human behavioral studies. It consisted of 100 subjects drawn from different ancestral and ethnic backgrounds, including White, East Asian, Hispanic Latino Blacks, and Indians. Out of the 100 subjects, 44 are males while 56 are females. Each subject was required to portray among other expressions, the six basic expressions happy, sad, angry, fear, disgust, and surprise. The expressions were shown at four different intensity levels. Included in the database is a set of 83 manually annotated feature points placed on the cropped face model, as shown in Figure2a.
The UPM-3DFE database was recently developed in our laboratory. It contains facial images of 50 persons. The subjects were drawn from different ancestral and ethnic backgrounds, such as Africans, Arabs, Malays, Chinese, Indians, and Persians. Among the 50 subjects recorded, 30 are males while the remaining 20 are females. Each subject was asked to portray the six basic expressions happy, sad, angry, fear, disgust, and surprise. Included in the database are 32 manually annotated land marks conspicuously placed on fiducial points of the cropped face mesh model, as shown in Figure2b.
The 3D Flexscan (V2.6) system was used in acquiring our 3D facial images. The system consists of two high vision cameras placed at a distance of 30 inches apart with a projector mounted between them. The projector projects different binary patterns onto the subject’s face, while the two cameras captures the pattern as deformed by the subject’s facial components. Using the stereo photometry technique, the system automatically determines correspondences between the images captured by the two cameras and merges them into a single 3D face model, with a resolution of 25 to 35 K polygons per model. The whole exercise is controlled and coordinated by a computer system.
We manually identified and annotated 32 expressive sensitive points on each face mesh in the database. To have a more reliable landmark localization, two different persons annotated the database independently. The final landmark points were determined by averaging the two manually labeled face meshes. These feature points are intended to be used as ground-truth reference.
In many pattern recognition applications, feature selection focuses on identifying the most significant features that have relevance to the interpretation of the target classes. A properly selected feature set, not only leads to higher classification accuracy, but also results in a faster computation and reduction in storage capacity.
Given n feature sets, the task of feature selection unit is to systematically determine relationships between these feature points that can accurately lead to the interpretation of each of the target expressions. In this work, from the 83 given points, we manually identified 46 distance vectors and 27 angles, whose changes usher the recognition of the seven facial expressions considered. Figure3a,b depict the identified distance vectors and angles, respectively.
Since each of the facial mesh models were obtained independently and the sizes of the face components differ from one individual to another and at different orientations, before any meaningful comparison can be made possible between them, the features from such meshes need to be aligned to a common coordinate system and normalized to a communal scale so that objective measurement is feasible. To achieve this, we applied the following steps.
A common subset of 29 feature points was extracted from each database, with each vertex having x,y,z coordinates. Let each feature point be represented by ß, then the total feature points per face mesh can be given as:(1)
Due to relative rigidity of the nose to expression changes, a point in the nose neighborhood was assigned to be the origin of the communal coordinate system; this point was given by the 16th point of the subset matrix, that is ß(16,:).
To convert each subset into the communal coordinate system, the ß(16,:) coordinate was subtracted from each entry of the matrix ß(i). The transformed matrix is now given by β(2)
To transform all the feature points into a standard scale eachβ i was normalized. The normalization was achieved by dividing the point with the distance between the two inner corners of the eyes (ß10 − ß9), Figure3b. The normalized points were assigned asβ′.(3)
The feature points are now in a communal coordinate system and in a standard scale, this procedure helps in canceling out all inter personnel variation, and thus, the feature points are at now ready for any objective measurements or comparison.
The distanced i between any two given points and where a line is defined, is determined as the Euclidean norm or length of the vectors defining it. It is given in 3D as:
for i = 1,2,…,29.
Where, k and j are end points of the line segment under consideration.
The determined distancesd i are further combined to instigate a more meaningful description of the face component’s expression contribution.
Stretching of right eyebrow,
Stretching of left eyebrow,
Pulling in of left and right brows,
Vertical movement of inner brows,
Vertical movement of outer brows,
Openness of right eye,
Openness of left eye,
Average eye width,
Openness of the moth,
Stretching of the mouth,
Vertical movement of mouth corners,
Vertical movement of upper lip,
Stretching of upper lip,
Stretching of lower lip,
Vertical movement of lower jaw,
Having calculated all the relevant distancesδ i we now form a distance vector D as shown below.
In Euclidean geometry, the angle θ between any two lines is defined as the cosine-inverse times the dot product of the slopes of two vectors, divided by the product of their absolute magnitude or length.
whereS j andS k are the slopes of the vectors on which ends of the angle arc straddled, and ∥s j ∥ and ∥s k ∥ are their respective lengths. Using Figure2b, the slopes of the distances where angles are formed are determined as follows in Table1:
Having calculated all the relevant slopes, then using Equation (22), we determine all the angles fromθ1 toθ27 and form angle vectors represented by ϕ, as follows:
Note that from Figure2b;a1 = θ1a2 = θ2up to a27 = θ27
for i = 1,2,…,N, where N is the number of face models.
To consolidate the discriminative power of these two schemes, we form extended feature vectorF t that concatenated both the distance vectorD i and angle vectorsϕ i , as shown below
Feature dimensionality reduction
Dimensionality reduction (DR) techniques are data pre-processing steps that aim to find a suitable low dimensional representation of the original data, that accurately and sufficiently represents the original data. DR has been a successful paradigm for automatically identifying and selecting the latent features, while removing the redundant ones. Mathematically, the data reduction problem can be explained as follows: given an n-dimensional feature vector P = (p1p2,…,p n ), the objective of DR is to find a representation in lower dimension Q = (q1q2,…,q m ), where, m < n, which sufficiently represents the original data P.
Basically there are two modalities to DR: (i) supervised based technique, which utilizes both the training data and the target label information to learn the lower dimensional representation, it then selects a subset from the existing feature set. examples of the supervised methods are maximum margin criteria (MMC), maximum relevance minimum redundancy (mRMR) and linear discriminate analysis (LDA), (ii) the unsupervised technique, however, transform the existing features by rotating and projecting them onto a minimal number of axes without using the target labels. Example of the unsupervised method is, the principal components analysis (PCA). In this study, we utilized the maximum relevance minimum redundancy mRMR algorithm, which reduces the features’ dimensions by selecting the most relevant features while removing the redundant ones.
Due to the fact that the human face is symmetrical in nature, some of the extracted features look duplicated, and are thus redundant. Moreover, the average recognition accuracy of a system does not always increase with the higher number of features, rather, the classification accuracy increases until a certain number of features is reached and then starts declining. Especially in cases where there is a limited number of data or samples. To circumvent this, we invoked the maximum relevance minimum redundancy (mRMR) model to aid in selecting the most relevant features in terms of class discrimination and the most compact or non-redundant features to represent the face mesh models. Given n number of features for classification, the goal of mRMR is to determine m subset of features that will accurately identify the target labels using mutual information between them[23, 24]. The mutual information estimation between two given discrete random variables x and y is determined in terms of their individualistic probabilities P(x),P(y) and their joint probability P(x y), as shown in Equation (25).
where x is the feature from set of selected features X, and y is a class label from the set of target prototypes Y .
For two features that are highly dependent on each other, removing any one of them will not bring about much change in the class discriminative power of the feature. In the study of the dependency between a feature variablex i and the class label y was maximized using Equation (26), while using Equation (27) they minimized the dependency of pair featuresx i andx j . This constraint is used to filter out only mutually exclusive features.
Finally, a sub setX m , m < n was selected from the main feature setX n that maximized Equation (28). In practice, the high search space of m subset features inRm space, calls for an incremental search method to find a near-optimal features as represented by Equation (29). This method is adopted in this study.
The reduced features are now ready to be classified into their appropriate categories. To achieve this, we designed a multi-class SVM classifier to categorize these vectors into their appropriate groups.
Support vector machine
SVM is basically designed to solve two class recognition problems. The goal is to finding a hyperplane with maximum margin that linearly separates the given instances into two distinct clusters. For classification task of more than two classes, the basic SVM need to be extended using either of the two methods One versus one, which classify between each pair of labels or one versus all where the classification is between each class and the all remaining classes. In this study, we adopted one versus one strategy to classify the seven expressions considered.
Training the SVM: before the SVM could solve even the simplest classification task, it has to be train, the training is achieve by showing the answers for every given example to the machine. To this end, We fragmented the experiment into two segments. In the first segment, we randomly selected sixty meshes from BU-3DFE database. The purpose of the random selection is to evade gender and person dependencies. The 60 samples were then divided into 10 parts, with 9 parts used as training set, while the remaining 1 part was used as testing set. Prior to commencing the training, the extracted features belonging to training set are first normalized, using Equation (30)
where,x i is the reduced feature vector of dimension i,μ i is the mean along the feature i, andσ i is standard deviation along feature i dimension.
We then constructed and trained 1/2N(N−1)unique pair SVM classifiers. To produce the classification results, the test feature vectors, however, were tested against all the trained SVM models. The majority voting scheme is finally employed to predict the class of ensemble outputs, using the strategy of the winner takes all[25, 26]. In each round of the experiment, 21 unique SVM classifiers are trained to classify the seven basic expressions. To ensure person in dependency, the intersection between the training and testing sets was always curbed to zero, meaning that any subject belonging to the testing set will not appear in the training set. Using ten-fold cross validation, the experiment is repeated 10 times so that each of the 10 subset is tested at least once. Since average recognition accuracy varies from experiment to experiment, to increase the reliability of the experiment and get a more stable recognition result, the experiment was run 100 times. Finally, the results were averaged. At the end of each round of the test cycle, all classifiers are reset and retrained in the next cycle.
In the second segment of the experiment, we created a hybrid database of sixty subjects; 30 subjects were drawn from BU-3DFE while the remaining 50% were drawn from UPM-3DFE. The purpose of this experiment is to investigate the inter database homogeneity. This will allow experiments requiring samples larger than one can draw from a single database, to easily be made possible by simply pooling from similar databases to satisfy the sample size requirement. Following this, the same procedures used in the first segment of the experiment were repeated.
Results and discussion
At the end of the first experiment, we achieved an average recognition accuracy of 92.2%; for the seven facial expression targeted neutral, happy, sad, angry, fear, disgust, and surprise with the highest recognition of 98.7 and 97.6% coming from the surprise and happy expressions, respectively, while the lowest recognition of 85.5 and 86.8% came from sad and angry, respectively. Details of the classification results is as shown in Table2. The higher recognition performance of surprise and happy can be attributed to the strong and unique features of these expressions with regard to facial surface deformation, such as extreme opening of eyes and mouth, in the case of surprise and extreme bulging of checks and stretching of lips in the case of happy. In contrast, the poor performance of the sad and angry can be linked to their high similarity with the neutral expression, as such distinguishing between these expressions is highly a challenging task. A comparison of this result with some state-of-the-art methods[10, 15] is shown in Figure4. It can be seen from the results that the proposed method outperforms the other methods in almost all the expression classes considered. This increase in performance can be attributed to the use of mRMR in selecting the most significant features. In the second segment of the experiment, we recorded an average recognition performance of 88.9%, which is marginally less than the result in the first segment. Table3 depicts the confusion matrix for the result.
In this article, we proposed a 3D facial expression recognition using maximum relevance and minimum redundancy face geometry features. We fragment the experiment into two segments. In the first segment, we made use of the BU-3DFE database and multi-class SMV classifier; we achieved a mean classification rate of 92.2%, showing a consistence improved performance in all the expression classes as compared to the some related studies. In the second segment, we investigated the inter database homogeneity by forming a hybrid face database drawn from the BU-3DFE and UPM-3DFE. We performed facial expression recognition using the same setup as in the first segment. The performance recorded here is slightly less than that of the first segment with the average recognition rate of 88.9%; this short fall can be attributed to the different landmark methods used in labeling the two databases.
Darwin C, Ekman P, Prodger P: The Expression of the Emotions in Man and Animals, 1st edn. (Oxford University Press, USA, 2002), pp. 19–25
Ekman P, Friesen WV: Facial action coding system. Manual Facial Act. Cod. Syst 1977: (Consulting Psychologists Press, Stanford University, Palo Alto, 1977)
Zhao X, Zhang S: Facial expression recognition using local binary patterns and discriminant kernel locally linear embedding. EURASIP J. Adv. Si. Pr 2012., 2012(1): 10.1186/1687-6180-2012-20
Ioannou SV, Raouzaiou AT, Tzouvaras VA, Mailis TP, Karpouzis KC, Kollias SD: Emotion recognition through facial expression analysis based on a neurofuzzy network. Neu. Netws 2005, 18(4):423-435. 10.1016/j.neunet.2005.03.004
Wang J, Yin L: Static topographic modelling for facial expression recognition and analysis. Comput. Vis. Image. Und 2007, 108(1-2):19-34. 10.1016/j.cviu.2006.10.011
Kotsia I, Zafeiriou S, Pitas I: Texture and shape information fusion for facial expression and facial action unit recognition. Pattern. Recogn 2008, 41(3):833-851. 10.1016/j.patcog.2007.06.026
Porta M: Human-Computer input and output techniques: an analysis of current research and promising applications. Artif. Intell. Rev 2007, 28(3):197-226. 10.1007/s10462-009-9098-5
Li X, Ruan Q, Ming Y: 3D Facial expression recognition based on basic geometric features. IEEE International Conference on Signal Processing (ICSP) (China, 2010), pp. 1366–1369. 10.1109/ICOSP.2010.5656891
Tang H, Huang TS: 3D facial expression recognition based on properties of line segments connecting facial feature points. IEEE International Conference on Automatic Face Gesture Recognition (AFGR) (Netherlands, 2008), pp. 1–6. 10.1109/AFGR.2008.4813304
Soyel H, Demirel H: 3D facial expression recognition with geometrically localized facial features, In International Symposium on Computer and Information Sciences (ISCIS). (Turkey, 2008); pp. 1–4. 10.1109/ISCIS.2008.4717898
Fasel B, Luettin J: Automatic facial expression analysis: a survey. Pattern Recogn 2003, 36(1):259-275. 10.1016/S0031-3203(02)00052-3
Samal A, Iyengar PA: Automatic recognition and analysis of human faces and facial expressions: a survey. Pattern recogn 1992, 25(1):65-77. 10.1016/0031-3203(92)90007-6
Berretti S, Bimbo AD, Pala P, Amor BB, Daoudi M: A set of selected SIFT features for 3D facial expression recognition. International Conference on Pattern Recognition (ICPR), vol.(2010) International Association for Pattern Recognition (IAPR) (Turkey, 2010), pp. 4125–4128
Wang J, Yin L, Wei X, Sun Y: 3D facial expression recognition based on primitive surface feature distribution. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol.(2) (New York, 2006), pp. 1399–1406
Yun T, Guan L: Human emotion recognition using real 3D visual features from Gabor library. IEEE International Workshop on Multimedia Signal Processing (MMSP) (France, 2010), pp. 05–510. 10.1109/MMSP.2010.5662073
Tekguc U, Soyel H, Demirel H: Feature selection for person-independent 3D facial expression recognition using NSGA-II. International Symposium on Computer and Information Sciences (ISCIS) (Cyprus, 2009), pp. 35–38. doi:10.1109/ISCIS.2009.5291925
Pantic M, Bartlett MS: Machine Analysis of Facial Expressions. (I-Tech Education and Publishing, Austria, 2007), pp. 377–416
Ostermann J: Face animation in MPEG-4, MPEG-4 facial animation. 2002, 2002: 17-55. 10.1002/0470854626.ch2
Yin L, Wei X, Sun Y, Wang J, Rosato MJ: A 3D facial expression database for facial behaviour research. proceedings of IEEE Automatic face and gesture recognition’06 (2006), pp. 211–216. doi:10.1109/FGR.2006.6
Salah AA, Alyuz N, Akarun L: Registration of three-dimensional face scans with average face models. J. Elec. Imag 2008, 17(1):011006. 10.1117/1.2896291
Stroud KA, Booth DJ: Engineering Mathematics, 5th edn. (Industrial Pr. Inc., New York, 2001), pp. 571–596
Kumar AC: Analysis of unsupervised dimensionality reduction techniques. Comput. Sci. Inf. Syst 2009, 6(2):217-227. 10.2298/CSIS0902217K
Peng H, Long F, Ding C: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell 2005, 27(8):1226-1238.
Bonev B, Escolano F, Cazorla M: Feature selection, mutual information, and the classification of high-dimensional patterns. Pattern Anal. Appl 2008, 11(3):309-319. 10.1007/s10044-008-0107-0
Joutsijoki H, Juhola M: Kernel selection in multi-class support vector machines and its consequence to the number of ties in majority voting method. Artif. Intell. Rev 2011, 2011: 1-18. 10.1007/s10462-011-9281-3
Liu Q, Chen C, Zhang Y, Hu Z: Feature selection for support vector machines with RBF kernel. Artif. Intell. Rev 2011, 36(2):99-115. 10.1007/s10462-011-9205-2
The authors declare that they have no competing interests.
About this article
Cite this article
Rabiu, H., Saripan, M.I., Mashohor, S. et al. 3D facial expression recognition using maximum relevance minimum redundancy geometrical features. EURASIP J. Adv. Signal Process. 2012, 213 (2012). https://doi.org/10.1186/1687-6180-2012-213
- 3D facial expression recognition
- Maximum relevance minimum redundancy feature selection
- Inter database homogeneity
- Multi-class SVM classifier