Skip to main content

Projective complex matrix factorization for facial expression recognition

Abstract

In this paper, a dimensionality reduction method applied on facial expression recognition is investigated. An unsupervised learning framework, projective complex matrix factorization (proCMF), is introduced to project high-dimensional input facial images into a lower dimension subspace. The proCMF model is related to both the conventional projective nonnegative matrix factorization (proNMF) and the cosine dissimilarity metric in the simple manner by transforming real data into the complex domain. A projective matrix is then found through solving an unconstraint complex optimization problem. The gradient descent method was utilized to optimize a complex cost function. Extensive experiments carried on the extended Cohn-Kanade and the JAFFE databases show that the proposed proCMF model provides even better performance than state-of-the-art methods for facial expression recognition.

1 Introduction

Facial expression recognition (FER) plays an important role in many applications related to computer vision and pattern recognition, such as human computer interface, surveillance, and multimedia [1,2,3,4,5]. Intensive techniques have been proposed for FER problem [6,7,8,9,10,11,12,13,14]. Much attention is put on the facial action coding system (FACS) approach which attempts to decompose facial expressions into varied action units and facial expression could be recognized based on the mixture of action units [6, 7]. A new perspective of using deep neural networks [8,9,10,11] to extract powerful temporal features hidden in facial images is also an interesting approach for FER, in particular for dynamic facial expression recognition. The dimensionality reduction technique that reduces the size of the feature space has been widely utilized because of its effectiveness on feature representation [12,13,14]. It is known that the changes of local appearance (e.g., eyes, nose, mouth) are usually related to facial expression variations; meanwhile, the global features that are related to the whole facial image may fail on expression analysis. Local facial components contain more discriminative information and outperform global features for face recognition [15]. A nonholistic representation and a low rank approximation of the data make better performance for an FER system [16].

Nonnegative matrix factorization (NMF) [12, 13] is among the most popular dimensionality reduction methods by its natural part-based representation ability. NMF decomposes the high-dimensional nonnegative data matrix into two low-rank nonnegative matrices with additional non-negativity constraints. Most NMF techniques estimate the linear subspace of the given data by the square of the Euclidean distance or the generalized Kullback-Leibler divergence. They work well when image noise is independent and identically distributed. However, for the data corrupted by outliers, the estimated subspace can be arbitrarily biased [17]. To overcome this drawback, researchers improved NMF in different distance metrics. Specifically, D. Kong et al. proposed a robust NMF by using L2 and L1 norms, where the noise was assumed to follow the Laplacian distribution [18]. Similarly, Earth Mover’s distance (EMD) and the Manhattan distance were also suggested in the work of Sandler et al. [19] and N. Guan et al. [20], respectively.

Besides, integrating constraints or using different metrics to modify the structure is also another attractive strategy to extend NMF. Starting from the ideas of SVD and NMF, for instance, Yuan et al. proposed a novel method called projective nonnegative matrix factorization (proNMF) [21]. Instead of operating two parameter matrices as NMF, there is only one computed matrix in proNMF. The coefficient matrix is replaced by the inner product of the base and the input matrix. As a result, there exists much fewer free-learned parameters. Furthermore, inspite of having no any extra regularization term, proNMF is still able to learn more spatially localized and part-based representations of visual patterns.

Recently, many FER algorithms have adopted NMF. Buciu and Pitas employed NMF for FER by enforcing local constraints to create a local NMF (LMF) model [22]. Nikitidis et al. incorporated discriminant constraints to build the supervised NMF learning method [23, 24]. Based on clustering-based discriminant analysis (CDA), the algorithm in [25] efficiently decomposes the provided data to discriminant parts and successfully extended the well-known NMF to subclass discriminant NMF (SDNMF). R. Zhi et al. [26] provided another extended NMF which imposed both sparse constraints on the basis matrix and graph-preserving criterions to improve the classification performance. Interpreting expressive images into two subspaces including the identity subspace and the expression subspace was a new model investigated in dual subspace nonnegative matrix factorization (DSNMF) [27] and dual subspace nonnegative graph embedding [28]. It can be seen that, in order to extract representative expression facial features, most existing algorithms tried to integrate label information via different constraints to derive supervised NMF. Therefore, the overall performance highly depends on the expression label which is insufficient to characterize data by the variations of pose, illumination, etc.

How to develop an unsupervised learning algorithm that performs efficiently FER was aimed in this paper. Encouraged by the localized and part-based representations of proNMF as well as the challenges on choosing a suitable metric that quantifies the approximation error in the NMF model, we proposed a new dimensionality reduction approach, named projective complex matrix factorization (proCMF). The proCMF model attempts to learn the localized basis that highlights the salient features of facial expressions; meanwhile, the reconstruction error affected by outliers was also minimized as much as possible. Moreover, the underlying intrinsic cosine dissimilarity referenced from the equivalence between two different metrics is also exploited. We also adopt a cosine-based distance measure to define an explicit mapping function from the space of pixel intensity into a high-dimensional sphere where complex projection is performed. The Euler formula was used for constructing isomorphism mapping from the real space to the complex space. In this context, the cosine dissimilarity for approximating real data was replaced by the Frobenius norm to evaluate the reconstruction error of complex approximation. After transforming the real data into the complex field, the obtained complex matrix was approximately factorized into a product of two low-rank matrices through an unconstraint complex optimization problem. The proposed method addresses the general problem of finding projections that enhance class separability without attached label information in the reduced dimensional space. There is no any extra regularization term in the objective functions that reduce the complication on solving the optimization problem. Furthermore, the representation of newly coming samples in the FER framework by using only the trained projection matrix is one of the important factors on decreasing the computational complexity.

In summary, the contributions of this paper are shown as follows.

First, we construct a new dimensionality reduction model for learning low rank projection in the complex domain. The adoption of cosine dissimilarity and the Frobenius norm significantly enhance the performance of the FER system.

Second, proCMF can be performed without limiting the sign of data. Our proposed method can be applied to both negative and positive data, which yields extension on real-world applications. Therefore, several operations that can extract complex features, such as the short-time Fourier transform, are going to be utilized directly instead of their absolute values (magnitude/power spectrogram) on sound processing.

To satisfy nonnegative requirement, NMF usually uses various strategies on minimizing a function, which leads to computational complexity. On the contrary, the significant superiority compared to NMF approaches of the proposed proCMF is to construct an unconstraint optimization problem that simplified the framework of extracting the basis and intrinsic features.

The rest of this paper is organized as follows. In Section 2, we recall some basic facts about proNMF. The motivation, formulation, and algorithms of the proposed model as well as its optimal solution are presented in Section 3. Some optimization techniques and computation methods are also described in this section. The convincing experimental results of the proposed approach on FER are depicted in Section 4. Finally, conclusion and future work are drawn in Section 5.

2 Projective nonnegative matrix factorization

The problem on ensuring the learned basis vectors to be part-based standard NMF was treated by projective nonnegative matrix factorization (proNMF) [29, 30]. ProNMF approximately factorizes a projection matrix P into a positive low-rank matrix W and its transpose WT such that X = PX. The proNMF optimization problem has the following form:

$$ \underset{\mathbf{W}\ge 0}{\min }{O}_{proNMF}\left(\mathbf{W}\right)=\frac{1}{2}{\left\Vert \mathbf{X}-{\mathbf{W}\mathbf{W}}^T\mathbf{X}\right\Vert}_F^2. $$
(1)

It can be seen that the projection matrix is constructed only from the basic matrix, so the optimization problem in (1) mainly takes account of one variable W. Using the popular multiplicative update algorithm [13, 29] to solve (1), one can update W iteratively from

$$ \mathbf{W}\leftarrow \mathbf{W}\frac{{\mathbf{XX}}^T\mathbf{W}}{{\mathbf{WW}}^T{\mathbf{XX}}^T\mathbf{W}+{\mathbf{XX}}^T{\mathbf{WW}}^T\mathbf{W}} $$
(2)

3 Projective complex matrix factorization

3.1 Observations on cosine dissimilarity

Assume that we are given the representations of two images I1 and I2 that are written by the N-dimensional vector x i  (i = 1, 2) in the lexicographic order. First, x i N is normalized to get x i (c)  [0, 1] where c is the element vector index or the vector spatial location. Then, x i is mapped into the 2N sphere by

$$ \mathrm{Z}\left({\mathbf{x}}_i\right)=\frac{1}{\sqrt{N}}{\left[c\mathrm{os}{\left({\mathbf{x}}_i\right)}^T\ \sin {\left({\mathbf{x}}_i\right)}^T\right]}^T $$
(3)

where

$$ c\mathrm{os}\left({\mathbf{x}}_i\right)={\left[c\mathrm{os}\left({\mathbf{x}}_i(1)\right),c\mathrm{os}\left({\mathbf{x}}_i(2)\right),\dots, c\mathrm{os}\left({\mathbf{x}}_i(N)\right)\right]}^T, $$
$$ \sin \left({\mathbf{x}}_i\right)={\left[\sin \left({\mathbf{x}}_i(1)\right),\sin \left({\mathbf{x}}_i(2)\right),\dots, \sin \left({\mathbf{x}}_i(N)\right)\right]}^T, $$

and Z(x i ) = 1.

We have

$$ \mathrm{Z}{\left({\mathbf{x}}_1\right)}^T\mathrm{Z}\left({\mathbf{x}}_2\right)=\frac{1}{N}\sum \limits_{c=1}^Nc\mathrm{os}\left({\mathbf{x}}_1(c)-{\mathbf{x}}_2(c)\right). $$
(4)

Recall that the cosine distance measure between two vectors x and y is given by

$$ d\left(\mathbf{x},\mathbf{y}\right)=1-\frac{{\mathbf{x}}^T\mathbf{y}}{\left\Vert \mathbf{x}\right\Vert \left\Vert \mathbf{y}\right\Vert }. $$
(5)

If the distance between Z(x1) and Z(x2) have the form

$$ \mathrm{d}\left(\mathrm{Z}\left({\mathbf{x}}_1\right),\mathrm{Z}\left({\mathbf{x}}_2\right)\right)=\frac{1}{2}{\left\Vert \mathrm{Z}\left({\mathbf{x}}_1\right)-\mathrm{Z}\left({\mathbf{x}}_2\right)\right\Vert}_F^2, $$
(6)

then it is equal to a cosine-based distance measure and

$$ \mathrm{d}\left(\mathrm{Z}\left({\mathbf{x}}_1\right),\mathrm{Z}\left({\mathbf{x}}_2\right)\right)=1-\frac{1}{N}\sum \limits_{c=1}^Nc\mathrm{os}\left({\mathbf{x}}_1(c)-{\mathbf{x}}_2(c)\right). $$
(7)

It can be seen that, if I1I2, e.g., c, x1(c) − x2(c) ≈ 0, then d(Z(x1), Z(x2)) → 0. This implies that if the two images are unrelated, then their local elements are unmatched.

Moreover, the mapping function (3) from N to 2N is equivalent to a mapping function f : N → N defined by:

$$ f\left({\mathbf{x}}_t\right)={\mathbf{z}}_t=\frac{1}{\sqrt{2}}{e}^{i\alpha \pi {\mathbf{x}}_t}=\frac{1}{\sqrt{2}}\left[\begin{array}{c}{e}^{i\alpha \pi {\mathbf{x}}_t(1)}\\ {}\vdots \\ {}{e}^{i\alpha \pi {\mathbf{x}}_t(N)}\end{array}\right] $$
(8)

where the Euler’s formula [31] is

$$ {\mathrm{e}}^{i\;\alpha \pi {\mathbf{x}}_t}=\cos \left(\alpha \pi {\mathbf{x}}_t\right)+ isin\left(\alpha \pi {\mathbf{x}}_t\right) $$
(9)

Therefore, the cosine dissimilarity of a data pair in the input real space equals to the Frobenious distance of the corresponding data pair in the complex domain. It is known that the robustness of cosine dissimilarity in the real domain has been found to suppress outliers [32]. With the idea of utilizing different robust similarity metrics to extend NMF, we introduce a new dimensionality reduction method (proCMF) which relates to conventional proNMF and uses the cosine dissimilarity metric as a measurement of the reconstruction error. However, the complexity of optimizing the real function with the cosine distance is wisely addressed by converting to a complex optimization problem with the Frobenius norm which is described in the next sections.

3.2 Problem formulation

In this section, we formulate the problem of multi-variants data factorization within the framework of complex data decomposition. Given the sample dataset X = [x1, x2, …, x M ], x i N, we convert the real data matrix XN × M to a complex matrix ZN × M by the mapping (8) and perform matrix factorization in this complex feature space.

The basic idea of proCMF is the coefficient of each data point z i  (i = 1, 2, …, M) that lies within the subspace spanned by the column vectors of one projection matrix. The coefficient matrix HK × M is obtained by linear transformation from samples. More specifically, given a matrix ZN × M, we need to find out two matrices WN × K and HK × M to minimize \( {\left\Vert \mathbf{Z}-\mathbf{WH}\right\Vert}_F^2\ \mathrm{s}.\mathrm{t}\ \mathbf{H}=\mathbf{VZ} \) where VK × N is the projection matrix. The proCMF objective function is as follows:

$$ \underset{\mathbf{W},\mathbf{V}}{\min }{\mathrm{O}}_{proCMF}\left(\mathbf{W},\mathbf{V}\right)=\underset{\mathbf{W},\mathbf{V}}{\min}\frac{1}{2}{\left\Vert \mathbf{Z}-\mathbf{WVZ}\right\Vert}_F^2 $$
(10)

3.3 Complex-valued gradient decent method

It can be seen that (10) is a nonconvex minimization problem with respect to both variables W and V. Therefore, they are impractical to obtain the optimal solution. This NP-hard problem can be tackled by applying the block coordinate descent (BCD) with two matrix blocks [33] to obtain a local solution by the following scheme:

Given an initial W(0), we find the optimal solution V(t+1) such that:

$$ {V}^{\left(t+1\right)}=\arg \underset{\mathbf{V}}{\min }{\mathrm{O}}_{\mathrm{proCMF}}\left({\mathbf{W}}^{(t)},\mathbf{V}\right)=\frac{1}{2}{\left\Vert \mathbf{Z}-{\mathbf{W}}^{(t)}\mathbf{VZ}\right\Vert}_F^2. $$
(11)

Because of no nonnegative constraint, the basis can be updated simply by the Moore–Penrose pseudoinverse [34]

$$ {\mathbf{W}}^{\left(t+1\right)}=\mathbf{Z}{\left({\mathbf{V}}^{\left(t+1\right)}\mathbf{Z}\right)}^{\dagger }. $$
(12)

To find optimal solutions of (11), we use the complex-valued gradient descent algorithm (CGD). Here, (11) is considered as a real-valued function that is needed to be minimized subject to the complex variable V. Generally, the problem of one scalar function of a complex variable (11) is defined as to solve the following unconstrained optimization problem:

$$ \underset{\mathbf{V}}{\min }f\left(\mathbf{V}\right) $$
(13)

where

$$ {\displaystyle \begin{array}{l}f\left(\mathbf{V}\right)=\frac{1}{2}{\left\Vert \mathbf{Z}-\mathbf{WVZ}\right\Vert}_F^2=\frac{1}{2}\mathrm{Trace}{\left(\mathbf{Z}-\mathbf{WVZ}\right)}^H\left(\mathbf{Z}-\mathbf{WVZ}\right)\\ {}=\frac{1}{2}\mathrm{Trace}\left({\mathbf{Z}}^H\mathbf{Z}-{\mathbf{Z}}^H{\mathbf{V}}^H{\mathbf{W}}^H\mathbf{Z}-{\mathbf{Z}}^H\mathbf{WVZ}+{\mathbf{Z}}^H{\mathbf{V}}^H{\mathbf{W}}^H\mathbf{WVZ}\right),\end{array}} $$
(14)

and (.)H is the matrix Hermitian operation.

Let V = Re(V) + iIm(V) where i is the imaginary unit and i2 = − 1. Then, f(V) can be viewed as a real bivariate function of its real and imaginary components.

In most of complex-variable optimization problems, the objective functions are the real-valued functions of complex arguments. They are not analytic on the complex plane and do not satisfy Cauchy-Riemann conditions. Brandwood’s analytic theory [35] can be applied to overcome this common problem. Recall that if f(V) satisfies Brandwood’s analytic condition, i.e., f(V) is analytic with respect to the complex-valued variable V and its complex conjugate \( \overline{\mathbf{V}} \) where V and \( \overline{\mathbf{V}} \) are treated as independent variables, then the first-order Taylor expansion of \( f\left(\mathbf{V},\overline{\mathbf{V}}\right) \) is as follows:

$$ \Delta f=\left\langle {\nabla}_{\overline{\mathbf{Z}}}f,\Delta \mathbf{Z}\right\rangle +\left\langle {\nabla}_{\mathbf{Z}}f,\Delta \overline{\mathbf{Z}}\right\rangle =2\operatorname{Re}\left\{\left\langle {\nabla}_{\overline{\mathbf{Z}}}f,\Delta \mathbf{Z}\right\rangle \right\} $$
(15)

and the complex gradient of f(V) is defined as

$$ {\nabla}_{\overline{\mathbf{V}}}f\left(\mathbf{V},\overline{\mathbf{V}}\right)=\frac{\partial f\left(\mathbf{V}\right)}{\operatorname{Re}\left(\mathbf{V}\right)}+i\frac{\partial f\left(\mathbf{V}\right)}{\operatorname{Im}\left(\mathbf{V}\right)}. $$
(16)
figure a

Therefore, the function f(V) is treated as \( f\left(\mathbf{V},\overline{\mathbf{V}}\right) \), where

$$ f\left(\mathbf{V},\overline{\mathbf{V}}\right)=\frac{1}{2} Trace\left[{\mathbf{Z}}^H\mathbf{Z}-{\mathbf{Z}}^H{\left(\overline{\mathbf{V}}\right)}^T{\mathbf{W}}^H\mathbf{Z}-{\mathbf{Z}}^H\mathbf{WVZ}+{\mathbf{Z}}^H{\left(\overline{\mathbf{V}}\right)}^T{\mathbf{W}}^H\mathbf{WVZ}\right] $$
(17)

The gradient of \( f\left(\mathbf{V},\overline{\mathbf{V}}\right) \) with respect to V is given by:

$$ {\nabla}_{\overline{\mathbf{V}}}f\left(\mathbf{V},\overline{\mathbf{V}}\right)=-{\mathbf{W}}^H{\mathbf{ZZ}}^H+{\mathbf{W}}^H{\mathbf{W}\mathbf{VZZ}}^H. $$
(18)

The gradient decent method for the unconstrained optimization problem in (13) builds a sequence {V(t)}t according to the following iterative form:

$$ {\mathbf{V}}^{\left(t+1\right)}={\mathbf{V}}^{(t)}-{\beta}^{(t)}{\nabla}_{\overline{\mathbf{V}}}f\left({\mathbf{V}}^{(t)},{\overline{\mathbf{V}}}^{(t)}\right) $$
(19)

where β(t) is the step size variable, which is a small positive constant minimizing \( f\left({\mathbf{V}}^{(t)}-{\beta}^{(t)}{\nabla}_{\overline{\mathbf{V}}}f\left({\mathbf{V}}^{(t)},{\overline{\mathbf{V}}}^{(t)}\right)\right) \) over . In this paper, backtracking line search, which is also known as the Armijo rule [36], is used to estimate the step size. In this rule, \( {\beta}^{(t)}={\mu}^{k_t} \), where 0 < μ <1 and k t is the first non-negative integer k such that:

$$ f\left({\mathbf{V}}^{\left(t+1\right)},{\overline{\mathbf{V}}}^{\left(t+1\right)}\right)-f\left({\mathbf{V}}^{(t)},{\overline{\mathbf{V}}}^{(t)}\right)\le 2\sigma \operatorname{Re}\left\{\left\langle {\nabla}_{\overline{\mathbf{V}}}f\left({\mathbf{V}}^{(t)},{\overline{\mathbf{V}}}^{(t)}\right),{\mathbf{V}}^{\left(t+1\right)}-{\mathbf{V}}^{(t)}\right\rangle \right\}. $$
(20)

There always exists a step length β(t), which is among 1, μ1, μ2, … . A stationary point of (13) also exists among the limit points of {V(t)}t [37]. The iteration will be stopped when the solution is close to a stationary point. Practically, a common condition to check if a point V(t) is close to a stationary point is:

$$ {\left\Vert {\nabla}_{\overline{\mathbf{V}}}f\left({\mathbf{V}}^{(t)},{\overline{\mathbf{V}}}^{(t)}\right)\right\Vert}_F\le \varepsilon $$
(21)

where ε is a pre-defined threshold.

The following Algorithm 2 summarizes the optimization process of the proposed proCMF model.

figure b

4 Experiments

We evaluated the proposed proCMF on FER. The classification capability of the derived encoding coefficient vectors was compared with various NMF-based methods.

Here, the minimization of the objective function \( \frac{1}{2}{\left\Vert {\mathbf{Z}}_{tr}-{\mathbf{W}}_{tr}{\mathbf{V}}_{tr}{\mathbf{Z}}_{tr}\right\Vert}^2 \) of proCMF was operated on the training phase that extracts new feature H tr =V tr Z tr from the image training set Z tr . After creating a projection matrix V tr , for each new testing sample z te in a query face image Z te , we can easily obtain its corresponding projection h te such that h te =V tr z te . To make a decision about the facial expression class, the projected expression representation h te will be fed to the nearest neighbor classifier. Then, the Euclidean distance between h te and each training datum of H tr is computed. The tested image is assigned to the class with the closest training datum.

4.1 Data description

The proposed algorithm is tested on two well-known datasets, the extended Cohn-Kanade (CK+) [38] and the Japanese Female Facial Expression (JAFFE) [39] datasets of six “basic” facial expressions (happiness, sadness, surprise, anger, disgust, and fear).

The CK+ dataset consists of 593 video sequences from 123 subjects. Each video sequence shows distinct facial expressions. For each expression of a subject, the last five frames in the videos are selected and these frames were treated as static different facial expressions. Some samples in the CK+ dataset are shown in Fig. 1.

Fig. 1
figure 1

Cropped face images of six facial expressions from the CK+ dataset [38]

Altogether, the JAFFE dataset has 213 grayscale images of ten subjects posed 2–4 examples for each expressions. Figure 2 shows some figures in the JAFFE dataset.

Fig. 2
figure 2

Cropped face images of six facial expressions from the JAFFE dataset [39]

Each cropped facial image in the datasets was isotropically scaled to the fixed size of 32 × 32 pixels.

4.2 Baselines and experiment settings

The proposed algorithm is compared to the following popular PCA and NMF algorithms: (1) basic NMF [4]; (2) projective NMF (proNMF) [9]; (3) convex NMF (conNMF) [40]; (4) weighted NMF (weiNMF), which assigns binary weights to the data matrix [41]; (5) NeNMF, which applies efficient Nesterov’s optimal gradient method in the optimization process [42]; (6) principal component analysis (PCA) [43]; (7) graph preserving sparse nonnegative matrix factorization (GSNMF) [26]; (8)(9) unsupervised and supervised robust nonnegative graph embedding (uRNGE) and (sRNGE), which are robust nonnegative graph embedding methods that replace the L2-norm with the L21-norm [44, 45]; and (10)(11) unsupervised and supervised robust semi-nonnegative graph embedding (usRNGE) and (ssRNGE), which are robust semi-nonnegative graph embedding methods that impose no constraint on the base matrix [44].

In the implementation of the complex valued conjugate gradient decent method with the Amijo rule (CGD), we set the decreasing rate of the step size μ to satisfy (19) with the sufficient decreasing condition at 0.01 and the stopping criteria of 10,000 times of iterations or that the error ε is smaller than 10−4.

4.3 Visualization of learned basis components and reconstructed images

In order to visualize the basis and reconstructed images, we computed the complex matrix z back to the real domain by applying the angle operator , which returns the angle of a complex number. The inverse mapping f −1of (8) gives the pre-image x in the following form:

$$ \mathbf{x}=\frac{\angle \mathbf{z}}{\alpha \pi}. $$
(22)

Some basis images learned by NMF, proNMF, and proCMF from the JAFFE dataset are shown as in Fig. 3. One can see that the NMF bases are generally nearly holistic and less sparse than others bases, while proNMF extracted more localized and non-overlapped bases which correspond to several facial parts. Unlike small single regions as proNMF bases, the basic components of the proposed proCMF are combined from several localized regions to highlight specific local facial features which are salient areas on the face, such as the mouth, the nose, the eyes, and the eyebrows. This can be attributed to the fact that the proposed proCMF is more robust to extract discriminant facial features and has potential superior for FER. Figure 4 gives the reconstructed images of the proposed proCMF, NMF, and proNMF. The faces reconstructed by proCMF are much clearer than those reconstructed by NMF and proNMF. Due to outliers and noise, NMF and proNMF are hard to identify well test samples.

Fig. 3
figure 3

Learned basis images by a NMF, b proNMF, and c the proposed proCMF from the training data of the JAFFE dataset

Fig. 4
figure 4

Reconstructed images by a NMF, b proNMF, and c the proposed proCMF from the JAFFE dataset

4.4 Facial expression recognition on CK+ dataset

Generally, the training set and the testing set were created by dividing the dataset into two subsets. On the CK+ dataset, we designed two experiments using different numbers of selected images for training and testing. In case 1, one image among five frames of each expression per subject is used for training and the remaining four are used for testing. In case 2, for each expression per subject, two collected images formed the training set and the remaining three formed the testing set. The average recognition rates of two cases versus different subspace dimensionalities are described in Tables 1 and 2, respectively. The best results are indicated in italics.

Table 1 Facial expression recognition rate (%) using the CK+ dataset with different subspace dimensionalities (case 1: no. training = 1)
Table 2 Facial expression recognition rate (%) using the CK+ dataset with different subspace dimensionalities (case 2: no. training = 2)

Most of the algorithms tend to get higher recognition rates when the subspace dimension increases. NMF performs better than proNMF and conNMF but worse than weiNMF and NeNMF. As mentioned in Section 4.3, the facial bases extracted by proNMF are spatially localized and lack expression-related information. These small basis regions of proNMF are clearly not appropriate for facial expression discrimination and result in unsatisfactory performance. The eigenface approach PCA, a well-known framework for dimensionality reduction, just rated at 86.80% in average achievement. The supervised scheme of the NGE model herein is better than unsupervised ones in all cases, and no nonnegative constraint on the base matrix of semi-NMF has significant effect on FER. GSNMF and the proposed proCMF are superior to baseline methods. Their average performances are over 97% with the highest rate of 97.51% is achieved by the proposed proCMF algorithm.

4.5 Facial expression recognition on JAFFE dataset

Regarding to the JAFFE dataset, one image for each expression per person is taken randomly to construct the training data and the rest of the images are used to the test phase (case 1). Similarly, the case 2 was conducted by collecting two images for the training set. However, the JAFFE dataset is more challenging and the recognition rate is much lower than that achieved for the CK+ dataset. The recognition results are detailed in Tables 3 and 4. The proposed proCMF attained the best performance and reaches the average recognition rates of 82.10 and 70.42%, respectively, which are 1.15 ~ 36.13% higher than those of other methods. In case 1 of experiments, when the number of learned basis K exceeds the size of the training data, the overfitting problem occurs on PCA and seminonnegative models.

Table 3 Facial expression recognition rate (%) using the JAFFE dataset with different subspace dimensionalities (case 1: no. training = 1)
Table 4 Facial expression recognition rate (%) using the JAFFE dataset with different subspace dimensionalities (case 2: no. training = 2)

4.6 Facial expression recognition on occluded CK+ images

As mentioned in previous sections, the proposed proCMF exhibits the best performance on facial expression recognition in both CK+ and JAFFE datasets. In this section, we aim to firm the robustness of the proposed method when the dataset is corrupted by partial occlusion. The occlusion experiments were conducted on the facial images in the CK+ dataset. We constructed three experimental assessments through varying the occluded position on the tested face images. The mouth and eyes occlusions are simulated by placing a mask at mouth or eyes position, respectively. The sheltering patches of size 70 × 70 were put randomly on the original image of size 640 × 490 to create the random occlusion case. Some samples from occluded CK+ images are shown in Fig. 5. In a similar way to the unocclusion scene, the five last frames of each video sequence were treated as static images. Two of them were processed to become occlusion images and collected to form the testing set, the remaining three images were used for training set. The detailed results of random, mouth, and eyes occlusions are reported in Tables 5, 6, and 7, respectively. It can be seen that the highest recognition rate is achieved repeatedly by our proposed method, proCMF. Moreover, proCMF performs much more robust and stable than other methods. In this occlusion situation, the proposed proCMF maintains its discriminating ability, while the performances of other methods decrease significantly. In order to witness this statement, we calculated the gap between the non-occlusion and the occlusion cases on the CK+ dataset. The smallest decline on the average recognition rate is achieved by proCMF and uRNGE (15.98%), followed by ssRNGE with 17.86%. Inversely, proNMF, NeNMF, and conNMF dropped by the biggest proportions of 34.43, 34.29, and 40.40%, respectively. The recognition rates of the rest of the methods range from 20.84 to 27.54%. In overall, it can be stated that the proposed method works well not only in recognizing unoccluded faces but also in the occlusion case.

Fig. 5
figure 5

Cropped face images of six facial expressions with occlusions from the CK+ dataset

Table 5 Facial expression recognition rate (%) using the occluded CK+ with different subspace dimensionalities (case of occluded randomly images)
Table 6 Facial expression recognition rate (%) using the occluded CK+ with different subspace dimensionalities (case of occluded mouth)
Table 7 Facial expression recognition rate (%) using the occluded CK+ with different subspace dimensionalities (case of occluded eyes)

The confusion matrix is figured out in terms of unocclusion, eyes and mouth occlusion for the proposed proCMF. The merged results are given in Table 8. It is observed that, in case of no occlusion, happiness and neutral can be classified well with highest accuracy (100%), while the other five expressions are recognized with lower accuracy (96–99%). For occluded mouth images, sadness is recognized with the lowest accuracy (71.86%) since sadness is highly confused to disgust and surprise. In contrast, it seems that eyes occlusion most affect to recognizing disgust expression. There is a significant decreasing to 63.16% on discriminating disgust by the confusing with sadness and anger.

Table 8 Confusion matrix of 7-class facial expression recognition using proCMF on CK+ dataset with unocclusion and eyes and mouth occlusions (%)

Regarding the evaluation of the influence of different parts on facial image to the expression recognition rates, we considered on both occlusion and unocclusion local regions. From the aligned images, we cropped fixed regions on faces to create the testing set. Figure 6 shows the 12 local facial regions with the corresponding facial expression recognition accuracies. It is shown that the eye regions have more important information than the nose and mouth regions. In fact, in case of local region experiments, the eyes-nose regions obtain 60.77% in accuracy rate; meanwhile, the mouth-nose regions just reach 33.35%. The eye regions perform 30.71% accuracy which is better than the mouth regions 9.10%. In case of occlusion experiment settings, 62.82% is the average accuracy of all algorithms on recognizing the occluded mouth images and this rate is degraded to 59.9% for the occluded eyes images.

Fig. 6
figure 6

Face regions with the recognition accuracies (%), respectively

5 Conclusions

We explored a new dimensionality reduction model which employs an implicitly cosine dissimilarity metric by transforming the real data to the complex domain and sets up a complex projection matrix in the setting of an unbounded optimization problem. Under the simple framework without any added label information, the proposed method, proCMF, successfully extracted local facial features which are salient areas in facial expression formation. The experiments validate convincingly that the proposed proCMF algorithm can much outperform the popular NMF algorithms for facial expression recognition. The proposed model shows its potential on dealing with diverse scenarios that involve a whole face and an occluded face. In the future, we will develop the current model by incorporating more constraints and improving optimization methods to enhance the FER performance. In order to firm the effectiveness of the proposed method, we will extend evaluation on spontaneous facial expressions and apply directly complex spectrogram features on sound processing, especially the problems about sound source separation.

References

  1. Z Zeng, M Pantic, GI Roisman, TS Huang, A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)

    Article  Google Scholar 

  2. FS Abousaleh, TK Lim, WH Cheng, NH Yu, MA Hossain, MF Alhamid, A novel comparative deep learning framework for facial age estimation. EURASIP J Image Video Process. 2016;2016(1):47.

  3. TH Tsai, WH Cheng, CW You, MC Hu, AW Tsui, HY Chi, Learning and recognition of on-premise signs (OPSs) from weakly labeled street view images. IEEE Trans. Image Process. 23(3), 1047–1059 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  4. SC Hidayati, CW You, WH Cheng, KL Hua, Learning and recognition of clothing genres from full-body images. IEEE Trans. Cybern. (2017)

  5. WH Cheng, CW Wang, JL Wu, Video adaptation for small display based on content recomposition. IEEE Trans. Circuits Syst. Video Technol. 17(1), 43–58 (2007)

    Article  Google Scholar 

  6. P. Ekman, W. V. Friesen, and J. C. Hager, Facial Action Coding System: The Manual on CD ROM. A Human Face, 2002.

    Google Scholar 

  7. S Kaltwang, S Todorovic, M Pantic, in IEEE CVPR. Latent trees for estimating intensity of facial action units (2015), pp. 296–304

    Google Scholar 

  8. H Jung, S Lee, J Yim, S Park, J Kim, in IEEE ICCV. Joint fine-tuning in deep neural networks for facial expression recognition (2015), pp. 2983–2991

    Google Scholar 

  9. JS Riera, K Srinivasan, KL Hua, WH Cheng, MA Hossain, MF Alhamid, Robust RGB-D hand tracking using deep learning priors. IEEE Trans. Circuits Syst. Video Technol. (2017)

  10. K He, X Zhang, S Ren, J Sun, in IEEE CVPR. Deep residual learning for image recognition (2016), pp. 770–778

    Google Scholar 

  11. K Zhang, Y Huang, Y Du, L Wang, Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017)

    Article  MathSciNet  Google Scholar 

  12. D Lee, H Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401, 755–791 (1999)

    MATH  Google Scholar 

  13. D Lee, H Seung, in NIPS. Algorithms for non-negative matrix factorization (2000), pp. 556–562

    Google Scholar 

  14. B Wu, T Mei, WH Cheng, YD Zhang, Unfolding Temporal Dynamics: Predicting Social Media Popularity Using Multi-Scale Temporal Decomposition. The 30th AAAI Conference on Artificial Intelligence (AAAI - Association for the Advancement of Artificial Intelligence, Phoenix, 2016), pp. 12–17

  15. B Heisele, P Ho, J Wu, T Poggio, Face recognition: component-based versus global approaches. Comput. Vis. Image Underst. 91, 6–12 (2003)

    Article  Google Scholar 

  16. J Ellison, D Massaro, Featural evaluation, integration, and judgment of facial affect. J. Exp. Psychol. Hum. Percept. Perform. 23(1), 213–226 (1997)

    Article  Google Scholar 

  17. A Cichocki, R Zdunek, S Amari, in Int. Conf. Independent Component Analysis and Signal Separation. Csisz’ar’s divergences for non-negative matrix factorization: Family of new algorithms (2006), pp. 32–39

    Chapter  Google Scholar 

  18. D Kong, C Ding, H Huang, in ACM Int. Conf. Information and Knowledge Management. Robust nonnegative matrix factorization using L2,1 norm (2011), pp. 673–682

    Google Scholar 

  19. R Sandler, M Lindenbaum, Nonnegative matrix factorization with earth mover’s distance metric for image analysis. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1590–1602 (2011)

    Article  Google Scholar 

  20. N Guan, D Tao, Z Luo, J Shawe-Taylor. (2012), MahNMF: Manhattan Non-negative Matrix Factorization. [Online]. Available: https://arxiv.org/abs/1207.3438v1. Accessed 14 July 2012

    Google Scholar 

  21. Z Yuan, E Oja, in Scandinavian Conf. Image Analysis. Projective nonnegative matrix factorization for image compression and feature extraction (2005), pp. 333–342

    Chapter  Google Scholar 

  22. I Buciu, I Pitas, in ICPR. Application of non-negative and local non negative matrix factorization to facial expression recognition (2004), pp. 288–291

    Google Scholar 

  23. S Nikitidis, A Tefas, N Nikolaidis, I Pitas, in ICIP. Facial expression recognition using clustering discriminant non-negative matrix factorization (2011), pp. 3001–3004

    Google Scholar 

  24. S Nikitidis, A Tefas, N Nikolaidis, I Pitas, Subclass discriminant nonnegative matrix factorization for facial image analysis. Pattern Recogn. 45(12), 4080–4091 (2012)

    Article  MATH  Google Scholar 

  25. X Chen, T Huang, Facial expression recognition: a clustering-based approach. Pattern Recogn. Lett. 24(9), 1295–1302 (2003)

    Article  MATH  Google Scholar 

  26. R Zhi, M Flierl, Q Ruan, W Kleijin, Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition. IEEE Trans. Syst. Man Cybern. 41(1), 38–52 (2011)

    Article  Google Scholar 

  27. Y Tu, C Hsu, in IEEE ICPR. Dual subspace nonnegative matrix factorization for person-invariant facial expression recognition (2012), pp. 2391–2394

    Google Scholar 

  28. H Kung, Y Tu, C Hsu, Dual subspace nonnegative graph embedding for identity-independent expression recognition. IEEE Trans. Inf. Forensics Secur. 10(3), 626–638 (2015)

    Article  Google Scholar 

  29. Z Yang, Z Yuan, J Laaksonen, Projective non-negative matrix factorization with applications to facial image processing. Int. J. Pattern Recognit Artif Intell. 21(8), 1353–1362 (2007)

    Article  Google Scholar 

  30. Z Yang, E Oja, Linear and nonlinear projective non-negative matrix factorization. IEEE Trans. Neural Netw. 21, 734–749 (2010)

    Article  Google Scholar 

  31. M Moskowitz, A Course in Complex Analysis in One Variable (World Scientific Publishing Co., Singapore, 2002), p. 7

  32. S Liwicki, G Tzimiropoulos, S Zafeiriou, M Pantic, Euler principal component analysis. Int. J. Comput. Vis. 101(3), 498–518 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  33. J Kim, Y He, H Park, Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. Glob. Optim. 58, 285–319 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  34. J Barata, M Hussein, The Moore–Penrose pseudoinverse: a tutorial review of the theory. Physics 42, 146–165 (2012)

    Google Scholar 

  35. DH Brandwood, A complex gradient operator and its application in adaptive array theory. IEEE Proc. F 130(1), 11–16 (1983)

    MathSciNet  Google Scholar 

  36. C Lin, Projected gradient methods for non-negative matrix factorization. Neural Comput. 19, 2756–2779 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  37. D Bertsekas, On the Goldstein-Levitin-Polyak gradient projection method. IEEE Trans. Automat. Contr. 21(2), 174–184 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  38. P Lucey, JF Cohn, T Kanade, J Saragih, Z Ambadar, I Matthews, in Proceedings of IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Workshops. The extended Cohn–Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression (2010), pp. 94–101

    Google Scholar 

  39. M Lyons, S Akamatsu, M Kamachi, J Gyoba, in Proceedings of IEEE Conf. Autom. Face Gesture Recog. Coding facial expressions with Gabor wavelets (1998), pp. 200–205

    Chapter  Google Scholar 

  40. C Ding, T Li, MI Jordan, Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 45–55 (2010)

    Article  Google Scholar 

  41. D Wang, T Li, C Ding, in Proceedings of IEEE ICDM. Weighted feature subset nonnegative matrix factorization and its applications to document understanding (2010), pp. 541–550

    Google Scholar 

  42. N Guan, D Tao, Z Luo, B Yuan, NeNMF: an optimal gradient method for non-negative matrix factorization. IEEE Trans. Signal Process. 60(6), 2882–2898 (2012)

    Article  MathSciNet  Google Scholar 

  43. K Fukunaga, Statistical Pattern Recognition (Academic Press Professional, Inc., San Diego, 1990)

  44. J Yang, S Yang, Y Fu, X Li, T Huang, in Proceedings of IEEE CVPR. Nonnegative graph embedding (2008), pp. 1–8

    Google Scholar 

  45. H Zhang, ZJ Zha, Y Yang, S Yan, TS Chua, Robust (semi) nonnegative graph embedding. IEEE Trans. Image Process. 23(1), 2996–3012 (2014)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was financially supported in part by the National Central University, Taiwan, through the NCU International Student Scholarship. The authors would like to thank the anonymous reviewers for their valuable comments that significantly improved the quality of this paper.

Author information

Authors and Affiliations

Authors

Contributions

VHD made the main contributions to the conception and tracking algorithms’ design, as well as drafting the article. BTP contributed to the design of experiments. YSL, JJD, and PTB provided the technical advices. MQB checked the manuscript and contributed to the rearrangement of the materials. JJW offered critical suggestions on the algorithms’ design, provided significant revising for important intellectual content, and gave final approval of the current version to be submitted. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jia-Ching Wang.

Ethics declarations

Consent for publication

Informed consent was obtained from all authors included in the study.

Competing interests

The authors declare that they have no competing interests.

Ethical approval

All data and procedures performed in paper were in accordance with the ethical standards of research community. This paper does not contain any studies with human participants or animals performed by any of the authors.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duong, VH., Lee, YS., Ding, JJ. et al. Projective complex matrix factorization for facial expression recognition. EURASIP J. Adv. Signal Process. 2018, 10 (2018). https://doi.org/10.1186/s13634-017-0521-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13634-017-0521-9

Keywords