EURASIP Journal on Applied Signal Processing 2002:10, 1021–1038 c ○ 2002 Hindawi Publishing Corporation Parameterized Facial Expression Synthesis Based on MPEG-4

In the framework of MPEG-4, one can include applications where virtual agents, utilizing both textual and multisensory data, including facial expressions and nonverbal speech help systems become accustomed to the actual feelings of the user. Applications of this technology are expected in educational environments, virtual collaborative workplaces, communities, and interactive entertainment. Facial animation has gained much interest within the MPEG-4 framework; with implementation details being an open research area (Tekalp, 1999). In this paper, we describe a method for enriching human computer interaction, focusing on analysis and synthesis of primary and intermediate facial expressions (Ekman and Friesen (1978)). To achieve this goal, we utilize facial animation parameters (FAPs) to model primary expressions and describe a rule-based technique for handling intermediate ones. A relation between FAPs and the activation parameter proposed in classical psychological studies is established, leading to parameterized facial expression analysis and synthesis notions, compatible with the MPEG-4 standard.


INTRODUCTION
Research in facial expression analysis and synthesis has mainly concentrated on primary or archetypal emotions. In particular, sadness, anger, joy, fear, disgust, and surprise are categories of emotions that attracted most of the interest in human computer interaction environments. Very few studies [1] have appeared in the computer science literature, which explore nonarchetypal emotions. This trend may be due to the great influence of the works of Ekman and Friesen [2,3] and Izard et al. [4] who proposed that the archetypal emotions correspond to distinct facial expressions which are supposed to be universally recognizable across cultures. On the contrary, psychological researchers have extensively investigated [5,6] a broader variety of emotions. An extensive survey on emotion analysis can be found in [7].
MPEG-4 indicates an alternative way of modeling facial expressions and the underlying emotions, which is strongly influenced by neurophysiological and psychological studies. The facial animation parameters (FAPs) that are utilized in the framework of MPEG-4 for facial animation purposes, are strongly related to the action units (AUs) which consist the core of the facial action coding system (FACS) [8].
One of the studies carried out by psychologists and which can be useful to researchers of the area of computer graphics and machine vision is the one of Whissel [5], who suggested that emotions are points in a space with a relatively small number of dimensions, which with a first approximation are only two: activation and evaluation. From the practical point of view, evaluation seems to express internal feelings of the subject and its estimation through face formations is intractable. On the other hand, activation is related to the facial muscles movement and can be more easily estimated based on facial characteristics.
In this work, we present a methodology for analyzing and synthesizing both primary and intermediate expressions, taking into account the results of Whissel's study and in particular the activation parameter. The proposed methodology consists of three steps.

Description of the archetypal expressions through particular FAPs
In order to do this, we translate facial muscle movementsdescribing expressions through muscle actions-into FAPs and create a vocabulary of FAPs for each archetypal expression. FAPs required for the description of the archetypal expressions are also experimentally verified through analysis of prototype datasets. In order to make comparisons with real expression sequences, we model FAPs employed in the facial expression formation through the movement of particular feature points (FPs)-the selected FPs can be automatically detected from real images or video sequences. The derived models can also serve as a bridge between expression analysis and expression synthesis disciplines [9].

Estimation of the range of variation of FAPs that are involved in each of the archetypal expressions
This is achieved by analyzing real images and video sequences as well as by animating synthesized examples.

Modelling of intermediate expressions
This is achieved through combination, in the framework of a rule-based system, of the activation parameter-known from Whissel's-with the description of the archetypal expressions by FAPs. Figure 1 illustrates the way the proposed scheme functions. The facial expression synthesis system operates either by utilizing FAP values estimated by an image analysis subsystem, or by rendering actual expressions recognized by a fuzzy rules system. In the former case, protuberant facial points motion is analyzed and translated to FAP value variation, which in turn is rendered using the synthetic face model, so as to reproduce the expression in question. Should the results of the analysis coincide with the systems knowledge of the definition facial expression, then the expression can be rendered using predefined FAP alteration tables. These tables are computed using the known definition of archetypal emotions, fortified by video data of actual human expressions. In this case, any intermediate expressions can be rendered using interpolation rules derived by the emotion wheel.
The paper is organized as follows. In Sections 2, 3, and 4 the three legs of the proposed methodology are presented.

Requests for synthesizing expressions
Activation for several emotion-related words (a priori knowledge)

FAPs involved in Archetypal expressions
Expressions' profiles  In Section 5, a way of utilizing the proposed emotions synthesis scheme for emotion analysis purposes is described. In Section 6, experimental results, which illustrate the performance of the presented approach, are given. Finally, conclusions are given in Section 7.

DESCRIPTION OF THE ARCHETYPAL EXPRESSIONS USING FAPS
In general, facial expressions and emotions are described by a set of measurements and transformations that can be considered atomic with respect to the MPEG-4 standard. In this way, we can describe both the anatomy of a human facebasically through FDPs, as well as animation parameters, with groups of distinct tokens, eliminating the need for specifying the topology of the underlying geometry. These tokens can then be mapped to automatically detected measurements and indications of motion on a video sequence and, thus, help to approximate a real expression conveyed by the subject by means of a synthetic one. Modelling facial expressions and underlying emotions through FAPs serves several purposes: (i) provides the compatibility of synthetic sequences, created using the proposed methodology, with the MPEG-4 standard; (ii) archetypal expressions occur rather infrequently; in most cases, emotions are expressed through variation of a few discrete facial features which are directly related with particular FAPs. Moreover, distinct FAPs can be utilized for communication between humans and computers in a paralinguistic form-expressed by facial signs; (iii) FAPs do not correspond to specific models or topologies; synthetic expressions can be animated by different (than the one that corresponds to the real subject) models or characters.
Two basic issues should be addressed when modelling archetypal expression: (i) estimation of FAPs that are involved in their formation, (ii) definition of the FAP intensities. The former is examined in the current section, while the latter is explained in Section 5.
It is a general truth that the facial action coding system (FACS) has influenced the research on expression analysis in a high degree. FACS is a system which tries to distinguish the visually distinguishable facial movements using the knowledge of facial anatomy. FACS uses Action Units (AU) as measurement units. An AU could combine the movement of two muscles or work in the reverse way, that is, split into several muscle movement.
MPEG-4 FAPs are also strongly related to the AU; this is shown in Table 1. Description of archetypal expressions by means of muscle movements and AUs has been the starting point for setting the archetypal expression description through FAPs.
Hints for this mapping were obtained from psychological studies [2,10,11] which refer to face formation during expression generation, as well as from experimental data provided by classic databases as Ekman's (static) and MediaLab's (dynamic)-see also Section 3. Table 2 illustrates the description of archetypal expressions and some variations of them, using the MPEG-4 FAP's terminology. It should be noted that the sets shown in Table 2 consist of the vocabulary of FAPs to be used for each archetypal expression, and not a particular profile for synthesizing expressions; this means that if animated, they would not necessarily produce the corresponding expression. In the following, we define an expression profile to be a subset of the FAPs vocabulary, corresponding to a particular expression, accompanied with FAP intensities, that is the actual ranges of variation, which if animated creates the requested expression. Several expression profiles based on the FAPs vocabulary proposed in Table 2 are shown in the experimental results section.

THE RANGE OF VARIATION OF FAPS IN REAL VIDEO SEQUENCES
An important issue, useful to both emotion analysis and synthesis systems, is the range of variation of the FAPs that are involved in facial expression formation. From the synthesis point of view, a study has been carried out [5] which refers to FAP's range definition. However, the suggested ranges of variation are rather loose and cannot be used for analysis purposes. In order to have clear cues about FAP's range of variation in real video sequences, we analyzed two wellknown datasets, showing archetypal expressions, Ekman's raise l o eyebrow + raise r o eyebrow + raise l m eyebrow + raise r m eyebrow + raise l i eyebrow + raise r i eyebrow + squeeze l eyebrow + squeeze r eyebrow AU5 close t l eyelid + close t r eyelid AU6 lift l cheek + lift r cheek AU7 close b l eyelid + close b r eyelid AU8 AU9 lower t midlip + raise nose + stretch l nose + stretch r nose AU10 raise nose (+ stretch l nose + stetch r nose) + lower t midlip AU11 AU12 push t lip + Push b lip (+ lower lowerlip + lower t midlip + raise b midlip) AU13 AU14 AU15 lower l cornerlip + lower r cornerlip AU16 AU17 depress chin AU18 AU19 AU20 raise b midlip + lower l cornerlip + lower r cornerlip + stretch l cornerlip + stretch r cornerlip + lower t lip lm + raise b lip lm + lower t lip lm o + raise b lip lm o + raise l cornerlip o + lower t lip rm + raise b lip rm +lower t lip rm o + raise b lip rm o +raise r cornerlip o (static) [2] and MediaLab's (dynamic) [12], and computed statistics about the involved FAPs. Both sets show extreme cases of expressions, rather than every day ones. However, they can be used for setting limits to the variance of the respective FAPs [13]. To achieve this, however, a way of modeling FAPs through the movement of facial points is required. Analysis of FAP's range of variation in real images and video sequences is used next for two purposes: (i) to verify and complete the proposed vocabulary for each archetypal expression, (ii) to define profiles of archetypal expressions.

Modeling FAPs through FP's movement
Although FAPs are practical and very useful for animation purposes, they are inadequate for analyzing facial expressions Joy open jaw (F 3 ), lower t midlip (F 4 ), raise b midlip (F 5 ), stretch l cornerlip (F 6 ), stretch r cornerlip (F 7 ), raise l cornerlip (F 12 ), raise r cornerlip (F 13 ), close t l eyelid (F 19  from video scenes or still images. The main reason for that is the absence of a clear quantitative definition of FAPs (at least of most of them) as well as their nonadditive nature. Note here that the same problem holds for the FACS action units. This is quite normal, due to the strong relationship between particular AUs and FAPs (see Table 1). In order to be able to measure FAPs in real images and video sequences, we should define a way of describing them through the movement of some points that lie in the facial area and are able to be automatically detected. Such a description could get advantage of the extended research made on automatic facial points detection [14,15]. Quantitative description of FAPs based on particular FPs, that correspond to protuberant facial points' movement, provides the means of bridging the gap between expression analysis and animation/synthesis. In the expression analysis case the nonadditive property of the FAPs can be addressed by a fuzzy rule system, similar to the one described later for creating profiles for intermediate expressions.
Quantitative modeling of FAPs is implemented using the features labeled as f i (i = 1, . . . , 15) in Table 3 [16]. The feature set employs FPs that lie in the facial area and, under some constraints, can be automatically detected and tracked.
It consists of distances, noted as s(x, y) where x and y correspond to feature points shown in Figure 2b, between these protuberant points, some of which are constant during expressions and are used as reference points. Distances between reference points are used for normalization (see Figure 2a).
The units for f i are identical to those corresponding to FAPs, even in cases where no one to one relation exists.
It should be noted that not all FAPs included in the vocabularies shown in Table 2 can be modeled by distances between facial protuberant points (e.g., raise b lip lm o, lower t lip lm o). In such cases the corresponding FAPs are retained in the vocabulary and their ranges of variation are experimentally defined based on facial animations. Moreover, some features serve for the estimation of range of variation of more than one FAP (e.g., features f 12 , f 13 , f 14 , and f 15 ).

Vocabulary verification
To obtain clear cues about the FAPs' range of variation in real video sequences, as well as to verify the vocabulary of FAPs involved in each archetypal emotion, we analyzed two well-known datasets, showing archetypal expressions:  Ekman's (static) [2] and MediaLab's (dynamic) [12]. The analysis was based on the FAPs' qualitative modelling described in the previous section. Computed statistics are summarized in Table 4. Mean values provide typical values that can be used for particular expression profiles, while the standard deviation can define the range of variation (see also Section 3.3). The units of shown values are those of the corresponding FAPs [17]. The symbol ( * ) expresses the absence of the corresponding FAP in the vocabulary of the particular expression while the symbol (-) shows that, although the corresponding FAP is included in the vocabulary, it has not been verified by the statistical analysis. The latter case shows that not all FAPs included in the vocabulary are experimentally verified.
The detection of the facial points subset used to describe the FAPs involved in the archetypal expressions was based on the work presented in [18]. To obtain accurate detection, in many cases, human assistance was necessary. The authors are Table 3: Quantitative FAPs modeling: (1) s(x, y) is the Euclidean distance between the FPs x and y shown in Figure 2b, (2) D i-NEUTRAL refers to the distance D i when the face is in its neutral position.

FAP name
Feature for the description Utilized feature Unit squeeze l eyebrow (F 37 ) working towards a fully automatic implementation of the FP detection procedure. Figure 3 illustrates particular statistics, computed over the previously described datasets, for the expression joy. In all diagrams, horizontal axis shows the indices of the features of Table 3, while vertical axis shows the value of the corresponding feature: Figure 3a shows the minimum values of the features, Figure 3b the maximum values, and Figure 3c the mean values. From this figure, it is confirmed, for example, that lower t midlip (feature with index 3), which refers to lowering the middle of the upper lip, is employed, since even the maximum value for this FAP is below zero. In the same way, the FAPs raise l m eyebrow, raise r m eyebrow, close t l eyelid, close t r eyelid, close b l eyelid, close b r eyelid, stretch l cornerlip, stretch r cornerlip (indices 9, 10, 12, 13, 14) are verified. Some of the above FAPs are described using a single variable. For example the stretch l cornerlip and stretch r cornerlip are both modelled via f 14 . The values, shown in Table 4, result by dividing the values of feature f 14 . Similarly to Figure 3, Figure 4 illustrates particular statistics for the expression surprise.

Creating archetypal expression profiles
An archetypal expression profile is a set of FAPs accompanied by the corresponding range of variation, which, if animated, produces a visual representation of the corresponding emotion. Typically, a profile of an archetypal expression consists of a subset of the corresponding FAPs' vocabulary coupled with the appropriate ranges of variation. The statistical expression analysis performed on the above mentioned datasets is useful for FAPs' vocabulary completion and verification, as well as for a rough estimation of the range of variation of FAPs, but not for profile creation. In order to define exact profiles for the archetypal expressions, we combined the following three steps: (a) we defined subsets of FAPs that are candidates to form an archetypal expression, by translating the proposed by psychological studies [2,10,11] face formations to FAPs, (b) we used the corresponding ranges of variations obtained from Table 4, (c) we animated the corresponding profiles to verify appropriateness of derived representations.
The initial range of variation for the FAPs has been computed as follows: let m i, j and σ i, j be the mean value and standard deviation of FAP F j for the archetypal expression i (where i = {1 ⇒ Anger, 2 ⇒ Sadness, 3 ⇒ Joy, 4 ⇒ Disgust, 5 ⇒ Fear, 6 ⇒ Surprise}), as estimated in Table 4. The initial range of variation X i, j of FAP F j for the archetypal expression i is defined as   for bi-directional, and or for unidirectional FAPs [17]. Generally speaking, for animation purposes, every MPEG-4 decoder has to provide and use an MPEG-4 compli-  ant face model whose geometry can be defined using FDPs, or should define the animation rules being based on face animation tables (FATs). Using FATs, we can explicitly specify the model vertices that will be spatially deformed for each FAP, as well as the magnitude of the deformation. This is in essence a mapping mechanism of each FAP, that represents a high-level semantic animation directive, to a lower-level, model specific deformation. An MPEG-4 decoder can use its own animation rules or receive a face model accompanied by the corresponding face animation tables (FATs) [19,20]. For our experiments on setting the archetypal expression profiles, we used the face model developed in the context of the European project ACTS MoMuSys [21], being freely available at http://www.iso.ch/ittf. Table 5 shows some examples of archetypal expression profiles, which were created based on our method. Figure 5 shows some examples of animated profiles. Figure 5a shows a particular profile for the archetypal expression anger, while Figures 5b and 5c show alternative profiles of the same expression. The difference between them is due to FAP intensities. Difference in FAP intensities is also shown in Figures 5d and 5e, both illustrating the same profile of expression surprise. Finally Figure 5f shows an example of a profile of the expression joy.

CREATING PROFILES FOR INTERMEDIATE EXPRESSIONS
In this section we propose a way for creating profiles for intermediate expressions, used to describe the visual portion of corresponding emotions. The limited number of studies, carried out by computer scientists and engineers [7], dealing with emotions other than the archetypal ones, lead us to search in other subject/discipline bibliographies. Psychologists examined a broader set of emotions [13], but very few of the corresponding studies provide exploitable results to computer graphics and machine vision fields. One of these studies has been carried out by Whissel [5] and suggests that emotions are points in a space spanning a relatively small number of dimensions, which with a first approximation, seem to occupy two axes: activation and evaluation. Activation is the degree of arousal associated with the term, as shown in the "activation" column of Table 6, terms like patient (at 3.3 in Table 6) represent a midpoint, surprised (over 6) represent high activation, and bashful (around 2) represent low activation. Evaluation is the degree of pleasantness associated with the term, with guilty, as shown in the "evaluation" column of Table 6 at 1.1, representing the negative extreme and delighted (at 6.4) representing the positive extreme [5]. From the practical point of view, evaluation seems to express internal feelings of the subject and its estimation through face formations is intractable. On the other hand, activation is related to facial muscles' movement and can be easily estimated based on facial characteristics. The third column in Table 6 represents Plutchik's [6] observation that emotion terms are unevenly distributed through the space defined by dimensions like Whissel's. Instead, they tend to form an approximately circular pattern called emotion wheel. Shown values refer to an angular measure, which runs from Acceptance (0) to Disgust (180).
For the creation of profiles for intermediate emotions we consider two cases: (a) emotions that are similar, in nature, to an archetypal one; for example they may differ only in the intensity of muscle actions; (b) emotions that cannot be considered as related to any of the archetypal ones.
In both cases we proceed by following the following steps: (i) utilize either the activation parameter or Plutchik's angular measure as a priori knowledge about the intensity of facial actions for several emotions. This knowledge is combined with the profiles of archetypal expressions, through a rule based system, to create profiles for intermediate emotions; (ii) animate the produced profiles for testing/correcting their appropriateness in terms of the visual similarity with the requested emotion.

Same universal emotion category
As a general rule, we can define six general categories, each one characterized by an archetypal emotion; within each of these categories, intermediate expressions are described by different emotional and optical intensities, as well as minor variation in expression details. From the synthetic point of view, emotions that belong to the same category can be rendered by animating the same FAPs using different intensities. For example, the emotion group fear also contains worry and terror [11]; these two emotions can be synthesized by reducing or increasing the intensities of the employed FAPs, respectively. In the case of expression profiles, this affect the range of variation of the corresponding FAPs which is appropriately translated; the fuzziness, that is introduced by the varying scale of the change of FAP intensity, also provides assistance in differentiating mildly the output in similar situations. This ensures that the synthesis will not render "robotlike" animation, but drastically more realistic results.
Let P (k) i be the kth profile of emotion i and X (k) i, j be the range of variation of FAP F j involved in P (k) i . If A, I are emotions belonging to the same universal emotion category, A being the archetypal, and I the intermediate one, then the following rules are applied.

Rule 1. P (k)
A and P (k) I employ the same FAPs.
Rule 2. The range of variation X (k) I, j is computed by X (k) Rule 3. a A and a I are the values of the activation parameter for emotion words A and I obtained from Whissel's study [5].

Emotions lying between archetypal ones
Creating profiles for emotions that do not clearly belong to a universal category is not straightforward. Apart from estimating the range of variations for FAPs, we should first define the vocabulary of FAPs for the particular emotion.
In order to proceed, we utilize both the emotion wheel of Plutchik [6] and especially the angular measure (shown also in Table 6), and the activation parameter. Let I be an intermediate emotion lying between archetypal emotions A 1 and A 2 (which are supposed to be the nearest, with respect to the two sides of I emotions) according to their angular measure. Let also V A1 and V A2 be the vocabularies (sets of FAPs) corresponding to A 1 and A 2 , respectively. The vocabulary V I of   emotion I emerges as the union of vocabularies V A1 and V A2 , that is, As already stated in Section 2, defining a vocabulary is not enough for modeling expressions; profiles should be created for this purpose. This poses a number of interesting issues in the case of different FAPs employed in the animation of individual profiles: in our approach, FAPs that are common in both emotions are retained during synthesis, while FAPs used in only one emotion are averaged with the respective neutral position. The same applies in the case of mutually exclusive FAPs: averaging of the intensities usually favors the most exaggerated of the emotions that are combined, whereas FAPs with contradicting intensities are cancelled out. In practice, this approach works successfully, as shown in the actual results that follow. The combination of different, perhaps contradictory or exclusive, FAPs can be used to establish a distinct emotion categorization, similar to the semantic one, with respect to the common or neighboring FAPs that are used to synthesize and animate emotions. Below, we describe the way to merge profiles of archetypal emotions and create profiles of intermediate ones.
Let P (k) A1 be the kth profile of emotion A 1 and P (l) A2 the lth profile of emotion A 2 , then the following rules are applied so as to create a profile P (m) Rule 2. If the F j is a FAP involved in both P (k) A1 and P (l) A2 with the same sign (direction of movement), then the range of variation X (k) I, j is computed as a weighted translation of X (k)

A1, j
and X (l) A2, j (where X (k) A1, j and X (l) A2, j are the ranges of variation of FAP F j involved in P (k) A1 and P (l) A2 , resp.) in the following way: (i) we compute the translated range of variations t X (k) of X (k) A1, j and X (l) A2, j , (ii) we compute the center and length c (k) and its midpoint is Rule 3. If the F j is involved in both P (k) A1 and P (l) A2 but with contradictory sign (opposite direction of movement), then the range of variation X (k) I, j is computed by In case where X (k) I, j is eliminated (which is the most possible situation), F j is excluded from the profile.
Rule 4. If the F j is involved only in one of P (k) A1 and P (l) A2 , then the range of variation X (k) I, j will be averaged with the corresponding of the neutral face position, that is, X (m) A1, j or X (m) I, j = (a I /(2 * a A2 ))X (l) A2, j . Rule 5. a A1 , a A2 , and a I are the values of the activation parameter for emotion words A 1 , A 2 , and I, obtained from Whissel's study [5].
It should be noted that the profiles, created using the above rules, have to be animated for testing and correction purposes; the final profiles are those that present an acceptable visual similarity with the requested real emotion.

THE EMOTION ANALYSIS SUBSYSTEM
In this section, we present a way of utilizing emotion modeling through profiles, for emotion understanding purposes. By doing this, we show that modeling emotions serves both synthesis as well as analysis purposes.
Consider as input to the emotion analysis sub-system a 15-element length feature vectorf that corresponds to the 15 features f i shown in Table 3. The particular values off can be rendered to FAP values as shown in the same table (see also Section 3.1) resulting in an input vectorḠ. The elements of G express the observed values of the corresponding involved FAPs; for example G 1 refers to the value of F 37 .
Let X (k) i, j be the range of variation of FAP F j involved in the kth profile P (k) i of emotion i. If c (k) i, j and s (k) i, j are the middle point and length of interval X (k) i, j , respectively, then we describe a fuzzy class A (k) i, j for F j , using the membership function µ (k) i, j shown in Figure 6. Also let ∆ (k) i, j be the set of classes A (k) i, j that correspond to profile P (k) i ; the beliefs p (k) i and b i that an observed, through the vectorḠ, facial state corresponds to profile P (k) i and emotion i, respectively, are computed through the following equations: where expresses the relevance r (k) i, j of the ith element of the input feature vector with respect to class A (k) i, j . Actuallyḡ = A (Ḡ) = {g 1 , g 2 , . . .} is the fuzzified input vector resulting from a singleton fuzzification procedure [22].
If a final decision about what is the observed emotion has to be made, then the following equation is used: It is observed through (8) that the various emotion profiles correspond to the fuzzy intersection of several sets and are implemented through a t-norm of the form t(a, b) = a·b. Similarly, the belief that an observed feature vector corresponds to a particular emotion results from a fuzzy union of several sets (see (9)) through an s-norm which is imple- mented as u(a, b) = max(a, b).
It should be noted that in the previously described emotion analysis system, no hypothesis has been made about the number of recognizable emotions; this number is limited only from the available modeled profiles. Thus, the system can be used for analyzing either as few as the archetypal emotions or much more, using the methodology described in Section 4 to create profiles for intermediate emotions.

EXPERIMENTAL RESULTS
In this section, we show the efficiency of the proposed scheme on synthesizing archetypal and intermediate emotions according to the methodology described in the previous sections. Animated profiles were created using the face model developed in the context of the European project ACTS Mo-MuSys [21], as well as the 3D model of the software package Poser, edition 4 of Curious Labs Company. This model has separate parts for each moving face part. The Poser model interacts with the controls in Poser and has joints that move realistically, as in real person. Poser mirrors real face movements by adding joint parameters to each face part. This allows us to manipulate the figure based on those parameters. We can control the eyes, the eyebrows, and the mouth of the model by filling the appropriate parameters; to do this a mapping from FAPs to Poser parameters is necessary. We did this mapping mainly experimentally; the relationship between FAPs and Poser parameters is more or less straightforward.
The first set of experiments shows synthesized archetypal expressions (see expressions more effectively, since the FAT mechanism can approximate the effect of muscle deformation, which accounts for the shape of the face during expressions. In the case of Figures 9 and 11 the decoder only utilises the FAPs supplied and thus, the final result depends on the predefined mapping between the animation parameters and the low polygon model.

Creating profiles for emotions belonging to the same universal category
In this section, we illustrate the proposed methodology for creating profiles for emotions that belong to the same universal category as an archetypal one. Emotion terms afraid, terrified, and worried are considered to belong to the emotion category fear [11] whose modeling base is the term afraid. In Table 7 are shown the produced profiles for the terms terrified and worried emerged by the one of the profiles of afraid (in particular P (8) F ). The range of variation X (8) T, j of FAP F j belonging to the eighth profile of emotion term terrified is computed by the equation X (8) T, j = (6.3/4.9)X (8) F, j , where X (8) F, j is the range of variation of FAP F j belonging to the eighth profile of emotion term afraid. Similarly, X (8) W, j = (3.9/4.9)X (8) F, j is the range of variation of FAP F j belonging to the eighth profile of emotion term worried. Figures 8 and 9 show the animated profiles for emotion terms afraid, terrified, and worried, respectively. The FAP values that we used are the median ones of the corresponding ranges of variation.

Creating profiles for emotions lying between
the archetypal ones In this section, we describe the method of creating a profile for the emotion guilt. According to the Plutchik's angular measure (see Table 6), emotion term guilty (angular measure 102.3 degrees) lies between the archetypal emotion terms afraid (angular measure 70.3 degrees) and sad (angular measure 108.5 degrees), being closer to the latter. According to Section 4.2 the vocabulary V G of emotion guilt emerges as the union of vocabularies V F and V S , that is, V G = V F ∪ V S , where V F and V S are the vocabularies corresponding to emotions fear and sad, respectively. In Table 8 it is shown the produced profile for the term guilty emerged by the one of the profiles of afraid (in particular P (8) F ) and sad (P (0) S ). FAPs F 3 , F 5 , F 33 , F 34 , F 35 , and F 36 are included only in the P (8) F and therefore the corresponding ranges of variation in the emerging guilty profile P (m) G (mth guilty profile) are computed by averaging the ranges of variation of P (8) F with the neutral face, according to Rule and X (m) G, 19 corresponds to the range [−110, −310].

CONCLUSION, DISCUSSION, AND FURTHER WORK
In this work, we have proposed a complete framework for creating visual profiles, based on FAPs, for intermediate (not primary) emotions. Emotion profiles can serve either the vision part of an emotion recognition system, or a client side application that creates synthetic expressions. The main advantage of the proposed system is its flexibility.
(i) No hypothesis about what the expression analysis system is (see Figure 1), should be made; it is enough to provide either the name of the conveyed emotion, or just the movement of a predefined set of FPs. In the former case, the proposed fuzzy system serves as an agent for synthesizing expressions, while in the latter case it functions as an autonomous emotion analysis system.
(ii) It is extensible with respect to completing (or modifying) the proposed vocabulary of FAPs for the archetypal expressions (iii) The range of variation of FAPs that involved in the archetypal expression profiles can be modified. Note however that this modification affects the profiles that created for intermediate expressions.
(iv) It is extensible with respect to the number of intermediate expressions that can be modeled.
Exploitation of the results obtained by psychological studies related with emotion recognition from computer scientists is possible although not straightforward. We have shown that terms like the emotion wheel and activation are suitable for extending the emotions that can be visually modeled.
The main focus of the paper is on synthesizing MPEG-4 compliant facial expressions; realistic generic animation is another interesting issue which would indeed require specific FATs. This constitutes a topic for further developments.
The results presented indicate that the use of FATs, while not essential, enhances the obtained results. However, in cases of low bitrate applications where speed and responsiveness are more important than visual fidelity, the FAT functionality may be omitted, since it imposes considerable overhead on the data stream. Samples of the emotional animation, including values and models, used in this paper can be found at http://www.image.ntua.gr/mpeg4. Amaryllis Raouzaiou was born in Athens, Greece in 1977. She graduated from the Department of Electrical and Computer Engineering, the National Technical University of Athens in 2000 and she is currently pursuing the Ph.D. degree at the same university. Her current research interests lie in the areas of synthetic-natural hybrid video coding, human-computer interaction, machine vision, and neural networks. She is a member of the Technical Chamber of Greece. She is with the team of IST project ERMIS (Emotionally Rich Man-Machine Interaction Systems, IST-2000-29319).

Nicolas Tsapatsoulis was born in Limassol,
Cyprus in 1969. He graduated from the Department of Electrical and Computer Engineering, the National Technical University of Athens in 1994 and received his Ph.D. degree in 2000 from the same university. His current research interests lie in the areas of human-computer interaction, machine vision, image and video processing, neural networks, and biomedical engineering. He is a member of the Technical Chambers of Greece and Cyprus and a member of IEEE Signal Processing and Computer societies. Dr. Tsapatsoulis has published nine papers in international journals and more than 20 in proceedings of international conferences. He served as Technical Program Co-Chair for the VLBV '01 workshop. He is a reviewer of the IEEE Transactions on Neural Networks and IEEE Transactions on Circuits and Systems for Video Technology journals. Since 1995 he has participated in ten research projects at Greek and European level.