Using Noninvasive Wearable Computers to Recognize Human Emotions from Physiological Signals

We discuss the strong relationship between a ﬀ ect and cognition and the importance of emotions in multimodal human computer interaction (HCI) and user modeling. We introduce the overall paradigm for our multimodal system that aims at recognizing its users’ emotions and at responding to them accordingly depending upon the current context or application. We then describe the design of the emotion elicitation experiment we conducted by collecting, via wearable computers, physiological signals from the autonomic nervous system (galvanic skin response, heart rate, temperature) and mapping them to certain emotions (sadness, anger, fear, surprise, frustration, and amusement). We show the results of three di ﬀ erent supervised learning algorithms that categorize these collected signals in terms of emotions, and generalize their learning to recognize emotions from new collections of signals. We ﬁnally discuss possible broader impact and potential applications of emotion recognition for multimodal intelligent systems.


INTRODUCTION
The field of human-computer interaction (HCI) has recently witnessed an explosion of adaptive and customizable human-computer interfaces which use cognitive user modeling, for example, to extract and represent a student's knowledge, skills, and goals, to help users find information in hypermedia applications, or to tailor information presentation to the user.New generations of intelligent computer user interfaces can also adapt to a specific user, choose suitable teaching exercises or interventions, give user feedback about the user's knowledge, and predict the user's future behavior such as answers, goals, preferences, and actions.Recent findings on emotions have shown that the mechanisms associated with emotions are not only tightly intertwined neurologically with the mechanisms responsible for cognition, but that they also play a central role in decision making, problem solving, communicating, negotiating, and adapting to unpredictable environments.Emotions are now therefore considered as organizing and energizing processes, serving important adaptive functions.
To take advantage of these new findings, researchers in signal processing and HCI are learning more about the unsuspectedly strong interface between affect and cognition in order to build appropriate digital technology.Affective states play an important role in many aspects of the activities we find ourselves involved in, including tasks performed in front of a computer or while interacting with computerbased technology.For example, being aware of how the user receives a piece of provided information is very valuable.Is the user satisfied, more confused, frustrated, amused, or simply sleepy?Being able to know when the user needs more feedback, by not only keeping track of the user's actions, but also by observing cues about the user's emotional experience, also presents advantages.
In the remainder of this article, we document the various ways in which emotions are relevant in multimodal HCI, and propose a multimodal paradigm for acknowledging the various aspects of the emotion phenomenon.We then focus on one modality, namely, the autonomic nervous system (ANS) and its physiological signals, and give an extended survey of the literature to date on the analysis of these signals in terms of signaled emotions.We furthermore show how, using sensing media such as noninvasive wearable computers capable of capturing these signals during HCI, we can begin to explore the automatic recognition of specific elicited emotions during HCI.Finally, we discuss research implications from our results.

Interaction of affect and cognition and its relevance to user modeling and HCI
As a result of recent findings, emotions are now considered as associated with adaptive, organizing, and energizing processes.We mention a few already identified phenomena concerning the interaction between affect and cognition, which we expect will be further studied and manipulated by building intelligent interfaces which acknowledge such an interaction.We also identify the relevance of these findings on emotions for the field of multimodal HCI.

Organization of memory and learning
We recall an event better when we are in the same mood as when the learning occurred [1].Hence eliciting the same affective state in a learning environment can reduce the cognitive overload considerably.User models concerned with reducing the cognitive overload [2]-by presenting information structured in the most efficient way in order to eliminate avoidable load on working memory-would strongly benefit from information about the affective states of the learners while involved in their tasks.

Focus and attention
Emotions restrict the range of cue utilization such that fewer cues are attended to [3]; driver's and pilot's safety computer applications can make use of this fact to better assist their users.

Perception
When we are happy, our perception is biased at selecting happy events, likewise for negative emotions [1].Similarly, while making decisions, users are often influenced by their affective states.Reading a text while experiencing a negatively valenced emotional state often leads to very different interpretation than reading the same text while in a positive state.
User models aimed at providing text tailored to the user need to take the user's affective state into account to maximize the user's understanding of the intended meaning of the text.

Categorization and preference
Familiar objects become preferred objects [4].User models, which aim at discovering the user's preferences [5], also need to acknowledge and make use of the knowledge that people prefer objects that they have been exposed to (incidentally even when they are shown these objects subliminally).

Goal generation and evaluation
Patients who have damage in their frontal lobes (cortex communication with limbic system is altered) become unable to feel, which results in their complete dysfunctionality in reallife settings where they are unable to decide what is the next action they need to perform [6], whereas normal emotional arousal is intertwined with goal generation and decisionmaking, and priority setting.

Decision making and strategic planning
When time constraints are such that quick action is needed, neurological shortcut pathways for deciding upon the next appropriate action are preferred over more optimal but slower ones [7].Furthermore people with different personalities can have very distinct preference models (Myers-Briggs Type Indicator).User models of personality [8] can be further enhanced and refined with the user's affective profile.

Motivation and performance
An increase in emotional intensity causes an increase in performance, up to an optimal point (inverted U-curve Yerkes-Dodson Law).User models which provide qualitative and quantitative feedback to help students think about and reflect on the feedback they have received [9] could include affective feedback about cognitive-emotion paths discovered and built in the student model during the tasks.

Intention
Not only are there positive consequences to positive emotions, but there are also positive consequences to negative emotions-they signal the need for an action to take place in order to maintain, or change a given kind of situation or interaction with the environment [10].Pointing to the positive signals associated with these negative emotions experienced during interaction with a specific software could become one of the roles of user modeling agents.

Communication
Important information in a conversational exchange comes from body language [11], voice prosody, facial expressions revealing emotional content [12], and facial displays connected with various aspects of discourse [13].Communication will become ambiguous when these are accounted for during HCI and computer-mediated communication.

Learning
People are more or less receptive to the information to be learned depending on their liking (of the instructor, of the visual presentation, of how the feedback is given, or of who is giving it).Moreover, emotional intelligence is learnable [14], which opens interesting areas of research for the field of user modeling as a whole.
Given the strong interface between affect and cognition on the one hand [15], and given the increasing versatility of computers agents on the other hand, the attempt to enable our tools to acknowledge affective phenomena rather than to remain blind to them appears desirable.

An application-independent paradigm for modeling user's emotions and personality
Figure 1 shows the overall paradigm for multimodal HCI, which was adumbrated earlier by Lisetti [17].As shown in the first portion of the picture pointed to by the arrow usercentered mode, when emotions are experienced in humans, they are associated with physical and mental manifestations.
The physical aspect of emotions includes ANS arousal and multimodal expression (including vocal intonation, facial expression, and other motor manifestations).The mental aspect of the emotion is referred to here as subjective experience in that it represents what we tell ourselves we feel or experience about a specific situation.
The second part of the Figure 1, pointed to by the arrow medium, represents the fact that using multimedia devices to sense the various signals associated with human emotional states and combining these with various machine learning algorithms makes it possible to interpret these signals in order to categorize and recognize the user's almost probable emotions as he or she is experiencing different emotional states during HCI.
A user model, including the user's current states, the user's specific goals in the current application, the user's personality traits, and the user's specific knowledge about the domain application can then be built and maintained over time during HCIs.
Socially intelligent agents, built with some (or all) of the similar constructs used to model the user, can then be used to drive the HCIs, adapting to the user's specific current emotional state if needed, knowing in advance the user's personality and preferences, having its own knowledge about the application domain and goals (e.g., help the student learning in all situations, assist in insuring the driver's safety).
Depending upon the application, it might be beneficial to endow our agent with its own personality to best adapt to the user (e.g., if the user is a child, animating the interaction with a playful or with different personality) and its own multimodal modes of expressions-the agent-centered mode-to provide the best adaptive personalized feedback.
Context-aware multimodal adaptation can indeed take different forms of embodiments and the chosen user feedback need to depend upon the specific application (e.g., using an animated facial avatar in a car might distract the driver whereas it might raise a student's level of interest during an e-learning session).Finally, the back-arrow shows that the multimodal adaptive feedback in turn has an effect on the user's emotional states-hopefully for the better and enhanced HCI. 1, there is growing evidence indeed that emotional states have their corresponding specific physiological signals that can be mapped respectively.In Vrana's study [27], personal imagery was used to elicit disgust, anger, pleasure, and joy from participants while their heart rate, skin conductance, and facial electromyogram (EMG) signals were measured.The results showed that acceleration of heart rate was greater during disgust, joy, and anger imageries than during pleasant imagery; and disgust could be discriminated from anger using facial EMG.In Sinha and Parsons' study [30], heart rate, skin conductance level, finger temperature, blood pressure, electrooculogram, and facial EMG were recorded while the subjects were visualizing the imagery scripts given to them to elicit neutrality, fear, joy, action, sadness, and anger.The results indicated that emotion-specific response patterns for fear and anger are accurately differentiable from each other and from the response pattern neutral imagery conditions.

Previous studies on mapping physiological signals to emotions As indicated in Table
Another study, which is very much related to one of the applications we will discuss in Section 5 (and which therefore we describe at length here), was conducted by Jennifer Healey from Massachusetts Institute of Technology (MIT) Media Lab [39].The study answered the questions about how affective models of users should be developed for computer systems and how computers should respond to the emotional states of users appropriately.The results showed that people do not just create preference lists, but they use affective expression to communicate and to show their satisfaction or dissatisfaction.Healey's research particularly focused on recognizing stress levels of drivers by measuring and analyzing their physiological signals in a driving environment.
Before the driving experiment was conducted, a preliminary emotion elicitation experiment was designed where eight states (anger, hate, grief, love, romantic love, joy, reverence, and no emotion: neutrality) were elicited from participants.These eight emotions were Clynes' [40] emotion set for basic emotions.This set of emotions was chosen to be elicited in the experiment because each emotion in this set was found to produce a unique set of finger pressure patterns [40].While the participants were experiencing these emotions, the changes in their physiological responses were measured.
Guided imagery technique (i.e., the participant imagines that she is experiencing the emotion by picturing herself in a certain given scenario) was used to generate the emotions listed above.The participant attempted to feel and express eight emotions for a varying period of three to five minutes (with random variations).The experiment was conducted over 32 days in a single-subject-multiple-session setup.However only twenty sets (days) of complete data were obtained at the end of the experiment.
While the participant experienced the given emotions, her galvanic skin response (GSR), blood volume pressure (BVP), EMG, and respiration values were measured.Eleven features were extracted from raw EMG, GSR, BVP, and respiration measurements by calculating the mean, the normalized mean, the normalized first difference mean, and the first forward distance mean of the physiological signals.Elevendimensional feature space of 160 emotions (20 days × 8 emotions) was projected into a two-dimensional space by using Fisher projection.Leave-one-out cross validation was used for emotion classification.The results showed that it was hard to discriminate all eight emotions.However, when the emotions were grouped as being (1) anger or peaceful, (2) high arousal or low arousal, and (3) positive valence or negative valence, they could be classified successfully as follows: (1) anger: 100%, peaceful: 98%, (2) high arousal: 80%, low arousal: 88%, (3) positive: 82%, negative: 50%.
Because of the results of the experiment described above, the scope of the driving experiment was limited to recognition of levels of only one emotional state: emotional stress.
At the beginning of the driving experiment, participants drove in and exited a parking garage, and then they drove in a city and on a highway, and returned to the same parking garage at the end.The experiment was performed on three subjects who repeated the experiment multiple times and six subjects who drove only once.Videos of the participants were recorded during the experiments and self-reports were obtained at the end of each session.Task design and questionnaire responses were used to recognize the driver's stress separately.The results obtained from these two methods were as follows: (i) task design analysis could recognize driver stress level as being rest (e.g., resting in the parking garage), city (e.g., driving in Boston streets), or highway (e.g., twolane merge on the highway) with 96% accuracy; (ii) questionnaire analysis could categorize four stress classes as being lowest, low, higher, or highest with 88.6% accuracy.
Finally, video recordings were annotated on a second-bysecond basis by two independent researchers for validation purposes.This annotation was used to find a correlation between stress metric created from the video and variables from the sensors.The results showed that physiological signals closely followed the stress metric provided by the video coders.
The results of these two methods (videos and pattern recognition) coincided in classifying the driver's stress and showed that stress levels could be recognized by measuring physiological signals and analyzing them by pattern recognition algorithms.
Clearly, more research has been performed in this domain, and yet still more remains to be done.We only included the sources that we were aware of, with the hope to assist other researchers on the topic.

Our study to elicit emotions and capture physiological signals data
After reviewing the related literature, we conducted our own experiment to find a mapping between physiological signals and emotions experienced.In our experiment we used movie clips and difficult mathematics questions to elicit targeted emotions-sadness, anger, surprise, fear, frustration, and amusement-and we used BodyMedia SenseWear Armband (BodyMedia Inc., www.bodymedia.com) to measure the physiological signals of our participants: galvanic skin response, heart rate, and temperature.The following subsections discuss the design of this experiment and the results gained after interpreting the collected data.The data we collected in the experiment described below was also used in another study [42]; however in this article we describe a different feature extraction technique which led to different results and implications, as will be discussed later.

Pilot panel study for stimuli selection: choosing movie clips to elicit specific emotions
Before conducting the emotion elicitation experiment, which will be described shortly, we designed a pilot panel study to determine the movie clips that may result in high subject agreement in terms of the elicited emotions (sadness, anger, surprise, fear, and amusement).Gross and Levenson's work [41] guided our panel study and from their study we used the movie scenes that resulted in high subject agreement in terms of eliciting the target emotions.Because some of their movies were not obtainable, and because anger and fear movie scenes evidenced low subject agreement during our study, alternative clips were also investigated.The following sections describe the panel study and results.

Subject sample
The sample included 14 undergraduate and graduate students from the psychology and computer science departments of University of Central Florida.The demographics are shown in Table 2.

Choice of movie clips to elicit emotions
Twenty-one movies were presented to the participants.Seven movies were included in the analysis based on the findings of Gross and Levenson [41] (as summarized in Table 3).The seven movie clips extracted from these seven movies were same as the movie clips of Gross and Levenson's study.Additional 14 movie clips were chosen by the authors, leading to a set of movies that included three movies to elicit sadness (Powder, Bambi, and The Champ), four movies to elicit anger (Eye for an Eye, Schindler's List, American History X, and My Bodyguard), four to elicit surprise (Jurassic Park, The Hitchhiker, Capricorn One, and a homemade clip called Grandma), one to elicit disgust (Fear Factor), five to elicit fear (Jeepers Creepers, Speed, The Shining, Hannibal, and Silence of the Lambs), and four to elicit amusement (Beverly Hillbillies, When Harry Met Sally, Drop Dead Fred, and The Great Dictator).

Procedure
The 14 subjects participated in the study simultaneously.After completing the consent forms, they filled out the questionnaires where they answered the demographic items.Then, the subjects were informed that they would be watching various movie clips geared to elicit emotions and between each clip, they would be prompted to answer questions about the emotions they experienced while watching the scene.They were also asked to respond according to the emotions they experienced and not the emotions experienced by the actors in the movie.A slide show played the various movie scenes and, after each one of the 21 clips, a slide was presented asking the participants to answer the survey items for the prior scene.

Measures
The questionnaire included three demographic questions: age ranges (18-25, 26-35, 36-45, 46-55, or 56+), gender, and ethnicity.For each scene, four questions were asked.The first question asked, "Which emotion did you experience from this If the participant checked "other" they were asked to specify which emotion they experienced (in an open choice format).The second question asked the participants to rate the intensity of the emotion they experienced on a six point scale.The third question asked whether they experienced any other emotion at the same intensity or higher, and if so, to specify what that emotion was.The final question asked whether they had seen the movie before.

Results
The pilot panel study was conducted to find the movie clips that resulted in (a) at least 90% agreement on eliciting the target emotion and (b) at least 3.5 average intensity.Table 4 lists the agreement rates and average intensities for the clips with more than 90% agreement.
There was not a movie with a high level of agreement for anger.Gross and Levenson's [41] clips were most successful at eliciting the emotions in our investigation in terms of high intensity, except for anger.In their study, the movie with the highest agreement rate for anger was My Bodyguard (42%).In our pilot study, however, the agreement rate for My Bodyguard was 29% with a higher agreement rate for frustration (36%), and we therefore chose not to include it in our final movie selection.However, because anger is an emotion of interest in a driving environment which we are particularly interested in studying, we did include the movie with the highest agreement rate for anger, Schindler's List (agreement rate was 36%, average intensity was 5.00).
In addition, for amusement, the movie Drop Dead Fred was chosen over When Harry Met Sally in our final selection due to the embarrassment experienced by some of the subjects when watching the scene from When Harry Met Sally.
The final set of movie scenes chosen for our emotion elicitation study is presented in Table 5.As mentioned in Section 3.2.1, for the movies that were chosen from Gross and Levenson's [41] study, the movie clips extracted from these movies were also the same.

Subject sample
The sample included 29 undergraduate students enrolled in a computer science course.The demographics are shown in Table 6.

Procedure
One to three subjects participated simultaneously in the study during each session.After signing consent forms, they were asked to complete a prestudy questionnaire and the noninvasive BodyMedia SenseWear Armband (shown in Figure 2) was placed on each subject's right arm.As shown in Figure 2, BodyMedia SenseWear Armband is a noninvasive wearable computer that we used to collect the physiological signals from the participants.SenseWear Armband is a versatile and reliable wearable body monitor created by BodyMedia, Inc.It is worn on the upper arm and includes a galvanic skin response sensor, skin temperature sensor, two-axis accelerometer, heat-flux sensor, and a nearbody ambient temperature sensor.The system also includes polar chest strap which works in compliance with the armband for heart rate monitoring.SenseWear Armband is capable of collecting, storing, processing, and presenting physiological signals such as GSR, heart rate, temperature, movement, and heat flow.After collecting signals, the SenseWear Armband is connected to the Innerwear Research Software (developed by BodyMedia, Inc.) either with a dock station or wirelessly to transfer the collected data.The data can either be stored in XML files for further interpretation with pattern recognition algorithms or the software itself can process the data and present it using graphs.
Once the BodyMedia SenseWear Armbands were worn, the subjects were instructed on how to place the chest strap.After the chest straps connected with the armband, the instudy questionnaire were given to the subjects and they were told (1) to find a comfortable sitting position and try not to move around until answering a questionnaire item, (2) that the slide show would instruct them to answer specific items on the questionnaire, (3) not to look ahead at the questions, and (4) that someone would sit behind them at the beginning of the study to time-stamp the armband.
A 45-minute slide show was then started.In order to establish a baseline, the study began with a slide asking the participants to relax, breathe through their nose, and listen to soothing music.Slides of natural scenes were presented, including pictures of the oceans, mountains, trees, sunsets, and butterflies.After these slides, the first movie clip played (sadness).Once the clip was over, the next slide asked the participants to answer the questions relevant to the scene they watched.Starting again with the slide asking the subjects to relax while listening to soothing music, this process continued for the anger, fear, surprise, frustration, and amusement clips.The frustration segment of the slide show asked the participants to answer difficult mathematical problems without using paper and pencil.The movie scenes and frustration exercise lasted from 70 to 231 seconds each.
The in-study questionnaire included three questions for each emotion.The first question asked, "Did you experience SADNESS (or the relevant emotion) during this section of the experiment?," and required a yes or no response.The second question asked the participants to rate the intensity of the emotion they experienced on a six-point scale.The third question asked participants whether they had experienced any other emotion at the same intensity or higher, and if so, to specify what that emotion was.
Finally, the physiological data gathered included heart rate, skin temperature, and GSR.

Subject agreement and average intensities
Table 7 shows subject agreement and average intensities for each movie clip and the mathematical problems.A twosample binomial test of equal proportions was conducted to determine whether the agreement rates for the panel study differed from the results obtained with this sample.Participants in the panel study agreed significantly more to the target emotion for the sadness and fear films.On the other hand, the subjects in this sample agreed more for the anger film.

Normalization and feature extraction
After determining the time slots corresponding to the point in the film where the intended emotion was most likely to be experienced, the procedures described above resulted in the following set of physiological records: 24 records for anger, 23 records for fear, 27 records for sadness, 23 records for amusement, 22 records for frustration, and 21 records for surprise (total of 140 physiological records).The differences among the number of data sets for each emotion class are due to the data loss for the data of some participants during segments of the experiment.In order to calculate how much the physiological responses changed as the participants went from a relaxed state to the state of experiencing a particular emotion, we normalized the data for each emotion.Normalization is also important for minimizing the individual differences among participants in terms of their physiological responses while they experience a specific emotion.
Collected data was normalized by using the average value of corresponding data type collected during the relaxation period for the same participant.For example, we normalized the GSR values as follows: normalized GSR = raw GSR − raw relaxation GSR raw relaxation GSR .(1) Each slot of the array consists of one specific feature of a specific data signal type, belonging to one specific participant while s/he was experiencing one specific emotion.(e.g., a slot contains the mean of normalized skin temperature value of, say, participant number 1 while s/he was experiencing anger, while another slot, for example, contains the variance of normalized GSR value of participant number 5 while s/he was experiencing sadness).
As mentioned, four features were extracted for each data type and then three supervised learning algorithms were implemented that took these 12 features as input and interpreted them.Following subsections describe the algorithms implemented to find a pattern among these features.

k-nearest neighbor algorithm
k-nearest neighbor (KNN) algorithm [43] uses two data sets: (1) the training data set and (2) the test data set.The training data set contains instances of minimum, maximum, mean, and variance of GSR, skin temperature, and heart rate values, and the corresponding emotion class.The test data set is similar to the training data set.
In order to classify an instance of a test data into an emotion, KNN calculates the distance between the test data and each instance of training data set.For example, let an arbitrary instance x be described by the feature vector a 1 (x), a 2 (x), . . ., a n (x) , where a r (x) is the rth feature of instance x.The distance between instances x i and x j is defined as d(x i , x j ), where, The algorithm then finds the k closest training instances to the test instance.The emotion with the highest frequency among k emotions associated with these k training instances is the emotion mapped to the test data.In our study KNN was tested with leave-one-out cross validation.Figure 3 shows the emotion recognition accuracy rates with KNN algorithm for each of the six emotions.KNN could classify sadness with 70.4%, anger with 70.8%, surprise with 73.9%, fear with 80.9%, frustration with 78.3%, and amusement with 69.6% accuracy.

Discriminant function analysis
The second algorithm was developed using discriminant function analysis (DFA) [44], which is a statistical method to classify data signals by using linear discriminant functions.DFA is used to find a set of linear combinations of the variables, whose values are as close as possible within groups and as far as possible between groups.These linear combinations are called discriminant functions.Thus, a discriminant function is a linear combination of the discriminating variables.In our implication of discriminant analysis, the groups are the emotion classes (sadness, anger, surprise, fear, frustration, and amusement) and the discriminant variables are the extracted features of data signals (minimum, maximum, mean, and variance of GSR, skin temperature, and heart rate).
Let x i be the extracted feature of a specific data signal.The functions used to solve the coefficients are in the form of The objective of DFA is to calculate the values of the coefficients u 0 − u 13 in order to obtain the linear combination.
In order to solve for these coefficients, we applied the generalized eigenvalue decomposition to the between-group and within-group covariance matrices.The vectors gained as a result of this decomposition were used to derive coefficients of the discriminant functions.The coefficients of each function were derived in order to get a maximized difference between the outputs of group means and a minimized difference within the group means.
As can be seen in Figure 4, the DFA algorithm's recognition accuracy was 77.8% for sadness, 70.8% for anger, 69.6% for surprise, 80.9% for fear, 72.7% for frustration, and 78.3% for amusement.

Marquardt backpropagation algorithm
The third algorithm used was a derivation of a backpropagation algorithm with Marquardt-Levenberg modification called Marquardt backpropagation (MBP) algorithm [45].In this technique, first the Jacobian matrix, which contains the first derivatives of the network errors with respect to the weights and biases, is computed.Then the gradient vector is computed as a product of the Jacobian matrix (J(x)) and the vector of errors (e(x)), and the Hessian approximation is computed as the product of the Jacobian matrix (J(x)) and the transpose of the Jacobian matrix (J T (x)) [45].
Then the Marquardt-Levenberg modification to the Gauss-Newton method is given by When µ is 0 or is equal to a small value, then this is the Gauss-Newton method that is using the Hessian approximation.When µ is a large value, then this equation is a gradient descent with a small step size 1/µ.The aim is to make the µ converge to 0 as fast as possible, and this is achieved by decreasing µ when there is a decrease in the error function and increasing it when there is no decrease in the error function.
The algorithm converges when gradient value reaches below a previously determined value.
As stated in Section 4.1, a total of 140 usable (i.e., without data loss) physiological records of GSR, temperature, and heart rate values were collected from the participants for six emotional states and 12 features (four for each data signal type) were extracted for each of the physiological record.As a result, a set of 140 data instances to train and test the network was obtained.The neural network was trained with MBP algorithm 140 times.
The recognition accuracy gained with MBP algorithm is shown in Figure 5, which was 88.9% for sadness, 91.7% for anger, 73.9% for surprise, 85.6% for fear, 77.3% for frustration, and finally 87.0% for amusement.
Overall, the DFA algorithm was better than the KNN algorithm for sadness, frustration, and amusement.On the other hand, KNN performed better than DFA for surprise.MBP algorithm performed better than both DFA and KNN for all emotion classes except for surprise and frustration.

Discussion
There are several studies that looked for the relationship between the physiological signals and emotions, as discussed in Section 3.1, and some of the results obtained were very promising.Our research adds to these studies by showing that emotions can be recognized from physiological signals via noninvasive wireless wearable computers, which means that the experiments can be carried out in real environments instead of laboratories.Real-life emotion recognition hence becomes closer to achieve.
Our multimodal experiment results showed that emotions can be distinguished from each other and that they can be categorized by collecting and interpreting physiological signals of the participants.Different physiological signals were important in terms of recognizing different emotions.Our results show a relationship between galvanic skin response and frustration.When a participant was frustrated, her GSR increased.The difference in GSR values of the frustrated participants was higher than the differences in both heart rate and temperature values.Similarly, heart rate was more related to anger and fear.Heart rate value of a feared participant increased, whereas it decreased when the participant was angry.
Overall, three algorithms, KNN, DFA, and MBP, could categorize emotions with 72.3%, 75.0%, and 84.1% accuracy, respectively.In a previous study [42] where we interpreted the same data set without applying feature extraction, the overall recognition accuracy was 71% with KNN, 74% with DFA, and 83% with MBP.The results of our latest study showed that implementing a feature extraction technique slightly improved the performance of all three algorithms.
Recognition accuracy for some emotions was higher with the pattern recognition algorithms than the agreement of the subjects on the same emotions.For example, fear could be recognized with 80.9% accuracy by KNN and DFA and with 85.6% accuracy by MBP, although the subject agreement on fear was 65%.This might be understood from Feldman Barrett et al.'s study [46]: the results of this study indicate that individuals vary in their ability to identify the specific emotions they experience.For example, some individuals are able to indicate whether they are experiencing a negative or a positive emotion, but they cannot identify the specific emotion.

Applications and future work
Our results are promising in terms of creating a multimodal affective user interface that can recognize its user's affective state, adapt to the situation, and interact with her accordingly, within given context and application, as discussed in Section 2.1 and depicted in Figure 1.
We are specifically looking into driving safety where intelligent interfaces can be developed to minimize the negative effects of some emotions and states that have impact on one's driving such as anger, panic, sleepiness, and even road rage [47].For example, when the system recognizes the driver is in a state of frustration, anger, or rage, the system could suggest the driver to change the music to a soothing one [47], or suggest a relaxation technique [48], depending on the driver's preferred style.Similarly, when the system recognizes that the driver is sleepy, it could suggest (maybe even insist) that she/he rolls down the window for awakening fresh air.
Our future work plans include designing and conducting experiments where driving-related emotions and states (frustration/anger, panic/fear, and sleepiness) are elicited from the participating drivers while they are driving in a driving simulator.During the experiment, physiological signals (GSR, temperature, and heart rate) of the participants will be measured with both BodyMedia SenseWear (see Figure 2) and ProComp+ (see Figure 6).At the same time, an ongoing video of each driver will be recorded for annotation and facial expression recognition purposes.These measurements and recordings will be analyzed in order to find unique patterns mapping them to each elicited emotion.
Another application of interest is training/learning where emotions such as frustration and anxiety affect the learning capability of the users [49,50,51].In an electronic learning environment, an intelligent affective interface could adjust the pace of training when it recognizes the frustration or boredom of the student, or it can provide encouragement when it recognizes the anxiety of the student.
One other application is telemedicine where the patients are being remotely monitored at their home by health-care providers [52].For example, when the system accurately recognizes repetitive sadness (possibly indicating the reoccurrence of depression) of telemedicine patients, the interface could forward this affective information to the health-care providers in order for them to be better equipped and ready to respond to the patient.
Those three applications, driver safety, learning, and telemedicine, are the main ones that we are investigating, aiming at enhancing HCI via emotion recognition through multimodal sensing in these contexts.However using the generic overall paradigm of recognizing and responding to emotions in a user-dependent and context-dependent manner discussed in Section 2.2 and shown in Figure 1, we hope that other research efforts might be able to concentrate on different application areas of affective intelligent interfaces.Some of our future work will focus on the difficulty to recognize emotions by interpreting a single (user mode), or modality.We are therefore planning on conducting multimodal studies on facial expression recognition and physiological signal recognition to guide the integration of the two modalities [16,53,54].Other modalities, as shown in Figure 1, could include vocal intonation and natural language processing to obtain increased accuracy.

CONCLUSION
In this paper we documented the newly discovered role of affect in cognition and identified a variety of humancomputer interaction context in which multimodal affective information could prove useful, if not necessary.We also presented an application-independent framework for multimodal affective user interfaces, hoping that it will prove useful for building other research efforts aiming at enhancing human-computer interaction with restoring the role of affect, emotion, and personality in human natural communication.
Our current research focused on creating a multimodal affective user interface that will be used to recognize users' emotions in real-time and respond accordingly, in particular, recognizing emotion through the analysis of physiological signals from the autonomic nervous system (ANS).We presented an extensive survey of the literature in the form of a survey table (ordered chronologically) identifying various emotion-eliciting and signal-analysis methods for various emotions.
In order to continue to contribute to the research effort of finding a mapping between emotions and physiological signals, we conducted an experiment in which we elicited emotions (sadness, anger, fear, surprise, frustration, and amusement) using movie clips and mathematical problems while measuring certain physiological signals documented as associated with emotions (GSR, heart rate, and temperature) of our participants.After extracting minimum, maximum, mean, and variance of the collected data signals, three supervised learning algorithms were implemented to interpret these features.Overall, three algorithms, KNN, DFA, and MBP, could categorize emotions with 72.3%, 75.0%, and 84.1% accuracy, respectively.
Finally, we would like to emphasize that we are well aware that full-blown computer systems with multimodal affective intelligent user interfaces will only be applicable to real use in telemedicine, driving safety, and learning once the research is fully mature and results are completely reliable within restricted domains and appropriate subsets of emotions.

Call for Papers
Perception is a complex process that involves brain activities at different levels.The availability of models for the representation and interpretation of the sensory information opens up new research avenues that cut across neuroscience, imaging, information engineering, and modern robotics.
The goal of the multidisciplinary field of perceptual signal processing is to identify the features of the stimuli that determine their "perception," namely "a single unified awareness derived from sensory processes while a stimulus is present," and to derive associated computational models that can be generalized.
In the case of vision, the stimuli go through a complex analysis chain along the so-called "visual pathway," starting with the encoding by the photoreceptors in the retina (low-level processing) and ending with cognitive mechanisms (high-level processes) that depend on the task being performed.
Accordingly, low-level models are concerned with image "representation" and aim at emulating the way the visual stimulus is encoded by the early stages of the visual system as well as capturing the varying sensitivity to the features of the input stimuli; high-level models are related to image "interpretation" and allow to predict the performance of a human observer in a given predefined task.
A global model, accounting for both such bottom-up and top-down approaches, would enable the automatic interpretation of the visual stimuli based on both their low-level features and their semantic content.
Among the main image processing fields that would take advantage of such models are feature extraction, contentbased image description and retrieval, model-based coding, and the emergent domain of medical image perception.
The goal of this special issue is to provide original contributions in the field of image perception and modeling.
Topics of interest include (but are not limited to): • Perceptually plausible mathematical bases for the representation of visual information (static and dynamic) • Modeling nonlinear processes (masking, facilitation) and their exploitation in the imaging field (compression, enhancement, and restoration) Authors should follow the EURASIP JASP manuscript format described at http://www.hindawi.com/journals/asp/.Prospective authors should submit an electronic copy of their complete manuscripts through the EURASIP JASP manuscript tracking system at http://www.mstracking.com/asp/,according to the following timetable:

Call for Papers
The main focus of this special issue is on the application of digital signal processing techniques for music information retrieval (MIR).MIR is an emerging and exciting area of research that seeks to solve a wide variety of problems dealing with preserving, analyzing, indexing, searching, and accessing large collections of digitized music.There are also strong interests in this field of research from music libraries and the recording industry as they move towards digital music distribution.The demands from the general public for easy access to these music libraries challenge researchers to create tools and algorithms that are robust, small, and fast.
Music is represented in either encoded audio waveforms (CD audio, MP3, etc.) or symbolic forms (musical score, MIDI, etc.).Audio representations, in particular, require robust signal processing techniques for many applications of MIR since meaningful descriptions need to be extracted from audio signals in which sounds from multiple instruments and vocals are often mixed together.Researchers in MIR are therefore developing a wide range of new methods based on statistical pattern recognition, classification, and machine learning techniques such as the Hidden Markov Model (HMM), maximum likelihood estimation, and Bayes estimation as well as digital signal processing techniques such as Fourier and Wavelet transforms, adaptive filtering, and source-filter models.New music interface and query systems leveraging such methods are also important for end users to benefit from MIR research.
Although research contributions on MIR have been published at various conferences in 1990s, the members of the MIR research community meet annually at the International Conference on Music Information Retrieval (ISMIR) since 2000.
Topics of interest include (but are not limited to): • Automatic summarization (succinct representation of music) Sensor networks are commonly comprised of lightweight distributed sensor nodes such as low-cost video cameras.There is inherent redundancy in the number of nodes deployed and corresponding networking topology.Operation of the network requires autonomous peer-based collaboration amongst the nodes and intermediate data-centric processing amongst local sensors.The intermediate processing known as in-network processing is application-specific. Often, the sensors are untethered so that they must communicate wirelessly and be battery-powered.Initial focus was placed on the design of sensor networks in which scalar phenomena such as temperature, pressure, or humidity were measured.
It is envisioned that much societal use of sensor networks will also be based on employing content-rich vision-based sensors.The volume of data collected as well as the sophistication of the necessary in-network stream content processing provide a diverse set of challenges in comparison with generic scalar sensor network research.
Applications that will be facilitated through the development of visual sensor networking technology include automatic tracking, monitoring and signaling of intruders within a physical area, assisted living for the elderly or physically disabled, environmental monitoring, and command and control of unmanned vehicles.
Many current video-based surveillance systems have centralized architectures that collect all visual data at a central location for storage or real-time interpretation by a human operator.The use of distributed processing for automated event detection would significantly alleviate mundane or time-critical activities performed by human operators, and provide better network scalability.Thus, it is expected that video surveillance solutions of the future will successfully utilize visual sensor networking technologies.
Given that the field of visual sensor networking is still in its infancy, it is critical that researchers from the diverse disciplines including signal processing, communications, and electronics address the many challenges of this emerging field.This special issue aims to bring together a diverse set of research results that are essential for the development of robust and practical visual sensor networks.
Topics of interest include (but are not limited to): • Sensor network architectures for high-bandwidth vision applications • Communication networking protocols specific to visual sensor networks • Scalability, reliability, and modeling issues of visual sensor networks • Distributed computer vision and aggregation algorithms for low-power surveillance applications • Fusion of information from visual and other modalities of sensors • Storage and retrieval of sensor information • Security issues for visual sensor networks • Visual sensor network testbed research • Novel applications of visual sensor networks • Design of visual sensors Authors should follow the EURASIP JASP manuscript format described at http://www.hindawi.com/journals/asp/.Prospective authors should submit an electronic copy of their complete manuscripts through the EURASIP JASP manuscript tracking system at http://www.mstracking.com/asp/,according to the following timetable:

Call for Papers
With the general availability of 3D digitizers, scanners, and the technology innovation in 3D graphics and computational equipment, large collections of 3D graphical models can be readily built up for different applications (e.g., in CAD/CAM, games design, computer animations, manufacturing and molecular biology).For such large databases, the method whereby 3D models are sought merits careful consideration.The simple and efficient query-by-content approach has, up to now, been almost universally adopted in the literature.Any such method, however, must first deal with the proper positioning of the 3D models.The two prevalent-in-the-literature methods for the solution to this problem seek either • Pose Normalization: Models are first placed into a canonical coordinate frame (normalizing for translation, scaling, and rotation).Then, the best measure of similarity is found by comparing the extracted feature vectors, or • Descriptor Invariance: Models are described in a transformation invariant manner, so that any transformation of a model will be described in the same way, and the best measure of similarity is obtained at any transformation.
The existing 3D retrieval systems allow the user to perform queries by example.The queried 3D model is then processed, low-level geometrical features are extracted, and similar objects are retrieved from a local database.A shortcoming of the methods that have been proposed so far regarding the 3D object retrieval, is that neither is the semantic information (high-level features) attached to the (low-level) geometric features of the 3D content, nor are the personalization options taken into account, which would significantly improve the retrieved results.Moreover, few systems exist so far to take into account annotation and relevance feedback techniques, which are very popular among the corresponding content-based image retrieval systems (CBIR).Most existing CBIR systems using knowledge either annotate all the objects in the database (full annotation) or annotate a subset of the database manually selected (partial annotation).As the database becomes larger, full annotation is increasingly difficult because of the manual effort needed.Partial annotation is relatively affordable and trims down the heavy manual labor.Once the database is partially annotated, traditional image analysis methods are used to derive semantics of the objects not yet annotated.However, it is not clear "how much" annotation is sufficient for a specific database and what the best subset of objects to annotate is.In other words how the knowledge will be propagated.Such techniques have not been presented so far regarding the 3D case.
Relevance feedback was first proposed as an interactive tool in text-based retrieval.Since then it has been proven to be a powerful tool and has become a major focus of research in the area of content-based search and retrieval.In the traditional computer centric approaches, which have been proposed so far, the "best" representations and weights are fixed and they cannot effectively model high-level concepts and user's perception subjectivity.In order to overcome these limitations of the computer centric approach, techniques based on relevant feedback, in which the human and computer interact to refine high-level queries to representations based on low-level features, should be developed.
The aim of this special issue is to focus on recent developments in this expanding research area.The special issue will focus on novel approaches in 3D object retrieval, transforms and methods for efficient geometric feature extraction, annotation and relevance feedback techniques, knowledge propagation (e.g., using Bayesian networks), and their combinations so as to produce a single, powerful, and dominant solution.
Topics of interest include (but are not limited to):

Christine
Laetitia Lisetti is a Professor at the Institut Eurecom in the Multimedia Communications Department, Sophia-Antipolis, France.Previously, she lived in the United States where she was an Assistant Professor in the Department of Computer Science at the University of Central Florida.From 1996 to 1998, she was a Postdoctoral Fellow at Stanford University in the Department of Psychology and the Department of Computer Science.She received a Ph.D. in computer science in 1995, from Florida International University.She has won multiple awards including a National Institute of Health Individual Research Service Award, the AAAI Nils Nilsson Award for Integrating AI Technologies, and the University of Central Florida COECS Distinguished Research Lecturer Award.Her research involves the use of artificial intelligence techniques in knowledge representation and machine learning to model affective knowledge computationally.She has been granted support from federally funded agencies such as the National Institute of Health, the Office of Naval Research, and US Army STRICOM as well as from industries such as Interval Research Corporation and Intel Corporation.She is a Member of IEEE, ACM, and AAAI, is regularly invited to serve on program committees of international conferences, and has cochaired several international workshops on affective computing.Fatma Nasoz is a Ph.D. candidate in the Computer Science Department of the University of Central Florida, Orlando, since August 2001.She earned her M.S. degree in computer science from the University of Central Florida and her B.S. degree in computer engineering from Bogazici University in Turkey, in 2003 and 2000, respectively.She was awarded the Center for Advanced Transportation System Simulation (CATSS) Scholarship in 2002 to model emotions of drivers for increased safety.Her research area is affective computing and she specifically focuses on creating adaptive intelligent user interfaces with emotion recognition abilities that adapt and respond to the user's current emotional state by also modeling their preferences and personality.Her research involves elicitating emotions in a variety of contexts, using noninvasive wearable computers to collect the participants' physiological signals, mapping these signals to affective states, and building adaptive interfaces to adapt appropriately to the current sensed data and context.She is a Member of the American Association for Artificial Intelligence and of the Association for Computing Machinery, and she has published multiple scientific articles.

Table 1 :
Previous studies on emotion elicitation and recognition.

Table 2 :
Demographics of subject sample aged 18 to 35 in pilot panel study.

Table 4 :
Agreement rates and average intensities for movies to elicit different emotions with more than 90% agreement across subjects.

Table 5 :
Movie scenes selected for the our experiment to elicit five emotions.

Table 6 :
Demographics of subject sample in emotion elicitation study.

Table 7 :
Agreement rates and average intensities for the elicited emotions.
Authors should follow the EURASIP JASP manuscript format described at http://www.hindawi.com/journals/asp/.Prospective authors should submit an electronic copy of their complete manuscripts through the EURASIP JASP manuscript tracking system at http://www.mstracking.com/asp/,according to the following timetable:

•
3D content-based search and retrieval methods (volume/surface-based) • Partial matching of 3D objects • Rotation invariant feature extraction methods for 3D objects • Graph-based and topology-based methods • 3D data and knowledge representation • Semantic and knowledge propagation over heterogeneous metadata types • Annotation and relevance feedback techniques for 3D objectsAuthors should follow the EURASIP JASP manuscript format described at http://www.hindawi.com/journals/asp/.Prospective authors should submit an electronic copy of their complete manuscript through the EURASIP JASP manuscript tracking system at http://www.mstracking.com/asp/,according to the following timetable: National Taiwan University, Taipei 106, Taiwan; ming@csie.ntu.edu.twPetros Daras, Informatics and Telematics Institute, Centre for Research and Technology Hellas, 57001 Thermi, Thessaloniki, Greece; daras@iti.gr