Predicting user mental states in spoken dialogue systems

EURASIP Journal on Advances in Signal Processing

Table 1 Features employed for emotion detection from the acoustic signal

Groups	Features	Physiological changes related to emotion
Pitch	Minimum value, maximum value, mean, median, standard deviation, value in the first voiced segment, value in the last voiced segment, correlation coefficient, slope, and error of the linear regression	Tension of the vocal folds and the sub glottal air pressure
First two formant frequencies and their bandwidths	Minimum value, maximum value, range, mean, median, standard deviation and value in the first and last voiced segments	Vocal tract resonances
Energy	Minimum value, maximum value, mean, median, standard deviation, value in the first voiced segment, value in the last voiced segment, correlation, slope, and error of the energy linear regression	Vocal effort, arousal of emotions
Rhythm	Speech rate, duration of voiced segments, duration of unvoiced segments, duration of longest voiced segment and number of unvoiced segments	Duration and stress conditions
References	Hansen [59], Ververidis and Kotropoulos [60], Morrison et al. [61] and Batliner et al. [62]