Skip to main content

Über dieses Buch

This volume contains the proceedings of the 1st International Conference on A?ective Computing and Intelligent Interaction (ACII 2005) held in Beijing, China, on 22–24 October 2005. Traditionally, the machine end of human–machine interaction has been very passive, and certainly has had no means of recognizing or expressing a?ective information. But without the ability to process such information, computers cannot be expected to communicate with humans in a natural way. The ability to recognize and express a?ect is one of the most important features of - man beings. We therefore expect that computers will eventually have to have the ability to process a?ect and to interact with human users in ways that are similar to those in which humans interact with each other. A?ective computing and intelligent interaction is a key emerging technology that focuses on m- iad aspects of the recognition, understanding, and expression of a?ective and emotional states by computers. The topic is currently a highly active research area and is receiving increasing attention. This strong interest is driven by a wide spectrum of promising applications such as virtual reality, network games, smart surveillance, perceptual interfaces, etc. A?ective computing and intelligent interaction is a multidisciplinary topic, involving psychology, cognitive science, physiology and computer science. ACII 2005 provided a forum for scientists and engineers to exchange their technical results and experiences in this fast-moving and exciting ?eld. A total of 45 oral papers and 82 poster papers included in this volume were selected from 205 c- tributionssubmittedbyresearchersworldwide.



Affective Face and Gesture Processing

Gesture-Based Affective Computing on Motion Capture Data

This paper presents research using full body skeletal movements captured using video-based sensor technology developed by Vicon Motion Systems, to train a machine to identify different human emotions. The Vicon system uses a series of 6 cameras to capture lightweight markers placed on various points of the body in 3D space, and digitizes movement into x, y, and z displacement data. Gestural data from five subjects was collected depicting four emotions: sadness, joy, anger, and fear. Experimental results with different machine learning techniques show that automatic classification of this data ranges from 84% to 92% depending on how it is calculated. In order to put these automatic classification results into perspective a user study on the human perception of the same data was conducted with average classification accuracy of 93%.

Asha Kapur, Ajay Kapur, Naznin Virji-Babul, George Tzanetakis, Peter F. Driessen

Expression Recognition Using Elastic Graph Matching

In this paper, we proposed a facial expression recognition method based on the elastic graph matching (EGM) approach.The EGM approach is widely considered very effective due to it’s robustness against face position and lighting variations. Among all the feature extraction methods which have been used with the EGM, we choose Gabor wavelet transform according to its good performance. In order to effectively represent the facial expression information, we choose the fiducial points from the local areas where the distortion caused by expression is obvious. The better performance of the proposed method is confirmed by the JAFFE facial expression database, compared to the some previous works. We can achieve the average expression recognition rate as high as 93.4%. Moreover, we can get face recognition result simultaneously in our experiment.

Yujia Cao, Wenming Zheng, Li Zhao, Cairong Zhou

The Bunch-Active Shape Model

Active Shape Model (ASM) is one of the most powerful statistical tools for face image alignment. In this paper, we propose a novel method, called Bunch-Active Shape Model (Bunch-ASM), based on the standard ASM, to automatically locate facial feature points in Face Recognition. In Bunch-ASM, eyes are localized by a face detector and the matching strategy used in Elastic Bunch Graph Matching (EBGM) is introduced. Experimental results prove that the Bunch-ASM performs much better than the standard ASM and the ASM with iris-refinement.

Jingcai Fan, Hongxun Yao, Wen Gao, Yazhou Liu, Xin Liu

Facial Signs of Affect During Tutoring Sessions

An emotionally intelligent tutoring system should be able to taking into account relevant aspects of the mental state of the student when providing feedback. The student’s facial expressions, put in context, could provide cues with respect to this state. We discuss the analysis of the facial expression displayed by students interacting with an Intelligent Tutoring System and our attempts to relate expression, situation and mental state building on Scherer’s component process model of emotion appraisal.

Dirk Heylen, Mattijs Ghijsen, Anton Nijholt, Rieks op den Akker

Towards Unsupervised Detection of Affective Body Posture Nuances

Recently, researchers have been modeling three to nine discrete emotions for creating affective recognition systems. However, in every day life, humans use a rich and powerful language for defining a large variety of affective states. Thus, one of the challenging issues in affective computing is to give computers the ability to recognize a variety of affective states using unsupervised methods. In order to explore this possibility, we describe affective postures representing 4 emotion categories using low level descriptors. We applied multivariate analysis to recognize and categorize these postures into nuances of these categories. The results obtained show that low-level posture features may be used for this purpose, leaving the naming issue to interactive processes.

P. Ravindra De Silva, Andrea Kleinsmith, Nadia Bianchi-Berthouze

Face Alignment Under Various Poses and Expressions

In this paper, we present a face alignment system to deal with various poses and expressions. In addition to global shape model, we use component shape model such as mouth shape model, contour shape model in addition to global shape model to achieve more powerful representation for face components under complex pose and expression variations. Different from 1-D profile texture feature in classical ASM, we use 2-D local texture feature for more accuracy, and in order to achieve high robustness and fast speed it is represented by Haar-wavelet features as in [5]. Extensive experiments are reported to show its effectiveness.

Shengjun Xin, Haizhou Ai

A Voting Method and Its Application in Precise Object Location

It has been demonstrated that combining the decisions of several classifiers can lead to better recognition results. The combination can be implemented using a variety of schemes, among which voting method is the simplest, but it has been found to be just as effective as more complicated strategies in improving the recognition results. In this paper, we propose a voting method for object location, which can be viewed as generalization of majority vote rule. Using this method, we locate eye centers in face region. The experimental results demonstrate that the locating performance is comparable with other newly proposed eye locating methods. The voting method can be considered as a general fusion scheme for precise location of object.

Yong Gao, Xinshan Zhu, Xiangsheng Huang, Yangsheng Wang

Face Tracking Using Mean-Shift Algorithm: A Fuzzy Approach for Boundary Detection

Face and hand tracking are important areas of research, related to adaptive human-computer interfaces, and affective computing. In this article we have introduced two new methods for boundary detection of the human face in video sequences: (1) edge density thresholding, and (2) fuzzy edge density. We have analyzed these algorithms based on two main factors: convergence speed and stability against white noise. The results show that “fuzzy edge density” method has an acceptable convergence speed and significant robustness against noise. Based on the results we believe that this method of boundary detection together with the mean-shift and its variants like cam-shift algorithm, can achieve fast and robust tracking of the face in noisy environment, that makes it a good candidate for use with cheap cameras and real-world applications.

Farhad Dadgostar, Abdolhossein Sarrafzadeh, Scott P. Overmyer

Modelling Nonrigid Object from Video Sequence Under Perspective Projection

The paper is focused on the problem of estimating 3D structure and motion of nonrigid object from a monocular video sequence. Many previous methods on this problem utilize the extension technique of factorization based on rank constraint to the tracking matrix, where the 3D shape of nonrigid object is expressed as weighted combination of a set of shape bases. All these solutions are based on the assumption of affine camera model. This assumption will become invalid and cause large reconstruction errors when the object is close to the camera. The main contribution of this paper is that we extend these methods to the general perspective camera model. The proposed algorithm iteratively updates the shape and motion from weak perspective projection to fully perspective projection by refining the scalars corresponding to the projective depths. Extensive experiments on real sequences validate the effectiveness and improvements of the proposed method.

Guanghui Wang, Yantao Tian, Guoqiang Sun

Sketch Based Facial Expression Recognition Using Graphics Hardware

In this paper, a novel system is proposed to recognize facial expression based on face sketch, which is produced by programmable graphics hardware-GPU(Graphics Processing Unit). Firstly, an expression subspace is set up from a corpus of images consisting of seven basic expressions. Secondly, by applying a GPU based edge detection algorithm, the real-time facial expression sketch extraction is performed. Subsequently, noise elimination is carried out by tone mapping operation on GPU. Then, an ASM instance is trained to track the facial feature points in the sketched face image more efficiently and precisely than that on a grey level image directly. Finally, by the normalized key feature points, Eigen expression vector is deduced to be the input of MSVM(Multi-SVMs) based expression recognition model, which is introduced to perform the expression classification. Test expression images are categorized by MSVM into one of the seven basic expression subspaces. Experiment on a data set containing 500 pictures clearly shows the efficacy of the algorithm.

Jiajun Bu, Mingli Song, Qi Wu, Chun Chen, Cheng Jin

Case-Based Facial Action Units Recognition Using Interactive Genetic Algorithm

This paper proposes a case-based automatic facial AU recognition approach using IGA, which embeds human’s ability to compare into target system. To obtain AU codes of a new facial image, IGA is applied to retrieve a match instance based on users’ evaluation, from the case base. Then the solution suggested by the matching case is used as the AU codes to the new facial image. The effectiveness of our approach is evaluated by 16 standard facial images collected under controlled imaging conditions and 10 un-standard images collected under spontaneous conditions using the Cohn _ Kanade Facial Expression Database as case base. To standard images, a recognition rate of 77.5% is achieved on single AUs, and a similarity rate of 82.8% is obtained on AU combinations. To un-standard images, a recognition rate of 82.8% is achieved on single AUs, and a similarity rate of 93.1% is obtained on AU combinations.

Shangfei Wang, Jia Xue

Facial Expression Recognition Using HLAC Features and WPCA

This paper proposes a new facial expression recognition method which combines Higher Order Local Autocorrelation (HLAC) features with Weighted PCA. HLAC features are computed at each pixel in the human face image. Then these features are integrated with a weight map to obtain a feature vector. We select the weight by combining statistic method with psychology theory. The experiments on the “CMU-PITTSBURGH AU-Coded Face Expression Image Database” show that our Weighted PCA method can improve the recognition rate significantly without increasing the computation, when compared with PCA.

Fang Liu, Zhi-liang Wang, Li Wang, Xiu-yan Meng

Motion Normalization

This paper presents a very simple but efficient algorithm to normalize all motion data in database with same skeleton length. The input motion stream is processed sequentially while the computation for a single frame at each step requires only the results from the previous step over a neighborhood of nearby backward frames. In contrast to previous motion retargeting approaches, we simplify the constraint condition of retargeting problem, which leads to the simpler solutions. Moreover, we improve Shin et al.’s algorithm [10], which is adopted by a widely used Kovar’s footskate cleanup algorithm [6] through adding one case missed by it.

Yan Gao, Lizhuang Ma, Zhihua Chen, Xiaomao Wu

Fusing Face and Body Display for Bi-modal Emotion Recognition: Single Frame Analysis and Multi-frame Post Integration

This paper presents an approach to automatic visual emotion recognition from two modalities: expressive face and body gesture. Face and body movements are captured simultaneously using two separate cameras. For each face and body image sequence single “expressive” frames are selected manually for analysis and recognition of emotions. Firstly, individual classifiers are trained from individual modalities for mono-modal emotion recognition. Secondly, we fuse facial expression and affective body gesture information at the feature and at the decision-level. In the experiments performed, the emotion classification using the two modalities achieved a better recognition accuracy outperforming the classification using the individual facial modality. We further extend the affect analysis into a whole image sequence by a multi-frame post integration approach over the single frame recognition results. In our experiments, the post integration based on the fusion of face and body has shown to be more accurate than the post integration based on the facial modality only.

Hatice Gunes, Massimo Piccardi

A Composite Method to Extract Eye Contour

An eye contour extraction method which combines a simplied version of Active Shape Model(ASM) with a gradient method is proposed. Considering the large amount of calculations required by ASM, it is only used to extract eyelids. As iris is considered to have some more regular shape, the detection of iris is done by the simple but fast gradient method, which is improved by introducing gradient value to the weight matrix. Our detection method has been implemented in the C programming language and experimental results shows good accuracy and efficiency.

Ke Sun, Hong Wang

Simulated Annealing Based Hand Tracking in a Discrete Space

Hand tracking is a challenging problem due to the complexity of searching in a 20+ degrees of freedom (DOF) space for an optimal estimation of hand configuration. This paper represents the feasible hand configurations as a discrete space, which avoids learning to find parameters as general configuration space representations do. Then, we propose an extended simulated annealing method with particle filter to search for optimal hand configuration in this discrete space, in which simplex search running in multi-processor is designed to predict the hand motion instead of initializing the simulated annealing randomly, and particle filter is employed to represent the state of the tracker at each layer for searching in high dimensional configuration space. The experimental results show that the proposed method makes the hand tracking more efficient and robust.

Wei Liang, Yunde Jia, Yang Liu, Cheng Ge

Modulation of Attention by Faces Expressing Emotion: Evidence from Visual Marking

Recent findings demonstrated that negative emotional faces (sad, anger or fear) tend to attract attention more than positive faces do. This study used the paradigm of visual marking to test the perspective that mentioned and explored whether the preview benefit still existed when using schematic faces as materials. The results found that preview benefit was significant in the search of affective materials. In a gap condition, it was faster to search negative faces than to search positive faces. However, this advantage did not appear in half-element condition when negative faces as distractors, which indicated that the view that negative faces capture attention more efficiently is not always like this.

Fang Hao, Hang Zhang, Xiaolan Fu

An Intelligent Algorithm for Enhancing Contrast for Image Based on Discrete Stationary Wavelet Transform and In-complete Beta Transform

Having implemented discrete stationary wavelet transform (DSWT) to an image, combining generalized cross validation (GCV), noise is reduced directly in the high frequency sub-bands which are at the better resolution levels and local contrast is enhanced by combining de-noising method with in-complete Beta transform (IBT) in the high frequency sub-bands which are at the worse resolution levels. In order to enhance the global contrast for the image, the low frequency sub-band image is also enhanced employing IBT and simulated annealing algorithm (SA). IBT is used to obtain non-linear gray transform curve. Transform parameters are determined by SA so as to obtain optimal non-linear gray transform parameters. In order to avoid the expensive time for traditional contrast enhancement algorithms, a new criterion is proposed with gray level histogram. Contrast type for original image is determined employing the new criterion. Gray transform parameters space is given respectively according to different contrast types, which shrinks gray transform parameters space greatly. Finally, the quality of enhanced image is evaluated by a total cost criterion. Experimental results show that the new algorithm can improve greatly the global and local contrast for an image while reducing efficiently gauss white noise (GWN) in the image. The new algorithm is more excellent in performance than histogram equalization (HE), un-sharpened mask algorithm (USM), WYQ algorithm and GWP algorithm.

Changjiang Zhang, Xiaodong Wang, Haoran Zhang

Automatic Facial Expression Recognition Using Linear and Nonlinear Holistic Spatial Analysis

This paper is engaged in the holistic spatial analysis on facial expression images. We present a systematic comparison of machine learning methods applied to the problem of automatic facial expression recognition, including supervised and unsupervised subspace analysis, SVM classifier and their nonlinear versions. Image-based holistic spatial analysis is more adaptive to recognition task in that it automatically learns the inner structure of training samples and extracts the most pertinent features for classification. Nonlinear analysis methods which could extract higher order dependencies among input patterns are supposed to promote the performance of classification. Surprisingly, the linear classifiers outperformed their nonlinear versions in our experiments. We proposed a new feature selection method named the Weighted Saliency Maps(WSM). Compared to other feature selection schemes such as Adaboost and PCA, WSM has the advantage of being simple, fast and flexible.

Rui Ma, Jiaxin Wang

A Novel Regularized Fisher Discriminant Method for Face Recognition Based on Subspace and Rank Lifting Scheme

The null space





) of total scatter matrix



contains no useful information for pattern classification. So, discarding the null space





) results in dimensionality reduction without loss discriminant power. Combining this subspace technique with proposed rank lifting scheme, a new regularized Fisher discriminant (SL-RFD) method is developed to deal with the small sample size (S3) problem in face recognition. Two public available databases, namely FERET and CMU PIE databases, are exploited to evaluate the proposed algorithm. Comparing with existing LDA-based methods in solving the S3 problem, the proposed SL-RFD method gives the best performance.

Wen-Sheng Chen, Pong Chi Yuen, Jian Huang, Jianhuang Lai, Jianliang Tang

Hand Motion Recognition for the Vision-based Taiwanese Sign Language Interpretation

In this paper we present a system to recognize the hand motion of Taiwanese Sign Language (TSL) using the Hidden Markov Models (HMMs) through a vision-based interface. Our hand motion recognition system consists of four phases: construction of color model, hand tracking, trajectory representation, and recognition. Our hand tracking can accurately track the hand positions. Since our system is recognized to hand motions that are variant with rotation, translation, symmetric, and scaling in Cartesian coordinate system, we have chosen invariant features which convert our coordinate system from Cartesian coordinate system to Polar coordinate system. There are nine hand motion patterns defined for TSL. Experimental results show that our proposed method successfully chooses invariant features to recognition with accuracy about 90%.

Chia-Shiuan Cheng, Pi-Fuei Hsieh, Chung-Hsien Wu

Static Gesture Quantization and DCT Based Sign Language Generation

To collect data for sign language recognition is not a trivial task. The lack of training data has become a bottleneck in the research of singer independence and large vocabulary recognition. A novel sign language generation algorithm is introduced in this paper. The difference between signers is analyzed briefly and a criterion is introduced to distinguish the same gesture words of different signers. Basing on that criterion we propose a sign word generation method combining the static gesture quantization and Discrete Cosine Transform (DCT), which can generate the new signers’ sign words according to the existed signers’ sign words. The experimental result shows that not only the data generated are distinct with the training data, they are also demonstrated effective.

Chenxi Zhang, Feng Jiang, Hongxun Yao, Guilin Yao, Wen Gao

A Canonical Face Based Virtual Face Modeling

The research presented here is to create 3D virtual face based on the canonical face model derived from a clustering method on facial feature points. The algorithm efficiently transforms feature points of the canonical face model into those of the new face model for input images without creating new face manually. By comparative experiments, we have shown both facial models generated by manually and automatically. In conclusion, both facial models are quite identical visually whereas efficiency is totally different.

Seongah Chin

Facial Phoneme Extraction for Taiwanese Sign Language Recognition

We have developed a system that recognizes the facial expressions in Taiwanese Sign Language (TSL) using a phoneme-based strategy. A facial expression is decomposed into three facial phonemes of eyebrow, eye, and mouth. A fast method is proposed for locating facial phonemes. The shapes of the phonemes were then matched by the deformable template method, giving feature points representing the corresponding phonemes. The trajectories of the feature points were tracked along the video image sequence and combined to recognize the type of facial expression. The tracking techniques and the feature points have been tailored for facial features in TSL. For example, the template matching methods have been simplified for tracking eyebrows and eyes. The mouth was tracked using the optical flow method, taking lips as homogeneous patches. The experiment has been conducted on 70 image sequences covering seven facial expressions. The average recognition rate is 83.3%.

Shi-Hou Lin, Pi-Fuei Hsieh, Chung-Hsien Wu

What Expression Could Be Found More Quickly? It Depends on Facial Identities

Visual search task was used to explore the role of facial identity in the processing of facial expression. Participants were asked to search for a happy or sad face in a crowd of emotional face pictures. Expression search was more quickly and accurate when all the faces in a display belonged to one identity than two identities. This suggested the interference of identity variance on expression recognition. At the same time the search speed for a certain expression also depended on the number of facial identities. When faces in a display belonged to one identity, a sad face among happy faces could be found more quickly than a happy face among sad faces; otherwise, when faces in a display belonged to two identities, a happy face could be found more quickly than a sad face.

Hang Zhang, Yuming Xuan, Xiaolan Fu

Using an Avatar to Develop a System for the Predication of Human Body Pose from Moments

Tracking people using movie sequences is not straightforward because of the human body’s articulation and the complexity of a person’s movements. In this paper we show how a person’s 3D pose can be reconstructed by using corresponding silhouettes of video sequences from a monocular view. Currently, a virtual avatar is used to train the model for inferring the pose and a different avatar is used to produce novel examples not in the training set in order to evaluate the approach. The approach was subsequently tested using the silhouettes of walking people.

Song Hu, Bernard F. Buxton

Real-Time Facial Expression Recognition System Based on HMM and Feature Point Localization

It is difficult for computer to recognize facial expression in real-time. Until now, no good method is put forward to solve this problem in and abroad. In this paper, we present an effective automated system that we developed to recognize facial gestures in real-time. According to psychologists’ facial expression classification, we define four basic facial expressions and localize key facial feature points exactly then extract facial components’ contours. We analyze and record facial components’ movements using information in sequential frames to recognize facial gestures. Since different facial expressions can have some same movements, it is necessary to use a good facial expression model to describe the relation between expression states and observed states of facial components’ movements for achieving a good recognition results. HMM is such a good method which can meet our requirement. We present a facial expression model based on HMM and get good real-time recognition results.

Yumei Fan, Ning Cheng, Zhiliang Wang, Jiwei Liu, Changsheng Zhu

Discriminative Features Extraction in Minor Component Subspace

In this paper, we propose a new method of extracting the discriminative features for classification from a given training dataset. The proposed method combines the advantages of both the null space method and the maximum margin criterion (MMC) method, whilst overcomes their drawbacks. The better performance of the proposed method is confirmed by face recognition experiments.

Wenming Zheng, Cairong Zou, Li Zhao

Vision-Based Recognition of Hand Shapes in Taiwanese Sign Language

The pixel-based shape representation has been sensitive to rotation. In this paper, we propose a pixel-based descriptor that is invariant with rotation and scale for the hand shape recognition in Taiwanese Sign Language (TSL). Based on the property that a hand shape is characteristic of a unique pointing direction, angle normalization is used to meet the rotation-invariant requirement. With angle normalization, the traces of class covariance matrices have been reduced almost all over the classes of hand shapes, implying a less overlap between classes. It is confirmed by the experiments that show an increase in recognition accuracy.

Jung-Ning Huang, Pi-Fuei Hsieh, Chung-Hsien Wu

An Information Acquiring Channel —— Lip Movement

This paper is to prove that lip-movement is an available channel for information acquiring. The reasoning is given by describing two kinds of valid applications, which are constructed on lip movement information only. One is lip-reading, the other is lip-movement utterance recognition. The accuracy of the former system with speaker-dependent could achieve 68%, and of the latter achieves over 99.5% for test-independent (TI) and nearly 100% for test-dependent (TD) in experiments till now. From this conclusion, it could be easily got that lip-reading channel is an effective one and can be applied independently.

Xiaopeng Hong, Hongxun Yao, Qinghui Liu, Rong Chen

Content-Based Affective Image Classification and Retrieval Using Support Vector Machines

In this paper a new method to classify and retrieve affective images is proposed. First users express the affective semantics of the images with adjective words; process the data got by Semantic Differential method to obtain main factors of affection and establish affective space; extract low-level visual features of image to construct visual feature space; calculate the correlation between affective space and visual feature space with SVMs. The prototype system that embodies trained SVMs has been implemented. The system can classify the images automatically and support the affective image retrieval. The experimental results prove the effectiveness of this method.

Qingfeng Wu, Changle Zhou, Chaonan Wang

A Novel Real Time System for Facial Expression Recognition

In this paper, a fully automatic, real-time system is proposed to recognize seven basic facial expressions (angry, disgust, fear, happiness, neutral, sadness and surprise), which is insensitive to illumination changes. First, face is located and normalized based on an illumination insensitive skin model and face segmentation; then, the basic Local Binary Patterns (LBP) technique, which is invariant to monotonic grey level changes, is used for facial feature extraction; finally, a coarse-to-fine scheme is used for expression classification. Theoretical analysis and experimental results show that the proposed system performs well in variable illumination and some degree of head rotation.

Xiaoyi Feng, Matti Pietikäinen, Abdenour Hadid, Hongmei Xie

Fist Tracking Using Bayesian Network

This paper presents a Bayesian network based multi-cue fusion method for robust and real-time fist tracking. Firstly, a new strategy, which employs the latest work in face recognition, is used to create accurate color model of the fist automatically. Secondly, color cue and motion cue are used to generate the possible position of the fist. Then, the posterior probability of each possible position is evaluated by Bayesian network, which fuses color cue and appearance cue. Finally, the fist position is approximated by the hypothesis that maximizes a posterior. Experimental results show that our algorithm is real-time and robust.

Peng Lu, Yufeng Chen, Mandun Zhang, Yangsheng Wang

Grounding Affective Dimensions into Posture Features

Many areas of today’s society are seeing an increased importance in the creation of systems capable of interacting with users on an affective level through a variety of modalities. Our focus has been on affective posture recognition. However, a deeper understanding of the relationship between emotions in terms of postural expressions is required. The goal of this study was to identify affective dimensions that human observers use when discriminating between postures, and to investigate the possibility of grounding this affective space into a set of posture features. Using multidimensional scaling, arousal, valence, and action tendency were identified as the main factors in the evaluation process. Our results showed that, indeed, low-level posture features could effectively discriminate between the affective dimensions.

Andrea Kleinsmith, P. Ravindra De Silva, Nadia Bianchi-Berthouze

Face and Facial Expression Recognition with an Embedded System for Human-Robot Interaction

In this paper, we present an embedded system in which face recognition and facial expression recognition for Human-Robot Interaction are implemented. To detect face with a fast and reliable way, AdaBoost algorithm is used. Then, Principal Component Analysis is applied for recognizing the face. Gabor wavelets are combined with Enhanced Fisher Model for facial expression recognition. Performance of the facial expression recognition reaches to 93%. The embedded system runs on 150MHz and the processing speed is 0.6 frames / second. Experimental result demonstrates that face detection, recognition and facial expression can be implemented with an embedded system for the Human-Robot Interaction.

Yang-Bok Lee, Seung-Bin Moon, Yong-Guk Kim

Affective Speech Processing

Combining Acoustic Features for Improved Emotion Recognition in Mandarin Speech

Combining different feature streams to obtain a more accurate experimental result is a well-known technique. The basic argument is that if the recognition errors of systems using the individual streams occur at different points, there is at least a chance that a combined system will be able to correct some of these errors by reference to the other streams. In the emotional speech recognition system, there are many ways in which this general principle can be applied. In this paper, we proposed using feature selection and feature combination to improve the speaker-dependent emotion recognition in Mandarin speech. Five basic emotions are investigated including anger, boredom, happiness, neutral and sadness. Combining multiple feature streams is clearly highly beneficial in our system. The best accuracy recognizing five different emotions can be achieved 99.44% by using MFCC, LPCC, RastaPLP, LFPC feature streams and the nearest class mean classifier.

Tsang-Long Pao, Yu-Te Chen, Jun-Heng Yeh, Wen-Yuan Liao

Intonation Modelling and Adaptation for Emotional Prosody Generation

This paper proposes an HMM-based approach to generating emotional intonation patterns. A set of models were built to represent syllable-length intonation units. In a classification framework, the models were able to detect a sequence of intonation units from raw fundamental frequency values. Using the models in a generative framework, we were able to synthesize smooth and natural sounding pitch contours. As a case study for emotional intonation generation, Maximum Likelihood Linear Regression (MLLR) adaptation was used to transform the neutral model parameters with a small amount of happy and sad speech data. Perceptual tests showed that listeners could identify the speech with the sad intonation 80% of the time. On the other hand, listeners formed a bimodal distribution in their ability to detect the system generated happy intontation and on average listeners were able to detect happy intonation only 46% of the time.

Zeynep Inanoglu, Steve Young

Application of Psychological Characteristics to D-Script Model for Emotional Speech Processing

D-scripts model is originally developed for description of affective (emotional) mass media texts and with extension also applies to emotional speech synthesis. In this model we distinguish units for “rational” inference (r-scripts) and units for “emotional” processing of meaning (d-scripts). Basing on a psycholinguistics study we demonstrate relations between classes of emotional utterances in d-script model and psychological characteristics of informants. The study proposes a theoretical framework for an affective agent simulating given psychological characteristics in it’s emotional speech behaviour.

Artemy Kotov

A Hybrid GMM and Codebook Mapping Method for Spectral Conversion

This paper proposes a new mapping method combining GMM and codebook mapping methods to transform spectral envelope for voice conversion system. After analyzing overly smoothing problem of GMM mapping method in detail, we propose to convert the basic spectral envelope by GMM method and convert envelope-subtracted spectral details by GMM and phone-tied codebook mapping method. Objective evaluations based on performance indices show that the performance of proposed mapping method averagely improves 27.2017% than GMM mapping method, and listening tests prove that the proposed method can effectively reduce over smoothing problem of GMM method while it can avoid the discontinuity problem of codebook mapping method.

Yongguo Kang, Zhiwei Shuang, Jianhua Tao, Wei Zhang, Bo Xu

Speech Emotional Recognition Using Global and Time Sequence Structure Features with MMD

In this paper, combined features of global and time-sequence were used as the characteristic parameters for speech emotional recognition. A new method based on formula of MMD (Modified Mahalanobis Distance) was proposed to decrease the estimated errors and simplify the calculation. Four emotions including happiness, anger, surprise and sadness are considered in the paper. 1000 recognizing sentences collected from 10 speakers were used to demonstrate the effectiveness of the new method. The average emotion recognition rate reached at 95%. Comparison with method of MQDF [1] (Modified quadratic discriminant function), Data analysis also displayed that the MMD is better than MQDF.

Li Zhao, Yujia Cao, Zhiping Wang, Cairong Zou

Emotional Metaphors for Emotion Recognition in Chinese Text

The affections of a person can be expressed by non-verbal methods such as facial expressions, gestures, postures and expressions from eyes. While implicit language like emotional metaphor is also an important way to express one’s affections. Different kinds of emotional metaphors in Chinese and their characteristics are proposed, including happiness, sadness, anger, fear and surprise. Experiment result shows the characteristics of Chinese emotional metaphors are reasonable. The importance of emotional metaphors in emotion recognition is also discussed.

Xiaoxi Huang, Yun Yang, Changle Zhou

Voice Conversion Based on Weighted Least Squares Estimation Criterion and Residual Prediction from Pitch Contour

This paper describes an enhanced system for more efficient voice conversion. A weighted LMSE (Least Mean Squared Error) criterion is adopted, instead of conventional LMSE, for the spectral conversion function training. In addition, a short-term pitch contour mapping algorithm together with a new residual codebook formed from pitch contour is presented. Informal listening tests prove that convincing voice conversion is achieved while maintaining high speech quality. Evaluations by objective tests also show that the proposed system reduces speaker individual discrimination compared with the baseline system in LPC based analysis/synthesis framework.

Jian Zhang, Jun Sun, Beiqian Dai

Modifying Spectral Envelope to Synthetically Adjust Voice Quality and Articulation Parameters for Emotional Speech Synthesis

Both of the prosody and spectral features are important for emotional speech synthesis. Besides prosody effects, voice quality and articulation parameters are the factors that should be considered to modify in emotional speech synthetic systems. Generally, rules and filters are designed to process these parameters respectively. This paper proves that by modifying spectral envelope, the voice quality and articulation could be adjusted as a whole. Thus, it will not need to modify each of the parameter separately depending on rules. Accordingly, it will make the synthetic system more flexible by designing an automatic spectral envelope model based on some machine learning methods. The perception test in this paper also shows that when prosody and spectral features are all modified, the best emotional synthetic speech will be obtained.

Yanqiu Shao, Zhuoran Wang, Jiqing Han, Ting Liu

Study on Emotional Speech Features in Korean with Its Application to Voice Conversion

Recent researches in speech synthesis are mainly focused on naturalness, and the emotional speech synthesis becomes one of the highlighted research topics. Although quite a many studies on emotional speech in English or Japanese have been addressed, the studies in Korean can seldom be found. This paper presents an analysis of emotional speech in Korean. Emotional speech features related to human speech prosody, such as F0, the duration, and the amplitude with their variations, are exploited. Their attribution to three different types of typical human speech is tried to be quantified and modeled. By utilizing the analysis results, emotional voice conversion from the neutral speech to the emotional one is also performed and tested.

Sang-Jin Kim, Kwang-Ki Kim, Hyun Bae Han, Minsoo Hahn

Annotation of Emotions and Feelings in Texts

In this paper, a semantic lexicon in the field of feelings and emotions is presented. This lexicon is described with an ontology. Then, we describe a system to annotate emotions in a text and, finally, we show how these annotations allow a textual navigation.

Yvette Yannick Mathieu

IG-Based Feature Extraction and Compensation for Emotion Recognition from Speech

This paper presents an approach to emotion recognition from speech signals. In this approach, the intonation groups (IGs) of the input speech signals are firstly extracted. The speech features in each selected intonation group are then extracted. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to characterize the feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Minimum Classification Error (MCE) algorithm. The IG-based feature vectors compensated by the compensation vectors are used to train the Gaussian Mixture Models (GMMs) for each emotional state. The emotional state with the GMM having the maximal likelihood ratio is determined as the final output. The experimental result shows that IG-based feature extraction and compensation can obtain encouraging performance for emotion recognition.

Ze-Jing Chuang, Chung-Hsien Wu

Toward a Rule-Based Synthesis of Emotional Speech on Linguistic Descriptions of Perception

This paper reports rules for morphing a voice to make it be perceived as containing various primitive features, for example, to make it sound more “bright” or “dark”. In a previous work we proposed a three-layered model, which contains emotional speech, primitive features, and acoustic features, for the perception of emotional speech. By experiments and acoustic analysis, we built the relationships between the three layers and reported that such relationships are significant. Then, a bottom-up method was adopted in order to verify the relationships. That is, we morphed (resynthesized) a speech voice by composing acoustic features in the bottommost layer to produce a voice in which listeners could perceive a single or multiple primitive features, which could be further perceived as different categories of emotion. The intermediate results show that the relationships of the model built in previous work are valid.

Chun-Fang Huang, Masato Akagi

Emotional Speech Synthesis Based on Improved Codebook Mapping Voice Conversion

This paper presents a spectral transformation method for emotional speech synthesis based on voice conversion framework. Three emotions are studied, including anger, happiness and sadness. For the sake of high naturalness, superior speech quality and emotion expressiveness, our original STASC system is modified by introducing a new feature selection strategy and hierarchical codebook mapping procedure. Our result shows that the LSF coefficients at low frequency carry more emotion-relative information, and therefore only these coefficients are converted. Listening tests prove that the proposed method can achieve a satisfactory balance between emotional expression and speech quality of converted speech signals.

Yu-Ping Wang, Zhen-Hua Ling, Ren-Hua Wang

Improving Speaker Recognition by Training on Emotion-Added Models

In speaker recognition applications, the changes of emotional states are main causes of errors. The ongoing work described in this contribution attempts to enhance the performance of automatic speaker recognition (ASR) systems on emotional speech. Two procedures that only need a small quantity of affective training data are applied to ASR task, which is very practical in real-world situations. The method includes classifying the emotional states by acoustical features and generating emotion-added model based on the emotion grouping. Experimental works are performed on Emotional Prosody Speech (EPS) corpus and show significant improvement in EERs and IRs compared with baseline and comparative experiments.

Tian Wu, Yingchun Yang, Zhaohui Wu

An Approach to Affective-Tone Modeling for Mandarin

Mandarin is a typical tone language in which a syllable possesses several tone types. While these tone types have rather clear manifestations in the fundamental frequency contour (



contour) in isolated syllables, they vary considerably in affective speech due to the influences of the speaker’s mood. In the paper the Fujisaki model based on the measured



contour is modified to adapt for affective Mandarin, and a novel approach is proposed to extract the parameters of the model automatically without any manual labels information such as boundary labels, tone types and syllable timing, etc. The preliminary statistic result shows the model is feasible for the affective speech study.

Zhuangluan Su, Zengfu Wang

An Emotion Space Model for Recognition of Emotions in Spoken Chinese

This paper presents a conception of emotion space modeling using psychological research for reference. Based on this conception, this paper studies the distribution of the seven emotions in spoken Chinese, including joy, anger, surprise, fear, disgust, sadness and neutral, in the two dimensional space of valence and arousal, and analyses the relationship between the dimensional ratings and the prosodic characteristics in terms of F0 maximum, minimum, range and mean. The findings show that the conception of emotion modeling is helpful to describe and distinguish emotions.

Xuecheng Jin, Zengfu Wang

Emotion-State Conversion for Speaker Recognition

The performance of speaker recognition system is easily disturbed by the changes of the internal states of human. The ongoing work proposes an approach of speech emotion-state conversion to improve the performance of speaker identification system over various affective speech. The features of neutral speech are modified according to statistical prosodic parameters of emotion utterances. Speaker models are generated based on the converted speech. The experiments conducted on an emotion corpus with 14 emotion states shows promising results with an improved performance by 7.2%.

Dongdong Li, Yingchun Yang, Zhaohi Wu, Tian Wu

Affect Editing in Speech

In this paper we present an affect editor for speech. The affect editor is a tool that encompasses various editing techniques for expressions in speech. It can be used for both natural and synthesized speech. We present a technique that uses a natural expression in one utterance by a particular speaker to other utterances by the same speaker or by other speakers. Natural new expressions are created without affecting the voice quality.

Tal Sobol Shikler, Peter Robinson

Pronunciation Learning and Foreign Accent Reduction by an Audiovisual Feedback System

Global integration and migration force people to learn additional languages. With respect to major languages, the acquisition is already initiated at primary school but according to their missing daily practice, many speakers keep a strong accent for longterm which may cause integration problems in new social or working environments. The possibility of later pronunciation improvements is limited since an experienced teacher and single education are required. Computer-assisted teaching methods have been established during the last decade.

Common methods do either not include a distinct user feedback (vocabulary trainer playing a reference pattern) or widely rely on fully automatic methods (speech recognition regarding the target language) causing evaluation mistakes, in particular, across the border of language groups.

The authors compiled an audiovisual database and set up an automatic system for the accent reduction (called


) by using recordings of 11 native Russian speakers learning German and 10 native German reference speakers. The system feedback is given within a multi modal scenario.

Oliver Jokisch, Uwe Koloska, Diane Hirschfeld, Rüdiger Hoffmann

Prosodic Reading Style Simulation for Text-to-Speech Synthesis

The simulation of different reading styles (mainly by adapting prosodic parameters) can improve the naturalness of synthetic speech and supports a more intelligent human machine interaction. The article exemplarily investigates the reading styles News and Tale. For comparison, all examined texts contained the same genre-neutral paragraphs which have been read without a specific style instruction: Normal but also faster, slower, rather monotone or more emotional which led to corresponding artificial styles.

The measured original intonation and durations style patterns control a diphone synthesizer (mapped contours). Additionally, the patterns are used to train a neural network (NN) model.

Within two separate listening tests, different stimuli presented as original signal/style, respectively, with mapped or NN generated prosodic contours have been evaluated. The results show that both, original utterances and artificial styles are basically perceived in their intended reading styles. Some reciprocal confusions indicate the similarities between different styles like News and Fast, Tale and Slow as well as Tale and Expressive. The confusions are more likely for synthetic speech. To produce e. g. the complex style Tale, different features of the prosodic variations Slow and Expressive are combined. The training method for the synthetic styles requires a further improvement.

Oliver Jokisch, Hans Kruschke, Rüdiger Hoffmann

F0 Contour of Prosodic Word in Happy Speech of Mandarin

This paper focuses on analyzing the F0 contour of happy speech. We designed some declarative sentences and recorded them in happy and neutral expressive states. All of our speakers were asked to express these sentences in the same imaginary scene. It is known that emotion can be expressed through modifying acoustic features of speech in various ways, such as pitch, intensity, voice quality and so on. In this study, we compared the difference of F0 contour between happy and neutral speech through which we found that: (1) F0 contour plays an important role when happiness is expressed. (2) The F0 contour of happy speech displays a kind of declination pattern, but the degree of declination is less than that of neutral speech. (3) Contrasting to neutral speech, the pitch register of happy speech is higher, and the slope of F0 contour of the final syllable of each prosodic word is bigger, especially for the syllable at the end of the sentence.

Haibo Wang, Aijun Li, Qiang Fang

A Novel Source Analysis Method by Matching Spectral Characters of LF Model with STRAIGHT Spectrum

This paper presents a voice source analysis method by studying the spectral characters of LF model and their representation in output speech signal. The estimation of source features is defined as the set of LF parameter whose spectrum has the most similar characters in frequency domain, including glottal formant and spectral tilt, with the corresponding characters held by the STRAIGHT spectrum of speech signal for analysis. Besides, the concept of analyzable frame is introduced to ensure the feasibility and improve the reliability of proposed method. Evaluation with synthetic speech proves this method is able to estimate the LF parameters with satisfactory precision. Furthermore, the experiment with emotional speech shows the effectiveness of proposed method in describing voice quality variety among speech with different emotions.

Zhen-Hua Ling, Yu Hu, Ren-Hua Wang

Features Importance Analysis for Emotional Speech Classification

The paper analyzes the prosody features, which includes the intonation, speaking rate, intensity, based on classified emotional speech. As an important feature of voice quality, voice source are also deduced for analysis. With the analysis results above, the paper creates both a CART model and a weight decay neural network model to find acoustic importance towards the emotional speech classification and to disclose whether there is an underlying consistency between acoustic features and speech emotion. The result shows the proposed method can obtain the importance of each acoustic feature through its weight for emotional speech classification and further improve the emotional speech classification.

Jianhua Tao, Yongguo Kang

Evaluation of Affective Expressivity

Toward Making Humans Empathize with Artificial Agents by Means of Subtle Expressions

Can we assign attitudes to a computer based on its represented subtle expressions, such as beep sounds and simple animations? If so, which kinds of beep sounds or simple animations are perceived as specific attitudes, such as “disagreement”, “hesitation” or “agreement”? To examine this issue, I carried out two experiments to observe and clarify how participants perceive or assign an attitude to a computer according to beep sounds of different durations and F0 contour’s slopes (Experiment 1) or simple animations of different durations and objects’ velocities (Experiment 2). The results of these two experiments revealed that 1) subtle expressions with increasing intonations (Experiment 1) or velocities (Experiment 2) were perceived by participants as “disagreement”, 2) flat intonations and velocities with longer duration were interpreted as “hesitation”, and 3) decreasing intonations and velocities with shorter duration were taken as “agreement.”

Takanori Komatsu

Evaluating Affective Feedback of the 3D Agent Max in a Competitive Cards Game

Within the field of Embodied Conversational Agents (ECAs), the simulation of emotions has been suggested as a means to enhance the believability of ECAs and also to effectively contribute to the goal of more intuitive human–computer interfaces. Although various emotion models have been proposed, results demonstrating the appropriateness of displaying particular emotions within ECA applications are scarce or even inconsistent. Worse, questionnaire methods often seem insufficient to evaluate the impact of emotions expressed by ECAs on users. Therefore we propose to analyze non-conscious physiological feedback (bio-signals) of users within a clearly arranged dynamic interaction scenario where various emotional reactions are likely to be evoked. In addition to its diagnostic purpose, physiological user information is also analyzed online to trigger empathic reactions of the ECA during game play, thus increasing the level of social engagement. To evaluate the appropriateness of different types of affective and empathic feedback, we implemented a cards game called


, where the user plays against an expressive 3D humanoid agent called


, which was designed at the University of Bielefeld [6] and is based on the emotion simulation system of [2]. Work performed at the University of Tokyo and NII provided a real-time system for empathic (agent) feedback that allows one to derive user emotions from skin conductance and electromyography [13]. The findings of our study indicate that within a competitive gaming scenario, the absence of negative agent emotions is conceived as stress-inducing and irritating, and that the integration of empathic feedback supports the acceptance of Max as a co-equal humanoid opponent.

Christian Becker, Helmut Prendinger, Mitsuru Ishizuka, Ipke Wachsmuth

Lexical Resources and Semantic Similarity for Affective Evaluative Expressions Generation

This paper presents resources and functionalities for the selection of affective evaluative terms. An affective hierarchy as an extension of the


lexical database was developed in the first place. The second phase was the development of a semantic similarity function, acquired automatically in an unsupervised way from a large corpus of texts that allows us to put into relation concepts and emotional categories. The integration of the two components is a key element for several applications.

Alessandro Valitutti, Carlo Strapparava, Oliviero Stock

Because Attitudes Are Social Affects, They Can Be False Friends...

The attitudes of the speaker during a verbal interaction are affects built by language and culture. Since they are a sophisticated material for expressing complex affects, using a channel of control that is surely not confused with emotions, they are the larger part of the affects expressed during an interaction, as it could be shown on large databases by Campbell [3]. strong Twelve representative attitudes of Japanese are given to be listened both by Japanese native speaker and French native speaker naive in Japanese. They include “


y”, “


”, “

exclamation of surprise

”, “


”, “


”, “


”, “


”, “


”, and four socially referenced degrees of politeness: “

simple politeness

”, “


”, “


” and “


” (Sadanobu [11]). Two perception experiments using a closed forced choice were carried out, each attitude


introduced by a definition and some examples of real situations. The 15 native Japanese subjects discriminate all the attitudes over chance, with some little confusion inside the politeness class. French subjects do not process the concept of degree of politeness: they do not identify the typical Japanese politeness degrees. The prosody of “


, highest degree of politeness in Japanese, is misunderstood by French on contrary meaning as “impoliteness”, “authority” and “irritation”.

Takaaki Shochi, Véronique Aubergé, Albert Rilliard

Emotion Semantics Image Retrieval: An Brief Overview

Emotion is the most abstract semantic structure of images. This paper overviews recent research on emotion semantics image retrieval. First, the paper introduces the general frame of emotion semantics image retrieval and points out the four main research issues: to exact sensitive features from images, to define users’ emotion information, to build emotion user model and to individualize the user model. Then several algorithms to solve these four issues are analyzed in detail. After that, some future research topics, including construction of an emotion database, evaluation of the user model and computation of the user model, are discussed, and some resolved strategies are presented elementarily.

Shangfei Wang, Xufa Wang

Affective Modeling in Behavioral Simulations: Experience and Implementations

Recent studies have convincingly demonstrated the critical role of affect in human cognitive development and expression, supporting the case for incorporating affective representation into behavioral simulations for artificial intelligence. Music provides a powerful and concise mechanism for evoking and indeed representing emotions, and thus studying the ways in which music represents affect can provide insights into computer representations. That music can be understood as a multidimensional structure leads to the consideration of systemic grammars for this representation. A systemic grammar of emotions is presented which has proven effective as the basis for a concrete – and marketable – implementation of behavioral simulations for virtual characters, by allowing the system to parse interactions between characters into representations of emotional states, and using the attributes of those determined states as determinants of subsequent behavior.

Robert A. Duisberg

An Ontology for Description of Emotional Cues

There is a great variety of theoretical models of emotions and implementation technologies which can be used in the design of affective computers. Consequently, designers and researchers usually made practical choices of models and develop ad-hoc solutions that sometimes lack flexibility. In this paper we introduce a generic approach to modeling emotional cues. The main component of our approach is the ontology of emotional cues. The concepts in the ontology are grouped into three global modules representing three layers of emotions’ detection or production: the emotion module, the emotional cue module, and the media module. The emotion module defines emotions as represented with emotional cues. The emotional cue module describes external emotional representations in terms of media properties. The media module describes basic media properties important for emotional cues. Proposed ontology enables flexible description of emotional cues at different levels of abstraction. This approach could serve as a guide for the flexible design of affective devices independently of the starting model and the final way of implementation.

Zeljko Obrenovic, Nestor Garay, Juan Miguel López, Inmaculada Fajardo, Idoia Cearreta

The Reliability and Validity of the Chinese Version of Abbreviated PAD Emotion Scales

The study aimed at testing the reliability and validity of the Chinese version of Abbreviated PAD Emotion Scales using a Chinese sample. 297 Chinese undergraduate students were tested with the Chinese version of Abbreviated PAD Emotion Scales; 98 of them were retested with the same scales after seven days in order to assess the test-retest reliability; and 102 of them were tested with SCL-90 at the same time which was intended as criteria for validity to assess the criterion validity. The results showed that the Chinese version of Abbreviated PAD Emotion Scales displayed satisfying reliability and validity on P (pleasure-displeasure), only moderate reliability and validity on D (dominance-submissiveness), but quite low reliability and validity on A (arousal-nonarousal).

Xiaoming Li, Haotian Zhou, Shengzun Song, Tian Ran, Xiaolan Fu

Representing Real-Life Emotions in Audiovisual Data with Non Basic Emotional Patterns and Context Features

The modeling of realistic emotional behavior is needed for various applications in multimodal human-machine interaction such as emotion detection in a surveillance system or the design of natural Embodied Conversational Agents. Yet, building such models requires appropriate definition of various levels for representing: the emotional context, the emotion itself and observed multimodal behaviors. This paper presents the multi-level emotion and context coding scheme that has been defined following the annotation of fifty one videos of TV interviews. Results of annotation analysis show the complexity and the richness of the real-life data: around 50% of the clips feature mixed emotions with multi-modal conflictual cues. A typology of mixed emotional patterns is proposed showing that cause-effect conflict and masked acted emotions are perceptually difficult to annotate regarding the valence dimension.

Laurence Devillers, Sarkis Abrilian, Jean-Claude Martin

The Relative Weights of the Different Prosodic Dimensions in Expressive Speech: A Resynthesis Study

The emotional prosody is multi-dimensional. A debated question is to understand if some parameters are more specialized to convey some emotion dimensions. Selected stimuli, expressing anxiety, disappointment, disgust, disquiet, joy, resignation, satisfaction and sadness, were extracted from the acted part of a French corpus supposed to include only variations of direct emotional expressions. These stimuli were used as a basis for the synthesis of artefactual stimuli integrating the emotional contour of each prosodic parameter separately, which were evaluated on a perceptual experiment. Results indicate that (1) no parameter alone is able to carry the whole emotion information, (2) F0 contours (not only the global F0 value) reveal to bring more information on positive expressions, (3) voice quality and duration convey more information on negative expressions, and (4) the intensity contours do not bring any significant information when used alone.

Nicolas Audibert, Véronique Aubergé, Albert Rilliard

Affective Database, Annotation and Tools

An XML-Based Implementation of Multimodal Affective Annotation

In simple cases, affective computing is a computational device recognizing and acting upon the emotions of its user or having (or simulating having) emotions of its own in complex cases. Multimodal technology is currently one of the hottest focuses in affective computing research. However, the lack of a large-scale multimodal database limits the research to some respective and scattered fields, such as affective recognition by video or by audio. This paper describes the development and implementation of an XML-based multimodal affective annotation system which is called MAAS (Multimodal Affective Annotation System). MAAS contains a hierarchical affective annotation model based on the 3-dimensional affect space derived from Mehrabian’s PAD temperament scale. The final annotation file is formed in XML format in order to interchange the resources with other research groups conveniently.

Fan Xia, Hong Wang, Xiaolan Fu, Jiaying Zhao

CHAD: A Chinese Affective Database

Affective database plays an important role in the process of affective computing which has been an attractive field of AI research. Based on analyzing current databases, a Chinese affective database (CHAD) is designed and established for seven emotion states: neutral, happy, sad, fear, angry, surprise and disgust. Instead of choosing the personal suggestion method, audiovisual materials are collected in four ways including three types of laboratory recording and movies. Broadcast programmes are also included as source of vocal corpus. By comparison of the five sources two points are gained. First, although broadcast programmes get the best performance in listening experiment, there are still problems as copyright, lacking visual information and can not represent the characteristics of speech in daily life. Second, laboratory recording using sentences with appropriately emotional content is an outstanding source of materials which has a comparable performance with broadcasts.

Mingyu You, Chun Chen, Jiajun Bu

Annotating Multimodal Behaviors Occurring During Non Basic Emotions

The design of affective interfaces such as credible expressive characters in story-telling applications requires the understanding and the modeling of relations between realistic emotions and behaviors in different modalities such as facial expressions, speech, hand gestures and body movements. Yet, research on emotional multimodal behaviors has focused on individual modalities during acted basic emotions. In this paper we describe the coding scheme that we have designed for annotating multimodal behaviors observed during mixed and non acted emotions. We explain how we used it for the annotation of videos from a corpus of emotionally rich TV interviews. We illustrate how the annotations can be used to compute expressive profiles of videos and relations between non basic emotions and multimodal behaviors.

Jean-Claude Martin, Sarris Abrilian, Laurence Devillers

The Properties of DaFEx, a Database of Kinetic Facial Expressions

In this paper we present an evaluation study for DaFEx (Database of Facial Expressions), a database created with the purpose of providing a benchmark for the evaluation of the facial expressivity of Embodied Conversational Agents (ECAs). DaFEx consists of 1008 short videos containing emotional facial expressions of the 6 Ekman’s emotions plus the neutral expression. The facial expressions were recorded by 8 professional actors (male and female) in two acting conditions (“utterance” and “non utterance”) and at 3 intensity levels (high, medium, low). The properties of DaFEx were studied by having 80 subjects classify the emotion expressed in the videos. We tested the effect of the intensity level, of the articulatory movements due to speech, and of the actors’ and subjects’ gender, on classification accuracy. We also studied the way error distribute across confusion classes. The results are summarized in this work.

Alberto Battocchi, Fabio Pianesi, Dina Goren-Bar

A Multimodal Database as a Background for Emotional Synthesis, Recognition and Training in E-Learning Systems

This paper presents a multimodal database developed within the EU-funded project MYSELF. The project aims at developing an e-learning platform endowed with affective computing capabilities for the training of relational skills through interactive simulations. The database includes data coming from 34 participants and concerning physiological parameters, vocal nonverbal features, facial expression and posture. Ten different emotions were considered (anger, joy, sadness, fear, contempt, shame, guilt, pride, frustration and boredom), ranging from primary to self-conscious emotions of particular relevance in learning process and interpersonal relationships. Preliminary results and analyses are presented, together with directions for future work.

Luigi Anolli, Fabrizia Mantovani, Marcello Mortillaro, Antonietta Vescovo, Alessia Agliati, Linda Confalonieri, Olivia Realdon, Valentino Zurloni, Alessandro Sacchi

Psychology and Cognition of Affect

Construction of Virtual Assistant Based on Basic Emotions Theory

The purpose of this paper is to construct a virtual assistant. Basic emotions theory points out that compound emotion consists of eight prototype basic emotions and “drives” which reflects people’s will. According to this theory, we construct a psychology model. By adjusting parameters in it, we can simulate different human psychologies. Based on this model, combining real-time facial expression and voice recognition and synthesizing technology, we construct a virtual assistant. Proved by experiment, our system obeys human emotion rules.

Zhiliang Wang, Ning Cheng, Yumei Fan, Jiwei Liu, Changsheng Zhu

Generalization of a Vision-Based Computational Model of Mind-Reading

This paper describes a vision-based computational model of mind-reading that infers complex mental states from head and facial expressions in real-time. The generalization ability of the system is evaluated on videos that were posed by lay people in a relatively uncontrolled recording environment for six mental states—agreeing, concentrating, disagreeing, interested, thinking and unsure. The results show that the system’s accuracy is comparable to that of humans on the same corpus.

Rana el Kaliouby, Peter Robinson

Physiological Sensing and Feature Extraction for Emotion Recognition by Exploiting Acupuncture Spots

Previous emotion recognition systems have mainly focused on pattern classification, rather than utilizing sensing technologies or feature extraction methods. This paper introduces a method of physiological sensing and feature extraction for emotion recognition that is based on an oriental medicine approach. The specific points for affective sensing were experimentally determine, in which it was found that skin conductance measurements of the forearm region correlate well with acupuncture spots. Features are then extracted by the same way to interpret pulsation signals in diagnosis. We found that the proposed sensing and feature extraction method benefits the recognition of emotion with a neural network classifier.

Ahyoung Choi, Woontack Woo

Human Machine Interaction: The Special Role for Human Unconscious Emotional Information Processing

The nature of (un)conscious human emotional information processing remains a great mystery. On the one hand, classical models view human conscious emotional information processing as computation among the brain’s neurons but fail to address its enigmatic features. On the other hand, quantum processes (superposition of states, nonlocality, entanglement,) also remain mysterious, yet are being harnessed in revolutionary information technologies like quantum computation, quantum cryptography, and quantum teleportation. In this paper, we would like to discuss several experiments that suggest a special role for unconscious emotional information processing in the human-computer interaction. What are its consequences and could this be the missing link between quantum information theory and conscious human emotional information processing?

Maurits van den Noort, Kenneth Hugdahl, Peggy Bosch

Affective Computing Model Based on Rough Sets

The paper first builds a novel affective model based on rough sets, presents the static description of affective space. Meanwhile, the paper creatively combines rough sets with Markov chain, gives the dynamic forecast of human affective change. In this affective model, some concepts and states are defined such as affective description precision and so on. It is a fundamental work to more research. Simulations are done using Matlab software, and simulation results show that this affective model can well simulate the human emotion. The results of this paper are innovative. It is a new research direction of affective computing that rough sets and affective computing infiltrate each other.

Chen Yong, He Tong

The Research of a Teaching Assistant System Based on Artificial Psychology

A humanistic computer teaching system is presented in this paper. The core of this system is the affective interaction between teacher and student. Based on theories in Psychology and the theory of Artificial Psychology, the emotion-learning model is developed. Four basic emotions and four types of learning psychology state are defined according to the Basic Emotion Theory, and two-dimensional emotion space is designed ground on Dimensional Emotion theory. This system could offer the student’s psychological state and psychological value to the teacher. Finally, this system was realized by using the recognition method that is based on digital image processing technology.

Xiuyan Meng, Zhiliang Wang, Guojiang Wang, Lin Shi, Xiaotian Wu

Emotion Estimation and Reasoning Based on Affective Textual Interaction

This paper presents a novel approach to Emotion Estimation that assesses the affective content from textual messages. Our main goals are to detect emotion from chat or other dialogue messages and to employ animated agents capable of the emotional reasoning based on the textual interaction. In this paper, the emotion estimation module is applied to a chat system, where avatars associated with chat partners act out the assessed emotions of messages through multiple modalities, including synthetic speech and associated affective gestures.

Chunling Ma, Helmut Prendinger, Mitsuru Ishizuka

An Emotion Model of 3D Virtual Characters in Intelligent Virtual Environment

Human emotion is related to stimulus and cognitive appraisal. Emotion is very important to entertainment application of virtual reality. Emotion model of 3D virtual characters is a challenging branch of Intelligent Virtual Environment (IVE). A believable 3D character should be provided with emotion and perception. In general, a virtual character is regarded as an autonomous agent with sense, perception, behavior and action. An emotion model of 3D virtual characters on the basis of psychology theory is presented in this paper. Our work is to construct 3D virtual characters that have internal sensor and perception for external stimulus, and express emotion autonomously in real time. Firstly, architecture of a virtual character is set up by cognitive model; Secondly, emotion class is set up by OCC and Plutchik’s emotion theory; Thirdly, some new concepts about emotion are presented with a general mathematical model which is relation among emotion, stimulus, motivation variable, personality variable. Fourthly, a perception model of 3D characters by Gibson’s theory is introduced. As a result, an emotional animation demo system of 3D virtual character is implemented on PC.

Zhen Liu, Zhi Geng Pan

An Adaptive Personality Model for ECAs

Curtin University’s Talking Heads (TH) combine an MPEG-4 compliant Facial Animation Engine (FAE), a Text To Emotional Speech Synthesiser (TTES), and a multi-modal Dialogue Manager (DM), that accesses a Knowledge Base (KB) and outputs Virtual Human Markup Language (VHML) text which drives the TTES and FAE. A user enters a question and an animated TH responds with a believable and affective voice and actions. However, this response to the user is normally marked up in VHML by the KB developer to produce the required facial gestures and emotional display. A real person does not react by fixed rules but on personality, beliefs, good and bad previous experiences, and training. This paper reviews personality theories and models relevant to THs, and then discusses the research at Curtin over the last five years in implementing and evaluating personality models. Finally the paper proposes an active, adaptive personality model to unify that work.

He Xiao, Donald Reid, Andrew Marriott, E. K. Gulland

The Effect of Mood on Self-paced Study Time

The present study investigated the effect of mood on self-paced study time. Twenty-eight university students voluntarily participated in the experiment. Half of them listened to positive music and the other half listened to negative music for nine minutes. After self-assessment of mood, they made self-paced study for word-pairs. The results showed that negative and positive mood have not significant effect on self-paced study time.

Yong Niu, Xiaolan Fu

The Effect of Embodied Conversational Agents’ Speech Quality on Users’ Attention and Emotion

This study investigates the influence of the speech quality of Embodied Conversational Agents (ECAs) on users’ perception, behavior and emotions. Twenty-four subjects interacted in a Wizard of Oz (WOZ) setup with two ECAs in two scenarios of a virtual theater partner application. In both scenarios, each ECA had three different speech qualities: natural, high-quality synthetic and low-quality synthetic. Eye gaze data show that subjects’ visual attention was not influenced by ECA’s speech quality, but by their look. On the other hand, subjects’ self-report of emotions and verbal descriptions of their perceptions were influenced by ECAs’ speech quality. Finally, Galvanic Skin Response data were neither influenced by ECAs’ look, nor by their speech quality. These results stress the importance of the correct matching of the auditory and visual modalities of ECAs and give methodological insights for the assessment of user’s perception, behavior and emotions when interacting with virtual characters.

Noël Chateau, Valérie Maffiolo, Nathalie Pican, Marc Mersiol

Knowledge Reconfiguration Considering the Distance of Personal Preference

For the purpose of processing data efficiently in the huge data environment, a design of intelligent system based on the function of human brain is necessary. This paper describes how to reconstruct the efficient subject memory considering the personal preference from the objective facts. Conceptual modeling of new knowledge reconfiguration based on the common node connection from a different memory is proposed. The well formed structure of knowledge frame with special synonym list was designed for the efficient knowledge reconfiguration, and using this structure Knowledge retrieval mechanism was made to perform extracting the associated data.We applied this mechanism to the supposed virtual knowledge frame and tested.

JeongYon Shim

Emotional Sequencing and Development in Fairy Tales

Affect is a transient phenomenon, with emotions tending to blend and interact over time [4]. This paper discusses emotional distributions in child-directed texts. It provides statistical evidence for the relevance of emotional sequencing, and evaluates trends of emotional story development, based on annotation statistics on 22 Grimms’ fairy tales which form part of a larger on-going text-annotation project that is also introduced. The study is motivated by the need for exploring features for

text-based emotion prediction

at the sentence-level, for use in expressive text-to-speech synthesis of children’s stories.

Cecilia Ovesdotter Alm, Richard Sproat

Affective Interaction and Systems and Applications

Informal User Interface for Graphical Computing

This paper explores a concept of sketch-based informal user interface for graphic computing, which can be characterized by two properties: stroke-based input and perceptual processing of strokes. A sketch-based graphics input prototype system designed for creative brainstorming in conceptual design is introduced. Two core technologies for implementing such a system, adaptive sketch recognition and dynamic user modeling, are also outlined.

Zhengxing Sun, Jing Liu

Building a Believable Character for Real-Time Virtual Environments

To endow the synthetic characters with autonomous behaviors interests several fields of researchers, especially the experts of computer graphics. In this paper, we propose a believable brain architecture to allow the synthetic character to achieve high level of autonomy. There are three new capabilities. Firstly, this architecture is not a simple stimulation-action model, but adds a layer between stimulation and action called agency that plays a role of extraction and synthesis. Secondly, hierarchy Finite State Machines (FSMs) and fuzzy logic techniques are embedded into the synthetic character’s brain. Finally, the ability to integrate the emotion expression model into the brain architecture allows the synthetic characters to express their sensibilities such as happy, angry, sorry, etc., which can inspire the users’ passion and fuse them into the virtual world.

Zhigeng Pan, Hongwei Yang, Bing Xu, Mingmin Zhang

A Wearable Multi-sensor System for Mobile Acquisition of Emotion-Related Physiological Data

Interest in emotion detection is increasing significantly. For research and development in the field of Affective Computing and emotion-aware interaction techniques, reliable and robust technology is needed for detecting emotional signs in users under everyday conditions. In this paper, a novel wearable system for measuring emotion-related physiological parameters is presented. Currently heart rate, skin conductivity, and skin temperature are taken; further sensors can easily be added. The system is very easy to use, robust, and suitable for mobile and long-time logging of data. It has an open architecture and can easily be integrated into other systems or applications. The system is designed for use in emotion research as well as in everyday affective applications.

Christian Peter, Eric Ebert, Helmut Beikirch

The HandWave Bluetooth Skin Conductance Sensor

HandWave is a small, wireless, networked skin conductance sensor for affective computing applications. It is used to detect information related to emotional, cognitive, and physical arousal of mobile users. Many existing affective computing systems make use of sensors that are inflexible and often physically attached to supporting computers. In contrast, HandWave allows an additional degree of flexibility by providing ad-hoc wireless networking capabilities to a wide variety of Bluetooth devices as well as adaptive biosignal amplification. As a consequence, HandWave is used in a variety of affective computing applications such as games, tutoring systems, experimental data collection, and augmented journaling. This paper describes the novel design attributes of this handheld sensor, its development, and various form factors. Future work includes an extension of this approach to other biometric signals of interest to affective computing researchers.

Marc Strauss, Carson Reynolds, Stephen Hughes, Kyoung Park, Gary McDarby, Rosalind W. Picard

Intelligent Expressions of Emotions

We propose an architecture of an embodied conversational agent that takes into account two aspects of emotions: the emotions triggered by an event (the felt emotions) and the expressed emotions (the displayed ones), which may differ in real life. In this paper, we present a formalization of emotion eliciting-events based on a model of the agent’s mental state composed of beliefs, choices, and uncertainties. This model enables to identify the emotional state of an agent at any time. We also introduce a computational model based on fuzzy logic that computes facial expressions of emotions blending. Finally, examples of facial expressions resulting from the implementation of our model are shown.

Magalie Ochs, Radosław Niewiadomski, Catherine Pelachaud, David Sadek

Environment Expression: Expressing Emotions Through Cameras, Lights and Music

Environment expression is about going beyond the usual Human emotion expression channels in virtual worlds. This work proposes an integrated storytelling model – the

environment expression model

– capable of expressing emotions through three channels: cinematography, illumination and music. Stories are organized into prioritized

points of interest

which can be characters or dialogues. Characters synthesize cognitive emotions based on the OCC emotion theory. Dialogues have collective emotional states which reflect the participant’s emotional state. During storytelling, at each instant, the highest prioritypoint of interest is focused through the expression channels. The cinematography channel and the illumination channel reflect the point of interest’s strongest emotion type and intensity. The music channel reflectsthe valence of the point of interest’s mood. Finally, a study was conducted to evaluate the model. Results confirm the influence of environment expression on emotion perception and reveal moderate success of this work’s approach.

Celso de Melo, Ana Paiva

Dynamic User Modeling in Health Promotion Dialogs

We describe our experience with the design, implementation and revision of a dynamic user model for adapting health promotion dialogs with ECAs to the ‘stage of change’ of the users and to their ‘social’ attitude toward the agent. The user model was built by learning a bayesian network from a corpus of data collected with a Wizard of Oz study. We discuss how uncertainty in the recognition of the user’s mental state may be reduced by integrating a simple linguistic parser with knowledge about the interaction context represented in the model.

Valeria Carofiglio, Fiorella de Rosis, Nicole Novielli

Achieving Empathic Engagement Through Affective Interaction with Synthetic Characters

This paper considers affective interactions to achieve empathic engagement with synthetic characters in virtual learning environments, in order to support and induce the expression of empathy in children. The paper presents FearNot!, a school based virtual learning environment, populated by synthetic characters used for personal, social and health education, specifically bullying issues in schools. An empirical study of 345 children aged 8-11 years who interacted with FearNot! is outlined. The results identify that affective interactions resulting in the expression of empathy were increased when children had high levels of belief and interest in character conversations and if they believed that their interactions had an impact on the characters’ behaviour.

Lynne Hall, Sarah Woods, Ruth Aylett, Lynne Newall, Ana Paiva

Real-Life Emotion Representation and Detection in Call Centers Data

Since the early studies of human behavior, emotions have attracted the interest of researchers in Neuroscience and Psychology. Recently, it has been a growing field of research in computer science. We are exploring how to represent and automatically detect a subject’s emotional state. In contrast with most previous studies conducted on artificial data, this paper addresses some of the challenges faced when studying real-life non-basic emotions. Real-life spoken dialogs from call-center services have revealed the presence of many blended emotions. A soft emotion vector is used to represent emotion mixtures. This representation enables to obtain a much more reliable annotation and to select the part of the corpus without conflictual blended emotions for training models. A correct detection rate of about 80% is obtained between Negative and Neutral emotions and between Fear and Neutral emotions using paralinguistic cues on a corpus of 20 hours of recording.

Laurence Vidrascu, Laurence Devillers

Affective Touch for Robotic Companions

As robotic platforms are designed for human robot interaction applications, a full body sense of touch, or “sensitive skin,” becomes important. The Huggable is a new type of therapeutic robotic companion based upon relational touch interactions. The initial use of neural networks to classify the affective content of touch is described.

Walter Dan Stiehl, Cynthia Breazeal

Dynamic Mapping Method Based Speech Driven Face Animation System

In the paper, we design and develop a speech driven face animation system based on the dynamic mapping method. The face animation is synthesized by the unit concatenating, and synchronous with the real speech. The units are selected according to the cost functions which correspond to voice spectrum distance between training and target units. Visual distance between two adjacent training units is also used to get better mapping results. Finally, the Viterbi method is used to find out the best face animation sequence. The experimental results show that synthesized lip movement has a good and natural quality.

Panrong Yin, Jianhua Tao

Affective Intelligence: A Novel User Interface Paradigm

This paper describes an advanced human-computer interface that combines real-time, reactive and high fidelity virtual humans with artificial vision and communicative intelligence to create a closed-loop interaction model and achieve an affective interface. The system, called the Virtual Human Interface (VHI), utilizes a photo-real facial and body model as a virtual agent to convey information beyond speech and actions. Specifically, the VHI uses a dictionary of nonverbal signals including body language, hand gestures and subtle emotional display to support verbal content in a reactive manner. Furthermore, its built in facial tracking and artificial vision system allows the virtual human to maintain eye contact, follow the motion of the user and even recognizing when somebody joins him or her in front of the terminal and act accordingly. Additional sensors allow the virtual agent to react to touch, voice and other modalities of interaction. The system has been tested in a real-world scenario whereas a virtual child reacted to visitors in an exhibition space.

Barnabas Takacs

Affective Guide with Attitude

The Affective Guide System is a mobile context-aware and spatial-aware system, offering the user with an affective multimodal interaction interface. The system takes advantage of the current mobile and wireless technologies. It includes an ‘affective guide with attitude’ that links its memories and visitor’s interest to the spatial location so that stories are relevant to what can be immediately seen. This paper presents a review of related work, the system in detail, challenges and the future work to be carried out.

Mei Yii Lim, Ruth Aylett, Christian Martyn Jones

Detecting Emotions in Conversations Between Driver and In-Car Information Systems

Speech interaction with in-car controls is becoming more commonplace as the interaction is considered to be less distracting to the driver. Cars of today are equipped with speech recognition system to dial phone numbers and to control the cockpit environment. Furthermore satellite navigation systems provide the driver with verbal directions to their destination. The paper extends the speech interaction between driver and car to consider automatic recognition of the emotional state of the driver and appropriate responses by the car to improve the driver mood. The emotion of the driver has been found to influence driving performance and by actively responding to the emotional of the driver the car could improve their driving.

Christian Martyn Jones, Ing-Marie Jonsson

Human Vibration Environment of Wearable Computer

The applied prospect of the wearable computer is very extensive, in order to put wearable computer into practice, one of key technologies to be solved is to improve antivibration capability. The brand-new human-computer interaction mode that the wearable computer offers, determines that the human body is its working environment. The beginning of our research work is to study the vibration environment in which the wearable computer should be working. The vibration that the wearable computer receives can be divided into two kinds, first, by the vibration of the human body transmission from external vibration, second, by the vibration of the human movement. In this paper, two environment that wearable computer often works in have been studied, and it is considered that the vibration caused by human moving is more intensive than the vibration transmitting from external vibration through human body.

Zhiqi Huang, Dongyi Chen, Shiji Xiahou

Approximation of Class Unions Based on Dominance-Matrix Within Dominance-Based Rough Set Approach

Dominance-based Rough Set Approach (DRSA) is an extension of classical Rough Set Theory (RST). Approximation of class unions is a very important approach in DRSA. Aiming at the disadvantage of the classical method we presented a new methodology for approximation of class unions based on dominance-matrix. It only needs to calculate the dominance-matrix and does not need to consider the preference relations one by one. Thus it greatly simplifies the process. Besides it is intuitive and efficient. The example illustrates its feasibility and efficiency.

Ming Li, Baowei Zhang, Tong Wang, Li Zhao

An Online Multi-stroke Sketch Recognition Method Integrated with Stroke Segmentation

In this paper a novel multi-stroke sketch recognition method is presented. This method integrates the stroke segmentation and sketch recognition into a single approach, in which both stroke segmentation and sketch recognition are uniformed as a problem of “fitting to a template” with a minimal fitting error, and a nesting Dynamic Programming algorithm is designed to accelerate the optimizing approach.

Jianfeng Yin, Zhengxing Sun

A Model That Simulates the Interplay Between Emotion and Internal Need

This paper proposes an emotion model that simulates the interplay between emotion and internal need of human being which influence human intelligence activities. The model includes emotion, internal need and decision modules. The aim of the model is to explore the role of emotion-like processes in intelligent machines. Simulations are conducted to test the performance of the emotion model.

Xiaoxiao Wang, Jiwei Liu, Zhiliang Wang

Affective Dialogue Communication System with Emotional Memories for Humanoid Robots

Memories are vital in human interactions. To interact sociably with a human, a robot should not only recognize and express emotions like a human, but also share emotional experience with humans. We present an affective human-robot communication system for a humanoid robot, AMI, which we designed to enable high-level communication with a human through dialogue. AMI communicates with humans by preserving emotional memories of users and topics, and it naturally engages in dialogue with humans. Humans therefore perceive AMI to be more human-like and friendly. Thus, interaction between AMI and humans is enhanced.

M. S. Ryoo, Yong-ho Seo, Hye-Won Jung, H. S. Yang

Scenario-Based Interactive Intention Understanding in Pen-Based User Interfaces

Interactive intention understanding is important for Pen-based User Interface (PUI). Many works on this topic are reported, and focus on handwriting or sketching recognition algorithms at the lexical layer. But these algorithms cannot totally solve the problem of intention understanding and can not provide the pen-based software with high usability. Hence, a scenario-based interactive intention understanding framework is presented in this paper, and is used to simulate human cognitive mechanisms and cognitive habits. By providing the understanding environment supporting the framework, we can apply the framework to the practical PUI system. The evaluation of the Scientific Training Management System for the Chinese National Diving Team shows that the framework is effective in improving the usability and enhancing the intention understanding capacity of this system.

Xiaochun Wang, Yanyan Qin, Feng Tian, Guozhong Dai

An Augmented Reality-Based Application for Equipment Maintenance

Augmented Reality (AR) is a technology that merges real-world images with computer-generated virtual objects, superimposes the virtual objects upon the real world, and exhibits them in the view of the users. Mechanical maintenance and repair tasks in industrial environments are prospective domains for AR applications. In this paper a prototype of AR system for training and assisting in maintaining equipment (PC) is presented. The key hardware feature of the system is a binocular video see-through Head Mounted Display (HMD), which presents the augmented reality to the users; the tracking system, which gives the position and orientation of equipment using ARToolKit; and a laptop PC for the platform of the whole application. The design and architecture of this system is described. Experimental results are finally presented.

Changzhi Ke, Bo Kang, Dongyi Chen, Xinyu Li

Highly Interactive Interface for Virtual Amusement Land Deriving Active Participation of Users

In this paper an interactive 3D virtual amusement land supporting user’s interactivity with various virtual contents is presented for increasing reality and immersion of a virtual park. Most existing 2D amusement contents provide less interactivity and reality than the proposed one does. This method enables users to navigate virtual space actively and to experience virtual contents on real-time rendering.

Seongah Chin

The Effects of Interactive Video Printer Technology on Encoding Strategies in Young Children’s Episodic Memory

This study was done to explore the effects of video printer technology on young children’s episodic memory. Five-year-old children were found to increase their memory of episodic events if they segmented the event into small units by video printer. But, seven-year-old children could not have any benefits of memory by using it. The discussion of these results will given on the topic of developmental differences of encoding strategies and the effects of video printer on memory.

Misuk Kim

Towards an Expressive Typology in Storytelling: A Perceptive Approach

This paper investigates the perception of expressiveness in storytelling. The aim is to establish a typology by identifying in a first step different perceived expressive forms. Two perceptive tests, a listening and a reading test, have been set up using a free semantic verbalization method. In particular, we focused on the influence of verbal and non verbal information on the perception of expressive types. From a detailed analysis of the listening test answers we distinguished between three situational categories of expressiveness specific to storytelling (emotions, emotional attitudes and means of expression). A comparison of the data collected in both tests has yielded different cues that seem to have been interpreted as expressing surprise, fear and sadness.

Véronique Bralé, Valérie Maffiolo, Ioannis Kanellos, Thierry Moudenc

Affective-Cognitive Learning and Decision Making: A Motivational Reward Framework for Affective Agents

In this paper we present a new computational framework of affective-cognitive learning and decision making for affective agents, inspired by human learning and recent neuroscience and psychology. In the proposed framework ‘internal reward from cognition and emotion’ and ‘external reward from the external world’ serve as motivation in learning and decision making. We construct this model, integrating affect and cognition, with the aim of enabling machines to make smarter and more human-like decisions for better human-machine interactions.

Hyungil Ahn, Rosalind W. Picard

A Flexible Intelligent Knowledge Capsule for Efficient Data Mining/Reactive Level Extraction

The construction of memory and its mechanism is very important to develop the efficient intelligent system. We designed a flexible Intelligent capsule for efficient data mining. The episode memory and its association are specially designed. The proposed knowledge capsule has a hierarchical structure and many functions of selection, learning,storing and data extraction. Based on this knowledge structure, it has a flexible memory and reactive level extraction mechanism. We constructed event oriented virtual memory and tested learning and associative knowledge retrieval function with virtual memory.

JeongYon Shim

On/Off Switching Mechanism in Knowledge Structure of Reticular Activating System

Reticular Activating system which has a form of small neural networks in the brain is closely related system with the automatic nervous system. It takes charge of the function that distinguishes between memorizing one and the others, accepts the only selected information and discards the unnecessary things.In this paper, we propose Reticular Activating system which has Knowledge acquisition, selection, storing, reconfiguration and retrieving part. In this paper, On/Off Switching mechanism for flexible memory is specially designed and tested.

JeongYon Shim

Uncharted Passions: User Displays of Positive Affect with an Adaptive Affective System

Affective technologies have potential to enhance human-computer interaction (HCI). The problem is that much development is technically, rather than user driven, raising many unanswered questions about user preferences and opening new areas for research. People naturally incorporate emotional messages during interpersonal communication with other people, but their use of holistic communication including emotional displays during HCI has not been widely reported. Using Wizard-of-Oz (WOZ) methods, experimental design and methods of sequential analysis from the social sciences, we have recorded, analyzed and compared emotional displays of participants during interaction with an apparently affective system and a standard, non-affective version. During interaction, participants portray extremely varied, sometimes intense, ever-changing displays of emotions and these are rated as significantly more positive in the affective computer condition and as significantly more intense in the told affective condition. We also discuss behavioural responses to the different conditions. These results are relevant to the design of future affective systems.

Lesley Axelrod, Kate Hone

Affective Composites: Autonomy and Proxy in Pedagogical Agent Networks

This paper proposes an alternative paradigm for building affective competencies in embodied conversational agents (ECAs). The key feature of this approach – and the reason for referring to it as an alternative paradigm – entails use of hybrid ECAs that are expressive both autonomously and as pass-through proxies for human communications. The context in which this hybrid ECA paradigm is currently under investigation involves animated pedagogical agents. Other domains for which ECAs are under current and envisioned use, however, such as medical interviews, may also be appropriate for their application. One critical research question involves testing the conjecture that human affect shared through an agent reverberates to or scaffolds the empathic credibility, trustworthiness or effectiveness of the agent when it is functioning autonomously.

Eric R. Hamilton

An Affective User Interface Based on Facial Expression Recognition and Eye-Gaze Tracking

This paper describes a pipeline by which facial expression and eye-gaze of the user are tracked, and then 3D facial animation is synthesized in the remote place based upon timing information of the facial and eye movement information. The system first detects a facial area within the given image and then classifies its facial expression into 7 emotional weightings. Such weighting information, transmitted to the PDA via a mobile network, is used for non-photorealistic facial expression animation. It turns out that facial expression animation using emotional curves is more effective in expressing the timing of an expression comparing to the linear interpolation method. The emotional avatar embedded on a mobile platform has some potential in conveying emotion between peoples via Internet.

Soo-Mi Choi, Yong-Guk Kim

Watch and Feel: An Affective Interface in a Virtual Storytelling Environment

In this paper we describe a study carried out with SenToy: a tangible interface that has the shape of a doll and is used to capture emotions from its user whilst performing specific gestures. SenToy was used with an application named Fearnot!, which is a virtual storytelling environment, where characters act autonomously and in character, so that stories emerge from those autonomous actions. The integration of SenToy in FearNot! was evaluated in two ways: (1) if users were able to manipulate the tangible interface appropriately, even if engaged in storytelling situation and (2) if the emotions expressed by the users with SenToy were the same as the ones reported to have been felt after the session. The results of the study show that Sentoy can be used to understand how the viewers reacted to the stories portrayed, and at the same time that the emotions that were expressed with SenToy (apart from some exceptions, reported in the paper) are indeed similar as the ones reported to have been felt by the users.

Rui Figueiredo, Ana Paiva

Affective Music Processing: Challenges

Our states of mind keep on changing all the time. How do we determine a mapping between our states of mind and music stimulus? In this report, we discuss, from a bird’s eye view, the meanings and effects of music which are impressed upon us. We review related literature in musical meanings and computational model, and finally we discuss the impression while listening to the fourth movement of the Pastoral. We point out challenges and suggest plausible computational approches to the affective music computing.

Somnuk Phon-Amnuaisuk

A User-Centered Approach to Affective Interaction

We have built eMoto, a mobile service for sending and receiving affective messages, with the explicit aim of addressing the inner experience of emotions. eMoto is a designed artifact that carries emotional experiences only achieved through interaction. Following on the theories of embodiment, we argue emotional experiences can not be design in only design for. eMoto is the result of a user-centered design approach, realized through a set of initial brainstorming methods, a


, a


of body language and a

two-tiered evaluation method

. eMoto is not a system that could have been designed from theory only, but require an iterative engagement with end-users, however, in combination with theoretical work. More specifically, we will show how we have managed to design an


and open system that allows for users’

emotional engagement


Petra Sundström, Anna Ståhl, Kristina Höök

Designing and Redesigning an Affective Interface for an Adaptive Museum Guide

The ideal museum guide should observe the user affective reactions to the presentations and adapt its behavior. In this paper we describe the user-centred design of an adaptive multimedia mobile guide with an affective interface. The novel approach has required a series of redesign cycles. We comment in particular on the last experiments we did with the prototype, users’ observations during interviews and more objective considerations based on logs. We show how the last design is better understood by the user, though there is still room for improvements.

Dina Goren-Bar, Ilenia Graziola, Cesare Rocchi, Fabio Pianesi, Oliviero Stock, Massimo Zancanaro

Intelligent Interaction for Linguistic Learning

The choice of a friendly interface for learning a foreign language is the fundamental theme of the research illustrated in this paper. Several are the means used in a dialogue among human beings and they essentially require the knowledge of the linguistic terminology, the context where the dialogue is inserted and the final goal. The use of technological tools which emulate the human beings’ manners can be useful in the communication processes where the interaction with situations of the every-day life is required. HyperEnglish, the multimedia courseware we are developing, is a prototype for the experimentation and evaluation of the reached hypotheses, where the fundamental goal is the building of an intelligent environment to learn the English language simplifying the communication and allowing to live emotions of the real life.

Vito Leonardo Plantamura, Enrica Gentile, Anna Angelini

A Three-Layered Architecture for Socially Intelligent Agents: Modeling the Multilevel Process of Emotions

In this article, we propose the design of a three-layered agent architecture inspired from the Multilevel Process Theory of Emotion (Leventhal and Scherer, 1987). Our project aims at modeling emotions on an autonomous embodied robotic agent, expanding upon our previous work (Lisetti, et al., 2004). Our agent is designed to socially interact with humans, navigating in an office suite environment, and engaging people in social interactions. We describe: (1) the psychological theory of emotion which inspired our design, (2) our proposed agent architecture, (3) the needed hardware additions that we implemented on a robot, (3) the robot’s multi-modal interface designed especially to engage humans in natural (and hopefully pleasant) social interactions.

Christine L. Lisetti, Andreas Marpaung

Multi-stream Confidence Analysis for Audio-Visual Affect Recognition

Changes in a speaker’s emotion are a fundamental component in human communication. Some emotions motivate human actions while others add deeper meaning and richness to human interactions. In this paper, we explore the development of a computing algorithm that uses audio and visual sensors to recognize a speaker’s affective state. Within the framework of Multi-stream Hidden Markov Model (MHMM), we analyze audio and visual observations to detect 11 cognitive/emotive states. We investigate the use of individual modality confidence measures as a means of estimating weights when combining likelihoods in the audio-visual decision fusion. Person-independent experimental results from 20 subjects in 660 sequences suggest that the use of stream exponents estimated on training data results in classification accuracy improvement of audio-visual affect recognition


Zhihong Zeng, Jilin Tu, Ming Liu, Thomas S. Huang

Investigation of Emotive Expressions of Spoken Sentences

When we meet a emotion keyword in a sentence that expresses a kind of emotion, or a word that does not directly express emotion but carry an attitude clue, it could be the case that the speaker is just stating a truth without any affection; it could be an expression of attitudes or emotive states of the agent but using different ways, or it could be other cases. In this consideration, it seems doubtable to determine the exact communicative emotion function of a sentence just based on the “keywords”. This paper endeavors to investigate the collective influence of some factors to the communicative emotion function of a sentence. These factors include emotion keywords, and sentence features such as the mood, negation, etc. we believe that the results will be useful for emotion detection or generation of short sentences.

Wenjie Cao, Chengqing Zong, Bo Xu

Affective Computing: A Review

Affective computing is currently one of the most active research topics, furthermore, having increasingly intensive attention. This strong interest is driven by a wide spectrum of promising applications in many areas such as virtual reality, smart surveillance, perceptual interface, etc. Affective computing concerns multidisciplinary knowledge background such as psychology, cognitive, physiology and computer sciences. The paper is emphasized on the several issues involved implicitly in the whole interactive feedback loop. Various methods for each issue are discussed in order to examine the state of the art. Finally, some research challenges and future directions are also discussed.

Jianhua Tao, Tieniu Tan

Personalized Facial Animation Based on 3D Model Fitting from Two Orthogonal Face Images

In this paper, a personalized MPEG-4 compliant facial animation system in embedded platform is presented. We report a semi-automatic and rapid approach for personalized modeling from two orthogonal face images. The approach is very easy and efficient. With multi-dimension texture mapping, the personalized face model offers a much more lifelike behavior. The system can be used in game, interactive services etc.

Yonglin Li, Jianhua Tao


Weitere Informationen

Premium Partner