nach oben

2011 | Buch

Kapitel lesen Erstes Kapitel lesen

Affective Computing and Intelligent Interaction

Fourth International Conference, ACII 2011, Memphis, TN, USA, October 9–12, 2011, Proceedings, Part II

herausgegeben von: Sidney D’Mello, Arthur Graesser, Björn Schuller, Jean-Claude Martin

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The two-volume set LNCS 6974 and LNCS 6975 constitutes the refereed proceedings of the Fourth International Conference on Affective Computing and Intelligent Interaction, ACII 2011, held in Memphis,TN, USA, in October 2011. The 135 papers in this two volume set presented together with 3 invited talks were carefully reviewed and selected from 196 submissions. The papers are organized in topical sections on recognition and synthesis of human affect, affect-sensitive applications, methodological issues in affective computing, affective and social robotics, affective and behavioral interfaces, relevant insights from psychology, affective databases, Evaluation and annotation tools.

Inhaltsverzeichnis

Frontmatter

Poster Papers

Emotion Twenty Questions: Toward a Crowd-Sourced Theory of Emotions

This paper introduces a method for developing a socially-constructed theory of emotions that aims to reflect the aggregated judgments of ordinary people about emotion terms.

Emotion Twenty Questions

(EMO20Q) is a dialog-based game that is similar to the familiar Twenty Questions game except that the object of guessing is the name for an emotion, rather than an arbitrary object. The game is implemented as a dyadic computer chat application using the Extensible Messaging and Presence Protocol (XMPP). We describe the idea of a theory that is socially-constructed by design, or

crowd-sourced

, as opposed to the

de facto

social construction of theories by the scientific community. This paper argues that such a subtle change in paradigm is useful when studying natural usage of emotion words, which can mean different things to different people but still contain a shared, socially-defined meaning that can be arrived at through conversational dialogs. The game of EMO20Q provides a framework for demonstrating this shared meaning and, moreover, provides a standardized way for collecting the judgments of ordinary people. The paper offers preliminary results of EMO20Q pilot experiments, showing that such a game is feasible and that it generates a range of questions that can be used to describe emotions.

Abe Kazemzadeh, Sungbok Lee, Panayiotis G. Georgiou, Shrikanth S. Narayanan

A Pattern-Based Model for Generating Text to Express Emotion

In this paper we introduce a novel pattern-based model for generating emotion sentences. Our model starts with initial patterns, then constructs extended patterns. From the extended patterns, we chose good patterns that are suitable for generating emotion sentences. We also introduce a sentence planning module, which provides rules and constraints for our model. We present some examples and results for our model. We show that the model can generate various types of emotion sentences, either from semantic representation of input, or by choosing the pattern and the desired emotion class.

Fazel Keshtkar, Diana Inkpen

Interpretations of Artificial Subtle Expressions (ASEs) in Terms of Different Types of Artifact: A Comparison of an on-screen Artifact with A Robot

We have already confirmed that the artificial subtle expressions (ASEs) from a robot can accurately and intuitively convey its internal states to participants [10]. In this paper, we then experimentally investigated whether the ASEs from an on-screen artifact could also convey the artifact’s internal states to participants in order to confirm whether the ASEs can be consistently interpreted regardless of the types of artifacts. The results clearly showed that the ASEs expressed from an on-screen artifact succeeded in accurately and intuitively conveying the artifact’s internal states to the participants. Therefore, we confirmed that the ASEs’ interpretations were consistent regardless of the types of artifacts.

Takanori Komatsu, Seiji Yamada, Kazuki Kobayashi, Kotaro Funakoshi, Mikio Nakano

Affective State Recognition in Married Couples’ Interactions Using PCA-Based Vocal Entrainment Measures with Multiple Instance Learning

Recently there has been an increase in efforts in Behavioral Signal Processing (BSP), that aims to bring quantitative analysis using signal processing techniques in the domain of observational coding. Currently observational coding in fields such as psychology is based on subjective expert coding of abstract human interaction dynamics. In this work, we use a Multiple Instance Learning (MIL) framework, a saliency-based prediction model, with a signal-driven vocal entrainment measure as the feature to predict the affective state of a spouse in problem solving interactions. We generate 18 MIL classifiers to capture the variable-length

saliency

of vocal entrainment, and a cross-validation scheme with maximum accuracy and mutual information as the metric to select the

best

performing classifier for each testing couple. This method obtains a recognition accuracy of 53.93%, a 2.14% (4.13% relative) improvement over baseline model using Support Vector Machine. Furthermore, this MIL-based framework has potential for identifying meaningful regions of interest for further detailed analysis of married couples interactions.

Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth S. Narayanan

A Comparison of Unsupervised Methods to Associate Colors with Words

Colors have a very important role on our perception of the world. We often associate colors with various concepts at different levels of consciousnes and these associations can be relevant to many fields such as education and advertisement. However, to the best of our knowledge, there are no systematic approaches to aid the automatic development of resources encoding this kind of knowledge. In this paper, we propose three computational methods based on image analysis, language models, and latent semantic analysis to automatically associate colors to words. We compare these methods against a gold standard obtained via crowd-sourcing. The results show that each method is effective in capturing different aspects of word-color associations.

Gözde Özbal, Carlo Strapparava, Rada Mihalcea, Daniele Pighin

Computer Based Video and Virtual Environments in the Study of the Role of Emotions in Moral Behavior

The role of emotions in moral issues is an important topic in philosophy and psychology. Recently, some psychologists have approached this issue by conducting online questionnaire-based studies. In this paper, we discuss the utility and plausibility of using computer based video and virtual environments to assist the study of moral judgments and behavior. In particular, we describe two studies: the first one demonstrates the use of computer generated visual effects. This was for the design and implementation of an experimental study aiming at observing participants’ moral judgment towards an actor’s confession of a behavior with doubtful morality, during which the actor either blushed or not. In the second study, we examine people’s responses when confronted with a moral dilemma in a Virtual Environment.

Xueni Pan, Domna Banakou, Mel Slater

EmoWisconsin: An Emotional Children Speech Database in Mexican Spanish

The acquisition of naturalistic speech data and the richness of its annotation are very important to face the challenges of automatic emotion recognition from speech. This paper describes the creation of a database of emotional speech in the Spanish spoken in Mexico. It was recorded from children between 7 and 13 years old while playing a sorting card game with an adult examiner. The game is based on a neuropsychological test, modified to encourage dialogue and induce emotions in the player. The audio was segmented at speaker turn level and annotated with six emotional categories and three continuous emotion primitives by 11 human evaluators. Inter-evaluator agreement is presented for categorical and continuous annotation. Initial classification and regression experiments were performed using a set of 6,552 acoustic features.

Humberto Pérez-Espinosa, Carlos Aleberto Reyes-García, Luis Villaseñor-Pineda

“Should I Teach or Should I Learn?” - Group Learning Based on the Influence of Mood

One’s mood influences one’s inclination to either rely on one’s current beliefs or search for different ones. Since mood may reflect one’s failures and achievements from interacting with the environment, perhaps this influence is working to our advantage.

We propose a simple agent architecture, where the behaviors of learning from or teaching other agents are dependent on the agent’s current mood. Using a particular multi-agent scenario, we demonstrate how this approach can lead an entire group of agents to learn a structured concept that was unknown to any of the agents.

César F. Pimentel

How Low Level Observations Can Help to Reveal the User’s State in HCI

For next generation human computer interaction (HCI), it is crucial to assess the affective state of a user. However, this respective user state is – even for human annotators – only indirectly inferable using background information and the observation of the interaction’s progression as well as the social signals produced by the interlocutors. In this paper, coincidences of directly observable patterns and different user states are examined in order to relate the former to the latter. This evaluation motivates a hierarchical label system, where labels of latent user states are supported by low level observations. The dynamic patterns of occurrences of various social signals may in an integration step infer the latent user’s state. Thus, we expect to advance the understanding of the recognition of affective user states as compositions of lower level observations for automatic classifiers in HCI.

Stefan Scherer, Martin Schels, Günther Palm

Investigating Acoustic Cues in Automatic Detection of Learners’ Emotion from Auto Tutor

This study investigates the emotion-discriminant ability of acoustic cues from speech collected in the automatic computer tutoring system named as Auto Tutor. The purpose of this study is to examine the acoustic cues for emotion detection of the speech channel from the learning system, and to compare the emotion-discriminant performance of acoustic cues (in this study) with the conversational cues (available in previous work). Comparison between the classification performance obtained using acoustic cues and conversational cues shows that the emotions: flow and boredom are better captured in acoustics than conversational cues while conversational cues play a more important role in multiple-emotion classification.

Rui Sun, Elliot Moore II

The Affective Triad: Stimuli, Questionnaires, and Measurements

Affective Computing has always aimed to answer the question: which measurement is most suitable to predict the subject’s affective state? Many experiments have been devised to evaluate the relationships among three types of variables (

the affective triad

): stimuli, self-reports, and measurements. Being the real affective state hidden, researchers have faced this question by looking for the measure most related either to the stimulus, or to self-reports. The first approach assumes that people receiving the same stimulus are feeling the same emotion; a condition difficult to match in practice. The second approach assumes that emotion is what people are saying to feel, and seems more likely.

We propose a novel method, which extends the mentioned ones by looking for the physiological measurement mostly correlated to the self-report due to emotion, not the stimulus. This guarantees to find a measure best related to subject’s affective state.

Simone Tognetti, Maurizio Garbarino, Matteo Matteucci, Andrea Bonarini

Relevance Vector Machine Based Speech Emotion Recognition

This work aims at investigating the use of relevance vector machine (RVM) for speech emotion recognition. The RVM technique is a Bayesian extension of the support vector machine (SVM) that is based on a Bayesian formulation of a linear model with an appropriate prior for each weight. Together with the introduction of RVM, aspects related to the use of SVM are also presented. From the comparison between the two classifiers, we find that RVM achieves comparable results to SVM, while using a sparser representation, such that it can be advantageously used for speech emotion recognition.

Fengna Wang, Werner Verhelst, Hichem Sahli

A Regression Approach to Affective Rating of Chinese Words from ANEW

Affective norms for the words is an important issue in textual emotion recognition application. One problem with existing research is that several studies were rated with a large number of participants, making it difficult to apply to different languages. Moreover, difference in culture across different ethnic groups makes the language/culture-specific affective norms not directly translatable to the applications using different languages. To overcome these problems, in this paper, a new approach to semi-automatic labeling of Chinese affective norms for the 1,034 words included in the affective norms for English words (ANEW) is proposed which use a rating of small number of Chinese words from ontology concept clusters with a regression-based approach for transforming the 1,034 English words’ ratings to the corresponding Chinese words’ ratings. The experimental result demonstrated that the proposed approach can be practically implemented and provide adequate results.

Wen-Li Wei, Chung-Hsien Wu, Jen-Chun Lin

Active Class Selection for Arousal Classification

Active class selection (ACS) studies how to optimally select the classes to obtain training examples so that a good classifier can be constructed from a small number of training examples. It is very useful in situations where the class labels need to be determined before the training examples and features can be obtained. For example, in many emotion classification problems, the emotion (class label) needs to be specified before the corresponding responses can be generated and recorded. However, there has been very limited research on ACS, and to the best knowledge of the authors, ACS has not been introduced to the affective computing community. In this paper, we compare two ACS approaches in an arousal classification application. Experimental results using a kNN classifier show that one of them almost always results in higher classification accuracy than a uniform sampling approach. We expect that ACS, together with transfer learning, will greatly reduce the data acquisition effort to customize an affective computing system.

Dongrui Wu, Thomas D. Parsons

Inductive Transfer Learning for Handling Individual Differences in Affective Computing

Although psychophysiological and affective computing approaches may increase facility for development of the next generation of human-computer systems, the data resulting from research studies in affective computing include large individual differences. As a result, it is important that the data gleaned from an affective computing system be tailored for each individual user by re-tuning it using user-specific training examples. Given the often time-consuming and/or expensive nature of efforts to obtain such training examples, there is a need to either 1) minimize the number of user-specific training examples required; or 2) to maximize the learning performance through the incorporation of auxiliary training examples from other subjects. In [11] we have demonstrated an active class selection approach for the first purpose. Herein we use transfer learning to improve the learning performance by combining user-specific training examples with auxiliary training examples from other subjects, which are similar but not exactly the same as the user-specific training examples. We report results from an arousal classification application to demonstrate the effectiveness of transfer learning in a Virtual Reality Stroop Task designed to elicit varying levels of arousal.

Dongrui Wu, Thomas D. Parsons

The Machine Knows What You Are Hiding: An Automatic Micro-expression Recognition System

Micro-expressions are one of the most important behavioral clues for lie and dangerous demeanor detections. However, it is difficult for humans to detect micro-expressions. In this paper, a new approach for automatic micro-expression recognition is presented. The system is fully automatic and operates in frame by frame manner. It automatically locates the face and extracts the features by using Gabor filters. GentleSVM is then employed to identify micro-expressions. As for spotting, the system obtained 95.83% accuracy. As for recognition, the system showed 85.42% accuracy which was higher than the performance of trained human subjects. To further improve the performance, a more representative training set, a more sophisticated testing bed, and an accurate image alignment method should be focused in future research.

Qi Wu, Xunbing Shen, Xiaolan Fu

EMOGIB: Emotional Gibberish Speech Database for Affective Human-Robot Interaction

Gibberish speech consists of vocalizations of meaningless strings of speech sounds. It is sometimes used by performing artists or by cartoon animations (e.g.: Teletubbies) to express intended emotions, without pronouncing any actually understandable word. The facts that no understandable text has to be pronounced and that only affect is conveyed create the advantage of gibberish in affective computing. In our study, we intend to experiment the communication between a robot and hospitalized children using affective gibberish. In this study, a new emotional database consisting of 4 distinct corpuses has been recorded for the purpose of affective child-robot interaction. The database comprises speech recordings of one actress simulating a neutral state and the big six emotions: anger, disgust, fear, happiness, sadness and surprise. The database has been evaluated through a perceptual test for all subsets of the database by adults and one subset of the database with children, achieving recognition scores up to 81%.

Selma Yilmazyildiz, David Henderickx, Bram Vanderborght, Werner Verhelst, Eric Soetens, Dirk Lefeber

Context-Sensitive Affect Sensing and Metaphor Identification in Virtual Drama

Affect interpretation from story/dialogue context and metaphorical expressions is challenging but essential for the development of emotion inspired intelligent user interfaces. In order to achieve this research goal, we previously developed an AI actor with the integration of an affect detection component on detecting 25 emotions from literal text-based improvisational input. In this paper, we report updated development on metaphorical affect interpretation especially for sensory & cooking metaphors. Contextual affect detection with the integration of emotion modeling is also explored. Evaluation results for the new developments are provided. Our work benefits systems with intention to employ emotions embedded in the scenarios/characters and open-ended input for visual representation without detracting users from learning situations.

Li Zhang, John Barnden

Doctoral Consortium

An Android Head for Social-Emotional Intervention for Children with Autism Spectrum Conditions

Many children with autism spectrum conditions (ASC) have difficulties recognizing emotions from facial expressions. Behavioural interventions have attempted to address this issue but their drawbacks have prompted the exploration of new intervention strategies. Robots have proven to be an engaging and effective possibility. Our work will investigate the use of a facially-expressive android head as a social partner for children with ASC. The main goal of this research is to improve the emotion recognition capabilities of the children through observation, imitation and control of facial expressions on the android.

Andra Adams, Peter Robinson

Automatic Emotion Recognition from Speech A PhD Research Proposal

This paper contains a PhD research proposal related to the domain of automatic emotion recognition from speech signal. We started by identifying our research problem, namely the acute confusion problem between emotion classes and we have cited different sources of this ambiguity. In the methodology section, we presented a method based on simililarity concept between a class and an instance patterns. We dubbed this method as Weighted Ordered classes – Nearest Neighbors. The first result obtained exceeds in performance the best result of the state-of-the art. Finally, as future work, we have made a proposition to improve the performance of the proposed system.

Yazid Attabi, Pierre Dumouchel

Multimodal Affect Recognition in Intelligent Tutoring Systems

This paper concerns the multimodal inference of complex mental states in the intelligent tutoring domain. The research aim is to provide intervention strategies in response to a detected mental state, with the goal being to keep the student in a positive affect realm to maximize learning potential. The research follows an ethnographic approach in the determination of affective states that naturally occur between students and computers. The multimodal inference component will be evaluated from video and audio recordings taken during classroom sessions. Further experiments will be conducted to evaluate the affect component and educational impact of the intelligent tutor.

Ntombikayise Banda, Peter Robinson

Candidacy of Physiological Measurements for Implicit Control of Emotional Speech Synthesis

There is a need for speech synthesis to be more emotionally expressive. Implicit control of a subset of affective vocal effects could be advantageous for some applications. Physiological measures associated with autonomic nervous system (ANS) activity are potential candidates for such input. This paper describes a pilot study investigating physiological sensor readings as potential input signals for modulating the speech synthesis of affective utterances composed by human users. A small corpus of audio, heart rate, and skin conductance data has been collected from eight doctoral student oral defenses. Planned analysis and research phases are outlined.

Shannon Hennig

Toward a Computational Approach for Natural Language Description of Emotions

This is a précis of the author’s dissertation proposal about natural language description of emotions. The proposal seeks to explain how humans describe emotions using natural language. The focus of the proposal is on words and phrases that refer to emotions, rather than the more general phenomena of emotional language. The main problem is that if descriptions of emotions refer to abstract concepts that are local to a particular human (or agent), then how do these concepts vary from person to person and how can shared meaning be established between people. The thesis of the proposal is that natural language emotion descriptions refer to theoretical objects, which provide a logical framework for dealing with this phenomenon in scientific experiments and engineering solutions. An experiment,

Emotion Twenty Questions

(EMO20Q), was devised to study the social natural language behavior of humans, who must use descriptions of emotions to play the familiar game of twenty questions when the unknown word is an emotion. The idea of a theory based on natural language propositions is developed and used to formalize the knowledge of a sign-using organism. Based on this pilot data, it was seen that approximately 25% of the emotion descriptions referred to emotions as objects with dimensional attributes. This motivated the author to use interval type-2 fuzzy sets as a computational model for the meaning of this dimensional subset of emotion descriptions. This model introduces a definition of a variable that ranges over emotions and allows for both inter- and intra- subject variability. A second set of experiments used interval surveys and translation tasks to assess this model. Finally, the use of spectral graph theory is proposed to represent emotional knowledge that has been acquired from the EMO20Q game.

Abe Kazemzadeh

Expressive Gesture Model for Humanoid Robot

This paper presents an expressive gesture model that generates communicative gestures accompanying speech for the humanoid robot Nao. The research work focuses mainly on the expressivity of robot gestures being coordinated with speech. To reach this objective, we have extended and developed our existing virtual agent platform GRETA to be adapted to the robot. Gestural prototypes are described symbolically and stored in a gestural database, called lexicon. Given a set of intentions and emotional states to communicate the system selects from the robot lexicon corresponding gestures. After that the selected gestures are planned to synchronize speech and then instantiated in robot joint values while taking into account parameters of gestural expressivity such as temporal extension, spatial extension, fluidity, power and repetitivity. In this paper, we will provide a detailed overview of our proposed model.

Le Quoc Anh, Catherine Pelachaud

Emotion Generation Integration into Cognitive Architecture

Emotions play an important role in human intelligence and human behavior. It has become important to model emotions, especially in the context of cognitive architecture. Current models of emotion are greatly underdetermined by experimental data from psychology, cognitive science, and neuroscience literature. I raise the hypothesis that deeper integration between emotion and cognition will produce models with much greater explanatory power. The thesis is that the use of a semantic associative network as a memory model will serve to both deepen and broaden integration between emotion and cognition. To test this, an affective cognitive architecture will be built with a semantic associative network at its heart, and will be compared to existing models as well as tested against existing experimental data.

Jerry Lin

Emotion Recognition Using Hidden Markov Models from Facial Temperature Sequence

In this paper, an emotion recognition from facial temporal sequence has been proposed. Firstly, the temperature difference histogram features and five statistical features are extracted from the facial temperature difference matrix of each difference frame in the data sequences. Then the discrete Hidden Markov Models are used as the classifier for each feature. In which, a feature selection strategy based on the recognition results in the training set is introduced. Finally, the results of the experiments on the samples of the USTC-NVIE database demonstrate the effectiveness of our method. Besides, the experiment results also demonstrate that the temperature information of the forehead is more useful than that of the other regions in emotion recognition and understanding, which is consistent with some related research results.

Zhilei Liu, Shangfei Wang

Interpreting Hand-Over-Face Gestures

People often hold their hands near their faces as a gesture in natural conversation, which can interfere with affective inference from facial expressions. However, these gestures are valuable as an additional channel for multi-modal inference. We analyse hand-over-face gestures in a corpus of naturalistic labelled expressions and propose the use of those gestures as a novel affect cue for automatic inference of cognitive mental states. We define three hand cues for encoding hand-over-face gestures, namely hand shape, hand action and facial region occluded, serving as a first step in automating the interpretation process.

Marwa Mahmoud, Peter Robinson

Toward a Computational Model of Affective Responses to Stories for Augmenting Narrative Generation

Current approaches to story generation do not utilize models of human affect to create stories with dramatic arc, suspense, and surprise. This paper describes current and future work towards computational models of affective responses to stories for the purpose of augmenting computational story generators. I propose two cognitively plausible models of suspense and surprise responses to stories. I also propose methods for evaluating these models by comparing them to actual human responses to stories. Finally, I propose the implementation of these models as a heuristic in a search-based story generation system. By using these models as a heuristic, the story generation system will favor stories that are more likely to produce affective responses from human readers.

Brian O’Neill

Recognizing Bodily Expression of Affect in User Tests

We describe our planned research in using affective feedback from body movement and posture to recognize affective states of users. Bodily expression of affect has received far less attention in research than facial expression. The aim of our research is to further investigate how affective states are communicated through bodily expression and to develop an evaluation method for assessing affective states of video gamers based on this knowledge. Current motion capture systems are often intrusive to the user and restricted to lab environments, which results in biasing user experience. We propose a non-intrusive recognition system for bodily expression of affect in a video game context, which can be deployed in the wild.

Marco Pasch, Monica Landoni

An Integrative Computational Model of Emotions

In this paper we propose a computational model of emotions designed to provide autonomous agents with mechanisms for affective processing. We present an integrative framework as the underlying architecture of this computational model, which enables the unification of theories explaining the different facets of the human emotion process and promotes the interaction between cognitive and affective functions. This proposal is inspired by recent advances in the study of human emotions in disciplines such as psychology and neuroscience.

Luis-Felipe Rodríguez, Félix Ramos, Gregorio García

Affective Support in Narrative-Centered Learning Environments

The link between affect and student learning has been the subject of increasing attention in recent years. Affective states such as flow and curiosity tend to have positive correlations with learning while negative states such as boredom and frustration have the opposite effect. Consequently, it is a goal of many intelligent tutoring systems to guide students toward emotional states that are conducive to learning through affective interventions. While much work has gone into understanding the relation between student learning and affective experiences, it is not clear how these relationships manifest themselves in narrative-centered learning environments. These environments embed learning within the context of an engaging narrative that can benefit from “affective scaffolding.” However, in order to provide an optimal level of support for students, the following research questions must be answered: 1) What is the nature of affective experiences in interactive learning environments? 2) How is affect impacted by personal traits, beliefs and learning strategies, and what role does affect have in shaping traits, beliefs, and learning strategies? 3) What strategies can be used to successfully create an optimal affective learning experience?

Jennifer Sabourin

Automatic Understanding of Affective and Social Signals by Multimodal Mimicry Recognition

Human mimicry is one of the important behavioral cues displayed during social interaction that inform us about the interlocutors’ interpersonal states and attitudes. For example, the absence of mimicry is usually associated with negative attitudes. A system capable of analyzing and understanding mimicry behavior could enhance social interaction, both in human-human and human-machine interaction, by informing the interlocutors about each other’s interpersonal attitudes and feelings of affiliation. Hence, our research focus is the investigation of mimicry in social human-human and human-machine interactions with the aim to helpimprove the quality of these interactions. In particular, we aim to develop automatic multimodal mimicry analyzers, to enhance affect recognition and social signal understanding systems through mimicry analysis, and to implement mimicry behavior in Embodied Conversational Agents. This paper surveys and discusses the recent work we have carried out regarding these aims. It is meant to serve as an ultimate goal and a guide for determining recommendations for the development of automatic mimicry analyzers to facilitate affective computing and social signal processing.

Xiaofan Sun, Anton Nijholt, Khiet P. Truong, Maja Pantic

Using Facial Emotional Signals for Communication between Emotionally Expressive Avatars in Virtual Worlds

In this paper we explore the applications of facial expression analysis and eye tracking in driving emotionally expressive avatars. We propose a system that transfers facial emotional signals including facial expressions and eye movements from the real world into a virtual world. The proposed system enables us to address the questions: How significant are eye movements in emotion expression? Can facial emotional signals be transferred effectively, from the real world into virtual worlds? We design an experiment to address the questions. There are two major contributions of our work: 1) We propose a system that incorporates eye movements for transferring facial emotions; 2) We design an experiment to evaluate the effectiveness of the facial emotional signals.

Yuqiong Wang, Joe Geigel

Interactive Event (Demo Papers)

Building Rapport with a 3D Conversational Agent

While embodied conversational agents improve a user’s experience with a system, systems meant for repeated use may need agents that build a relationship with the user. Anita is a low-cost 3D agent capable of talking, displaying emotions, gesturing, and postural mimicry, all of which may increase the rapport between agent and user. Motion capture and pressure sensors were used to create an agent capable of realistic, responsive motions.

Whitney L. Cade, Andrew Olney, Patrick Hays, Julia Lovel

Siento: An Experimental Platform for Behavior and Psychophysiology in HCI

We describe Siento, a system to perform different types of affective computing studies. The platform allows for dimensional or categorical models of emotions, self-reported vs. third party reporting and can record and process multiple types of modalities including video, physiology and text. It has been used already in a number of studies. This type of systems can improve the repeatability of experiments. The system is also used for data acquisition, feature extraction and data analysis applying machine learning techniques.

Rafael A. Calvo, Md. Sazzad Hussain, Payam Aghaei Pour, Omar Alzoubi

A Gesture-Based Interface and Active Cinema

Visual design affects the viewer with various meanings depending upon how it is presented. Our research optimizes visual design in a short animated movie format whose plot is a fixed linear narrative. Designed and functioning in a simulation environment we manipulate the visual design of this short movie according to feedback detected from the viewer. Our goal is to explore the film-maker’s ability to refine and optimize a movie experience in a real-time environment working toward a system that dynamically optimizes the visual and auditory impact of a narrative. In this paper, we describe a prototype system that explores and demonstrates these ideas.

Mark J. Chavez, Aung Sithu Kyaw

OperationARIES!: Aliens, Spies and Research Methods

Operation ARIES

! is an Intelligent Tutoring System that teaches research methodology in a game-like atmosphere. There is a dramatic storyline that engages and motivates students as they acquire both declarative knowledge and critical reasoning skills. ARIES has three modules in which students maintain mixed-initiative dialogue with multiple artificial agents. The internal architecture is similar to that of AutoTutor which uses natural language interaction in the tutorial dialogues. Learning as well as the engaging and motivating factors of ARIES are currently being investigated.

Carol M. Forsyth, Arthur C. Graesser, Keith Millis, Zhiqiang Cai, Diane Halpern

EMO20Q Questioner Agent

In this demonstration, we present an implementation of an emotion twenty questions (EMO20Q) questioner agent. The ubiquitous twenty questions game is a suitable format to study how people describe emotions and designing a computer agent to learn and reason about abstract emotion concepts can provide further theoretical insights. While natural language poses many challenges for the computer in human-computer interaction, the accessibility of natural language has made it possible to acquire data of many players reasoning about emotions in human-human games. These data are used to automate a computer questioner agent that asks the user questions and, based on that user’s answers, attempts to guess the emotion that the user has in mind.

Abe Kazemzadeh, James Gibson, Panayiotis G. Georgiou, Sungbok Lee, Shrikanth S. Narayanan

A Game Prototype with Emotional Contagion

Emotional contagion (EC) in games may provide players with an unique experience. We have developed a turn-based role playing prototype game which incorporates a model based on the EC process. While playing, users have the opportunity to observe the effects of emotional events on individual characters and on the group through simulated emotional contagion dynamics.

Gonçalo Pereira, Joana Dimas, Rui Prada, Pedro A. Santos, Ana Paiva

A Smartphone Interface for a Wireless EEG Headset with Real-Time 3D Reconstruction

We demonstrate a fully functional handheld brain scanner consisting of a low-cost 14-channel EEG headset with a wireless connection to a smartphone, enabling minimally invasive EEG monitoring in naturalistic settings. The smartphone provides a touch-based interface with real-time brain state decoding and 3D reconstruction.

Arkadiusz Stopczynski, Jakob Eg Larsen, Carsten Stahlhut, Michael Kai Petersen, Lars Kai Hansen

Prediction of Affective States through Non-invasive Thermal Camera and EMG Recordings

We propose utilization of thermal camera recordings along with simultaneous EMG recordings to monitor the dynamically changing affective state of a human. Affective state is depicted along the valence and arousal dimensions, such that readings from EMG are mapped onto the valence axis and readings from the thermal camera are mapped onto the arousal axis. The combined system increases the accuracy in prediction of affective states due to a larger palette of extracted features.

Didem Gokcay, Serdar Baltaci, Cagri Karahan, Kemal Dogus Turkay

The First Audio/Visual Emotion Challenge and Workshop

The First Audio/Visual Emotion Challenge and Workshop – An Introduction

The

Audio/Visual Emotion Challenge and Workshop

(

http://sspnet.eu/avec2011

) is the first competition event aimed at comparison of automatic audio, visual,

and

audiovisual emotion analysis. The goal of the challenge is to provide a common benchmark test set for individual multimodal information processing and to bring together the audio and video emotion recognition communities, to compare the relative merits of the two approaches to emotion recognition under well-defined and strictly comparable conditions, and establish to what extent fusion of the approaches is possible and beneficial. A second motivation is the need to advance emotion recognition systems to be able to deal with naturalistic behavior in large volumes of un-segmented, non-prototypical and non-preselected data as this is exactly the type of data that real systems have to face in the real world. Three emotion detection sub-challenges were addressed: emotion detection from audio, from video, or from audiovisual information. As benchmarking database the SEMAINE database of naturalistic dialogues was used. Emotion needed to be recognized in terms of positive/negative valence, and high and low activation (arousal), expectancy, and power.

In total, 41 research teams registered for the challenge. The data turned out to be challenging indeed: The dataset consists of over 4 hours of audio and video recordings, 3,838 words uttered by the subject of interest, and over 1.3 million video frames in total, making it not only a challenge to detect more complex affective states, but also to deal with the sheer amount of data.

Besides participation in the Challenge, papers were invited addressing in particular the differences between audio and video processing of emotive data, and the issues concerning combined audio-visual emotion recognition.

We would like to particularly thank our sponsors – the Social Signal Processing Network (SSPNet), and the HUMAINE Association, all 22 members of the Technical Program Committee for their timely and insightful reviews of the submissions: Anton Batliner, Felix Burkhardt, Rama Chellappa, Mohamed Chetouani, Fernando De la Torre, Laurence Devillers, Julien Epps, Raul Fernandez, Hatice Gunes, Julia Hirschberg, Aleix Martinez, Marc Mehu, Marcello Mortillaro, Matti Pietikäinen, Ioannis Pitas, Peter Robinson, Stefan Steidl, Jianhua Tao, Mohan Trivedi, Matthew Turk, Alessandro Vinciarelli, and Stefanos Zafeiriou, and of course all participants.

Björn Schuller, Michel Valstar, Roddy Cowie, Maja Pantic

Dimensionality Reduction and Classification Analysis on the Audio Section of the SEMAINE Database

This paper presents an analysis of the audio section of the SEMAINE database for affect detection. Chi-square and principal component analysis techniques are used to reduce the dimensionality of the audio datasets. After dimensionality reduction, different classification techniques are used to perform emotion classification at the word level. Additionally, for unbalanced training sets, class re-sampling is performed to improve the model’s classification results. Overall, the final results indicate that Support Vector Machines (SVM) performed best for all data sets. Results show promise for the SEMAINE database as an interesting corpus to study affect detection.

Ricardo A. Calix, Mehdi A. Khazaeli, Leili Javadpour, Gerald M. Knapp

Speech Emotion Recognition System Based on L1 Regularized Linear Regression and Decision Fusion

This paper describes a speech emotion recognition system that is built for Audio Sub-Challenge of Audio/Visual Emotion Challenge (AVEC 2011). In this system, feature selection is conducted via L1 regularized linear regression in which the L1 norm of regression weights is minimized to find a sparse weight vector. The features with approximately zero weights are removed to create a well-selected small feature set. A fusion scheme by combining the strength from linear regression and Extreme learning machine (EML) based feedforward neural networks (NN) is proposed for classification. The experiment results conducted on the SEMAINE database of naturalistic dialogues distributed through AVEC 2011 are presented.

Ling Cen, Zhu Liang Yu, Ming Hui Dong

A Psychologically-Inspired Match-Score Fusion Model for Video-Based Facial Expression Recognition

Communication between humans is complex and is not limited to verbal signals; emotions are conveyed with gesture, pose and facial expression. Facial Emotion Recognition and Analysis (FERA), the techniques by which non-verbal communication is quantified, is an exemplar case where humans consistently outperform computer methods. While the field of FERA has seen many advances, no system has been proposed which scales well to very large data sets. The challenge for computer vision is how to automatically and non-heuristically

downsample

the data while maintaining the maximum representational power that does not sacrifice accuracy. In this paper, we propose a method inspired by human vision and attention theory [2]. Video is segmented into temporal partitions with a dynamic sampling rate based on the frequency of visual information. Regions are homogenized by a match-score fusion technique. The approach is shown to provide classification rates higher than the baseline on the AVEC 2011 video-subchallenge dataset [15].

Albert Cruz, Bir Bhanu, Songfan Yang

Continuous Emotion Recognition Using Gabor Energy Filters

Automatic facial expression analysis systems try to build a mapping between the continuous emotion space and a set of discrete expression categories (e.g. happiness, sadness). In this paper, we present a method to recognize emotions in terms of latent dimensions (e.g. arousal, valence, power). The method we applied uses Gabor energy texture descriptors to model the facial appearance deformations, and a multiclass SVM as base learner of emotions. To deal with more naturalistic behavior, the SEMAINE database of naturalistic dialogues was used.

Mohamed Dahmane, Jean Meunier

Multiple Classifier Systems for the Classification of Audio-Visual Emotional States

Research activities in the field of human-computer interaction increasingly addressed the aspect of integrating some type of emotional intelligence. Human emotions are expressed through different modalities such as speech, facial expressions, hand or body gestures, and therefore the classification of human emotions should be considered as a multimodal pattern recognition problem. The aim of our paper is to investigate multiple classifier systems utilizing audio and visual features to classify human emotional states. For that a variety of features have been derived. From the audio signal the fundamental frequency, LPC- and MFCC coefficients, and RASTA-PLP have been used. In addition to that two types of visual features have been computed, namely form and motion features of intermediate complexity. The numerical evaluation has been performed on the four emotional labels

Arousal, Expectancy, Power, Valence

as defined in the AVEC data set. As classifier architectures multiple classifier systems are applied, these have been proven to be accurate and robust against missing and noisy data.

Michael Glodek, Stephan Tschechne, Georg Layher, Martin Schels, Tobias Brosch, Stefan Scherer, Markus Kächele, Miriam Schmidt, Heiko Neumann, Günther Palm, Friedhelm Schwenker

Investigating the Use of Formant Based Features for Detection of Affective Dimensions in Speech

The ability of a machine to discern various categories of emotion is of great interest in many applications. This paper attempts to explore the use of baseline features consisting of prosodic and spectral features along with formant based features for the purpose of classification of emotion along the dimensions of

arousal

valence

expectancy

, and

power

. Using three feature selection criteria namely maximum average recall, maximal relevance, and minimal-redundancy-maximal-relevance, the paper intends to find the criterion that gives the highest unweighted accuracy. Using a Gaussian Mixture Model classifier, the results indicate that the formant based features show a statistically significant improvement on the accuracy of the classification system.

Jonathan C. Kim, Hrishikesh Rao, Mark A. Clements

Naturalistic Affective Expression Classification by a Multi-stage Approach Based on Hidden Markov Models

In naturalistic behaviour, the affective states of a person change at a rate much slower than the typical rate at which video or audio is recorded (e.g. 25fps for video). Hence, there is a high probability that consecutive recorded instants of expressions represent a same affective content. In this paper, a multi-stage automatic affective expression recognition system is proposed which uses Hidden Markov Models (HMMs) to take into account this temporal relationship and finalize the classification process. The hidden states of the HMMs are associated with the levels of affective dimensions to convert the classification problem into a best path finding problem in HMM. The system was tested on the audio data of the Audio/Visual Emotion Challenge (AVEC) datasets showing performance significantly above that of a one-stage classification system that does not take into account the temporal relationship, as well as above the baseline set provided by this Challenge. Due to the generality of the approach, this system could be applied to other types of affective modalities.

Hongying Meng, Nadia Bianchi-Berthouze

The CASIA Audio Emotion Recognition Method for Audio/Visual Emotion Challenge 2011

This paper introduces the CASIA audio emotion recognition method for the audio sub-challenge of Audio/Visual Emotion Challenge 2011 (AVEC2011). Two popular pattern recognition techniques, SVM and AdaBoost, are adopted to solve the emotion recognition problem. The feature set is also simply investigated by comparing the performance of classifier built on the baseline feature set and the dimension reduced feature set. Experimental results show that the baseline feature set is better for the classification of

arousal

and

power

dimensions, while the reduced feature set is better for the other affective dimensions, and the average performance of AdaBoost slightly outperforms SVMs in our experiment.

Shifeng Pan, Jianhua Tao, Ya Li

Modeling Latent Discriminative Dynamic of Multi-dimensional Affective Signals

During face-to-face communication, people continuously exchange para-linguistic information such as their emotional state through facial expressions, posture shifts, gaze patterns and prosody. These affective signals are subtle and complex. In this paper, we propose to explicitly model the interaction between the high level perceptual features using Latent-Dynamic Conditional Random Fields. This approach has the advantage of explicitly learning the sub-structure of the affective signals as well as the extrinsic dynamic between emotional labels. We evaluate our approach on the Audio-Visual Emotion Challenge (AVEC 2011) dataset. By using visual features easily computable using off-the-shelf sensing software (vertical and horizontal eye gaze, head tilt and smile intensity), we show that our approach based on LDCRF model outperforms previously published baselines for all four affective dimensions. By integrating audio features, our approach also outperforms the audio-visual baseline.

Geovany A. Ramirez, Tadas Baltrušaitis, Louis-Philippe Morency

Audio-Based Emotion Recognition from Natural Conversations Based on Co-Occurrence Matrix and Frequency Domain Energy Distribution Features

Emotion recognition from natural speech is a very challenging problem. The audio sub-challenge represents an initial step towards building an efficient audio-visual based emotion recognition system that can detect emotions for real life applications (i.e. human-machine interaction and/or communication). The SEMAINE database, which consists of emotionally colored conversations, is used as the benchmark database. This paper presents our emotion recognition system from speech information in terms of positive/negative valence, and high and low arousal, expectancy and power. We introduce a new set of features including Co-Occurrence matrix based features as well as frequency domain energy distribution based features. Comparisons between well-known prosodic and spectral features and the new features are presented. Classification using the proposed features has shown promising results compared to the classical features on both the development and test data sets.

Aya Sayedelahl, Pouria Fewzee, Mohamed S. Kamel, Fakhri Karray

AVEC 2011–The First International Audio/Visual Emotion Challenge

The Audio/Visual Emotion Challenge and Workshop (AVEC 2011) is the first competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and audiovisual emotion analysis, with all participants competing under strictly the same conditions. This paper first describes the challenge participation conditions. Next follows the data used – the SEMAINE corpus – and its partitioning into train, development, and test partitions for the challenge with labelling in four dimensions, namely activity, expectation, power, and valence. Further, audio and video baseline features are introduced as well as baseline results that use these features for the three sub-challenges of audio, video, and audiovisual emotion recognition.

Björn Schuller, Michel Valstar, Florian Eyben, Gary McKeown, Roddy Cowie, Maja Pantic

Investigating Glottal Parameters and Teager Energy Operators in Emotion Recognition

The purpose of this paper is to study the performance of glottal waveform parameters and TEO in distinguishing binary classes of four emotion dimensions (activation, expectation, power, and valence) using authentic emotional speech. The two feature sets were compared with a 1941-dimension acoustic feature set including prosodic, spectral, and other voicing related features extracted using openSMILE toolkit. The comparison work highlight the discrimination ability of TEO in emotion dimensions activation and power, and glottal parameters in expectation and valence for authentic speech data. Using the same classification methodology, TEO and glottal parameter outperformed or performed similarly to the prosodic, spectral and other voicing related features (i.e., the feature set obtained using openSMILE).

Rui Sun, Elliot Moore II

Affective Brain-Computer Interfaces Workshop (aBCI 2011)

Affective Brain-Computer Interfaces (aBCI 2011)

Recently, many groups (see Zander and Kothe. Towards passive brain–computer interfaces: applying brain–computer interface technology to human–machine systems in general.

Neural

Eng

., 8, 2011) have worked toward expanding brain-computer interface (BCI) systems to include not only active control, but also passive mental state monitoring to enhance human-computer interaction (HCI). Many studies have shown that brain imaging technologies can reveal information about the affective and cognitive state of a subject, and that the interaction between humans and machines can be aided by the recognition of those user states. New developments including practical sensors, new machine learning software, and improved interaction with the HCI community are leading us to systems that seamlessly integrate passively recorded information to improve interactions with the outside world.

To achieve robust passive BCIs, efforts from applied and basic sciences have to be combined. On the one hand, applied fields such as affective computing aim to develop applications that adapt to changes in the user states and thereby enrich interaction, leading to a more natural and effective usability. On the other hand, basic research in neuroscience advances our understanding of the neural processes associated with emotions. Similar advancements are made for more cognitive mental states such as attention, workload, or fatigue.

This is the second workshop on affective brain-computer interfaces. The first one was held at ACII 2009 in Amsterdam. Like the first workshop, this one explores the advantages and limitations of using neurophysiological signals as a modality for the automatic recognition of affective and cognitive states, and the possibilities of using this information about the user state in innovative and adaptive applications. Whereas in 2009 the focus was on affective and cognitive state estimation alike, in this 2011 workshop we focus more on the induction, measurement, and use of affective states, i.e. emotions and moods. Hence, the main topics of this workshop are (1) emotion elicitation and data collection for affective BCI, (2) detection of mental state via electroencephalography and other modalities, and (3) adaptive interfaces and affective BCI.

This workshop also seeks to foster interaction among researchers with relevant interests, such as BCI, affective computing, neuro-ergonomics, affective and cognitive neuroscience. These experts present state-of-the-art progress and their visions on the various overlaps between those disciplines.

Christian Mühl, Anton Nijholt, Brendan Allison, Stephen Dunne, Dirk Heylen

Online Recognition of Facial Actions for Natural EEG-Based BCI Applications

We present a system for classification of nine voluntary facial actions, i.e.

Neutral

Smile

Sad

Surprise

Angry

Speak

Blink

Left

, and

Right

. The data is assessed by an Emotiv EPOC wireless EEG head-set. We derive spectral features and step function features that represent the main signal characteristics of the recorded data in a straightforward manner. With a two stage classification setup using support vector machines we achieve an overall recognition accuracy of 81.8%. Furthermore, we show a qualitative evaluation of an online system for facial action recognition using the EPOC device.

Dominic Heger, Felix Putze, Tanja Schultz

What You Expect Is What You Get? Potential Use of Contingent Negative Variation for Passive BCI Systems in Gaze-Based HCI

When using eye movements for cursor control in human-computer interaction (HCI), it may be difficult to find an appropriate substitute for the click operation. Most approaches make use of dwell times. However, in this context the so-called Midas-Touch-Problem occurs which means that the system wrongly interprets fixations due to long processing times or spontaneous dwellings of the user as command. Lately it has been shown that brain-computer interface (BCI) input bears good prospects to overcome this problem using imagined hand movements to elicit a selection. The current approach tries to develop this idea further by exploring potential signals for the use in a passive BCI, which would have the advantage that the brain signals used as input are generated automatically without conscious effort of the user. To explore event-related potentials (ERPs) giving information about the user’s intention to select an object, 32-channel electroencephalography (EEG) was recorded from ten participants interacting with a dwell-time-based system. Comparing ERP signals during the dwell time with those occurring during fixations on a neutral cross hair, a sustained negative slow cortical potential at central electrode sites was revealed. This negativity might be a contingent negative variation (CNV) reflecting the participants’ anticipation of the upcoming selection. Offline classification suggests that the CNV is detectable in single trial (mean accuracy 74.9 %). In future, research on the CNV should be accomplished to ensure its stable occurence in human-computer interaction and render possible its use as a potential substitue for the click operation.

Klas Ihme, Thorsten Oliver Zander

EEG Correlates of Different Emotional States Elicited during Watching Music Videos

Studying emotions has become increasingly popular in various research fields. Researchers across the globe have studied various tools to implicitly assess emotions and affective states of people. Human computer interface systems specifically can benefit from such implicit emotion evaluator module, which can help them determine their users’ affective states and act accordingly. Brain electrical activity can be considered as an appropriate candidate for extracting emotion-related cues, but it is still in its infancy. In this paper, the results of analyzing the Electroencephalogram (EEG) for assessing emotions elicited during watching various pre-selected emotional music video clips have been reported. More precisely, in-depth results of both subject-dependent and subject-independent correlation analysis between time domain, and frequency domain features of EEG signal and subjects’ self assessed emotions are produced and discussed.

Eleni Kroupi, Ashkan Yazdani, Touradj Ebrahimi

Classifying High-Noise EEG in Complex Environments for Brain-Computer Interaction Technologies

Future technologies such as Brain-Computer Interaction Technologies (BCIT) or affective Brain Computer Interfaces (aBCI) will need to function in an environment with higher noise and complexity than seen in traditional laboratory settings, and while individuals perform concurrent tasks. In this paper, we describe preliminary results from an experiment in a complex virtual environment. For analysis, we classify between a subject hearing and reacting to an audio stimulus that is addressed to them, and the same subject hearing an irrelevant audio stimulus. We performed two offline classifications, one using BCILab [1], the other using LibSVM [2]. Distinct classifiers were trained for each individual in order to improve individual classifier performance [3]. The highest classification performance results were obtained using individual frequency bands as features and classifying with an SVM classifier with an RBF kernel, resulting in mean classification performance of 0.67, with individual classifier results ranging from 0.60 to 0.79.

Brent Lance, Stephen Gordon, Jean Vettel, Tony Johnson, Victor Paul, Chris Manteuffel, Matthew Jaswa, Kelvin Oie

Neural Correlates of Mindfulness Practice for Naive Meditators

Mindfulness-Based Stress Reduction (MBSR), a widely-used form of mindfulness-based meditation, has shown positive effects on reducing psychological stress, and helping the immune system and a variety of disorders. So far little is known as to how neurophysiological activity is affected by MBSR and how it changes over time as a meditator becomes more experienced with MBSR. In this study we investigated naive meditators’ EEG activity during an eight-week MBSR program. We developed easy-to-use and portable dry-sensor EEG devices and the participants recorded data by themselves. We investigated the effect of concentration level on EEG power spectrum, and tracked how EEG changed over time during the program. Significant results were found between EEG and concentration, and between EEG and amount of experience. We discussed our findings in the context of EEG rhythmic activity in relation to meditation. Our findings provided insight into developing a BCI system to guide meditation practice.

An Luo, Dyana Szibbo, Julie Forbes, Thomas J. Sullivan

First Demonstration of a Musical Emotion BCI

Development of EEG-based brain computer interface (BCI) methods has largely focused on creating a communication channel for subjects with intact cognition but profound loss of motor control from stroke or neurodegenerative disease that allows such subjects to communicate by spelling out words on a personal computer. However, other important human communication channels may also be limited or unavailable for handicapped subjects – direct non-linguistic emotional communication by gesture, vocal prosody, facial expression, etc.. We report and examine a first demonstration of a musical ‘emotion BCI’ in which, as one element of a live musical performance, an able-bodied subject successfully engaged the electronic delivery of an ordered sequence of five music two-tone bass frequency drone sounds by imaginatively re-experiencing the human feeling he had spontaneously associated with the sound of each drone sound during training sessions. The EEG data included activities of both brain and non-brain sources (scalp muscles, eye movements). Common Spatial Pattern classification gave 84% correct pseudo-online performance and 5-of-5 correct classification in live performance. Re-analysis of the training session data including only the brain EEG sources found by multiple-mixture Amica ICA decomposition achieved five-class classification accuracy of 59-70%, confirming that different voluntary emotion imagination experiences may be associated with distinguishable brain source EEG dynamics.

Scott Makeig, Grace Leslie, Tim Mullen, Devpratim Sarma, Nima Bigdely-Shamlo, Christian Kothe

Emotion in Games Workshop

Emotion in Games

Computer games are unique elicitors of emotion. Recognition of player emotion, dynamic construction of affective player models, and modelling emotions in non-playing characters, represent challenging areas of research and practice at the crossroads of cognitive and affective science, psychology, artificial intelligence and human-computer interaction. Techniques from AI and HCI can be used to recognize player affective states and to model emotion in non-playing characters. Multiple input modalities provide novel means for measuring player satisfaction and engagement. These data can then be used to adapt the gameplay to the player’s state, to maximize player engagement and to close the affective game loop.

The Emotion in Games workshop (EmoGames 2011

http://sirenproject.eu/content/acii-2011

-workshop-emotion-games) will bring together researchers and practitioners in affective computing, user experience research, social psychology and cognition, machine learning, and AI and HCI, to explore topics in player experience research, affect induction, sensing and modelling and affect-driven game adaptation, and modelling of emotion in non-playing characters. It will also provide new insights on how gaming can be used as a research platform, to induce and capture affective interactions with single and multiple users, and to model affect- and behaviour-related concepts, helping to operationalize concepts such as flow and engagement.

The workshop will include a keynote, paper and poster presentations, and panel discussions. Selected papers will appear in a special issue of the IEEE Transactions on Affective Computing, “Emotion in Games”, in mid-2013.

The EmoGames2011 workshop is organized in coordination with the newly formed ‘Emotion in Games’ Special Interest Group (SIG) of the Humaine Association and the IEEE Computational Intelligence Society (CIS) Task Force on Player Satisfaction Modelling. We would like to thank all participants, as well as the members of the Program Committee, for their reviews of the workshop submissions: Elisabeth André, Ruth Aylett, Nadia Bianchi-Berthouze, Antonio Camurri, Marc Cavazza, Jonathan Gratch, Hatice Gunes, Dirk Heylen, Katherine Isbister, Stefanos Kollias, Maurizio Mancini, Anton Nijholt, Julian Togelius, Asimina Vasalou, Gualtiero Volpe, Catherine Pelachaud, and Tom Ziemke.

Georgios N. Yannakakis, Kostas Karpouzis, Ana Paiva, Eva Hudlicka

Improvisation, Emotion, Video Game

Actors are increasingly used by the video game industry to give life to non-player characters. Models are animated based on face and body tracking. Voices are dubbed. This paper argues that it is now time to tap into the improvising expertise of actors. To create games with rich, emotional content, improvisation with a group of actors is a necessary part of game-play development.

abstract

environment.

Josephine Anstey

Outline of an Empirical Study on the Effects of Emotions on Strategic Behavior in Virtual Emergencies

The applicability of appropriate coping strategies is important in emergencies or traumatic experiences such as car accidents or human violence. In this context, emotion regulation and decision making are relevant. However, research on human reactions to traumatic experiences is very challenging and most existing research uses retrospective assessments of these variables of interest. Thus, we are currently developing and evaluating novel methods to investigate human behavior in cases of emergency. Virtual reality scenarios of emergencies are employed to enable an immersive interactive engagement (e.g., dealing with fire inside a building) based on the modification of Valve’s popular Source

2007 game engine.

This paper presents our ongoing research project, which aims at the empirical investigation of human strategic behavior under the influence of emotions while having to cope with virtual emergencies.

Christian Becker-Asano, Dali Sun, Birgit Kleim, Corinna N. Scheel, Brunna Tuschen-Caffier, Bernhard Nebel

Assessing Performance Competence in Training Games

In-process assessment of trainee learners in game-based simulators is a challenging activity. This typically involves human instructor time and cost, and does not scale to the one tutor per learner vision of computer-based learning. Moreover, evaluation from a human instructor is often subjective and comparisons between learners are not accurate. Therefore, in this paper, we propose an automated, formula-driven quantitative evaluation method for assessing performance competence in serious training games. Our proposed method has been empirically validated in a game-based driving simulator using 7 subjects and 13 sessions, and accuracy up to 90.25% has been achieved when compared to an existing qualitative method. We believe that by incorporating quantitative evaluation methods like these future training games could be enriched with more meaningful feedback and adaptive game-play so as to better monitor and support player motivation, engagement and learning performance.

Hiran Ekanayake, Per Backlund, Tom Ziemke, Robert Ramberg, Kamalanath Hewagamage

Affective Preference from Physiology in Videogames: A Lesson Learned from the TORCS Experiment

In this paper we discuss several issues arisen during our most recent experiment concerning the estimation of player preference from physiological signals during a car racing game, to share our experience with the community and provide some insights on the experimental process. We present a selection of critical aspects that range from the choice of the task, to the definition of the questionnaire, to data acquisition.

Thanks to the experience gained during the mentioned case study, we can give an extensive picture of which aspects can be considered in the design of similar experiments. The goal of this contribution is to provide guidelines for analogous experiments.

Maurizio Garbarino, Matteo Matteucci, Andrea Bonarini

Analysing the Relevance of Experience Partitions to the Prediction of Players’ Self-reports of Affect

A common practice in modeling affect from physiological signals consists of reducing the signals to a set of statistical features that feed predictors of self-reported emotions. This paper analyses the impact of various time-windows, used for the extraction of physiological features, to the accuracy of affective models of players in a simple 3D game. Results show that the signals recorded in the central part of a short gaming experience contain more relevant information to the prediction of positive affective states than the starting and ending parts while the relevant information to predict anxiety and frustration appear not to be localized in a specific time interval but rather dependent on particular game stimuli.

Héctor Perez Martínez, Georgios N. Yannakakis

A Game-Based Corpus for Analysing the Interplay between Game Context and Player Experience

Recognizing players’ affective state while playing video games has been the focus of many recent research studies. In this paper we describe the process that has been followed to build a corpus based on game events and recorded video sessions from human players while playing

Super Mario Bros

. We present different types of information that have been extracted from game context, player preferences and perception of the game, as well as user features, automatically extracted from video recordings. We run a number of initial experiments to analyse players’ behavior while playing video games as a case study of the possible use of the corpus.

Noor Shaker, Stylianos Asteriadis, Georgios N. Yannakakis, Kostas Karpouzis

Effect of Emotion and Articulation of Speech on the Uncanny Valley in Virtual Characters

This paper presents a study of how exaggerated facial expression in the lower face region affects perception of emotion and the Uncanny Valley phenomenon in realistic, human-like, virtual characters. Characters communicated the six basic emotions, anger, disgust, fear, sadness and surprise with normal and exaggerated mouth movements. Measures were taken for perceived familiarity and human-likeness. The results showed that: an increased intensity of articulation significantly reduced the uncanny for anger; yet increased perception of the uncanny for characters expressing happiness with an exaggeration of mouth movement. The practical implications of these findings are considered when controlling the uncanny in virtual characters.

Angela Tinwell, Mark Grimshaw, Debbie Abdel-Nabi

Machine Learning for Affective Computing Workshop

Machine Learning for Affective Computing

Affective computing (AC) is a unique discipline which includes modeling affect using one or multiple modalities by drawing on techniques from many different fields. AC often deals with problems that are known to be very complex and multi-dimensional, involving different kinds of data (numeric, symbolic, visual etc.). However, with the advancement of machine learning techniques, a lot of those problems are now becoming more tractable.

The purpose of this workshop was to engage the machine learning and affective computing communities towards solving problems related to understanding and modeling social affective behaviors. We welcomed participation of researchers from diverse fields, including signal processing and pattern recognition, statistical machine learning, human-computer interaction, human-robot interaction, robotics, conversational agents, experimental psychology, and decision making.

There is a need for a set of high standards for recognizing and understanding affect. At the same time, these standards need to take into account that the expectations and validations in this area may be different than in traditional research on machine learning. This should be reflected in the design of machine learning techniques used to tackle these problems. For example, affective data sets are known to be noisy, high dimensional, and incomplete. Classes may overlap. Affective behaviors are often person specific and require temporal modeling with real-time performance. This first edition of the ACII Workshop on Machine Learning for Affective Computing will be a proper venue to invoke such discussions and engage the community towards design and validation of learning techniques for affective computing.

Mohammed Hoque, Daniel J. McDuff, Louis-Philippe Morency, Rosalind W. Picard

Large Scale Personality Classification of Bloggers

Personality is a fundamental component of an individual’s affective behavior. Previous work on personality classification has emerged from disparate sources: Varieties of algorithms and feature-selection across spoken and written data have made comparison difficult. Here, we use a large corpus of blogs to compare classification feature selection; we also use these results to identify characteristic language information relating to personality. Using Support Vector Machines, the best accuracies range from 84.36% (openness to experience) to 70.51% (neuroticism). To achieve these results, the best performing features were a combination of: (1) stemmed bigrams; (2) no exclusion of stopwords (i.e. common words); and (3) the boolean, presence or absence of features noted, rather than their rate of use. We take these findings to suggest that both the structure of the text and the presence of common words are important. We also note that a common dictionary of words used for content analysis (LIWC) performs less well in this classification task, which we propose is due to their conceptual breadth. To get a better sense of how personality is expressed in the blogs, we explore the best performing features and discuss how these can provide a deeper understanding of personality language behavior online.

Francisco Iacobelli, Alastair J. Gill, Scott Nowson, Jon Oberlander

Smartphones Get Emotional: Mind Reading Images and Reconstructing the Neural Sources

Combining a wireless EEG headset with a smartphone offers new opportunities to capture brain imaging data reflecting our everyday social behavior in a mobile context. However processing the data on a portable device will require novel approaches to analyze and interpret significant patterns in order to make them available for runtime interaction. Applying a Bayesian approach to reconstruct the neural sources we demonstrate the ability to distinguish among emotional responses reflected in different scalp potentials when viewing pleasant and unpleasant pictures compared to neutral content. Rendering the activations in a 3D brain model on a smartphone may not only facilitate differentiation of emotional responses but also provide an intuitive interface for touch based interaction, allowing for both modeling the mental state of users as well as providing a basis for novel bio-feedback applications.

Michael Kai Petersen, Carsten Stahlhut, Arkadiusz Stopczynski, Jakob Eg Larsen, Lars Kai Hansen

Generalizing Models of Student Affect in Game-Based Learning Environments

Evidence of the strong relationship between learning and emotion has fueled recent work in modeling affective states in intelligent tutoring systems. Many of these models are designed in ways that limit their ability to be deployed to a large audience of students by using expensive sensors or subject-dependent machine learning techniques. This paper presents work that investigates empirically derived Bayesian networks for prediction of student affect. Predictive models are empirically learned from data acquired from 260 students interacting with the game-based learning environment,

Crystal Island

. These models are then tested on data from a second identical study involving 140 students to examine issues of generalizability of learned predictive models of student affect. The findings suggest that predictive models of affect that are learned from empirical data may have significant dependencies on the populations on which they are trained, even when the populations themselves are very similar.

Jennifer Sabourin, Bradford Mott, James C. Lester

A Spatio-Temporal Probabilistic Framework for Dividing and Predicting Facial Action Units

This paper proposed a probabilistic approach to divide the Facial Action Units (AUs) based on the physiological relations and their strengths among the facial muscle groups. The physiological relations and their strengths were captured using a Static Bayesian Network (SBN) from given databases. A data driven spatio-temporal probabilistic scoring function was introduced to divide the AUs into : (i) frequently occurred and strongly connected AUs (FSAUs) and (ii) infrequently occurred and weakly connected AUs (IWAUs). In addition, a Dynamic Bayesian Network (DBN) based predictive mechanism was implemented to predict the IWAUs from FSAUs. The combined spatio-temporal modeling enabled a framework to predict a full set of AUs in real-time. Empirical analyses were performed to illustrate the efficacy and utility of the proposed approach. Four different datasets of varying degrees of complexity and diversity were used for performance validation and perturbation analysis. Empirical results suggest that the IWAUs can be robustly predicted from the FSAUs in real-time and was found to be robust against noise.

A. K. M. Mahbubur Rahman, Md. Iftekhar Tanveer, Mohammed Yeasin

Erratum: Naturalistic Affective Expression Classification by a Multi-stage Approach Based on Hidden Markov Models

The acknowledgement text of the initially published paper was incomplete. It should have been as follows:This work was supported by EPSRC grant EP/G043507/1: Pain rehabilitation: E/Motion-based automated coaching. This work was funded by the EPSRC Emotion & Pain Project EP/H017178/1.

Hongying Meng, Nadia Bianchi-Berthouze

Backmatter

Titel: Affective Computing and Intelligent Interaction
herausgegeben von: Sidney D’Mello
Arthur Graesser
Björn Schuller
Jean-Claude Martin
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-24571-8
Print ISBN: 978-3-642-24570-1
DOI: https://doi.org/10.1007/978-3-642-24571-8