Skip to main content

Über dieses Buch

This book constitutes refereed proceedings of the COST 2102 International Training School on Cognitive Behavioural Systems held in Dresden, Germany, in February 2011. The 39 revised full papers presented were carefully reviewed and selected from various submissions. The volume presents new and original research results in the field of human-machine interaction inspired by cognitive behavioural human-human interaction features. The themes covered are on cognitive and computational social information processing, emotional and social believable Human-Computer Interaction (HCI) systems, behavioural and contextual analysis of interaction, embodiment, perception, linguistics, semantics and sentiment analysis in dialogues and interactions, algorithmic and computational issues for the automatic recognition and synthesis of emotional states.



Computational Issues in Cognitive Systems

An Approach to Intelligent Signal Processing

This paper describes an approach to intelligent signal processing. First we propose a general signal model which applies to speech, music, biological, and technical signals. We formulate this model mathematically using a unification of hidden Markov models and finite state machines. Then we name tasks for intelligent signal processing systems and derive a hierarchical architecture which is capable of solving them. We show the close relationship of our approach to cognitive dynamic systems. Finally we give a number of application examples.

Matthias Wolff, Rüdiger Hoffmann

The Analysis of Eye Movements in the Context of Cognitive Technical Systems: Three Critical Issues

Understanding mechanisms of attention is important in the context of research and application. Eye tracking is a promising method to approach this question, especially for the development of future cognitive technical systems. Based on three examples, we discuss aspects of eye gaze behaviour which are relevant for research and application. First, we demonstrate the omnipresent influence of sudden auditory and visual events on the duration of fixations. Second, we show that the correspondence between gaze direction and attention allocation is determined by characteristics of the task. Third, we explore how eye movements can be used for information transmission in remote collaboration by comparing it with verbal interaction and the mouse cursor. Analysing eye tracking in the context of future applications reveals a great potential but requires solid knowledge of the various facets of gaze behavior.

Sebastian Pannasch, Jens R. Helmert, Romy Müller, Boris M. Velichkovsky

Ten Recent Trends in Computational Paralinguistics

The field of computational paralinguistics is currently emerging from loosely connected research on speaker states, traits, and vocal behaviour. Starting from a broad perspective on the state-of-the-art in this field, we combine these facts with a bit of ‘tea leaf reading’ to identify ten currently dominant trends that might also characterise the next decade of research: taking into account more tasks and task interdependencies, modelling paralinguistic information in the continuous domain, agglomerating and evaluating on large amounts of heterogeneous data, exploiting more and more types of features, fusing linguistic and non-linguistic phenomena, devoting more effort to optimisation of the machine learning aspects, standardising the whole processing chain, addressing robustness and security of systems, proceeding to evaluation in real-life conditions, and finally overcoming cross-language and cross-cultural barriers. We expect that following these trends we will see an increase in the ‘social competence’ of tomorrow’s speech and language processing systems.

Björn Schuller, Felix Weninger

Conversational Speech Recognition in Non-stationary Reverberated Environments

This paper presents a conversational speech recognition system able to operate in non-stationary reverberated environments. The system is composed of a dereverberation front-end exploiting multiple distant microphones, and a speech recognition engine. The dereverberation front-end identifies a room impulse response by means of a blind channel identification stage based on the Unconstrained Normalized Multi-Channel Frequency Domain Least Mean Square algorithm. The dereverberation stage is based on the adaptive inverse filter theory and uses the identified responses to obtain a set of inverse filters which are then exploited to estimate the clean speech. The speech recognizer is based on tied-state cross-word triphone models and decodes features computed from the dereverberated speech signal. Experiments conducted on the Buckeye corpus of conversational speech report a relative word accuracy improvement of 17.48% in the stationary case and of 11.16% in the non-stationary one.

Rudy Rotili, Emanuele Principi, Martin Wöllmer, Stefano Squartini, Björn Schuller

From Nonverbal Cues to Perception: Personality and Social Attractiveness

Nonverbal behaviour influences to a significant extent our perception of others, especially during the earliest stages of an interaction. This article considers the phenomenon in two zero acquaintance scenarios: the first is the attribution of personality traits to speakers we listen to for the first time, the second is the social attractiveness of unacquainted people with whom we talk on the phone. In both cases, several nonverbal cues, both measurable and machine detectable, appear to be significantly correlated with quantitative assessments of personality traits and social attractiveness. This provides a promising basis for the development of computing approaches capable of predicting how people are perceived by others in social terms.

Alessandro Vinciarelli, Hugues Salamin, Anna Polychroniou, Gelareh Mohammadi, Antonio Origlia

Measuring Synchrony in Dialog Transcripts

A finite register method of processing dialog transcripts is used to measure interlocutor synchrony. Successive contributions by participants are measured for word n-gram repetitions and temporal overlaps. The Zipfian distribution of words in language use leads to a natural expectation that random re-orderings of dialog contributions will unavoidably exhibit repetition – one might reasonably expect that the frequency of repetition in actual dialog is in fact best explained as a random effect. Accordingly, significance is assessed with respect to randomized contrast values. The contrasts are obtained from averages over randomized reorderings of dialog contributions with temporal spans of the revised dialogs guided by the original durations. Benchmark distributions for allo-repetition and self-repetition are established from existing dialog transcripts covering a pair of pragmatically different circumstances: ATR English language “lingua franca” discussions, Air-Traffic communications (Flight 1549 over the Hudson River). Repetition in actual dialog exceeds the frequency one might expect from a random process. Perhaps surprisingly from the perspective of using repetition as an index of synchrony, self-repetition significantly exceeds allo-repetition.

Carl Vogel, Lydia Behan

A Companion Technology for Cognitive Technical Systems

The Transregional Collaborative Research Centre SFB/TRR 62

”A Companion Technology for Cognitive Technical Systems”

, funded by the German Research Foundation (DFG) at Ulm and Magdeburg sites, deals with the systematic and interdisciplinary study of cognitive abilities and their implementation in technical systems. The properties of multimodality, individuality, adaptability, availability, cooperativeness and trustworthiness are at the focus of the investigation. These characteristics show a new type of interactive device which is not only practical and efficient to operate, but as well agreeable, hence the term ”companion”. The realisation of such a technology is supported by technical advancement as well as by neurobiological findings. Companion technology has to consider the entire situation of the user, machine, environment and (if applicable) other people or third interacting parties, in current and historical states. This will reflect the mental state of the user, his embeddedness in the task, and how he is situated in the current process.

Andreas Wendemuth, Susanne Biundo

Semantic Dialogue Modeling

This paper describes an abstract model for the semantic level of a dialogue system. We introduce mathematical structures which make it possible to design a semantic-driven dialogue system. We describe essential parts of such a system, which comprise the construction of feature-values relations representing meaning from a given world model, the modeling of the flow of information between the dialogue strategy controller and speech recogniser by a

horizon of comprehension

and the

horizon of recognition results

, the connection of these horizons to wordings via

utterance-meaning pairs

, and the incorporation of new horizons into a state of information. Finally, the connection to dialogue strategy controlling is sketched.

Günther Wirsching, Markus Huber, Christian Kölbl, Robert Lorenz, Ronald Römer

Furhat: A Back-Projected Human-Like Robot Head for Multiparty Human-Machine Interaction

In this chapter, we first present a summary of findings from two previous studies on the limitations of using flat displays with embodied conversational agents (ECAs) in the contexts of face-to-face human-agent interaction. We then motivate the need for a three dimensional display of faces to guarantee accurate delivery of gaze and directional movements and present


, a novel, simple, highly effective, and human-like back-projected robot head that utilizes computer animation to deliver facial movements, and is equipped with a pan-tilt neck. After presenting a detailed summary on why and how


was built, we discuss the advantages of using optically projected animated agents for interaction. We discuss using such agents in terms of situatedness, environment, context awareness, and social, human-like face-to-face interaction with robots where subtle nonverbal and social facial signals can be communicated. At the end of the chapter, we present a recent application of


as a multimodal multiparty interaction system that was presented at the London Science Museum as part of a robot festival,. We conclude the paper by discussing future developments, applications and opportunities of this technology.

Samer Al Moubayed, Jonas Beskow, Gabriel Skantze, Björn Granström

VISION as a Support to Cognitive Behavioural Systems

Cognitive behavioral systems would definitely benefit from a supporting technology able to automatically recognize the context where humans operate, their gestures and even facial expressions. Such capability poses challenges for many researchers in various fields because the ultimate goal is to transfer to machines the human capability of representing and reasoning on the environment and its elements. The automation can be achieved through a supporting infrastructure able to capture a huge amount of information from the environment, much more than humans do, and sending it to a processing unit able to build a representation of the context that would catch all elements necessary to interpret the specific environment.

The goal of this paper is to present the VISION infrastructure and how it can support cognitive systems. Indeed, VISION is a software/hardware infrastructure that overcomes the limitations of current technology for Wireless Sensor Networks (WSNs) providing broadband wireless links for 3D video streaming with very high reliability, obtained by an innovative reconfigurable context and resource aware middleware for WSNs. We show VISION at work on the communicative impaired children scenario.

Luca Berardinelli, Dajana Cassioli, Antinisca Di Marco, Anna Esposito, Maria Teresa Riviello, Catia Trubiani

The Hourglass of Emotions

Human emotions and their modelling are increasingly understood to be a crucial aspect in the development of intelligent systems. Over the past years, in fact, the adoption of psychological models of emotions has become a common trend among researchers and engineers working in the sphere of affective computing. Because of the elusive nature of emotions and the ambiguity of natural language, however, psychologists have developed many different affect models, which often are not suitable for the design of applications in fields such as affective HCI, social data mining, and sentiment analysis. To this end, we propose a novel biologically-inspired and psychologically-motivated emotion categorisation model that goes beyond mere categorical and dimensional approaches. Such model represents affective states both through labels and through four independent but concomitant affective dimensions, which can potentially describe the full range of emotional experiences that are rooted in any of us.

Erik Cambria, Andrew Livingstone, Amir Hussain

A Naturalistic Database of Thermal Emotional Facial Expressions and Effects of Induced Emotions on Memory

This work defines a procedure for collecting naturally induced emotional facial expressions through the vision of movie excerpts with high emotional contents and reports experimental data ascertaining the effects of emotions on memory word recognition tasks. The induced emotional states include the four basic emotions of sadness, disgust, happiness, and surprise, as well as the neutral emotional state. The resulting database contains both thermal and visible emotional facial expressions, portrayed by forty Italian subjects and simultaneously acquired by appropriately synchronizing a thermal and a standard visible camera. Each subject’s recording session lasted 45 minutes, allowing for each mode (thermal or visible) to collect a minimum of 2000 facial expressions from which a minimum of 400 were selected as highly expressive of each emotion category. The database is available to the scientific community and can be obtained contacting one of the authors. For this pilot study, it was found that emotions and/or emotion categories do not affect individual performance on memory word recognition tasks and temperature changes in the face or in some regions of it do not discriminate among emotional states.

Anna Esposito, Vincenzo Capuano, Jiri Mekyska, Marcos Faundez-Zanuy

Prosody Modelling for TTS Systems Using Statistical Methods

The main drawback of older methods of prosody modelling is the monotony of the output, which is perceived as uncomfortable by the users, especially when listening to longer passages. The present paper proposes a prosodic generator designed to increase the variability of synthesized speech in reading devices for the blind. The method used is based on text segmentation into several prosodic patterns by means of vector quantisation and the subsequent training of corresponding HMMs (Hidden Markov Models) on F0 parameters. The path through the model’s states is then used to generate sentence prosody. We also tried to utilize morphological information in order to increase prosody naturalness. The evaluation of the quality of the proposed prosodic generators was carried out by means of listening tests.

Zdeněk Chaloupka, Petr Horák

Modeling the Effect of Motion at Encoding and Retrieval for Same and Other Race Face Recognition

We assess the role of motion when encoding and recognizing unfamiliar faces, using a recognition memory paradigm. This reveals a facilitative role for non-rigid motion when learning unfamiliar same and other-race faces, and indicate that it is more important that the face is learned, rather than recognized, in motion. A computational study of the faces using Appearance Models of facial variation, shows that this lack a motion effect at recognition was reproduced by a norm-based encoding of faces, with the selection of features based on distance from the norm.

Hui Fang, Nicholas Costen, Natalie Butcher, Karen Lander

An Audiovisual Feedback System for Pronunciation Tutoring – Mandarin Chinese Learners of German

Computer-assisted pronunciation tutoring (CAPT) methods have been established during the last decade. Recent systems usually include a distinct user feedback and an automatic pronunciation assessment system. This study is based on the audiovisual CAPT system, in which an extensive feedback mechanism and several speech databases for Slavonic learners of German were developed. We intend to adapt the existing system for Chinese learners of German and report on the first usage experiences. We have thus analyzed the deviations of German utterances produced by Chinese learners in comparison to those of German natives, especially in term of prosodic and phonetic issues. We also designed supplementary database and organized perceptual evaluation tests by German native listeners with respect to individual phones as well as to general rhythm and intonation. In this way the language transfer of tonal Chinese can be demonstrated, which is vital to the system adaption for Chinese learners.

Hongwei Ding, Oliver Jokisch, Rüdiger Hoffmann

Si.Co.D.: A Computer Manual for Coding Questions

This contribution aims at presenting a computer manual for coding questions called Si.Co.D. (standing for “Sistema di Codifica delle Domande”, that is Coding System of Questions). The software presents a set of related coding systems that identify questions on the basis of their openness/closeness, threatening, confusing formulation and intonation. The software has the characteristics of interactivity and multimediality, in order to facilitate and help observers’ training for coding questions. It is composed of two sections: one dealing with flow chart and definitions of questions categories, the other one presenting some examples of questions. Peculiarity and advantages of the tool are described, also comparing it with other annotation and sound management software. Possible applications are discussed. Its usefulness is showed by some research applications.

Augusto Gnisci, Enza Graziano, Angiola Di Conza

Rule-Based Morphological Tagger for an Inflectional Language

This paper aims to present an alternative view on the task of morphological tagging - a rule based system with new and simple learning method that uses just basic arithmetic operations to create an efficient knowledge base. Matching process of this rule-based approach follows specific-to-general technique, where rules for more specific contexts are applied whenever they are available in the rule-base. As a consequence, the major accuracy and performance improvements can be achieved by pruning the rule-base.

Daniel Hládek, Ján Staš, Jozef Juhár

Czech Emotional Prosody in the Mirror of Speech Synthesis

Contemporary speech synthesisers still provide a fairly monotonous and tedious output when used for longer Czech texts. One of the ways how to make these texts more lively is the synthesis of emotionally coloured speech. In the present paper we focus on the modelling of real-speech-based emotions in synthetic speech and the subsequent assessment of emotionally coloured utterances in listening tests with the aim of determining the role that individual prosodic parameters play in the identification of each emotion.

Jana Vlčková-Mejvaldová, Petr Horák

Pre-attention Cues for Person Detection

Current state-of-the-art person detectors have been proven reliable and achieve very good detection rates. However, the performance is often far from real time, which limits their use to low resolution images only. In this paper, we deal with candidate window generation problem for person detection, i.e. we want to reduce the computational complexity of a person detector by reducing the number of regions that has to be evaluated. We base our work on Alexe’s paper [1], which introduced several pre-attention cues for generic object detection. We evaluate these cues in the context of person detection and show that their performance degrades rapidly for scenes containing multiple objects of interest such as pictures from urban environment. We extend this set by new cues, which better suits our class-specific task. The cues are designed to be simple and efficient, so that they can be used in the pre-attention phase of a more complex sliding window based person detector.

Karel Paleček, David Gerónimo, Frédéric Lerasle

Comparison of Complementary Spectral Features of Emotional Speech for German, Czech, and Slovak

Our paper is aimed at statistical analysis and comparison of spectral features which complement vocal tract characteristics (spectral centroid, spectral flatness measure, Shannon entropy, Rényi entropy, etc.) in emotional and neutral speech of male and female voice. This experiment was realized using the German speech database EmoDB and the Czech and Slovak speech material extracted from the stories performed by professional actors. Analysis of complementary spectral features (basic and extended statistical parameters and histograms of spectral features distribution) for all three languages confirms that this approach can be used for classification of emotional speech types.

Jiří Přibil, Anna Přibilová

Form-Oriented Annotation for Building a Functionally Independent Dictionary of Synthetic Movement

Non-verbal behavior performed by embodied conversational agents still appears “wooden” and sometimes even “unnatural”. Annotated corpora and high resolution annotations capturing the expressive details of movement, may improve the gradualness of synthetic behavior. This paper presents a non-functional, form-oriented annotation scheme based on informal corpora involving multi-speaker dialogues. This annotation scheme allows annotators to capture the expressive details of movement in high-resolutions. The expressive domains it captures are: spatial domain (movement-pose configuration on the level of articulators), fluidity (translations between movement-phases and phrases), temporal domain (movement variation in the form of movement phases), repetitivity (repetitive features of movement), and power (level of exposure). The presented annotation scheme can transform the encoded data into movement templates that can be directly reproduced by an embodied conversational agent.

Izidor Mlakar, Zdravko Kačič, Matej Rojc

A Cortical Approach Based on Cascaded Bidirectional Hidden Markov Models

Research in the field of neural processing proposes a bidirectional computation scheme among the hierarchical organized levels of the brain. This scheme is called cortical algorithm and can be realized using Cascaded Bidirectional Hidden Markov Models (CBHMMs). In this paper CBHMMs are investigated in the light of analysis-synthesis systems. Such systems are important elements of Cognitive Dynamic Systems and Cognitive User Interfaces. Some of the most salient properties of Cognitive Systems are their abilities to support inference and reasoning, planning under uncertainty and adaptation to changing environmental conditions. That is, beside the bidirectional computation scheme among the hierarchical organized levels, CBHMMs need to support logical operations like inference and reasoning. To integrate this new aspect to the analysis-synthesis framework we pick up an old suggestion from D.M. MacKay from the late 1960s. D.M. MacKay suggested to supplement Shannon’s measure of selective information content by a descriptive information content. Descriptive information in turn is composed of structural and metric information and considers the logical aspect of information.

Ronald Römer

Modeling Users’ Mood State to Improve Human-Machine-Interaction

The detection of user emotions plays an important role in Human-Machine-Interaction. By considering emotions, applications such as monitoring agents or digital companions are able to adapt their reaction towards users’ needs and claims. Besides emotions, personality and moods are eminent as well. Standard emotion recognizers do not consider them adequately and therefore neglect a crucial part of user modeling.

The challenge is to gather reliable predictions about the actual mood of the user and, beyond that, represent changes in users’ mood during interaction. In this paper we present a model that incorporates both the tracking of mood changes based on recognized emotions and different personality traits. Furthermore we present a first evaluation on realistic data.

Ingo Siegert, R. Böck, Andreas Wendemuth

Pitch Synchronous Transform Warping in Voice Conversion

In this paper a new voice conversion algorithm is presented, which transforms the utterance of a source speaker into the utterance of a target speaker. The voice conversion approach is based on pitch synchronous speech analysis, Discrete Cosine Transform (DCT), nonlinear spectral warping with spectrum interpolation and pitch synchronous speech synthesis with overlapping using the speech production model. The DCT speech model contains also information about the phase properties of the modeled speech frame, but is, in contrary to a model based e.g. on the discrete Fourier transform, a real model and can be efficiently used for speech coding and voice conversion. The resulting finite impulse response of the converted DCT speech model is obtained by the inverse DCT and it is of the mixed phase type. The proposed voice conversion procedure results in speech with high naturalness.

Robert Vích, Martin Vondra

ATMap: Annotated Tactile Maps for the Visually Impaired

For the visually impaired, there are various challenges to understand cognitive spatial maps, specifically for the born blind. Although a few existing audio-haptic maps provide possibilities to access geographic data, most of them are hard to offer convenient services through interactive methods. We developed an interactive tactile map system, called ATMap. It allows users to create and share geographic referenced enhancing annotations on a 2D tactile map, in order to obtain more about relevant places beyond static information in GIS database. 5 blind users have been recruited to evaluate the system in a pilot study.

Limin Zeng, Gerhard Weber

Behavioural Issues in Cognitive Systems

From Embodied and Extended Mind to No Mind

The paper will discuss the extended mind thesis with a view to the notions of “agent” and of “mind”, while helping to clarify the relation between “embodiment” and the “extended mind”. I will suggest that the extended mind thesis constitutes a

reductio ad absurdum

of the notion of ‘mind’; the consequence of the extended mind debate should be to drop the notion of the mind altogether – rather than entering the discussion how extended it is.

Vincent C. Müller

Effects of Experience, Training and Expertise on Multisensory Perception: Investigating the Link between Brain and Behavior

The ability to successfully integrate information from different senses is of paramount importance for perceiving the world and has been shown to change with experience. We first review how experience, in particular musical experience, brings about changes in our ability to fuse together sensory information about the world. We next discuss evidence from drumming studies that demonstrate how the perception of audiovisual synchrony depends on experience. These studies show that drummers are more robust than novices to perturbations of the audiovisual signals and appear to use different neural mechanisms in fusing sight and sound. Finally, we examine how experience influences audiovisual speech perception. We present an experiment investigating how perceiving an unfamiliar language influences judgments of temporal synchrony of the audiovisual speech signal. These results highlight the influence of both the listener’s experience with hearing an unfamiliar language as well as the speaker’s experience with producing non-native words.

Scott A. Love, Frank E. Pollick, Karin Petrini

Nonverbal Communication – Signals, Conventions and Incommensurable Explanations

Considering nonverbal communication and its complexity, four problems are addressed which focus on the dynamics of nonverbal communication. (1) How much of the complexity of nonverbal communication is due to the amount of expressions following cultural rules (e. g. conventions) and the expressions of the agents´ states through their signaling systems? (2) Nonverbal behaviour can be regarded as time-varying multi-scaled multimodal configurations of magnitudes. It is natural to ask for scaling laws. (3) Furthermore, the dynamics of the configurations just mentioned is of interest. (4) Why are verbal expressions more conventional than nonverbal ones? The discussion of the four problems suggests that signal-based explanations of nonverbal behaviour and communication are incommensurable with convention-based explanations.

Lutz-Michael Alisch

A Conversation Analytical Study on Multimodal Turn-Giving Cues

End-of-Turn Prediction

The present paper focuses on the systematic study of the sequential organization of verbal as well as nonverbal behavior in spontaneous interaction. The study concerns one of the most universal structural features of conversation, the phenomenon of speaker change, as occurring in forty-four dialogues of the multimodal HuComTech corpus of Hungarian spontaneous speech. The purpose of the paper is twofold: (1) to capture salient communication patterns and organized structures across the conversations, and (2) to make explicit the simultaneously occurring markers and cues of the turn-giving intention of the current speaker based on information coming from different modalities, involving: (a) verbal-acoustic (duration of continuous speech), (b) nonverbal-acoustic (duration of pauses), and (c) nonverbal-visual (gaze direction, hand gestures, posture) information. Performing several SQL queries on the HuComTech database of manually annotated spontaneous dialogues will help us determine the multimodal features of turn-ends in Hungarian. The final goal is to contribute to the development of dialogue management systems with a decision tree distinguishing two basic discourse segments, turn-keep and turn-give.

Ágnes Abuczki

Conversational Involvement and Synchronous Nonverbal Behaviour

Measuring the quality of an interaction by means of low-level cues has been the topic of many studies in the last couple of years. In this study we propose a novel method for conversation-quality-assessment. We first test whether manual ratings of conversational involvement and automatic estimation of synchronisation of facial activity are correlated. We hypothesise that the higher the synchrony the higher the involvement. We compare two different synchronisation measures. The first measure is defined as the similarity of facial activity at a given point in time. The second is based on dependence analyses between the facial activity time series of two interlocutors. We found that dependence measure correlates more with conversational involvement than similarity measure.

Uwe Altmann, Catharine Oertel, Nick Campbell

First Impression in Mark Evaluation: Predictive Ability of the SC-IAT

According to the dual cognition theories, this paper explores the role of emotion and cognition processes of mark evaluations, by analyzing the effect of impulsive and reflective evaluations on the behaviour of approach toward that mark. The study tests the predictive contributions of the Single Category Implicit Association Test in the consumer psychology field, as a tool to detect the perceivers’ first impression. Its ability to discriminate between consumers’ evaluations of an unknown mark is tested according to four dimensions (Harmony, Dynamism, Pleasantness, Simplicity), whose correlation with the visual and graphical features of the mark are also tested. The SC-IAT ability to predict the following approach behaviour is tested together with the contribution of deliberative evaluations. The results indicate that the implicit evaluations affects the following behaviour, together with the explicit ones whose effects are mediated by the intentions. The findings are discussed in the frame of dual cognition models.

Angiola Di Conza, Augusto Gnisci

Motivated Learning in Computational Models of Consciousness

Much work has gone into designing and implementing agents capable of “cognitive” thought. In this paper, we give an overview of a motivated learning model and describe various ways in which we are in the process of implementing the model for simulation purposes. This work presents three different software platforms (Blender, iCub, and NeoAxis) through which an intelligent conscious agent can be implemented. This article presents the concept of a computational model of consciousness as a feature of a cognitive agent and discusses how it might be implemented/simulated.

James Graham, Daniel Jachyra

Are Pointing Gestures Induced by Communicative Intention?

The aim of the paper is to present some ideas and observations on the communicative intentionality of pointing gestures. The material under study consists of twenty ”origami” dialogue tasks, half of them recorded in mutual visibility (MV) and half in lack visibility (LV) condition. Two participants took part in each dialogue sessions: Instructor Giver (IG) and Instructor Follower (IF). The analysis is focused on selected features of pointing gestures as well as on the semantic aspects of the verbal expressions realised concurrently with pointings: semantic content of verbal expressions realised concurrently with pointing gestures, preceding context of the utterances, place of pointing gestures’ realisation in gesture space and spatial perspective of their realisation are taken into consideration as potential cues of communicative intentions

Ewa Jarmolowicz-Nowikow

TV Interview Participant Profiles from a Multimodal Perspective

The study presented in this paper attempts to provide evidence about prominent features in the interactional behavior of TV interview participants that could lead to testable predictions about their communicative profiles. Based on a multimodally annotated interview corpus, we explore the behavior of two groups representing the two basic discursive roles of this domain, namely the interviewers and the interviewees. We describe the profile of the speakers as outlined by their non-verbal activity, in terms of preferred modalities employed as well as the specific goals and functions that they aim to accomplish through the use of non-verbal expressions (gestures, facial expressions, torso movements). From this perspective, we aim to discover possible patterns of non-verbal behavior that are in line with the communicative actions that each role is supposed to perform, as well as significant differences or distinctive features attested in the behavior of each group.

Maria Koutsombogera, Harris Papageorgiou

The Neurophonetic Model of Speech Processing ACT: Structure, Knowledge Acquisition, and Function Modes

Speech production and speech perception are important human capabilities comprising cognitive as well as sensorimotor functions. This paper summarizes our work developing a neurophonetic model for speech processing, called ACT, which was carried out over the last seven years. The function modes of the model are production, perception, and acquisition. The name of our model reflects the fact that vocal tract ACTions, which constitute motor plans of speech items, are the central units in this model. Specifically (i) the


of the model, (ii) the acquired


, and (iii) the correspondence between the model’s structure and specific

brain regions

are discussed.

Bernd J. Kröger, Jim Kannampuzha, Cornelia Eckers, Stefan Heim, Emily Kaufmann, Christiane Neuschaefer-Rube

Coding Hand Gestures: A Reliable Taxonomy and a Multi-media Support

A taxonomy of hand gestures and a digital tool (CodGest) are proposed in order to describe different types of gesture used by speaker during speech in different social contexts. It is an exhaustive and mutually exclusive categories system to be shared within the scientific community to study multimodal signals and their contribute to the interaction. Classical taxonomies from gesture literature were integrated within a comprehensive taxonomy, which was tested in five different social contexts and its reliability was measured across them through inter-observer agreement indexes. A multi-media tool was realized as digital support for coding gestures in observational research.

Fridanna Maricchiolo, Augusto Gnisci, Marino Bonaiuto

Individuality in Communicative Bodily Behaviours

This paper investigates to which extent participants in spontaneously occurring interactions can be recognised automatically from the shape description of their bodily behaviours. For this purpose, we apply classification algorithms to an annotated corpus of Danish dyadic and triadic conversations. The bodily behaviours which we consider are head movement, facial expressions and hand gestures. Although the data used are of limited size, the results of classification are promising especially for hand gestures indicating big variance in people’s bodily behaviours even if the involved participants are a homogeneous group in terms of gender, age and social background. The obtained results are not only interesting from a theoretic point of view, but they can also be relevant for video indexing and searching, computer games and other applications which involve multimodal interaction.

Costanza Navarretta

A Cross-Cultural Study on the Perception of Emotions: How Hungarian Subjects Evaluate American and Italian Emotional Expressions

In the present work a cross-modal evaluation of the visual and auditory channels in conveying emotional information is conducted through perceptual experiments aimed at investigating whether some of the basic emotions are perceptually privileged and whether the perceptual mode, the cultural environment and the language play a role in this preference. To this aim, Hungarian subjects were requested to assess emotional stimuli extracted from Italian and American movies in the single (either mute video or audio alone) and combined audio-video mode. Results showed that among the proposed emotions, anger plays a special role and fear, happiness and sadness are better perceived than surprise and irony in both the cultural environments. The perception of emotions is affected by the communication mode and the language influences the perceptual assessment of emotional information.

Maria Teresa Riviello, Anna Esposito, Klara Vicsi

Affective Computing: A Reverence for a Century of Research

To bring affective computing a leap forward, it is best to start with a step back. A century of research has been conducted on topics, which are crucial for affective computing. Understanding this vast amount of research will accelerate progress on affective computing. Therefore, this article provides an overview of the history of affective computing. The complexity of affect will be described by discussing i) the relation between body and mind, ii) cognitive processes (i.e., attention, memory, and decision making), and iii) affective computing’s I/O. Subsequently, definitions are provided of affect and related constructs (i.e., emotion, mood, interpersonal stances, attitude, and personality traits) and of affective computing. Perhaps when these elements are embraced by the community of affective computing, it will us a step closer in bridging its semantic gap.

Egon L. van den Broek


Weitere Informationen

Premium Partner