Skip to main content

Über dieses Buch

This book constitutes the refereed proceedings of the 7th International Workshop on Human Behavior Understanding, HBU 2016, held in Amsterdam, The Netherlands, in October 2016.

The 10 full papers were carefully reviewed and selected from 17 initial submissions. They are organized in topical sections named: behavior analysis during play; daily behaviors; gesture and movement analysis; and vision based applications.



Behavior Analysis During Play


EmoGame: Towards a Self-Rewarding Methodology for Capturing Children Faces in an Engaging Context

Facial expression datasets are currently limited as most of them only capture the emotional expressions of adults. Researchers have begun to assert the importance of having child exemplars of the various emotional expressions in order to study the interpretation of these expressions developmentally. Capturing children expression is more complicated as the protocols used for eliciting and recording expressions for adults are not necessarily adequate for children. This paper describes the creation of a flexible Emotional Game for capturing children faces in an engaging context. The game is inspired by the well-known Guitar HeroTM gameplay, but instead of playing notes, the player should produce series of expressions. In the current work, we measure the capacity of the game to engage the children and we discuss the requirements in terms of expression recognition needed to ensure a viable gameplay. The preliminary experiments conducted with a group of 12 children with ages between 7 and 11 in various settings and social contexts show high levels of engagement and positive feedback.
Benjamin Allaert, José Mennesson, Ioan Marius Bilasco

Assessing Affective Dimensions of Play in Psychodynamic Child Psychotherapy via Text Analysis

Assessment of emotional expressions of young children during clinical work is an important, yet arduous task. Especially in natural play scenarios, there are not many constraints on the behavior of the children, and the expression palette is rich. There are many approaches developed for the automatic analysis of affect, particularly from facial expressions, paralinguistic features of the voice, as well as from the myriads of non-verbal signals emitted during interactions. In this work, we describe a tool that analyzes verbal interactions of children during play therapy. Our approach uses natural language processing techniques and tailors a generic affect analysis framework to the psychotherapy domain, automatically annotating spoken sentences on valence and arousal dimensions. We work with Turkish texts, for which there are far less natural language processing resources than English, and our approach illustrates how to rapidly develop such a system for non-English languages. We evaluate our approach with longitudinal psychotherapy data, collected and annotated over a one year period, and show that our system produces good results in line with professional clinicians’ assessments.
Sibel Halfon, Eda Aydın Oktay, Albert Ali Salah

Multimodal Detection of Engagement in Groups of Children Using Rank Learning

In collaborative play, children exhibit different levels of engagement. Some children are engaged with other children while some play alone. In this study, we investigated multimodal detection of individual levels of engagement using a ranking method and non-verbal features: turn-taking and body movement. Firstly, we automatically extracted turn-taking and body movement features in naturalistic and challenging settings. Secondly, we used an ordinal annotation scheme and employed a ranking method considering the great heterogeneity and temporal dynamics of engagement that exist in interactions. We showed that levels of engagement can be characterised by relative levels between children. In particular, a ranking method, Ranking SVM, outperformed a conventional method, SVM classification. While either turn-taking or body movement features alone did not achieve promising results, combining the two features yielded significant error reduction, showing their complementary power.
Jaebok Kim, Khiet P. Truong, Vicky Charisi, Cristina Zaga, Vanessa Evers, Mohamed Chetouani

Daily Behaviors


Anomaly Detection in Elderly Daily Behavior in Ambient Sensing Environments

Current ubiquitous computing applications for smart homes aim to enhance people’s daily living respecting age span. Among the target groups of people, elderly are a population eager for “choices for living arrangements”, which would allow them to continue living in their homes but at the same time provide the health care they need. Given the growing elderly population, there is a need for statistical models able to capture the recurring patterns of daily activity life and reason based on this information. We present an analysis of real-life sensor data collected from 40 different households of elderly people, using motion, door and pressure sensors. Our objective is to automatically observe and model the daily behavior of the elderly and detect anomalies that could occur in the sensor data. For this purpose, we first introduce an abstraction layer to create a common ground for home sensor configurations. Next, we build a probabilistic spatio-temporal model to summarize daily behavior. Anomalies are then defined as significant changes from the learned behavioral model and detected using a cross-entropy measure. We have compared the detected anomalies with manually collected annotations and the results show that the presented approach is able to detect significant behavioral changes of the elderly.
Oya Aran, Dairazalia Sanchez-Cortes, Minh-Tri Do, Daniel Gatica-Perez

Human Behavior Analysis from Smartphone Data Streams

In the past decade multimedia systems have started including diverse modes of data to understand complex situations and build more sophisticated models. Some increasingly common modes in multimedia are intertwined data streams from sensor modalities such as wearable/mobile, environmental, and biosensors. These data streams offer new information sources to model and predict complex world situations as well as understanding and modeling humans. This paper makes two contributions to the modeling and analysis of multimodal data in the context of user behavior analysis. First, it introduces the use of a concept lattice based data fusion technique for recognizing events. Concept lattices are very effective when enough labeled data samples are not available for supervised machine learning algorithms, but human knowledge is available to develop classification approaches for recognition. Life events encode activities of daily living, and environmental events encode states and state transitions in environmental variables. Second, it introduces a framework that detects frequent co-occurrence patterns as sequential and parallel relations among events from multiple event streams. We show the applicability of our approach in finding interesting human behavior patterns by using longitudinal mobile data collected from 23 users over 1–2 months.
Laleh Jalali, Hyungik Oh, Ramin Moazeni, Ramesh Jain

Gesture and Movement Analysis


Sign Language Recognition for Assisting the Deaf in Hospitals

In this study, a real-time, computer vision based sign language recognition system aimed at aiding hearing impaired users in a hospital setting has been developed. By directing them through a tree of questions, the system allows the user to state their purpose of visit by answering between four to six questions. The deaf user can use sign language to communicate with the system, which provides a written transcript of the exchange. A database collected from six users was used for the experiments. User independent tests without using the tree-based interaction scheme yield a 96.67 % accuracy among 1257 sign samples belonging to 33 sign classes. The experiments evaluated the effectiveness of the system in terms of feature selection and spatio-temporal modelling. The combination of hand position and movement features modelled by Temporal Templates and classified by Random Decision Forests yielded the best results. The tree-based interaction scheme further increased the recognition performance to more than 97.88 %.
Necati Cihan Camgöz, Ahmet Alp Kındıroğlu, Lale Akarun

Using the Audio Respiration Signal for Multimodal Discrimination of Expressive Movement Qualities

In this paper we propose a multimodal approach to distinguish between movements displaying three different expressive qualities: fluid, fragmented, and impulsive movements. Our approach is based on the Event Synchronization algorithm, which is applied to compute the amount of synchronization between two low-level features extracted from multimodal data. In more details, we use the energy of the audio respiration signal captured by a standard microphone placed near to the mouth, and the whole body kinetic energy estimated from motion capture data. The method was evaluated on 90 movement segments performed by 5 dancers. Results show that fragmented movements display higher average synchronization than fluid and impulsive movements.
Vincenzo Lussu, Radoslaw Niewiadomski, Gualtiero Volpe, Antonio Camurri

Spatio-Temporal Detection of Fine-Grained Dyadic Human Interactions

We introduce a novel spatio-temporal deformable part model for offline detection of fine-grained interactions in video. One novelty of the model is that part detectors model the interacting individuals in a single graph that can contain different combinations of feature descriptors. This allows us to use both body pose and movement to model the coordination between two people in space and time. We evaluate the performance of our approach on novel and existing interaction datasets. When testing only on the target class, we achieve mean average precision scores of 0.82. When presented with distractor classes, the additional modelling of the motion of specific body parts significantly reduces the number of confusions. Cross-dataset tests demonstrate that our trained models generalize well to other settings.
Coert van Gemeren, Ronald Poppe, Remco C. Veltkamp

Vision Based Applications


Convoy Detection in Crowded Surveillance Videos

This paper proposes detection of convoys in a crowded surveillance video. A convoy is defined as a group of pedestrians who are moving or standing together for a certain period of time. To detect such convoys, we firstly address pedestrian detection in a crowded scene, where small regions of pedestrians and their strong occlusions render usual object detection methods ineffective. Thus, we develop a method that detects pedestrian regions by clustering feature points based on their spatial characteristics. Then, positional transitions of pedestrian regions are analysed by our convoy detection method that consists of the clustering and intersection processes. The former finds groups of pedestrians in one frame by flexibly handling their relative spatial positions, and the latter refines groups into convoys by considering their temporal consistences over multiple frames. The experimental results on a challenging dataset shows the effectiveness of our convoy detection method.
Zeyd Boukhers, Yicong Wang, Kimiaki Shirahama, Kuniaki Uehara, Marcin Grzegorzek

First Impressions - Predicting User Personality from Twitter Profile Images

This paper proposes a computer vision based pipeline for inferring the perceived personality of users from their Twitter profile images. We humans make impressions on a daily basis during communication. The perception of personality of a person gives information about the person’s behaviour and is an important attribute in developing rapport. The personality assessment in this paper is referred to as first impressions, which is similar to how humans create a mental image of another person by just looking at their profile pictures. In the proposed automated pipeline, hand crafted (engineered) and learnt feature descriptors are computed on user profile images. The effect of image background is assessed on the perception of the personality from a profile picture. A multivariate regression approach is used to predict the big five personality traits - agreeableness, conscientiousness, extraversion, openness and neuroticism. We study the correlation between the big five personality traits generated from Tweet analysis with the proposed profile image based framework. The experiments show high correlation for scene based first impressions perception. It is interesting to note that the results generated by analysing a profile image uploaded by a user in a particular point in time are in sync with the first impression traits generated by investigating Tweets posted over a longer duration of time.
Abhinav Dhall, Jesse Hoey


Weitere Informationen

Premium Partner