nach oben

2006 | Buch

Computer Vision in Human-Computer Interaction

ECCV 2006 Workshop on HCI, Graz, Austria, May 13, 2006. Proceedings

herausgegeben von: Thomas S. Huang, Nicu Sebe, Michael S. Lew, Vladimir Pavlović, Mathias Kölsch, Aphrodite Galata, Branislav Kisačanin

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The interests and goals of HCI (human–computer interaction) include und- standing, designing, building, and evaluating complex interactive systems - volving many people and technologies. Developments in software and hardware technologies are continuously driving applications in supporting our collabo- tive and communicative needs as social beings, both at work and at play. At the same time, similar developments are pushing the human–computer interface beyond the desktop and into our pockets, streets, and buildings. Developments in mobile, wearable, and pervasive communications and computing technologies provide exciting challenges and opportunities for HCI. The present volume represents the proceedings of the HCI 2006 Workshop that was held in conjunction with ECCV 2006 (European Conference on C- puter Vision) in Graz, Austria. The goal of this workshop was to bring together researchers from the ?eld of computer vision whose work is related to human– computer interaction. We solicited original contributions that address a wide range of theoretical and application issues in human–computer interaction. Wewereverypleasedbytheresponseandhadadi?culttaskofselectingonly 11 papers (out of 27 submitted) to be presented at the workshop. The accepted papers were presented in four sessions, as follows: Face Analysis – In their paper “Robust Face Alignment Based On Hierarchical Classi?er Network” authors Li Zhang,Haizhou Ai, and Shihong Lao build a hierarc- cal classi?er network that connects face detection and face alignment into a smooth coarse-to-?ne procedure. Thus a robust face alignment algorithm on face images with expressionand pose changes is introduced. Experiments are reported to show its accuracy and robustness.

Inhaltsverzeichnis

Frontmatter

Computer Vision in Human-Computer Interaction

Robust Face Alignment Based on Hierarchical Classifier Network

Abstract

Robust face alignment is crucial for many face processing applications. As face detection only gives a rough estimation of face region, one important problem is how to align facial shapes starting from this rough estimation, especially on face images with expression and pose changes. We propose a novel method of face alignment by building a hierarchical classifier network, connecting face detection and face alignment into a smooth coarse-to-fine procedure. Classifiers are trained to recognize feature textures in different scales from entire face to local patterns. A multi-layer structure is employed to organize the classifiers, which begins with one classifier at the first layer and gradually refines the localization of feature points by more classifiers in the following layers. A Bayesian framework is configured for the inference of the feature points between the layers. The boosted classifiers detects facial features discriminately from its local neighborhood, while the inference between the layers constrains the searching space. Extensive experiments are reported to show its accuracy and robustness.

Li Zhang, Haizhou Ai, Shihong Lao

EigenExpress Approach in Recognition of Facial Expression Using GPU

Abstract

The automatic recognition of facial expression presents a significant challenge to the pattern analysis and man-machine interaction research community. In this paper, a novel system is proposed to recognize human facial expressions based on the expression sketch. Firstly, facial expression sketch is extracted by an GPU-based real-time edge detection and sharpening algorithm from original gray image. Then, a statistical method, which is called Eigenexpress, is introduced to obtain the expression feature vectors for sketches. Finally, Modified Hausdorff distance(MHD) was used to perform the expression classification. In contrast to performing feature vector extraction from the gray image directly, the sketch based expression recognition reduces the feature vector’s dimension first, which leads to a concise representation of the facial expression. Experiment shows our method is appreciable and convincible.

Qi Wu, Mingli Song, Jiajun Bu, Chun Chen

Face Representation Method Using Pixel-to-Vertex Map (PVM) for 3D Model Based Face Recognition

Abstract

3D model based approach for face recognition has been spotlighted as a robust solution under variant conditions of pose and illumination. Since a generative 3D face model consists of a large number of vertices, a 3D model based face recognition system is generally inefficient in computation time. In this paper, we propose a novel 3D face representation algorithm to reduce the number of vertices and optimize its computation time. Finally, we evaluate the performance of proposed algorithm with the Korean face database collected using a stereo-camera based 3D face capturing device.

Taehwa Hong, Hagbae Kim, Hyeonjoon Moon, Yongguk Kim, Jongweon Lee, Seungbin Moon

Robust Head Tracking with Particles Based on Multiple Cues Fusion

Abstract

This paper presents a fully automatic and highly robust head tracking algorithm based on the latest advances in real-time multi-view face detection techniques and multiple cues fusion under particle filter framework. Visual cues designed for general object tracking problem hardly suffice for robust head tracking under diverse or even severe circumstances, making it a necessity to utilize higher level information which is object-specific. To this end we introduce a vector-boosted multi-view face detector [2] as the “face cue” in addition to two other general visual cues targeting the entire head, color spatiogram[3] and contour gradient. Data fusion is done by an extended particle filter which supports multiple distinct yet interrelated state vectors (referring to face and head in our tracking context). Furthermore, pose information provided by the face cue is exploited to help achieve improved accuracy and efficiency in the fusion. Experiments show that our algorithm is highly robust against target position, size and pose change as well as unfavorable conditions such as occlusion, poor illumination and cluttered background.

Yuan Li, Haizhou Ai, Chang Huang, Shihong Lao

Vision-Based Interpretation of Hand Gestures for Remote Control of a Computer Mouse

Abstract

This paper presents a vision-based interface for controlling a computer mouse via 2D and 3D hand gestures. The proposed interface builds upon our previous work that permits the detection and tracking of multiple hands that can move freely in the field of view of a potentially moving camera system. Dependable hand tracking, combined with fingertip detection, facilitates the definition of simple and, therefore, robustly interpretable vocabularies of hand gestures that are subsequently used to enable a human operator convey control information to a computer system. Two such vocabularies are defined, implemented and validated. The first one depends only on 2D hand tracking results while the second also makes use of 3D information. As confirmed by several experiments, the proposed interface achieves accurate mouse positioning, smooth cursor movement and reliable recognition of gestures activating button events. Owing to these properties, our interface can be used as a virtual mouse for controlling any Windows application.

Antonis A. Argyros, Manolis I. A. Lourakis

Computing Emotion Awareness Through Facial Electromyography

Abstract

To improve human-computer interaction (HCI), computers need to recognize and respond properly to their user’s emotional state. This is a fundamental application of affective computing, which relates to, arises from, or deliberately influences emotion. As a first step to a system that recognizes emotions of individual users, this research focuses on how emotional experiences are expressed in six parameters (i.e., mean, absolute deviation, standard deviation, variance, skewness, and kurtosis) of physiological measurements of three electromyography signals: frontalis (EMG1), corrugator supercilii (EMG2), and zygomaticus major (EMG3). The 24 participants were asked to watch film scenes of 120 seconds, which they rated afterward. These ratings enabled us to distinguish four categories of emotions: negative, positive, mixed, and neutral. The skewness of the EMG2 and four parameters of EMG3, discriminate between the four emotion categories. This, despite the coarse time windows that were used. Moreover, rapid processing of the signals proved to be possible. This enables tailored HCI facilitated by an emotional awareness of systems.

Egon L. van den Broek, Marleen H. Schut, Joyce H. D. M. Westerink, Jan van Herk, Kees Tuinenbreijer

Silhouette-Based Method for Object Classification and Human Action Recognition in Video

Abstract

In this paper we present an instance based machine learning algorithm and system for real-time object classification and human action recognition which can help to build intelligent surveillance systems. The proposed method makes use of object silhouettes to classify objects and actions of humans present in a scene monitored by a stationary camera. An adaptive background subtract-tion model is used for object segmentation. Template matching based supervised learning method is adopted to classify objects into classes like human, human group and vehicle; and human actions into predefined classes like walking, boxing and kicking by making use of object silhouettes.

Yiğithan Dedeoğlu, B. Uğur Töreyin, Uğur Güdükbay, A. Enis Çetin

Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm

Abstract

This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-gate zero cross ratio, and classify noise and voice by Support Vector Machines (SVM). New speech mixing algorithm used in Multipoint Control Unit (MCU) of conferences imposes short-time power of each audio stream as mixing weight vector, and is designed for parallel processing in program. Various experiments show, proposed VAD algorithm achieves overall better performance in all SNRs than VAD of G.729b and other VAD, output audio of new speech mixing algorithm has excellent hearing perceptibility, and its computational time delay are small enough to satisfy the needs of real-time transmission, and MCU computation is lower than that based on G.729b VAD.

Wei Xue, Sidan Du, Chengzhi Fang, Yingxian Ye

Action Recognition in Broadcast Tennis Video Using Optical Flow and Support Vector Machine

Abstract

Motion analysis in broadcast sports video is a challenging problem especially for player action recognition due to the low resolution of players in the frames. In this paper, we present a novel approach to recognize the basic player actions in broadcast tennis video where the player is about 30 pixels tall. Two research challenges, motion representation and action recognition, are addressed. A new motion descriptor, which is a group of histograms based on optical flow, is proposed for motion representation. The optical flow here is treated as spatial pattern of noisy measurement instead of precise pixel displacement. To recognize the action performed by the player, support vector machine is employed to train the classifier where the concatenation of histograms is formed as the input features. Experimental results demonstrate that our method is promising by integrating with the framework of multimodal analysis in sports video.

Guangyu Zhu, Changsheng Xu, Wen Gao, Qingming Huang

FaceMouse: A Human-Computer Interface for Tetraplegic People

Abstract

This paper proposes a new human-machine interface particularly conceived for people with severe disabilities (specifically tetraplegic people), that allows them to interact with the computer for their everyday life by means of mouse pointer. In this system, called FaceMouse, instead of classical "pointer paradigm" that requires the user to look at the point where to move, we propose to use a paradigm called "derivative paradigm", where the user does not indicate the precise position, but the direction along which the mouse pointer must be moved. The proposed system is composed of a common, low-cost webcam, and by a set of computer vision techniques developed to identify the parts of the user’s face (the only body part that a tetraplegic person can move) and exploit them for moving the pointer. Specifically, the implemented algorithm is based on template matching to track the nose of the user and on cross-correlation to calculate the best match. Finally, several real applications of the system are described and experimental results carried out by disabled people are reported.

Emanuele Perini, Simone Soria, Andrea Prati, Rita Cucchiara

Object Retrieval by Query with Sensibility Based on the KANSEI-Vocabulary Scale

Abstract

Recently the demand for image retrieval and recognizable extraction corresponding to KANSEI (sensibility) has been increasing, and the studies focused on establishing those KANSEI-based systems have been progressing more than ever. In addition, the attempt to understand, measure and evaluate, and apply KANSEI to situational design or products will be required more and more in the future. Particularly, study of KANSEI-based image retrieval tools have especially been in the spotlight. So many investigators give a trial of using KANSEI for image retrieval. However, the research in this area is still under its primary stage because it is difficult to process higher-level contents as emotion or KANSEI of human. To solve this problem, we suggest the KANSEI-Vocabulary Scale by associating human sensibilities with shapes among visual information. And we construct the object retrieval system for evaluation of KANSEI-Vocabulary Scale by shape. In our evaluation results, we are able to retrieve object images with the most appropriate shape in term of the query’s KANSEI. Furthermore, the method achieves an average rate of 71% user’s satisfaction.

Sunkyoung Baek, Myunggwon Hwang, Miyoung Cho, Chang Choi, Pankoo Kim

Backmatter

Titel: Computer Vision in Human-Computer Interaction
herausgegeben von: Thomas S. Huang
Nicu Sebe
Michael S. Lew
Vladimir Pavlović
Mathias Kölsch
Aphrodite Galata
Branislav Kisačanin
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-34203-8
Print ISBN: 978-3-540-34202-1
DOI: https://doi.org/10.1007/11754336