Top

2010 | Book

Read chapter Read first chapter

Intelligent Video Event Analysis and Understanding

Editors: Jianguo Zhang, Ling Shao, Lei Zhang, Graeme A. Jones

Publisher: Springer Berlin Heidelberg

Book Series : Studies in Computational Intelligence

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

With the vast development of Internet capacity and speed, as well as wide adop- tion of media technologies in people’s daily life, a large amount of videos have been surging, and need to be efficiently processed or organized based on interest. The human visual perception system could, without difficulty, interpret and r- ognize thousands of events in videos, despite high level of video object clutters, different types of scene context, variability of motion scales, appearance changes, occlusions and object interactions. For a computer vision system, it has been be very challenging to achieve automatic video event understanding for decades. Broadly speaking, those challenges include robust detection of events under - tion clutters, event interpretation under complex scenes, multi-level semantic event inference, putting events in context and multiple cameras, event inference from object interactions, etc. In recent years, steady progress has been made towards better models for video event categorisation and recognition, e. g. , from modelling events with bag of spatial temporal features to discovering event context, from detecting events using a single camera to inferring events through a distributed camera network, and from low-level event feature extraction and description to high-level semantic event classification and recognition. Nowadays, text based video retrieval is widely used by commercial search engines. However, it is still very difficult to retrieve or categorise a specific video segment based on their content in a real multimedia system or in surveillance applications.

Frontmatter

The Understanding of Meaningful Events in Gesture-Based Interaction

Abstract

Gesture-based interaction is becoming more and more available each day with the continuous advances and developments in acquisition technology and recognition algorithms as well as with the increasing availability of personal (mobile) devices, ambient media displays and interactive surfaces. Vision-based technology is the preferred choice when non-intrusiveness, unobtrusiveness and comfortable interactions are being sought. However, it also comes with the additional costs of difficult (unknown) scenarios to process and far-than-perfect recognition rates. The main challenge is represented by spotting and segmenting gestures in video media. Previous research has considered various events that specify when a gesture begins and when it ends in conjunction with location, time, motion, posture or various other segmentation cues. Therefore, video events identify, specify and segment gestures. Even more, when gestures are being correctly detected and recognized by the system with the appropriate feedback delivered to the human, the result is that gestures become themselves events in the human-computer dialogue: the commands were understood and the system reacted back.

This chapter addresses the double view of meaningful events: events that specify gestures together with intelligent algorithms that detect them in video sequences; gestures, that once recognized and accordingly interpreted by the system, become important events in the human-computer dialogue specifying the common understanding that was established. The chapter follows the duality aspect of events from the system as well as the human perspective contributing to the present understanding of gestures in human-computer interaction.

Radu-Daniel Vatavu

Apply GPCA to Motion Segmentation

Abstract

In this paper, we present a motion segmentation approach based on the subspace segmentation technique, the generalized PCA. By incorporating the cues from the neighborhood of intensity edges of images, motion segmentation is solved under an algebra framework. Our main contribution is to propose a post-processing procedure, which can detect the boundaries of motion layers and further determine the layer ordering. Test results on real imagery have confirmed the validity of our method.

Hongchuan Yu, Jian J. Zhang

Gait Analysis and Human Motion Tracking

Abstract

We present a strategy based on human gait to achieve efficient tracking, recovery of ego-motion and 3-D reconstruction from an image sequence acquired by a single camera attached to a pedestrian. In the first phase, the parameters of the human gait are established by a classical frame-by-frame analysis, using an generalised least squares (GLS) technique. The gait model is non-linear, represented by a truncated Fourier series. In the second phase, this gait model is employed within a “predict-correct” framework using a maximum a posteriori, expectation maximization (MAP-EM) strategy to obtain robust estimates of the ego-motion and scene structure, while continuously refining the gait model. Experiments on synthetic and real image sequences show that the use of the gait model results in more efficient tracking. This is demonstrated by improved matching and retention of features, and a reduction in execution time, when processing video sequences.

Huiyu Zhou

Spatio-temporal Dynamic Texture Descriptors for Human Motion Recognition

Abstract

In this chapter we apply the Local Binary Pattern on Three Orthogonal Planes (LBP-TOP) descriptor to the field of human action recognition. We modified this spatio-temporal descriptor using LBP and CS-LBP techniques combined with gradient and Gabor images. Moreover, we enhanced its performaces by performing the analysis on more slices located at different time intevals or at different views. A video sequence is described as a collection of spatial-temporal words after the detection of space-time interest points and the description of the area around them. Our contribution has been in the description part, showing LBP-TOP to be 1) a promising descriptor for human action classification purposes and 2) we have developed several modifications and extensions to the descriptor in order to enhance its performance in human motion recognition, showing the method to be computationally efficient.

Riccardo Mattivi, Ling Shao

Efficient Object Localization with Variation-Normalized Gaussianized Vectors

Abstract

Effective object localization relies on efficient and effective searching method, and robust image representation and learning method. Recently, the Gaussianized vector representation has been shown effective in several computer vision applications, such as facial age estimation, image scene categorization and video event recognition. However, all these tasks are classification and regression problems based on the whole images. It is not yet explored how this representation can be efficiently applied in the object localization, which reveals the locations and sizes of the objects. In this work, we present an efficient object localization approach for the Gaussianized vector representation, following a branch-and-bound search scheme introduced by Lampert et al. [5]. In particular, we design a quality bound for rectangle sets characterized by the Gaussianized vector representation for fast hierarchical search. This bound can be obtained for any rectangle set in the image, with little extra computational cost, in addition to calculating the Gaussianized vector representation for the whole image. Further, we propose incorporating a normalization approach that suppresses the variation within the object class and the background class. Experiments on a multi-scale car dataset show that the proposed object localization approach based on the Gaussianized vector representation outperforms previous work using the histogram-of-keywords representation. The within-class variation normalization approach further boosts the performance. This chapter is an extended version of our paper at the 1st International Workshop on Interactive Multimedia for Consumer Electronics at ACM Multimedia 2009 [16].

Xiaodan Zhuang, Xi Zhou, Mark A. Hasegawa-Johnson, Thomas S. Huang

Fusion of Motion and Appearance for Robust People Detection in Cluttered Scenes

Abstract

Robust detection of people in video is critical in visual surveillance. In this work we present a framework for robust people detection in highly cluttered scenes with low resolution image sequences. Our model utilises both human appearance and their long-term motion information through a fusion formulated in a Bayesian framework. In particular, we introduce a spatial pyramid Gaussian Mixture approach to model variations of long-term human motion information, which is computed via an improved background modeling using spatial motion constrains. Simultaneously, people appearance is modeled by histograms of oriented gradients. Experiments demonstrate that our method reduces significantly false positive rate compared to that of a state of the art human detector under very challenging lighting condition, occlusion and background clutter.

Jianguo Zhang, Shaogang Gong

Understanding Sports Video Using Players Trajectories

Abstract

One of the main goal for novel machine learning and computer vision systems is to perform automatic video event understanding. In this chapter, we present a content-based approach for understanding sports videos using players trajectories. To this aim, an object-based approach for temporal analysis of videos is described. An original hierarchical parallel semi-Markov model (HPaSMM) is proposed. In this latter, a lower level is used to model players trajectories motions and interactions using parallel hidden Markov models, while an upper level relying on semi-Markov chains is considered to describe activity phases. Such probabilistic graphical models help taking into account low level temporal causalities of trajectories features as well as upper level temporal transitions between activity phases. Hence, it provides an efficient and extensible machine learning tool for applications of sports video semantic-based understanding such that segmentation, summarization and indexing. To illustrate the efficiency of the proposed modeling, application of the novel modeling to two sports, and the corresponding results, are reported.

Alexandre Hervieu, Patrick Bouthemy

Real-Time Face Recognition from Surveillance Video

Abstract

This chapter describes an experimental system for the recognition of human faces from surveillance video. In surveillance applications, the system must be robust to changes in illumination, scale, pose and expression. The system must also be able to perform detection and recognition rapidly in real time.

Our system detects faces using the Viola-Jones face detector, then extracts local features to build a shape-based feature vector. The feature vector is constructed from ratios of lengths and differences in tangents of angles, so as to be robust to changes in scale and rotations in-plane and out-of-plane. Consideration was given to improving the performance and accuracy of both the detection and recognition steps.

Michael Davis, Stefan Popov, Cristina Surlea

Event Understanding of Human-Object Interaction: Object Movement Detection via Stable Changes

Abstract

This chapter proposes an object movement detection method in household environments. The proposed method detects “object placement” and “object removal” via images captured by environment-embedded cameras. When object movement detection is performed in household environments, there are several difficulties: the method needs to detect object movements robustly even if sizes of objects are small, the method must discriminate objects and non-objects such as humans. In this work, we propose an object movement detection method by detecting “stable changes”, which are changing from the recorded state but which change are settled. To categorize objects and non-objects via the stable changes even though non-objects make long-term changes (e.g. a person is sitting down), we employ motion history of changed regions. In addition, to classify object placement and object removal, we use multiple-layered background model, called the layered background model and edge subtraction technique. The experiment shows the system can detect objects robustly and in sufficient frame-rates.

Shigeyuki Odashima, Taketoshi Mori, Masamichi Simosaka, Hiroshi Noguchi, Tomomasa Sato

Survey of Dirac: A Wavelet Based Video Codec for Multiparty Video Conferencing and Broadcasting

Abstract

The basic aim of this book chapter is to provide a survey on BBC Dirac Video Codec. That survey, would not only provide the in depth description of different version of Dirac Video Codec but would also explain the algorithmic explanation of Dirac at implementation level. This chapter would not only provide help to new researchers who are working to understand BBC Dirac video codec but also provide them future directions and ideas to enhance features of BBC Dirac video codec.

Compression is significantly important because of being bandwidth limited or expensive for widespread use of multimedia contents over the internet. Compression takes the advantage in limitation of human perception due to which it is not able to process all the information of perfectly reproduced pictures. We can compress the pictures without the loss of perceived picture quality. Compression is used to exploit the limited storage and transmission capacity as efficiently as possible.

The need of efficient codec’s has gained significant attraction amongst researchers. The applications of codec’s range from compressing high resolution files, broadcasting, live video streaming, pod casting, and desktop production. Depending on the type of application, the requirements of the codec’s change.

Ahtsham Ali, Nadeem A. Khan, Shahid Masud, Syed Farooq Ali

Erratum: Event Understanding of Human-Object Interaction: Object Movement Detection via Stable Changes

Abstract

The original version of this chapter unfortunately contained a mistake. The spelling of the author’s name was incorrect. The corrected name is Masamichi Shimosaka.

Shigeyuki Odashima, Taketoshi Mori, Masamichi Simosaka, Hiroshi Noguchi, Tomomasa Sato

Backmatter

Title: Intelligent Video Event Analysis and Understanding
Editors: Jianguo Zhang
Ling Shao
Lei Zhang
Graeme A. Jones
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-17554-1
Print ISBN: 978-3-642-17553-4
DOI: https://doi.org/10.1007/978-3-642-17554-1

Springer Professional

Intelligent Video Event Analysis and Understanding

About this book

Table of Contents

Frontmatter

The Understanding of Meaningful Events in Gesture-Based Interaction

Apply GPCA to Motion Segmentation

Gait Analysis and Human Motion Tracking

Spatio-temporal Dynamic Texture Descriptors for Human Motion Recognition

Efficient Object Localization with Variation-Normalized Gaussianized Vectors

Fusion of Motion and Appearance for Robust People Detection in Cluttered Scenes

Understanding Sports Video Using Players Trajectories

Real-Time Face Recognition from Surveillance Video

Event Understanding of Human-Object Interaction: Object Movement Detection via Stable Changes

Survey of Dirac: A Wavelet Based Video Codec for Multiparty Video Conferencing and Broadcasting

Erratum: Event Understanding of Human-Object Interaction: Object Movement Detection via Stable Changes

Backmatter

Premium Partner