Skip to main content

2012 | Buch

Video Analytics for Business Intelligence

herausgegeben von: Caifeng Shan, Fatih Porikli, Tao Xiang, Shaogang Gong

Verlag: Springer Berlin Heidelberg

Buchreihe : Studies in Computational Intelligence

insite
SUCHEN

Über dieses Buch

Closed Circuit TeleVision (CCTV) cameras have been increasingly deployed pervasively in public spaces including retail centres and shopping malls. Intelligent video analytics aims to automatically analyze content of massive amount of public space video data and has been one of the most active areas of computer vision research in the last two decades. Current focus of video analytics research has been largely on detecting alarm events and abnormal behaviours for public safety and security applications. However, increasingly CCTV installations have also been exploited for gathering and analyzing business intelligence information, in order to enhance marketing and operational efficiency. For example, in retail environments, surveillance cameras can be utilised to collect statistical information about shopping behaviour and preference for marketing (e.g., how many people entered a shop; how many females/males or which age groups of people showed interests to a particular product; how long did they stay in the shop; and what are the frequent paths), and to measure operational efficiency for improving customer experience. Video analytics has the enormous potential for non-security oriented commercial applications. This book presents the latest developments on video analytics for business intelligence applications. It provides both academic and commercial practitioners an understanding of the state-of-the-art and a resource for potential applications and successful practice.

Inhaltsverzeichnis

Frontmatter

Computational Vision

Frontmatter
Object Detection and Tracking
Abstract
Detecting and tracking objects are among the most prevalent and challenging tasks that a surveillance system has to accomplish in order to determine meaningful events and suspicious activities, and automatically annotate and retrieve video content. Under the business intelligence notion, an object can be a face, a head, a human, a queue of people, a crowd as well as a product on an assembly line. In this chapter we introduce the reader to main trends and provide taxonomy of popular methods to give an insight to underlying ideas as well as to show their limitations in the hopes of facilitating integration of object detection and tracking for more effective business oriented video analytics.
Fatih Porikli, Alper Yilmaz
Auto-calibration of Non-overlapping Multi-camera CCTV Systems
Abstract
Deployment of existing vision approaches in camera networks for applications such as human tracking show a large gap between user expectation and current results. Calibrated cameras could push these approaches closer to applicability, as physical constraints greatly complement the ill-posed acquisition process. Calibrated cameras promise also new applications as spatial relationships among cameras and the environment capture additional information. However, a convenient calibration is still a challenge on its own. This paper presents a novel calibration framework for large networks including non-overlapping cameras. The framework purely relies on visual information coming from walking people. Since non-overlapping scenarios make point correspondences impossible, time constancy of a person’s motion introduces the missing complementary information. The framework obtains calibrated cameras starting from single camera calibration thereby bringing the problem to a reduced form suitable for multi-view calibration. It extends the standard bundle adjustment by a smoothness constraint to avoid the ill-posed problem arising from missing point correspondences. The stratified optimization suppresses the danger to get stuck in local minima. Experiments with synthetic and real data validate the approach.
Cristina Picus, Roman Pflugfelder, Branislav Micusik
Fast Approximate Nearest Neighbor Methods for Example-Based Video Search
Abstract
The cost of computer storage is steadily decreasing. Many terabytes of video data can be easily collected using video cameras in public places for modern surveillance applications, or stored on video sharing websites. However, the growth in CPU speeds has recently slowed to a crawl. This situation implies that while the data is being collected, it cannot be cheaply processed in time. Searching such vast collections of video data for useful information requires radically different approaches, calling for algorithms with sub-linear time complexity.
One application of a search in a large data set is query-by-example. A video clip is used as a query for an algorithm to find a set of similar clips from the collection. A naive solution to such problem would specify some sort of a similarity metric and exhaustively compute this similarity between the query and all other video clips in the collection. Then the clips with the highest similarity values can be returned as the answer-set. However, as the number of the videos in the collection grows, such computation becomes prohibitively expensive. In order to show sub-linear growth any large-scale algorithm needs to exploit some properties of the data that does away with the need to compute explicit distances between a query point and any other point in the set. To this end, Approximate Nearest Neighbor methods have recently become popular. These algorithms provide a trade-off between the accuracy of finding nearest-neighbors and the corresponding computational complexity. As a result, searches in very large datasets can be performed very quickly albeit at the cost of very few incorrect matches.
Most of the recent work in developing ANN methods has been done for data points that lie in a Euclidean space. However, several applications in computer vision such as object and human activity recognition use non-Euclidean data. State-of-the-art Euclidean ANN methods do not perform well when applied to these datasets. In this chapter, we present algorithms for performing ANN on manifolds 1) by explicitly considering the Riemannian geometry of the non-Euclidean manifold and 2) by taking advantage of the kernel trick in non-Euclidean spaces where performing Riemannian computations is expensive. For a data set with N samples, the proposed methods are able to retrieve similar objects in as low as O(K) time complexity, where K ≪ N. We test and evaluate our methods on both synthetic and real datasets and get better than state-of-the-art results.
Rizwan Chaudhry, Yuri Ivanov

Demographics

Frontmatter
Human Age Estimation and Sex Classification
Abstract
Collecting demographic information from the customers, such as age and sex, is very important for marketing and customer group analysis. For instance, the marketing study has an interest to know how many people visited a shopping mall, and what is the distribution of the customers, such as how many males and females; how many young, adult, and senior people. Instead of hiring human workers to observe the customers, a computational system might be developed to analyze people who appeared in images and videos captured by cameras installed in a shopping mall, and then gather the demographic information. To develop a real system for age estimation and sex classification, many essential issues have to be addressed. In this chapter, a detailed introduction of the computational approaches to human age estimation and sex classification will be given. Various methods for feature extraction and learning will be described. Major challenges and future research directions will also be discussed. The goal is to inspire new research and encourage deeper investigation towards developing a working system for business intelligence.
Guodong Guo
People Counter: Counting of Mostly Static People in Indoor Conditions
Abstract
The ability to count people from video is a challenging problem. The scientific challenge arises from the fact that although the task is relatively well-defined, the imaging scenario is not well constrained. The background scene can be uncontrolled along with the illumination being complex and varying. Additionally, the spatial and temporal image resolution is usually poor. The context of most works in people counting is in counting pedestrians from single frames in outdoor settings or moving subjects in indoor settings from standard frame rate video. There is little work done on counting of persons in varying poses, who are mostly static (sitting, lying down), in very low frame rate video (4 frames per minute), and under harsh illumination variations. In this chapter, we explore a design that handles illumination issues at the pixel level using photometry-based normalization, and pose and low-movement issues at feature level by exploiting the spatio-temporal coherence that is present among small body part movements. The motion of each body part, such as the hands or the head, will be present even in mostly static poses. These short duration motions will occur spatially close together over the image location occupied by the subject. We accumulate these using a spatio-temporal autoregressive (AR) model to arrive at blob representations that are further grouped into people counts. We show quantitative performance on real datasets.
Amit Khemlani, Kester Duncan, Sudeep Sarkar
Scene Invariant Crowd Counting and Crowd Occupancy Analysis
Abstract
In public places, crowd size may be an indicator of congestion, delay, instability, or of abnormal events, such as a fight, riot or emergency. Crowd related information can also provide important business intelligence such as the distribution of people throughout spaces, throughput rates, and local densities. A major drawback of many crowd counting approaches is their reliance on large numbers of holistic features, training data requirements of hundreds or thousands of frames per camera, and that each camera must be trained separately. This makes deployment in large multi-camera environments such as shopping centres very costly and difficult. In this chapter, we present a novel scene-invariant crowd counting algorithm that uses local features to monitor crowd size. The use of local features allows the proposed algorithm to calculate local occupancy statistics, scale to conditions which are unseen in the training data, and be trained on significantly less data. Scene invariance is achieved through the use of camera calibration, allowing the system to be trained on one or more viewpoints and then deployed on any number of new cameras for testing without further training. A pre-trained system could then be used as a turn-key solution for crowd counting across a wide range of environments, eliminating many of the costly barriers to deployment which currently exist.
David Ryan, Simon Denman, Sridha Sridharan, Clinton Fookes
Identifying Customer Behaviour and Dwell Time Using Soft Biometrics
Abstract
In a commercial environment, it is advantageous to know how long it takes customers to move between different regions, how long they spend in each region, and where they are likely to go as they move from one location to another. Presently, these measures can only be determined manually, or through the use of hardware tags (i.e. RFID). Soft biometrics are characteristics that can be used to describe, but not uniquely identify an individual. They include traits such as height, weight, gender, hair, skin and clothing colour. Unlike traditional biometrics, soft biometrics can be acquired by surveillance cameras at range without any user cooperation. While these traits cannot provide robust authentication, they can be used to provide identification at long range, and aid in object tracking and detection in disjoint camera networks. In this chapter we propose using colour, height and luggage soft biometrics to determine operational statistics relating to how people move through a space. A novel average soft biometric is used to locate people who look distinct, and these people are then detected at various locations within a disjoint camera network to gradually obtain operational statistics.
Simon Denman, Alina Bialkowski, Clinton Fookes, Sridha Sridharan

Behaviour Analysis

Frontmatter
Automatic Activity Profile Generation from Detected Functional Regions for Video Scene Analysis
Abstract
The potential applications of video surveillance to the Business Intelligence domain continue to grow. For example, automatic computer vision algorithms can provide a fast, efficient process to screen hundreds of hours of video for activity patterns that potentially impact the business. Two such algorithms and their variants are discussed in this chapter. These algorithms analyze surveillance video in order to automatically recognize various functional elements, such as: walkways, roadways, parking-spots, and doorways, through their interactions with pedestrian and vehicle detections. The recognized functional element regions provide a means of capturing statistics related to particular businesses. For example, the owner may be interested in the number of people that enter or exit their business versus the number of people that walk past. Results are shown on functional element recognition and business related activity profiles that demonstrate the effectiveness of these algorithms. Experiments are performed using webcam video of a downtown main street in Ocean City NJ, and surveillance video from the CAVIAR shopping center dataset.
Eran Swears, Matthew Turek, Roderic Collins, A. G. Amitha Perera, Anthony Hoogs
Analyzing Groups: A Social Signaling Perspective
Abstract
This chapter introduces some basic methods to deal with groups of people in surveillance settings. Recently, modeling groups has become a very active trend for video surveillance researchers. Our solution is proper of the recently forged field of social signaling, since it embeds notions of social psychology into computer vision techniques, offering a novel research perspective for the video surveillance community. In particular, we present methods to discover and track groups of people, and to infer what is the focus of attention of each person, that is, we estimate the portion of a scene that is frequently observed by people. Each method we present is evaluated in an experimental section on real scenario, that gives a clear idea of its performance and potentialities.
Loris Bazzani, Marco Cristani, Giulia Paggetti, Diego Tosato, Gloria Menegaz, Vittorio Murino

Systems

Frontmatter
Video Analytics for Business Intelligence
Abstract
This chapter focuses on various algorithms and techniques in video analytics that can be applied to the business intelligence domain. The goal is to provide the reader with an overview of the state of the art approaches in the field of video analytics, and also describe the various applications where these technologies can be applied. We describe existing algorithms for extraction and processing of target and scene information, multi-sensor cross camera analysis, inferencing of simple, complex and abnormal video events, data mining, image search and retrieval, intuitive UIs for efficient customer experience, and text summarization of visual data. We have also presented the evaluation results of each of these technology components using in-house and other publicly available datasets.
Asaad Hakeem, Himaanshu Gupta, Atul Kanaujia, Tae Eun Choe, Kiran Gunda, Andrew Scanlon, Li Yu, Zhong Zhang, Peter Venetianer, Zeeshan Rasheed, Niels Haering
Design and Validation of a System for People Queue Statistics Estimation
Abstract
Estimating statistics of people queues is an important problem for many businesses. Monitoring statistics like average wait time, average service time and queue length help businesses enhance service efficiency, improve customer satisfaction and increase revenue. There is thus a need to design systems that can automatically monitor these statistics. Systems that use video content analytics on imagery acquired by surveillance cameras are ideally suited for such a monitoring task. This chapter presents the systematic design of a general solution for automated visual queue statistics estimation and its validation from surveillance video. Such a design involves the careful consideration of multiple variables such as queue geometry, service-counter type, illumination dynamics, camera viewpoints, people appearances etc. We address these variabilities via a suite of algorithms designed to work across a range of queuing scenarios. We discuss factors involved in the systematic validation of such a system such that realistic performance assessment over a wide range of operating conditions can be ensured.We address validation, evaluation parameters and deployment considerations for this system and demonstrate the performance of the proposed solution.
Vasu Parameswaran, Vinay Shet, Visvanathan Ramesh
Backmatter
Metadaten
Titel
Video Analytics for Business Intelligence
herausgegeben von
Caifeng Shan
Fatih Porikli
Tao Xiang
Shaogang Gong
Copyright-Jahr
2012
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-28598-1
Print ISBN
978-3-642-28597-4
DOI
https://doi.org/10.1007/978-3-642-28598-1

Premium Partner