Skip to main content

2011 | Buch

Computer Vision Systems

8th International Conference, ICVS 2011, Sophia Antipolis, France, September 20-22, 2011. Proceedings

herausgegeben von: James L. Crowley, Bruce A. Draper, Monique Thonnat

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 8th International Conference on Computer Vision Systems, ICVS 2011, held in Sophia Antipolis, France, in September 2009. The 22 revised papers presented were carefully reviewed and selected from 58 submissions. The papers are organized in topical sections on vision systems, control of perception, performance evaluation, activity recognition, and knowledge directed vision.

Inhaltsverzeichnis

Frontmatter

Vision System

Knowing What Happened - Automatic Documentation of Image Analysis Processes
Abstract
Proper archiving or later reconstruction and verification of results in data analysis requires thorough logging of all manipulative actions on the data and corresponding parameter settings. Unfortunately such documentation tasks often enforce extensive and error prone manual activities by the user. To overcome these problems we present Alida, an approach for fully automatic documentation of data analysis procedures. Based on an unified operator interface all operations on data including their sequence and configurations are registered during analysis. Subsequently these data are made explicit in XML graph representations yielding a suitable base for visual and analytic inspection. As example for the application of Alida in practice we present MiToBo , a toolbox for image analysis implemented on the basis of Alida and demonstrating the advantages of automatic documentation for image analysis procedures.
Birgit Möller, Oliver Greß, Stefan Posch
Efficient Use of Geometric Constraints for Sliding-Window Object Detection in Video
Abstract
We systematically investigate how geometric constraints can be used for efficient sliding-window object detection. Starting with a general characterization of the space of sliding-window locations that correspond to geometrically valid object detections, we derive a general algorithm for incorporating ground plane constraints directly into the detector computation. Our approach is indifferent to the choice of detection algorithm and can be applied in a wide range of scenarios. In particular, it allows to effortlessly combine multiple different detectors and to automatically compute regions-of-interest for each of them. We demonstrate its potential in a fast CUDA implementation of the HOG detector and show that our algorithm enables a factor 2-4 speed improvement on top of all other optimizations.
Patrick Sudowe, Bastian Leibe
A Method for Asteroids 3D Surface Reconstruction from Close Approach Distances
Abstract
We present a procedure for asteroids 3D surface reconstruction from images for close approach distances. Different from other 3D reconstruction scenario from spacecraft images, the closer flyby gave the chance to revolve around the asteroid shape and thus acquiring images from different viewpoints with a higher baseline. The chance to have more information of the asteroids surface is however paid by the loss of correspondences between images given the larger baseline. In this paper we present a procedure used to reconstruct the 3D surface of the asteroid 21 Lutetia encountered by Rosetta spacecraft on July the 10 th of 2010 at the closest approach distance of 3170 Km. It was possible to reconstruct a wider surface even dealing with strong ratio of missing data in the measurements. Results show the reconstructed 3D surface of the asteroid as a sparse 3D mesh.
Luca Baglivo, Alessio Del Bue, Massimo Lunardelli, Francesco Setti, Vittorio Murino, Mariolino De Cecco
RT-SLAM: A Generic and Real-Time Visual SLAM Implementation
Abstract
This article presents a new open-source C++ implementation to solve the SLAM problem, which is focused on genericity, versatility and high execution speed. It is based on an original object oriented architecture, that allows the combination of numerous sensors and landmark types, and the integration of various approaches proposed in the literature. The system capacities are illustrated by the presentation of an inertial/vision SLAM approach, for which several improvements over existing methods have been introduced, and that copes with very high dynamic motions. Results with a hand-held camera are presented.
Cyril Roussillon, Aurélien Gonzalez, Joan Solà, Jean-Marie Codol, Nicolas Mansard, Simon Lacroix, Michel Devy

Control of Perception (I)

A Quantitative Comparison of Speed and Reliability for Log-Polar Mapping Techniques
Abstract
A space-variant representation of images is of great importance for active vision systems capable of interacting with the environment. A precise processing of the visual signal is achieved in the fovea, and, at the same time, a coarse computation in the periphery provides enough information to detect new saliences on which to bring the focus of attention. In this work, different techniques to implement the blind-spot model for the log-polar mapping are quantitatively analyzed to assess the visual quality of the transformed images and to evaluate the associated computational load. The technique with the best trade-off between these two aspects is expected to show the most efficient behaviour in robotic vision systems, where the execution time and the reliability of the visual information are crucial.
Manuela Chessa, Silvio P. Sabatini, Fabio Solari, Fabio Tatti
Toward Accurate Feature Detectors Performance Evaluation
Abstract
The quality of interest point detectors is crucial for many computer vision applications. One of the frequently used integral methods to compare detectors’ performance is repeatability score. In this work, the authors analyze the existing approach for repeatability score calculation and highlight some important weaknesses and drawbacks of this method. Then we propose a set of criteria toward more accurate integral detector performance measure and introduce a modified repeatability score calculation. We also provide illustrative examples to highlight benefits of the proposed method.
Pavel Smirnov, Piotr Semenov, Alexander Redkin, Anthony Chun
Evaluation of Local Descriptors for Action Recognition in Videos
Abstract
Recently, local descriptors have drawn a lot of attention as a representation method for action recognition. They are able to capture appearance and motion. They are robust to viewpoint and scale changes. They are easy to implement and quick to calculate. Moreover, they have shown to obtain good performance for action classification in videos. Over the last years, many different local spatio-temporal descriptors have been proposed. They are usually tested on different datasets and using different experimental methods. Moreover, experiments are done making assumptions that do not allow to fully evaluate descriptors. In this paper, we present a full evaluation of local spatio-temporal descriptors for action recognition in videos. Four widely used in state-of-the-art approaches descriptors and four video datasets were chosen. HOG, HOF, HOG-HOF and HOG3D were tested under a framework based on the bag-of-words model and Support Vector Machines.
Piotr Bilinski, Francois Bremond

Performance Evaluation (II)

On the Spatial Extents of SIFT Descriptors for Visual Concept Detection
Abstract
State-of-the-art systems for visual concept detection typically rely on the Bag-of-Visual-Words representation. While several aspects of this representation have , such as keypoint sampling strategy, vocabulary size, projection method, weighting scheme or the integration of color, the impact of the spatial extents of local SIFT descriptors has not been studied in previous work. In this paper, the effect of different spatial extents in a state-of-the-art system for visual concept detection is investigated. Based on the observation that SIFT descriptors with different spatial extents yield large performance differences, we propose a concept detection system that combines feature representations for different spatial extents using multiple kernel learning. It is shown experimentally on a large set of 101 concepts from the Mediamill Challenge and on the PASCAL Visual Object Classes Challenge that these feature representations are complementary: Superior performance can be achieved on both test sets using the proposed system.
Markus Mühling, Ralph Ewerth, Bernd Freisleben
An Experimental Framework for Evaluating PTZ Tracking Algorithms
Abstract
PTZ (Pan-Tilt-Zoom) cameras are powerful devices in video surveillance applications, because they offer both wide area coverage and highly detailed images in a single device. Tracking with a PTZ camera is a closed loop procedure that involves computer vision algorithms and control strategies, both crucial in developing an effective working system. In this work, we propose a novel experimental framework that allows to evaluate image tracking algorithms in controlled and repeatable scenarios, combining the PTZ camera with a calibrated projector screen on which we can play different tracking situations. We applied such setup to compare two different tracking algorithms, a kernel-based (mean-shift) tracking and a particle filter, opportunely tuned to fit with a PTZ camera. As shown in the experiments, our system allows to finely investigate pros and cons of each algorithm.
Pietro Salvagnini, Marco Cristani, Alessio Del Bue, Vittorio Murino

Activity Recognition

Unsupervised Activity Extraction on Long-Term Video Recordings Employing Soft Computing Relations
Abstract
In this work we present a novel approach for activity extraction and knowledge discovery from video employing fuzzy relations. Spatial and temporal properties from detected mobile objects are modeled with fuzzy relations. These can then be aggregated employing typical soft-computing algebra. A clustering algorithm based on the transitive closure calculation of the fuzzy relations allows finding spatio-temporal patterns of activity. We present results obtained on videos corresponding to different sequences of apron monitoring in the Toulouse airport in France.
Luis Patino, Murray Evans, James Ferryman, François Bremond, Monique Thonnat
Unsupervised Discovery, Modeling, and Analysis of Long Term Activities
Abstract
This work proposes a complete framework for human activity discovery, modeling, and recognition using videos. The framework uses trajectory information as input and goes up to video interpretation. The work reduces the gap between low-level vision information and semantic interpretation, by building an intermediate layer composed of Primitive Events. The proposed representation for primitive events aims at capturing meaningful motions (actions) over the scene with the advantage of being learned in an unsupervised manner. We propose the use of Primitive Events as descriptors to discover, model, and recognize activities automatically. The activity discovery is performed using only real tracking data. Semantics are added to the discovered activities (e.g., “Preparing Meal”, “Eating”) and the recognition of activities is performed with new datasets.
Guido Pusiol, Francois Bremond, Monique Thonnat
Ontology-Based Realtime Activity Monitoring Using Beam Search
Abstract
In this contribution we present a realtime activity monitoring system, called SCENIOR (SCEne Interpretation with Ontology-based Rules) with several innovative features. Activity concepts are defined in an ontology using OWL, extended by SWRL rules for the temporal structure, and are automatically transformed into a high-level scene interpretation system based on JESS rules. Interpretation goals are transformed into hierarchical hypotheses structures associated with constraints and embedded in a probabilistic scene model. The incremental interpretation process is organised as a Beam Search with multiple parallel interpretation threads. At each step, a context-dependent probabilistic rating is computed for each partial interpretation reflecting the probability of that interpretation to reach completion. Low-rated threads are discarded depending on the beam width. Fully instantiated hypotheses may be used as input for higher-level hypotheses, thus realising a doubly hierarchical recognition process. Missing evidence may be "hallucinated" depending on the context. The system has been evaluated with real-life data of aircraft service activities.
Wilfried Bohlken, Bernd Neumann, Lothar Hotz, Patrick Koopmann
Probabilistic Recognition of Complex Event
Abstract
This paper describes a complex event recognition approach with probabilistic reasoning for handling uncertainty. The first advantage of the proposed approach is the flexibility of the modeling of composite events with complex temporal constraints. The second advantage is the use of probability theory providing a consistent framework for dealing with uncertain knowledge for the recognition of complex events. The experimental results show that our system can successfully improve the event recognition rate. We conclude by comparing our algorithm with the state of the art and showing how the definition of event models and the probabilistic reasoning can influence the results of the real-time event recognition.
Rim Romdhane, Bernard Boulay, Francois Bremond, Monique Thonnat

Control of Perception (I)

Learning What Matters: Combining Probabilistic Models of 2D and 3D Saliency Cues
Abstract
In this paper we address the problem of obtaining meaningful saliency measures that tie in coherently with other methods and modalities within larger robotic systems. We learn probabilistic models of various saliency cues from labeled training data and fuse these into probability maps, which while appearing to be qualitatively similar to traditional saliency maps, represent actual probabilities of detecting salient features. We show that these maps are better suited to pick up task-relevant structures in robotic applications. Moreover, having true probabilities rather than arbitrarily scaled saliency measures allows for deeper, semantically meaningful integration with other parts of the overall system.
Ekaterina Potapova, Michael Zillich, Markus Vincze
3D Saliency for Abnormal Motion Selection: The Role of the Depth Map
Abstract
This paper deals with the selection of relevant motion within a scene. The proposed method is based on 3D features extraction and their rarity quantification to compute bottom-up saliency maps. We show that the use of 3D motion features namely the motion direction and velocity is able to achieve much better results than the same algorithm using only 2D information. This is especially true in close scenes with small groups of people or moving objects and frontal view. The proposed algorithm uses motion features but it can be easily generalized to other dynamic or static features. It is implemented on a platform for real-time signal analysis called Max/Msp/Jitter. Social signal processing, video games, gesture processing and, in general, higher level scene understanding can benefit from this method.
Nicolas Riche, Matei Mancas, Bernard Gosselin, Thierry Dutoit
Scene Understanding through Autonomous Interactive Perception
Abstract
We propose a framework for detecting, extracting and modeling objects in natural scenes from multi-modal data. Our framework is iterative, exploiting different hypotheses in a complementary manner. We employ the framework in realistic scenarios, based on visual appearance and depth information. Using a robotic manipulator that interacts with the scene, object hypotheses generated using appearance information are confirmed through pushing. The framework is iterative, each generated hypothesis is feeding into the subsequent one, continuously refining the predictions about the scene. We show results that demonstrate the synergic effect of applying multiple hypotheses for real-world scene understanding. The method is efficient and performs in real-time.
Niklas Bergström, Carl Henrik Ek, Mårten Björkman, Danica Kragic

Knowledge Directed Vision

A Cognitive Vision System for Nuclear Fusion Device Monitoring
Abstract
We propose a cognitive vision-based system for the intelligent monitoring of tokamaks during plasma operation, based on multi-sensor data analysis and symbolic reasoning. The practical purpose is to detect and characterize in real time abnormal events such as hot spots measured through infrared images of the in-vessel components in order to take adequate decisions. Our system is made intelligent by the use of a priori knowledge of both contextual and perceptual information for ontology-driven event modeling and task-oriented event recognition. The system is made original by combining both physics-based and perceptual information during the recognition process. Real time reasoning is achieved thanks to task-level software optimizations. The framework is generic and can be easily adapted to different fusion device environments. This paper presents the developed system and its achievements on real data of the Tore Supra tokamak imaging system.
Vincent Martin, Victor Moncada, Jean-Marcel Travere, Thierry Loarer, François Brémond, Guillaume Charpiat, Monique Thonnat
Knowledge Representation and Inference for Grasp Affordances
Abstract
Knowledge bases for semantic scene understanding and processing form indispensable components of holistic intelligent computer vision and robotic systems. Specifically, task based grasping requires the use of perception modules that are tied with knowledge representation systems in order to provide optimal solutions. However, most state-of-the-art systems for robotic grasping, such as the K- CoPMan, which uses semantic information in mapping and planning for grasping, depend on explicit 3D model representations, restricting scalability. Moreover, these systems lacks conceptual knowledge that can aid the perception module in identifying the best objects in the field of view for task based manipulation through implicit cognitive processing. This restricts the scalability, extensibility, usability and versatility of the system. In this paper, we utilize the concept of functional and geometric part affordances to build a holistic knowledge representation and inference framework in order to aid task based grasping. The performance of the system is evaluated based on complex scenes and indirect queries.
Karthik Mahesh Varadarajan, Markus Vincze

Control of Perception (II)

Towards a General Abstraction through Sequences of Conceptual Operations
Abstract
Computer vision is a complex field which can be challenging for those outside the research community to apply in the real world. To address this we present a novel formulation for the abstraction of computer vision problems above algorithms, as part of our OpenVL framework. We have created a set of fundamental operations which form a basis from which we can build up descriptions of computer vision methods. We use these operations to conceptually define the problem, which we can then map into algorithm space to choose an appropriate method to solve the problem. We provide details on three of our operations, Match, Detect and Solve, and subsequently demonstrate the flexibility of description these three offer us. We describe various vision problems such as image registration and tracking through the sequencing of our operations and discuss how these may be extended to cover a larger range of tasks, which in turn may be used analogously to a graphics shader language.
Gregor Miller, Steve Oldridge, Sidney Fels
Girgit: A Dynamically Adaptive Vision System for Scene Understanding
Abstract
Modern vision systems must run in continually changing contexts. For example, a system to detect vandalism in train stations must function during the day and at night. The vision components for acquisition and detection used during daytime may not be the same as those used at night. The system must adapt to a context by replacing running components such as image acquisition from color to infra-red. This adaptation must be dynamic with detection of context, decision on change in system configuration, followed by the seamless execution of the new configuration. All this must occur while minimizing the impact of dynamic change on validity of detection and loss in performance. We present Girgit, a context-aware vision system for scene understanding, that dynamically orchestrates a set of components. A component encapsulates a vision-related algorithm such as from the OpenCV library. Girgit inherently provides loading/caching of multiple component instances, system reconfiguration, management of incoming events to suggest actions such as component re-configuration and replacement of components in pipelines. Given the surplus architectural layer for dynamic adaptation one may ask, does Girgit degrade scene understanding performance? We perform several empirical evaluations on Girgit using metrics such as frame-rate and adaptation time to answer this question. For instance, the average adaptation time between change in configurations is less than 2 μs with caching, while 8 ms without caching. This in-turn has negligible effect on scene understanding performance with respect to static C++ implementations for most practical purposes.
Leonardo M. Rocha, Sagar Sen, Sabine Moisan, Jean-Paul Rigault
Run Time Adaptation of Video-Surveillance Systems: A Software Modeling Approach
Abstract
Video-surveillance processing chains are complex software systems, exhibiting high degrees of variability along several dimensions. At the specification level, the number of possible applications and type of scenarios is large. On the software architecture side, the number of components, their variations due to possible choices among different algorithms, the number of tunable parameters... make the processing chain configuration rather challenging. In this paper we describe a framework for design, deployment, and run-time adaptation of video-surveillance systems—with a focus on the run time aspect. Starting from a high level specification of the application type, execution context, quality of service requirements... the framework derives valid possible system configurations through (semi) automatic model transformations. At run-time, the framework is also responsible for adapting the running configuration to context changes. The proposed framework relies on Model-Driven Engineering (MDE) methods, a recent line of research in Software Engineering that promotes the use of software models and model transformations to establish a seamless path from software specifications to system implementations. It uses Feature Diagrams which offer a convenient way of representing the variability of a software system. The paper illustrates the approach on a simple but realistic use case scenario of run time adaptation.
Sabine Moisan, Jean-Paul Rigault, Mathieu Acher, Philippe Collet, Philippe Lahire
Automatically Searching for Optimal Parameter Settings Using a Genetic Algorithm
Abstract
Modern vision systems are often a heterogeneous collection of image processing, machine learning, and pattern recognition techniques. One problem with these systems is finding their optimal parameter settings, since these systems often have many interacting parameters. This paper proposes the use of a Genetic Algorithm (GA) to automatically search parameter space. The technique is tested on a publicly available face recognition algorithm and dataset. In the work presented, the GA takes the role of a person configuring the algorithm by repeatedly observing performance on a tuning-subset of the final evaluation test data. In this context, the GA is shown to do a better job of configuring the algorithm than was achieved by the authors who originally constructed and released the LRPCA baseline. In addition, the data generated during the search is used to construct statistical models of the fitness landscape which provides insight into the significance from, and relations among, algorithm parameters.
David S. Bolme, J. Ross Beveridge, Bruce A. Draper, P. Jonathon Phillips, Yui Man Lui
Backmatter
Metadaten
Titel
Computer Vision Systems
herausgegeben von
James L. Crowley
Bruce A. Draper
Monique Thonnat
Copyright-Jahr
2011
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-23968-7
Print ISBN
978-3-642-23967-0
DOI
https://doi.org/10.1007/978-3-642-23968-7

Premium Partner