nach oben

2009 | Buch

Kapitel lesen Erstes Kapitel lesen

Computer Vision Systems

7th International Conference on Computer Vision Systems, ICVS 2009 Liège, Belgium, October 13-15, 2009. Proceedings

herausgegeben von: Mario Fritz, Bernt Schiele, Justus H. Piater

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book constitutes the refereed proceedings of the 7th International Conference on Computer Vision Systems, ICVS 2009, held in Liege, Belgium, October 13-15, 2009. The 21 papers for oral presentation presented together with 24 poster presentations and 2 invited papers were carefully reviewed and selected from 96 submissions. The papers are organized in topical sections on human-machine-interaction, sensors, features and representations, stereo, 3D and optical flow, calibration and registration, mobile and autonomous systems, evaluation, studies and applications, learning, recognition and adaption.

Inhaltsverzeichnis

Frontmatter

Human-Machine Interaction

Recognizing Gestures for Virtual and Real World Interaction

In this paper, we present a vision-based system that estimates the pose of users as well as the gestures they perform in real time. This system allow users to interact naturally with an application (virtual reality, gaming) or a robot.

The main components of our system are a 3D upper-body tracker, which estimates human body pose in real-time from a stereo sensor and a gesture recognizer, which classifies output from temporal tracker into gesture classes. The main novelty of our system is the bag-of-features representation for temporal sequences. This representation, though simple, proves to be surprisingly powerful and able to implicitly learn sequence dynamics. Based on this representation, a multi-class classifier, treating the bag of features as the feature vector is applied to estimate the corresponding gesture class.

We show with experiments performed on a HCI gesture dataset that our method performs better than state-of-the-art algorithms and has some nice generalization properties. Finally, we describe virtual and real world applications, in which our system was integrated for multimodal interaction.

David Demirdjian, Chenna Varri

Multimodal Speaker Recognition in a Conversation Scenario

As a step toward the design of a robot that can take part to a conversation we propose a robotic system that, taking advantage of multiple perceptual capabilities, actively follows a conversation among several human subjects. The essential idea of our proposal is that the robot system can dynamically change the focus of its attention according to visual or audio stimuli to track the actual speaker throughout the conversation and infer her identity.

Maria Letizia Marchegiani, Fiora Pirri, Matia Pizzoli

FaceL: Facile Face Labeling

FaceL is a simple and fun face recognition system that labels faces in live video from an iSight camera or webcam. FaceL presents a window with a few controls and annotations displayed over the live video feed. The annotations indicate detected faces, positions of eyes, and after training, the names of enrolled people. Enrollment is video based, capturing many images per person. FaceL does a good job of distinguishing between a small set of people in fairly uncontrolled settings and incorporates a novel incremental training capability. The system is very responsive, running at over 10 frames per second on modern hardware. FaceL is open source and can be downloaded from

http://pyvision.sourceforge.net/facel

David S. Bolme, J. Ross Beveridge, Bruce A. Draper

Automatic Assessment of Eye Blinking Patterns through Statistical Shape Models

Several studies have related the alertness of an individual to their eye-blinking patterns. Accurate and automatic quantification of eye-blinks can be of much use in monitoring people at jobs that require high degree of alertness, such as that of a driver of a vehicle. This paper presents a non-intrusive system based on facial biometrics techniques, to accurately detect and quantify eye-blinks. Given a video sequence from a standard camera, the proposed procedure can output blink frequencies and durations, as well as the PERCLOS metric, which is the percentage of the time the eyes are at least 80% closed. The proposed algorithm was tested on 360 videos of the AV@CAR database, which amount to approximately 95,000 frames of 20 different people. Validation of the results against manual annotations yielded very high accuracy in the estimation of blink frequency with encouraging results in the estimation of PERCLOS (average error of 0.39%) and blink duration (average error within 2 frames).

Federico M. Sukno, Sri-Kaushik Pavani, Constantine Butakoff, Alejandro F. Frangi

Open-Set Face Recognition-Based Visitor Interface System

This work presents a real-world, real-time video-based open-set face recognition system. The system has been developed as a visitor interface, where a visitor looks at the monitor to read the displayed message before knocking on the door. While the visitor is reading the welcome message, using the images captured by the webcam located on the screen, the developed face recognition system identifies the person without requiring explicit cooperation. According to the identity of the person, customized information about the host is conveyed. To evaluate the system’s performance in this application scenario, a face database has been collected in front of an office. The experimental results on the collected database show that the developed system can operate reliably under real-world conditions.

Hazım Kemal Ekenel, Lorant Szasz-Toth, Rainer Stiefelhagen

Cascade Classifier Using Divided CoHOG Features for Rapid Pedestrian Detection

Co-occurrence histograms of oriented gradients (CoHOG) is a powerful feature descriptor for pedestrian detection, but its calculation cost is large because the feature vector is very high-dimensional. In this paper, in order to achieve rapid detection, we propose a novel method to divide the CoHOG feature into small features and construct a cascade-structured classifier by combining many weak classifiers. The proposed cascade classifier rejects non-pedestrian images at the early stage of the classification while positive and suspicious images are examined carefully by all weak classifiers. This accelerates the classification process without spoiling detection accuracy. The experimental results show that our method achieves about 2.6 times faster detection and the same detection accuracy in comparison to the previous work.

Masayuki Hiromoto, Ryusuke Miyamoto

Sensors, Features and Representations

Boosting with a Joint Feature Pool from Different Sensors

This paper introduces a new way to apply boosting to a joint feature pool from different sensors, namely 3D range data and color vision. The combination of sensors strengthens the systems universality, since an object category could be partially consistent in shape, texture or both. Merging of different sensor data is performed by computing a spatial correlation on 2D layers. An AdaBoost classifier is learned by boosting features competitively in parallel from every sensor layer. Additionally, the system uses new corner-like features instead of rotated Haar-like features, in order to improve real-time classification capabilities. Object type dependent color information is integrated by applying a distance metric to hue values. The system was implemented on a mobile robot and trained to recognize four different object categories: people, cars, bicycle and power sockets. Experiments were conducted to compare system performances between different merged and single sensor based classifiers. We found that for all object categories the classification performance is considerably improved by the joint feature pool.

Dominik Alexander Klein, Dirk Schulz, Simone Frintrop

A Multi-modal Attention System for Smart Environments

Focusing their attention to the most relevant information is a fundamental biological concept, which allows humans to (re-)act rapidly and safely in complex and unfamiliar environments. This principle has successfully been adopted for technical systems where sensory stimuli need to be processed in an efficient and robust way. In this paper a multi-modal attention system for smart environments is described that explicitly respects efficiency and robustness aspects already by its architecture. The system facilitates unconstrained human-machine interaction by integrating multiple sensory information of different modalities.

B. Schauerte, T. Plötz, G. A. Fink

Individual Identification Using Gait Sequences under Different Covariate Factors

Recently, gait recognition for individual identification has received increased attention from biometrics researchers as gait can be captured at a distance using low-resolution capturing device. Human gait properties can be affected by different clothing and carrying objects (i.e. covariate factors). Most of the literature shows that these covariate factors give difficulties for individual identification based on gait. In this paper, we propose a novel method that generates dynamic and static feature templates of the sequences of silhouette images (Dynamic Static Silhouette Templates (DSSTs)) to overcome this issue. Here the DSST is calculated from Motion History Images (MHIs). The experimental results show that our method overcomes issues arising from differing clothing and the carrying of objects.

Yogarajah Pratheepan, Joan V. Condell, Girijesh Prasad

Using Local Symmetry for Landmark Selection

Most visual Simultaneous Localization And Mapping (SLAM) methods use interest points as landmarks in their maps of the environment. Often the interest points are detected using contrast features, for instance those of the Scale Invariant Feature Transform (SIFT). The SIFT interest points, however, have problems with stability, and noise robustness. Taking our inspiration from human vision, we therefore propose the use of local symmetry to select interest points. Our method, the MUlti-scale Symmetry Transform (MUST), was tested on a robot-generated database including ground-truth information to quantify SLAM performance. We show that interest points selected using symmetry are more robust to noise and contrast manipulations, have a slightly better repeatability, and above all, result in better overall SLAM performance.

Gert Kootstra, Sjoerd de Jong, Lambert R. B. Schomaker

Combining Color, Depth, and Motion for Video Segmentation

This paper presents an innovative method to interpret the content of a video scene using a depth camera. Cameras that provide distance instead of color information are part of a promising young technology but they come with many difficulties: noisy signals, small resolution, and ambiguities, to cite a few.

By taking advantage of the robustness to noise of a recent background subtraction algorithm, our method is able to extract useful information from the depth signals. We further enhance the robustness of the algorithm by combining this information with that of an RGB camera. In our experiments, we demonstrate this increased robustness and conclude by showing a practical example of an immersive application taking advantage of our algorithm.

Jérôme Leens, Sébastien Piérard, Olivier Barnich, Marc Van Droogenbroeck, Jean-Marc Wagner

Stable Structural Deformations

Recently, we introduced a hierarchical finite element model in the context of structural image segmentation. Such model deforms from its equilibrium shape into similar shapes under the influence of both, image–based forces and structural forces, which serve the propagation of deformations across the hierarchy levels. Such forces are very likely to result in large (rotational) deformations, which yield under the linear elasticity model artefacts and thus poor segmentation results. In this paper, we provide results indicating that different implementations of the stiffness warping method can be successfully combined to simulate dependent rotational deformations correctly, and in an efficient manner.

Karin Engel, Klaus Toennies

Demand-Driven Visual Information Acquisition

Fast, reliable and demand-driven acquisition of visual information is the key to represent visual scenes efficiently. To achieve this efficiency, a cognitive vision system must plan the utilization of its processing resources to acquire only information relevant for the task. Here, the incorporation of long-term knowledge plays a major role on deciding which information to gather. In this paper, we present a first approach to make use of the knowledge about the world and its structure to plan visual actions. We propose a method to schedule those visual actions to allow for a fast discrimination between objects that are relevant or irrelevant for the task. By doing so, we are able to reduce the system’s computational demand. A first evaluation of our ideas is given using a proof-of-concept implementation.

Sven Rebhan, Andreas Richter, Julian Eggert

Stereo, 3D and Optical Flow

A Real-Time Low-Power Stereo Vision Engine Using Semi-Global Matching

Many real-time stereo vision systems are available on low-power platforms. They all either use a local correlation-like stereo engine or perform dynamic programming variants on a scan-line. However, when looking at high-performance global stereo methods as listed in the upper third of the Middlebury database, the low-power real-time implementations for these methods are still missing. We propose a real-time implementation of the semi-global matching algorithm with algorithmic extensions for automotive applications on a reconfigurable hardware platform resulting in a low power consumption of under 3W. The algorithm runs at 25Hz processing image pairs of size 750x480 pixels and computing stereo on a 680x400 image part with up to a maximum of 128 disparities.

Stefan K. Gehrig, Felix Eberli, Thomas Meyer

Feature-Based Stereo Vision Using Smart Cameras for Traffic Surveillance

This paper presents a stereo-based system for measuring traffic on motorways. To achieve real-time performance, the system exploits a decentralized architecture composed of a pair of smart cameras fixed over the road and connected via network to an embedded industrial PC on the side of the road. Different features (Harris corners and edges) are detected on the two images and matched together with local matching algorithm. The resulting 3D points cloud is processed by maximum spanning tree clustering algorithm to group the points into vehicle objects. Bounding boxes are defined for each detected object, giving an approximation of the vehicles 3D sizes. The system presented here has been validated manually and gives over 90% of good detection accuracy at 20-25 frames/s.

Quentin Houben, Jacek Czyz, Juan Carlos Tocino Diaz, Olivier Debeir, Nadine Warzee

Development and Long-Term Verification of Stereo Vision Sensor System for Controlling Safety at Railroad Crossing

Many people are involved in accidents every year at railroad crossings, but there is no suitable sensor for detecting pedestrians. We are therefore developing a stereo vision based system for ensuring safety at railroad crossings. In this system, stereo cameras are installed at the corners and are pointed toward the center of the railroad crossing to monitor the passage of people. The system determines automatically and in real-time whether anyone or anything is inside the railroad crossing, and whether anyone remains in the crossing. The system can be configured to automatically switch over to a surveillance monitor or automatically connect to an emergency brake system in the event of trouble. We have developed an original stereovision device and installed the remote controlled experimental system applied human detection algorithm in the commercial railroad crossing. Then we store and analyze image data and tracking data throughout two years for standardization of system requirement specification.

Daisuke Hosotani, Ikushi Yoda, Katsuhiko Sakaue

Generation of 3D City Models Using Domain-Specific Information Fusion

In this contribution we present a building reconstruction strategy using spatial models of building parts and information fusion of aerial image, digital surface model and ground plans. The fusion of sensor data aims to derive reliably local building features and is therefore controlled in a domain specific way: ground plans indicate the approximate location of outer roof corners and the intersection of planes from the digital surface model yields the inner roof corners. Parameterized building parts are selected using these corners and afterwards combined to form complete three-dimensional building models. We focus here on the domain specific information fusion and present results on a sub-urban dataset.

Jens Behley, Volker Steinhage

Bio-inspired Stereo Vision System with Silicon Retina Imagers

This paper presents a silicon retina-based stereo vision system, which is used for a pre-crash warning application for side impacts. We use silicon retina imagers for this task, because the advantages of the camera, derived from the human vision system, are high temporal resolution up to 1

and the handling of various lighting conditions with a dynamic range of ~120

. A silicon retina delivers asynchronous data which are called

address events

(AE). Different stereo matching algorithms are available, but these algorithms normally work with full frame images. In this paper we evaluate how the AE data from the silicon retina sensors must be adapted to work with full-frame area-based and feature-based stereo matching algorithms.

Jürgen Kogler, Christoph Sulzbachner, Wilfried Kubinger

A Fast Joint Bioinspired Algorithm for Optic Flow and Two-Dimensional Disparity Estimation

The faithful detection of the motion and of the distance of the objects in the visual scene is a desirable feature of any artificial vision system designed to operate in unknown environments characterized by conditions variable in time in an often unpredictable way. Here, we propose a distributed neuromorphic architecture, that, by sharing the computational resources to solve the stereo and the motion problems, produces fast and reliable estimates of optic flow and 2D disparity. The specific joint design approach allows us to obtain high performance at an affordable computational cost. The approach is validated with respect to the state-of-the-art algorithms and in real-world situations.

Manuela Chessa, Silvio P. Sabatini, Fabio Solari

Calibration and Registration

GPU-Accelerated Nearest Neighbor Search for 3D Registration

Nearest Neighbor Search (NNS) is employed by many computer vision algorithms. The computational complexity is large and constitutes a challenge for real-time capability. The basic problem is in rapidly processing a huge amount of data, which is often addressed by means of highly sophisticated search methods and parallelism. We show that NNS based vision algorithms like the Iterative Closest Points algorithm (ICP) can achieve real-time capability while preserving compact size and moderate energy consumption as it is needed in robotics and many other domains. The approach exploits the concept of general purpose computation on graphics processing units (GPGPU) and is compared to parallel processing on CPU. We apply this approach to the 3D scan registration problem, for which a speed-up factor of 88 compared to a sequential CPU implementation is reported.

Deyuan Qiu, Stefan May, Andreas Nüchter

Visual Registration Method for a Low Cost Robot

An autonomous mobile robot must face the correspondence or data association problem in order to carry out tasks like place recognition or unknown environment mapping. In order to put into correspondence two maps, most methods estimate the transformation relating the maps from matches established between low level feature extracted from sensor data. However, finding explicit matches between features is a challenging and computationally expensive task. In this paper, we propose a new method to align obstacle maps without searching explicit matches between features. The maps are obtained from a stereo pair. Then, we use a vocabulary tree approach to identify putative corresponding maps followed by the Newton minimization algorithm to find the transformation that relates both maps. The proposed method is evaluated in a typical office environment showing good performance.

David Aldavert, Arnau Ramisa, Ricardo Toledo, Ramon López de Mántaras

Automatic Classification of Image Registration Problems

This paper introduces a system that automatically classifies registration problems based on the type of registration required. Rather than rely on a single “best” algorithm, the proposed system is made up of a suite of image registration techniques. Image pairs are analyzed according to the types of variation that occur between them, and appropriate algorithms are selected to solve for the alignment. In the case where multiple forms of variation are detected all potentially appropriate algorithms are run, and a normalized cross correlation (NCC) of the results in their respective error spaces is performed to select which alignment is best. In 87% of the test cases the system selected the transform of the expected corresponding algorithm, either through elimination or through NCC, while in the final 13% a better transform (as calculated by NCC) was proposed by one of the other methods. By classifying the type of registration problem and choosing an appropriate method the system significantly improves the flexibility and accuracy of automatic registration techniques.

Steve Oldridge, Gregor Miller, Sidney Fels

Practical Pan-Tilt-Zoom-Focus Camera Calibration for Augmented Reality

While high-definition cameras with automated zoom lenses are widely used in broadcasting and film productions, there have been no practical calibration methods working without special hardware devices. We propose a practical method to calibrate pan-tilt-zoom-focus cameras, which takes advantages from both pattern-based and rotation-based calibration approaches. It uses patterns whose positions are only roughly known

a priori

, with several image samples taken at different rotations. The proposed method can find the camera view’s translation along the optical axis caused by zoom and focus operations, which has been neglected in most rotation-based algorithms. We also propose a practical focus calibration technique that is applicable even when the image is too defocused for the patterns to be detected. The proposed method is composed of two separate procedures – zoom calibration and focus calibration. Once the calibration is done for all zoom settings with a fixed focus setting, the remaining focus calibration is fully automatic. We show the accuracy of the proposed method by comparing it to the algorithm most widely used in computer vision. The proposed algorithm works also well for real cameras with translation offsets.

Juhyun Oh, Seungjin Nam, Kwanghoon Sohn

Mobile and Autonomous Systems

Learning Objects and Grasp Affordances through Autonomous Exploration

We describe a system for autonomous learning of visual object representations and their grasp affordances on a robot-vision system. It segments objects by grasping and moving 3D scene features, and creates probabilistic visual representations for object detection, recognition and pose estimation, which are then augmented by continuous characterizations of grasp affordances generated through biased, random exploration. Thus, based on a careful balance of generic prior knowledge encoded in (1) the embodiment of the system, (2) a vision system extracting structurally rich information from stereo image sequences as well as (3) a number of built-in behavioral modules on the one hand, and autonomous exploration on the other hand, the system is able to generate object and grasping knowledge through interaction with its environment.

Dirk Kraft, Renaud Detry, Nicolas Pugeault, Emre Başeski, Justus Piater, Norbert Krüger

Integration of Visual Cues for Robotic Grasping

In this paper, we propose a method that generates grasping actions for novel objects based on visual input from a stereo camera. We are integrating two methods that are advantageous either in predicting how to grasp an object or where to apply a grasp. The first one reconstructs a wire frame object model through curve matching. Elementary grasping actions can be associated to parts of this model. The second method predicts grasping points in a 2D contour image of an object. By integrating the information from the two approaches, we can generate a sparse set of full grasp configurations that are of a good quality. We demonstrate our approach integrated in a vision system for complex shaped objects as well as in cluttered scenes.

Niklas Bergström, Jeannette Bohg, Danica Kragic

A Hierarchical System Integration Approach with Application to Visual Scene Exploration for Driver Assistance

A scene exploration which is quick and complete according to current task is the foundation for most higher scene processing. Many specialized approaches exist in the driver assistance domain (e.g. car recognition or lane marking detection), but we aim at an integrated system, combining several such techniques to achieve sufficient performance. In this work we present a novel approach to this integration problem. Algorithms are contained in hierarchically arranged layers with the main principle that the ordering is induced by the requirement that each layer depends only on the layers below. Thus, higher layers can be added to a running system (incremental composition) and shutdown or failure of higher layers leaves the system in an operational state, albeit with reduced functionality (graceful degradation). Assumptions, challenges and benefits when applying this approach to practical systems are discussed. We demonstrate our approach on an integrated system performing visual scene exploration on real-world data from a prototype vehicle. System performance is evaluated on two scene exploration completeness measures and shown to gracefully degrade as several layers are removed and to fully recover as these layers are restarted while the system is running.

Benjamin Dittes, Martin Heracles, Thomas Michalke, Robert Kastner, Alexander Gepperth, Jannik Fritsch, Christian Goerick

Real-Time Traversable Surface Detection by Colour Space Fusion and Temporal Analysis

We present a real-time approach for traversable surface detection using a low-cost monocular camera mounted on an autonomous vehicle. The proposed methodology extracts colour and texture information from various channels of the HSL, YCbCr and LAB colourspaces by temporal analysis in order to create a “traversability map”. On this map lighting and water artifacts are eliminated including shadows, reflections and water prints. Additionally, camera vibration is compensated by temporal filtering leading to robust path edge detection in blurry images. The performance of this approach is extensively evaluated over varying terrain and environmental conditions and the effect of colourspace fusion on the system’s precision is analysed. The results show a mean accuracy of 97% over this comprehensive test set.

Ioannis Katramados, Steve Crumpler, Toby P. Breckon

Saliency-Based Obstacle Detection and Ground-Plane Estimation for Off-Road Vehicles

Due to stringing time constraints, saliency models are becoming popular tools for building situated robotic systems requiring, for instance, object recognition and vision-based localisation capabilities. This paper contributes to this endeavour by applying saliency into two new tasks: modulation of stereo-based obstacle detection and ground-plane estimation, both to operate on-board off-road vehicles. To achieve this, a new biologically inspired saliency model, along with a set of adaptations to the task-specific algorithms, are proposed. Experimental results show a reduction in computational cost and an increase in both robustness and accuracy when saliency modulation is used.

Pedro Santana, Magno Guedes, Luís Correia, José Barata

Performance Evaluation of Stereo Algorithms for Automotive Applications

The accuracy of stereo algorithms is commonly assessed by comparing the results against the Middlebury database. However, no equivalent data for automotive or robotics applications exist and these are difficult to obtain. We introduce a performance evaluation scheme and metrics for stereo algorithms at three different levels. This evaluation can be reproduced with comparably low effort and has very few prerequisites. First, the disparity images are evaluated on a pixel level. The second level evaluates the disparity data roughly column by column, and the third level performs an evaluation on an object level. We compare three real-time capable stereo algorithms with these methods and the results show that a global stereo method, semi-global matching, yields the best performance using our metrics that incorporate both accuracy and robustness.

Pascal Steingrube, Stefan K. Gehrig, Uwe Franke

Evaluation, Studies and Applications

White-Box Evaluation of Computer Vision Algorithms through Explicit Decision-Making

Traditionally computer vision and pattern recognition algorithms are evaluated by measuring differences between final interpretations and ground truth. These black-box evaluations ignore intermediate results, making it difficult to use intermediate results in diagnosing errors and optimization. We propose “opening the box,” representing vision algorithms as sequences of decision points where recognition results are selected from a set of alternatives. For this purpose, we present a domain-specific language for pattern recognition tasks, the Recognition Strategy Language (RSL). At run-time, an RSL interpreter records a complete history of decisions made during recognition, as it applies them to a set of interpretations maintained for the algorithm. Decision histories provide a rich new source of information: recognition errors may be traced back to the specific decisions that caused them, and intermediate interpretations may be recovered and displayed. This additional information also permits new evaluation metrics that include false negatives (correct hypotheses that the algorithm generates and later rejects), such as the percentage of ground truth hypotheses generated (

historical recall

), and the percentage of generated hypotheses that are correct

(historical precision).

We illustrate the approach through an analysis of cell detection in two published table recognition algorithms.

Richard Zanibbi, Dorothea Blostein, James R. Cordy

Evaluating the Suitability of Feature Detectors for Automatic Image Orientation Systems

We investigate the suitability of different local feature detectors for the task of automatic image orientation under different scene texturings. Building on an existing system for image orientation, we vary the applied operators while keeping the strategy fixed, and evaluate the results. An emphasis is put on the effect of combining detectors for calibrating difficult datasets. Besides some of the most popular scale and affine invariant detectors available, we include two recently proposed operators in the setup: A scale invariant junction detector and a scale invariant detector based on the local entropy of image patches. After describing the system, we present a detailed performance analysis of the different operators on a number of image datasets. We both analyze ground-truth-deviations and results of a final bundle adjustment, including observations, 3D object points and camera poses. The paper concludes with hints on the suitability of the different combinations of detectors, and an assessment of the potential of such automatic orientation procedures.

Timo Dickscheid, Wolfgang Förstner

Interest Point Stability Prediction

Selective attention algorithms produce more interest points than are usable by many computer vision applications. This work suggests a method for ordering and selecting a subset of those interest points, simultaneously increasing the repeatability of that subset. The individual repeatability of a combination of 10

SIFT, Harris-Laplace, and Hessian-Laplace interest points are predicted using generalized linear models (GLMs). The models are produced by studying the 17 attributes of each interest point. Our goal is not to improve any particular algorithm, but to find attributes that affect affine and similarity invariance regardless of algorithm. The techniques explored in this research enable interest point detectors to improve mean repeatability of their algorithm by 4% using a rank-ordering produced by a GLM or by thresholding interest points using a set of five new thresholds. Selecting the top 1% of GLM-ranked Harris-Laplace interest points results in a repeatability improvement of 6%, to 92.4%.

H. Thomson Comer, Bruce A. Draper

Relevance of Interest Points for Eye Position Prediction on Videos

This papers tests the relevance of interest points to predict eye movements of subjects when viewing video sequences freely. Moreover the papers compares the eye positions of subjects with interest maps obtained using two classical interest point detectors: one spatial and one space-time. We fund that in function of the video sequence, and more especially in function of the motion inside the sequence, the spatial or the space-time interest point detector is more or less relevant to predict eye movements.

Alain Simac-Lejeune, Sophie Marat, Denis Pellerin, Patrick Lambert, Michèle Rombaut, Nathalie Guyader

A Computer Vision System for Visual Grape Grading in Wine Cellars

This communication describes a computer vision system for automatic visual inspection and classification of grapes in cooperative wine cellars. The system is intended to work outdoors, so robust algorithms for preprocessing and segmentation are implemented. Specific methods for illumination compensation have been developed. Gabor filtering has been used for segmentation. Several preliminary classification schemes, using artificial neural networks and Random Forest, have also been tested. The obtained results show the benefits of the system as a useful tool for classification and for objective price fixing.

Esteban Vazquez-Fernandez, Angel Dacal-Nieto, Fernando Martin, Arno Formella, Soledad Torres-Guijarro, Higinio Gonzalez-Jorge

Inspection of Stamped Sheet Metal Car Parts Using a Multiresolution Image Fusion Technique

This paper presents an image processing algorithm for on-line inspection of large sheet metal car parts. The automatic inspection of stamped sheet metal is not an easy task due to the high reflective nature of the material and the nearly imperceptible characteristics of the defects to be detected. In order to deal with the ubiquitous glints, four images of every zone are acquired illuminating from different directions. The image series is fused using a Haar transform into a single image where the spurious features originated by the glints are eliminated without discarding the salient information. Our results clearly suggest that the proposed fusion scheme offers a powerful way to obtain a clean image where these subtle defects can be detected reliably.

Eusebio de la Fuente López, Félix Miguel Trespaderne

Who’s Counting? Real-Time Blackjack Monitoring for Card Counting Detection

This paper describes a computer vision system to detect card counters and dealer errors in a game of Blackjack from an overhead stereo camera. Card counting is becoming increasingly popular among casual Blackjack players, and casinos are eager to find new systems of dealing with the issue. There are several existing systems on the market; however, these solutions tend to be overly expensive, require specialised hardware (e.g. RFID) and are only cost-effective to the largest casinos. With a user-centered design approach, we built a simple and effective system that detects cards and player bets in real time, and calculates the correlation between player bets and the card count to determine if a player is card counting. The system uses a combination of contour analysis, template matching and the SIFT algorithm to detect and recognise cards. Stereo imaging is used to calculate the height of chip stacks on the table, allowing the system to track the size of player bets. Our system achieves card recognition accuracy of over 99%, and effectively detected card counters and dealer errors when tested with a range of different users, including professional dealers and novice blackjack players.

Krists Zutis, Jesse Hoey

Learning, Recognition and Adaptation

Increasing the Robustness of 2D Active Appearance Models for Real-World Applications

This paper presents an approach to increase the robustness of

Active Appearance Models

(AAMs) within the scope of human-robot-interaction. Due to unknown environments with changing illumination conditions and different users, which may perform unpredictable head movements, standard AAMs suffer from a lack of robustness. Therefore, this paper introduces several methods to increase the robustness of AAMs. In detail, we optimize the shape model to certain applications by using genetic algorithms. Furthermore, a modified retinex-filter to reduce the influence of illumination is presented. These approaches are finally combined with an adaptive parameter fitting approach, which can handle bad initializations. We obtain very promising results of experiments evaluating the IMM face database [1].

Ronny Stricker, Christian Martin, Horst-Michael Gross

Learning Query-Dependent Distance Metrics for Interactive Image Retrieval

An approach to target-based image retrieval is described based on on-line rank-based learning. User feedback obtained via interaction with 2D image layouts provides qualitative constraints that are used to adapt distance metrics for retrieval. The user can change the query during a search session in order to speed up the retrieval process. An empirical comparison of online learning methods including ranking-SVM is reported using both simulated and real users.

Junwei Han, Stephen J. McKenna, Ruixuan Wang

Consistent Interpretation of Image Sequences to Improve Object Models on the Fly

We present a system, which is able to track multiple objects under partial and total occlusion. The reasoning system builds up a graph based spatio-temporal representation of object hypotheses and thus is able to explain the scene even if objects are totally occluded. Furthermore it adapts the object models and learns new appearances at assumed object locations. We represent objects in a star-shaped geometrical model of interest points using a codebook. The novelty of our system is to combine a spatio-temporal reasoning system and an interest point based object detector for on-line improving of object models in terms of adding new, and deleting unreliable interest points. We propose this system for a consistent representation of objects in an image sequence and for learning changes of appearances on the fly.

Johann Prankl, Martin Antenreiter, Peter Auer, Markus Vincze

Nonideal Iris Recognition Using Level Set Approach and Coalitional Game Theory

This paper presents an efficient algorithm for iris recognition using the level set approach and the coalitional game theory. To segment the inner boundary from a nonideal iris image, we apply a level set based curve evolution approach using the edge stopping function, and to detect the outer boundary, we employ the curve evolution approach using the regularized Mumford-Shah segmentation model with an energy minimization approach. An iterative algorithm, called the Contribution-Selection Algorithm (CSA), in the context of coalitional game theory is used to select the optimal features subset without compromising the accuracy. The verification performance of the proposed scheme is validated using the UBIRIS Version 2, the ICE 2005, and the WVU datasets.

Kaushik Roy, Prabir Bhattacharya

Incremental Video Event Learning

We propose a new approach for video event learning. The only hypothesis is the availability of tracked object attributes. The approach incrementally aggregates the attributes and reliability information of tracked objects to learn a hierarchy of state and event concepts. Simultaneously, the approach recognises the states and events of the tracked objects. This approach proposes an automatic bridge between the low-level image data and higher level conceptual information. The approach has been evaluated for more than two hours of an elderly care application. The results show the capability of the approach to learn and recognise meaningful events occurring in the scene. Also, the results show the potential of the approach for giving a description of the activities of a person (e.g. approaching to a table, crouching), and to detect abnormal events based on the frequency of occurrence.

Marcos Zúñiga, François Brémond, Monique Thonnat

A System for Probabilistic Joint 3D Head Tracking and Pose Estimation in Low-Resolution, Multi-view Environments

We present a new system for 3D head tracking and pose estimation in low-resolution, multi-view environments. Our approach consists of a joint particle filter scheme, that combines head shape evaluation with histograms of oriented gradients and pose estimation by means of artificial neural networks. The joint evaluation resolves previous problems of automatic alignment and multi-sensor fusion and gains an automatic system that is flexible against modifications in the available number of cameras. We evaluate on the CLEAR07 dataset for multi-view head pose estimation and achieve mean pose errors of 7.2° and 9.3° for pan and tilt respectively, which improves accuracy compared to our previous work by 14.9% and 25.8%.

Michael Voit, Rainer Stiefelhagen

Robust Tracking by Means of Template Adaptation with Drift Correction

Algorithms for correlation-based visual tracking rely to a great extent on a robust measurement of an object’s location, gained by comparing a template with the visual input. Robustness against object appearance transformations requires template adaptation - a technique that is subject to drift problems due to error integration. Most solutions to this “drift-problem” fall back on a dominant template that remains unmodified, preventing a true adaptation to arbitrary large transformations. In this paper, we present a novel template adaptation approach that instead of recurring to a master template, makes use of object segmentation as a complementary object support to circumvent the drift problem. In addition, we introduce a selective update strategy that prevents erroneous adaptation in case of occlusion or segmentation failure. We show that using our template adaptation approach, we are able to successfully track a target in sequences containing large appearance transformations, where standard template adaptation techniques fail.

Chen Zhang, Julian Eggert, Nils Einecke

A Multiple Hypothesis Approach for a Ball Tracking System

This paper presents a computer vision system for tracking and predicting flying balls in 3-D from a stereo-camera. It pursues a “textbook-style” approach with a robust circle detector and probabilistic models for ball motion and circle detection handled by state-of-the-art estimation algorithms. In particular we use a Multiple-Hypotheses Tracker (MHT) with an Unscented Kalman Filter (UKF) for each track, handling multiple flying balls, missing and false detections and track initiation and termination.

The system also performs auto-calibration estimating physical parameters (ball radius, gravity relative to camera, air drag) simply from observing some flying balls. This reduces the setup time in a new environment.

Oliver Birbach, Udo Frese

Fast Vision-Based Object Recognition Using Combined Integral Map

Integral images or integral map (IMap) is one of the major techniques that has been used to improve the speed of computer vision systems. It has been used to compute Haar features and histograms of oriented gradient features. Some modifications have been proposed to the original IMap algorithm, but most proposed systems use IMap as it was first introduced. The IMap may be further improved by reducing its computational cost in multi-dementional feature domain. In this paper, a combined integral map (CIMap) technique is proposed to efficiently build and use multiple IMaps using a single concatenated map. Implementations show that using CIMap can signifficantly improve system speed while maintaining the accuracy.

Tam Phuong Cao, Guang Deng, Darrell Elton

Backmatter

Titel: Computer Vision Systems
herausgegeben von: Mario Fritz
Bernt Schiele
Justus H. Piater
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-04667-4
Print ISBN: 978-3-642-04666-7
DOI: https://doi.org/10.1007/978-3-642-04667-4