Skip to main content
Top

2010 | Book

Recognizing Patterns in Signals, Speech, Images and Videos

ICPR 2010 Contests, Istanbul, Turkey, August 23-26, 2010, Contest Reports

Editors: Devrim Ünay, Zehra Çataltepe, Selim Aksoy

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the refereed contest reports of the 20th International Conference on Pattern Recognition, ICPR 2010, held in Istanbul, Turkey, in August 2010. The 31 revised full papers presented were carefully reviewed and selected. The papers are organized in topical sections on BiHTR - Bi-modal handwritten Text Recognition, CAMCOM 2010 - Verification of Video Source Camera Competition, CDC - Classifier Domains of Competence, GEPR - Graph Embedding for Pattern Recognition, ImageCLEF@ICPR - Information Fusion Task, ImageCLEF@ICPR - Visual Concept Detection Task, ImageCLEF@ICPR - Robot Vision Task, MOBIO - Mobile Biometry Face and Speaker Verification Evaluation, PR in HIMA - Pattern Recognition in Histopathological Images, SDHA 2010 - Semantic Description of Human Activities.

Table of Contents

Frontmatter

BiHTR – Bi-modal Handwritten Text Recognition

Bi-modal Handwritten Text Recognition (BiHTR) ICPR 2010 Contest Report

Handwritten text is generally captured through two main modalities:

off-line

and

on-line

. Each modality has advantages and disadvantages, but it seems clear that smart approaches to handwritten text recognition (HTR) should make use of both modalities in order to take advantage of the positive aspects of each one. A particularly interesting case where the need of this bi-modal processing arises is when an off-line text, written by some writer, is considered along with the on-line modality of the same text written by another writer. This happens, for example, in computer-assisted transcription of old documents, where on-line text can be used to interactively correct errors made by a main off-line HTR system.

In order to develop adequate techniques to deal with this challenging bi-modal HTR recognition task, a suitable corpus is needed. We have collected such a corpus using data (word segments) from the publicly available off-line and on-line IAM data sets.

In order to provide the Community with an useful corpus to make easy tests, and to establish baseline performance figures, we have proposed this handwritten bi-modal contest.

Here is reported the results of the contest with two participants, one of them achieved a 0% classification error rate, whilst the other participant achieved an interesting 1.5%.

Moisés Pastor, Roberto Paredes
Hybrid HMM/ANN Models for Bimodal Online and Offline Cursive Word Recognition

The recognition performance of current automatic offline handwriting transcription systems is far from being perfect. This is the reason why there is a growing interest in assisted transcription systems, which are more efficient than correcting by hand an automatic transcription. A recent approach to interactive transcription involves multi-modal recognition, where the user can supply an online transcription of some of the words. In this paper, a description of the bimodal engine, which entered the “Bi-modal Handwritten Text Recognition” contest organized during the 2010 ICPR, is presented. The proposed recognition system uses Hidden Markov Models hybridized with neural networks (HMM/ANN) for both offline and online input. The

N

-best word hypothesis scores for both the offline and the online samples are combined using a log-linear combination, achieving very satisfying results.

S. España-Boquera, J. Gorbe-Moya, F. Zamora-Martínez, M. J. Castro-Bleda

CAMCOM 2010 – Verification of Video Source Camera Competition

Verification of Video Source Camera Competition (CAMCOM 2010)

Digital cameras are being integrated in a large number of mobile devices. These devices may be used to record illegal activities, or the recordings themselves may be illegal. Due to the tight integration of these mobile devices with the internet, these recordings may quickly find their way to internet video-sharing sites such as YouTube. In criminal casework it is advantageous to reliably establish the source of the video. Although this was shown to be doable for relatively high quality video, it is unknown how these systems perform for low quality transcoded videos. The

CAMCOM2010

contest is organized to create a benchmark for source video identification, where the videos originate from YouTube. Despite the number of participants was satisfactory initially, only two participants submitted results, mostly due to a lack of time. Judging by the performance of the contestants, this is certainly not a trivial problem.

Wiger van Houten, Zeno Geradts, Katrin Franke, Cor Veenman

CDC – Classifier Domains of Competence

The Landscape Contest at ICPR 2010

The landscape contest

provides a new and configurable framework to evaluate the robustness of supervised classification techniques and detect their limitations. By means of an evolutionary multiobjective optimization approach, artificial data sets are generated to cover reachable regions in different dimensions of data complexity space. Systematic comparison of a diverse set of classifiers highlights their merits as a function of data complexity. Detailed analysis of their comparative behavior in different regions of the space gives guidance to potential improvements of their performance. In this paper we describe the process of data generation and discuss performances of several well-known classifiers as well as the contestants’ classifiers over the obtained data sets.

Núria Macià, Tin Kam Ho, Albert Orriols-Puig, Ester Bernadó-Mansilla
Feature-Based Dissimilarity Space Classification

General dissimilarity-based learning approaches have been proposed for dissimilarity data sets [1,2]. They often arise in problems in which direct comparisons of objects are made by computing pairwise distances between images, spectra, graphs or strings.

Dissimilarity-based classifiers can also be defined in vector spaces [3]. A large comparative study has not been undertaken so far. This paper compares dissimilarity-based classifiers with traditional feature-based classifiers, including linear and nonlinear SVMs, in the context of the ICPR 2010 Classifier Domains of Competence contest. It is concluded that the feature-based dissimilarity space classification performs similar or better than the linear and nonlinear SVMs, as averaged over all 301 datasets of the contest and in a large subset of its datasets. This indicates that these classifiers have their own domain of competence.

Robert P. W. Duin, Marco Loog, Elżbieta Pȩkalska, David M. J. Tax
IFS-CoCo in the Landscape Contest: Description and Results

In this work, we describe the main features of IFS-CoCo, a coevolutionary method performing instance and feature selection for nearest neighbor classifiers. The coevolutionary model and several related background topics are revised, in order to present the method to the ICPR’10 contest “Classifier domains of competence: The Landscape contest”. The results obtained show that our proposal is a very competitive approach in the domains considered, outperforming both the benchmark results of the contest and the nearest neighbor rule.

Joaquín Derrac, Salvador García, Francisco Herrera
Real-Valued Negative Selection (RNS) for Classification Task

This work presents a classification technique based on artificial immune system (AIS). The method consists of a modification of the real-valued negative selection (RNS) algorithm for pattern recognition. Our approach considers a modification in two of the algorithm parameters: the detector radius and the number of detectors for each class. We present an illustrative example. Preliminary results obtained shows that our approach is promising. Our implementation is developed in Java using the Weka environment.

Luiz Otávio Vilas Boas Oliveira, Isabela Neves Drummond

GEPR – Graph Embedding for Pattern Recognition

Graph Embedding for Pattern Recognition

This is the report of the first contest on Graph Embedding for Pattern Recognition, hosted at the ICPR2010 conference in Istanbul. The aim is to define an effective algorithm to represent graph-based structures in terms of vector spaces, to enable the use of the methodologies and tools developed in the statistical Pattern Recognition field. For this contest, a large dataset of graphs derived from three available image databases has been constructed, and a quantitative performance measure has been defined. Using this measure, the algorithms submitted by the contest participants have been experimentally evaluated.

Pasquale Foggia, Mario Vento
Graph Embedding Using Constant Shift Embedding

In the literature, although structural representations (e.g. graph) are more powerful than feature vectors in terms of representational abilities, many robust and efficient methods for classification (unsupervised and supervised) have been developed for feature vector representations. In this paper, we propose a graph embedding technique based on the constant shift embedding which transforms a graph to a real vector. This technique gives the abilities to perform the graph classification tasks by procedures based on feature vectors. Through a set of experiments we show that the proposed technique outperforms the classification in the original graph domain and the other graph embedding techniques.

Salim Jouili, Salvatore Tabbone
A Fuzzy-Interval Based Approach for Explicit Graph Embedding

We present a new method for explicit graph embedding. Our algorithm extracts a feature vector for an undirected attributed graph. The proposed feature vector encodes details about the number of nodes, number of edges, node degrees, the attributes of nodes and the attributes of edges in the graph. The first two features are for the number of nodes and the number of edges. These are followed by

w

features for node degrees,

m

features for

k

node attributes and

n

features for

l

edge attributes — which represent the distribution of node degrees, node attribute values and edge attribute values, and are obtained by defining (in an unsupervised fashion), fuzzy-intervals over the list of node degrees, node attributes and edge attributes. Experimental results are provided for sample data of ICPR2010 contest GEPR.

Muhammad Muzzamil Luqman, Josep Lladós, Jean-Yves Ramel, Thierry Brouard

ImageCLEF@ICPR – Information Fusion Task

The ImageCLEF Medical Retrieval Task at ICPR 2010 — Information Fusion to Combine Visual and Textual Information

An increasing number of clinicians, researchers, educators and patients routinely search for medical information on the Internet as well as in image archives. However, image retrieval is far less understood and developed than text–based search. The ImageCLEF medical image retrieval task is an international benchmark that enables researchers to assess and compare techniques for medical image retrieval using standard test collections. Although text retrieval is mature and well researched, it is limited by the quality and availability of the annotations associated with the images. Advances in computer vision have led to methods for using the image itself as search entity. However, the success of purely content–based techniques has been limited and these systems have not had much clinical success. On the other hand a combination of text– and content–based retrieval can achieve improved retrieval performance if combined effectively. Combining visual and textual runs is not trivial based on experience in ImageCLEF. The goal of the fusion challenge at ICPR is to encourage participants to combine visual and textual results to improve search performance. Participants were provided textual and visual runs, as well as the results of the manual judgments from ImageCLEFmed 2008 as training data. The goal was to combine textual and visual runs from 2009. In this paper, we present the results from this ICPR contest.

Henning Müller, Jayashree Kalpathy-Cramer
ISDM at ImageCLEF 2010 Fusion Task

Nowadays, one of the main problems in information retrieval is filtering the great amount of information currently available. Late fusion techniques merge the outcomes of different information retrieval systems to generate a single result that, hopefully, could increase the overall performance by taking advantage of the strengths of all the individual systems. These techniques have a great flexibility and allow an efficient development of multimedia retrieval systems. The growing interest on these technologies has led to the creation of a subtrack in the ImageCLEF entirely devoted to them: the information fusion task. In this work, Intelligent Systems and Data Mining group approach to that task is presented. We propose the use of an evolutive algorithm to estimate the parameters of three of all the fusion approaches present in the literature.

A. Revuelta-Martínez, I. García-Varea, J. M. Puerta, L. Rodríguez
Rank-Mixer and Rank-Booster: Improving the Effectiveness of Retrieval Methods

In this work, we present two algorithms to improve the effectiveness of multimedia retrieval. One, as earlier approaches, uses several retrieval methods to improve the result, and the other uses one single method to achieve higher effectiveness. One of the advantages of the proposed algorithms is that they can be computed efficiently in top of existing indexes. Our experimental evaluation over 3D object datasets shows that the proposed techniques outperforms the multimetric approach and previously existing rank fusion methods.

Sebastian Kreft, Benjamin Bustos
Information Fusion for Combining Visual and Textual Image Retrieval in ImageCLEF@ICPR

In the ImageCLEF image retrieval competition multimodal image retrieval has been evaluated over the past seven years. For ICPR 2010 a contest was organized for the fusion of visual and textual retrieval as this was one task where most participants had problems. In this paper, classical approaches such as the maximum combinations (combMAX), the sum combinations (combSUM) and the multiplication of the sum and the number of non–zero scores (combMNZ) were employed and the trade–off between two fusion effects (chorus and dark horse effects) was studied based on the sum of

n

maxima. Various normalization strategies were tried out. The fusion algorithms are evaluated using the best four visual and textual runs of the ImageCLEF medical image retrieval task 2008 and 2009. The results show that fused runs outperform the best original runs and multi–modality fusion statistically outperforms single modality fusion. The logarithmic rank penalization shows to be the most stable normalization. The dark horse effect is in competition with the chorus effect and each of them can produce best fusion performance depending on the nature of the input data.

Xin Zhou, Adrien Depeursinge, Henning Müller

ImageCLEF@ICPR – Visual Concept Detection Task

Overview of the Photo Annotation Task in ImageCLEF@ICPR

The Photo Annotation Task poses the challenge for automated annotation of 53 visual concepts in Flickr photos and was organized as part of the ImageCLEF@ICPR contest. In total, 12 research teams participated in the multilabel classification challenge while initially 17 research groups were interested and got access to the data. The participants were provided with a training set of 5,000 Flickr images with annotations, a validation set of 3,000 Flickr images with annotations and the test was performed on 10,000 Flickr images. The evaluation was carried out twofold: first the evaluation per concept was conducted by utilizing the Equal Error Rate (EER) and the Area Under Curve (AUC) and second the evaluation per example was performed with the Ontology Score (OS). Summarizing the results, an average AUC of 86.5% could be achieved, including concepts with an AUC of 96%. The classification performance for each image ranged between 59% and 100% with an average score of 85%. In comparison to the results achieved in ImageCLEF 2009, the detection performance increased for the concept-based evaluation by 2.2% EER and 2.5% AUC and showed a slight decrease for the example-based evaluation.

Stefanie Nowak
Detection of Visual Concepts and Annotation of Images Using Ensembles of Trees for Hierarchical Multi-Label Classification

In this paper, we present a hierarchical multi-label classification system for visual concepts detection and image annotation. Hierarchical multi-label classification (HMLC) is a variant of classification where an instance may belong to multiple classes at the same time and these classes/labels are organized in a hierarchy. The system is composed of two parts: feature extraction and classification/annotation. The feature extraction part provides global and local descriptions of the images. These descriptions are then used to learn a classifier and to annotate an image with the corresponding concepts. To this end, we use predictive clustering trees (PCTs), which are able to classify target concepts that are organized in a hierarchy. Our approach to HMLC exploits the annotation hierarchy by building a single predictive clustering tree that can simultaneously predict all of the labels used to annotate an image. Moreover, we constructed ensembles (random forests) of PCTs, to improve the predictive performance. We tested our system on the image database from the ImageCLEF@ICPR 2010 photo annotation task. The extensive experiments conducted on the benchmark database show that our system has very high predictive performance and can be easily scaled to large number of visual concepts and large amounts of data.

Ivica Dimitrovski, Dragi Kocev, Suzana Loskovska, Sašo Džeroski
The University of Surrey Visual Concept Detection System at ImageCLEF@ICPR: Working Notes

Visual concept detection is one of the most important tasks in image and video indexing. This paper describes our system in the ImageCLEF@ICPR Visual Concept Detection Task which ranked

first

for large-scale visual concept detection tasks in terms of Equal Error Rate (EER) and Area under Curve (AUC) and ranked

third

in terms of hierarchical measure. The presented approach involves state-of-the-art local descriptor computation, vector quantisation via clustering, structured scene or object representation via localised histograms of vector codes, similarity measure for kernel construction and classifier learning. The main novelty is the classifier-level and kernel-level fusion using Kernel Discriminant Analysis with RBF/Power Chi-Squared kernels obtained from various image descriptors. For 32 out of 53 individual concepts, we obtain the best performance of all 12 submissions to this task.

M. A. Tahir, F. Yan, M. Barnard, M. Awais, K. Mikolajczyk, J. Kittler

ImageCLEF@ICPR – Robot Vision Task

Overview of the ImageCLEF@ICPR 2010 Robot Vision Track

This paper describes the robot vision track that has been proposed to the ImageCLEF@ICPR2010 participants. The track addressed the problem of visual place classification. Participants were asked to classify rooms and areas of an office environment on the basis of image sequences captured by a stereo camera mounted on a mobile robot, under varying illumination conditions. The algorithms proposed by the participants had to answer the question “where are you?” (I am in the kitchen, in the corridor, etc) when presented with a test sequence imaging rooms seen during training (from different viewpoints and under different conditions), or additional rooms that were not imaged in the training sequence. The participants were asked to solve the problem separately for each test image (obligatory task). Additionally, results could also be reported for algorithms exploiting the temporal continuity of the image sequences (optional task). A total of eight groups participated to the challenge, with 25 runs submitted to the obligatory task, and 5 submitted to the optional task. The best result in the obligatory task was obtained by the Computer Vision and Geometry Laboratory, ETHZ, Switzerland, with an overall score of 3824.0. The best result in the optional task was obtained by the Intelligent Systems and Data Mining Group, University of Castilla-La Mancha, Albacete, Spain, with an overall score of 3881.0.

Andrzej Pronobis, Henrik I. Christensen, Barbara Caputo
Methods for Combined Monocular and Stereo Mobile Robot Localization

This paper describes an approach for mobile robot localization using a visual word based place recognition approach. In our approach we exploit the benefits of a stereo camera system for place recognition. Visual words computed from SIFT features are combined with VIP (viewpoint invariant patches) features that use depth information from the stereo setup. The approach was evaluated under the ImageCLEF@ICPR 2010 competition. The results achieved on the competition datasets are published in this paper.

Friedrich Fraundorfer, Changchang Wu, Marc Pollefeys
PicSOM Experiments in ImageCLEF RobotVision

The PicSOM multimedia analysis and retrieval system has previously been successfully applied to supervised concept detection in image and video databases. Such concepts include locations and events and objects of a particular type. In this paper we apply the general-purpose visual category recognition algorithm in PicSOM to the recognition of indoor locations in the ImageCLEF/ICPR RobotVision 2010 contest. The algorithm uses bag-of-visual-words and other visual features with fusion of SVM classifiers. The results show that given a large enough training set, a purely appearance-based method can perform very well – ranked first for one of the contest’s training sets.

Mats Sjöberg, Markus Koskela, Ville Viitaniemi, Jorma Laaksonen
Combining Image Invariant Features and Clustering Techniques for Visual Place Classification

This paper presents the techniques developed by the SIMD group and the results obtained for the 2010 RobotVision task in the ImageCLEF competition. The approach presented tries to solve the problem of robot localization using only visual information. The proposed system presents a classification method using training sequences acquired under different lighting conditions. Well-known SIFT and RANSAC techniques are used to extract invariant points from the images used as training information. Results obtained in the RobotVision@ImageCLEF competition proved the goodness of the proposal.

Jesús Martínez-Gómez, Alejandro Jiménez-Picazo, José A. Gámez, Ismael García-Varea

MOBIO – Mobile Biometry Face and Speaker Verification Evaluation

On the Results of the First Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation

This paper evaluates the performance of face and speaker verification techniques in the context of a mobile environment. The mobile environment was chosen as it provides a realistic and challenging test-bed for biometric person verification techniques to operate. For instance the audio environment is quite noisy and there is limited control over the illumination conditions and the pose of the subject for the video. To conduct this evaluation, a part of a database captured during the “Mobile Biometry” (MOBIO) European Project was used. In total there were nine participants to the evaluation who submitted a face verification system and five participants who submitted speaker verification systems. The results have shown that the best performing face and speaker verification systems obtained the same level of performance, respectively 10.9% and 10.6% of HTER.

Sébastien Marcel, Chris McCool, Pavel Matějka, Timo Ahonen, Jan Černocký, Shayok Chakraborty, Vineeth Balasubramanian, Sethuraman Panchanathan, Chi Ho Chan, Josef Kittler, Norman Poh, Benoît Fauve, Ondřej Glembek, Oldřich Plchot, Zdeněk Jančík, Anthony Larcher, Christophe Lévy, Driss Matrouf, Jean-François Bonastre, Ping-Han Lee, Jui-Yu Hung, Si-Wei Wu, Yi-Ping Hung, Lukáš Machlica, John Mason, Sandra Mau, Conrad Sanderson, David Monzo, Antonio Albiol, Hieu V. Nguyen, Li Bai, Yan Wang, Matti Niskanen, Markus Turtinen, Juan Arturo Nolazco-Flores, Leibny Paola Garcia-Perera, Roberto Aceves-Lopez, Mauricio Villegas, Roberto Paredes

PR in HIMA – Pattern Recognition in Histopathological Images

Pattern Recognition in Histopathological Images: An ICPR 2010 Contest

The advent of digital whole-slide scanners in recent years has spurred a revolution in imaging technology for histopathology. In order to encourage further interest in histopathological image analysis, we have organized a contest called “Pattern Recognition in Histopathological Image Analysis.” This contest aims to bring some of the pressing issues facing the advance of the rapidly emerging field of digital histology image analysis to the attention of the wider pattern recognition and medical image analysis communities. Two sample histopathological problems are explored: counting lymphocytes and centroblasts. The background to these problems and the evaluation methodology are discussed.

Metin N. Gurcan, Anant Madabhushi, Nasir Rajpoot
A Classification Scheme for Lymphocyte Segmentation in H&E Stained Histology Images

A technique for automating the detection of lymphocytes in histopathological images is presented. The proposed system takes Hematoxylin and Eosin (H&E) stained digital color images as input to identify lymphocytes. The process involves segmentation of cells from extracellular matrix, feature extraction, classification and overlap resolution. Extracellular matrix segmentation is a two step process carried out on the HSV-equivalent of the image, using mean shift based clustering for color approximation followed by thresholding in the HSV space. Texture features extracted from the cells are used to train a SVM classifier that is used to classify lymphocytes and non-lymphocytes. A contour based overlap resolution technique is used to resolve overlapping lymphocytes.

Manohar Kuse, Tanuj Sharma, Sudhir Gupta
Identifying Cells in Histopathological Images

We present an image analysis pipeline for identifying cells in histopathology images of cancer. The analysis starts with segmentation using multi-phase level sets, which is insensitive to initialization and enables automatic detection of arbitrary objects. Morphological operations are used to remove small spots in the segmented images. The target cells are then identified based on their features. The detected cells were compared with the manual detection performed by pathologists. The quantitative evaluation shows promise and utility of our technique.

Jierong Cheng, Merlin Veronika, Jagath C. Rajapakse
Lymphocyte Segmentation Using the Transferable Belief Model

In the context of several pathologies, the presence of lymphocytes has been correlated with disease outcome. The ability to automatically detect

lymphocyte nuclei

on histopathology imagery could potentially result in the development of an image based prognostic tool. In this paper we present a method based on the estimation of a mixture of Gaussians for determining the probability distribution of the principal image component. Then, a post-processing stage eliminates regions, whose shape is not similar to the

nuclei

searched. Finally, a Transferable Belief Model is used to detect the

lymphocyte nuclei

, and a shape based algorithm possibly splits them under an equal area and an eccentricity constraint principle.

Costas Panagiotakis, Emmanuel Ramasso, Georgios Tziritas
Counting Lymphocytes in Histopathology Images Using Connected Components

In this paper, a method for automatic counting of lymphocytes in histopathology images using connected components is presented. Our multi-step approach can be divided into two main parts: processing of histopathology images, and recognition of interesting regions. In the processing part, we use thresholding and morphology methods as well as connected components to improve the quality of the images for recognition. The recognition part is based on a modified template matching method. The experimental results achieved for our algorithm prove its high robustness for this kind of applications.

Felix Graf, Marcin Grzegorzek, Dietrich Paulus

SDHA 2010 – Semantic Description of Human Activities

An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010

This paper summarizes results of the 1st Contest on Semantic Description of Human Activities (SDHA), in conjunction with ICPR 2010. SDHA 2010 consists of three types of challenges, High-level Human Interaction Recognition Challenge, Aerial View Activity Classification Challenge, and Wide-Area Activity Search and Recognition Challenge. The challenges are designed to encourage participants to test existing methodologies and develop new approaches for complex human activity recognition scenarios in realistic environments. We introduce three new public datasets through these challenges, and discuss results of the state-of-the-art activity recognition systems designed and implemented by the contestants. A methodology using a spatio-temporal voting [19] successfully classified segmented videos in the UT-Interaction datasets, but had a difficulty correctly localizing activities from continuous videos. Both the method using local features [10] and the HMM based method [18] recognized actions from low-resolution videos (i.e. UT-Tower dataset) successfully. We compare their results in this paper.

M. S. Ryoo, Chia-Chih Chen, J. K. Aggarwal, Amit Roy-Chowdhury
HMM Based Action Recognition with Projection Histogram Features

Hidden Markov Models (HMM) have been widely used for action recognition, since they allow to easily model the temporal evolution of a single or a set of numeric features extracted from the data. The selection of the feature set and the related emission probability function are the key issues to be defined. In particular, if the training set is not sufficiently large, a manual or automatic feature selection and reduction is mandatory. In this paper we propose to model the emission probability function as a Mixture of Gaussian and the feature set is obtained from the projection histograms of the foreground mask. The projection histograms contain the number of moving pixel for each row and for each column of the frame and they provide sufficient information to infer the instantaneous posture of the person. Then, the HMM framework recovers the temporal evolution of the postures recognizing in such a manner the global action. The proposed method have been successfully tested on the UT-Tower and on the Weizmann Datasets.

Roberto Vezzani, Davide Baltieri, Rita Cucchiara
Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels

A novel framework for action recognition in video using empirical covariance matrices of bags of low-dimensional feature vectors is developed. The feature vectors are extracted from segments of silhouette tunnels of moving objects and coarsely capture their shapes. The matrix logarithm is used to map the segment covariance matrices, which live in a nonlinear Riemannian manifold, to the vector space of symmetric matrices. A recently developed sparse linear representation framework for dictionary-based classification is then applied to the log-covariance matrices. The log-covariance matrix of a query segment is approximated by a sparse linear combination of the log-covariance matrices of training segments and the sparse coefficients are used to determine the action label of the query segment. This approach is tested on the Weizmann and the UT-Tower human action datasets. The new approach attains a segment-level classification rate of 96.74% for the Weizmann dataset and 96.15% for the UT-Tower dataset. Additionally, the proposed method is computationally and memory efficient and easy to implement.

Kai Guo, Prakash Ishwar, Janusz Konrad
Variations of a Hough-Voting Action Recognition System

This paper presents two variations of a Hough-voting framework used for action recognition and shows classification results for low-resolution video and videos depicting human interactions. For low-resolution videos, where people performing actions are around 30 pixels, we adopt low-level features such as gradients and optical flow. For group actions with human-human interactions, we take the probabilistic action labels from the Hough-voting framework for single individuals and combine them into group actions using decision profiles and classifier combination.

Daniel Waltisberg, Angela Yao, Juergen Gall, Luc Van Gool
Backmatter
Metadata
Title
Recognizing Patterns in Signals, Speech, Images and Videos
Editors
Devrim Ünay
Zehra Çataltepe
Selim Aksoy
Copyright Year
2010
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-17711-8
Print ISBN
978-3-642-17710-1
DOI
https://doi.org/10.1007/978-3-642-17711-8

Premium Partner