Skip to main content

2015 | Buch

Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications

20th Iberoamerican Congress, CIARP 2015, Montevideo, Uruguay, November 9-12, 2015, Proceedings

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 20th Iberoamerican Congress on Pattern Recognition, CIARP 2015, held in Montevideo, Uruguay, in November 2015.

The 95 papers presented were carefully reviewed and selected from 185 submissions. The papers are organized in topical sections on applications on pattern recognition; biometrics; computer vision; gesture recognition; image classification and retrieval; image coding, processing and analysis; segmentation, analysis of shape and texture; signals analysis and processing; theory of pattern recognition; video analysis, segmentation and tracking.

Inhaltsverzeichnis

Frontmatter

Applications of Pattern Recognition

Frontmatter
Evaluating Imputation Techniques for Missing Data in ADNI: A Patient Classification Study

In real-world applications it is common to find data sets whose records contain missing values. As many data analysis algorithms are not designed to work with missing data, all variables associated with such records are generally removed from the analysis. A better alternative is to employ data imputation techniques to estimate the missing values using statistical relationships among the variables. In this work, we test the most common imputation methods used in the literature for filling missing records in the ADNI (Alzheimer’s Disease Neuroimaging Initiative) data set, which affects about 80% of the patients–making unwise the removal of most of the data. We measure the imputation error of the different techniques and then evaluate their impact on classification performance. We train support vector machine and random forest classifiers using all the imputed data as opposed to a reduced set of samples having complete records, for the task of discriminating among different stages of the Alzheimer’s disease. Our results show the importance of using imputation procedures to achieve higher accuracy and robustness in the classification.

Sergio Campos, Luis Pizarro, Carlos Valle, Katherine R. Gray, Daniel Rueckert, Héctor Allende
Genetic Prediction in Bovine Meat Production: Is Worth Integrating Bayesian and Machine Learning Approaches? a Comprenhensive Analysis

Genomic prediction is a still growing field, as good predictions can have important economic impact in both, agronomics and health. In this article, we make a brief review and a comprehensive analysis of classical predictors used in the area. We propose a strategy to choose and ensemble of methods and to combine their results, to take advantage of the complementarity that some predictors have.

Maria Ines Fariello, Eileen Amstrong, Alicia Fernandez
Classification of Low-Level Atmospheric Structures Based on a Pyramid Representation and a Machine Learning Method

The atmosphere is a highly complex fluid system where multiple intrinsic and extrinsic phenomena superpose at same spatial and temporal dominions and different scales, making its characterization a challenging task. Despite the novel methods for pattern recognition and detection available in the literature, most of climate data analysis and weather forecast rely on the ability of specialized personnel to visually detect atmospheric patterns present in climate data plots. This paper presents a method for classifying low-level wind flow configurations, namely: confluences, difluences, vortices and saddle points. The method combines specialized image features to capture the particular structure of low-level wind flow configurations through a pyramid layout representation and a state-of-the-art machine learning classification method. The method was validated on a set of volumes extracted from climate simulations and manually annotated by experts. The best results into the independent test dataset was 0.81 of average accuracy among the four atmospheric structures.

Sebastián Sierra, Juan F. Molina, Angel Cruz-Roa, José Daniel Pabón, Raúl Ramos-Pollán, Fabio A. González, Hugo Franco
Inferring Leaf Blade Development from Examples

Morphogenesis is the process by which plant tissues are organized and differentiated to determine the morphological structure of their organs. Understanding leaf blade morphogenesis is a major unsolved challenge in plant sciences. Despite the advances, until now there is no a clear understanding of the physiological mechanisms underlying these morphological changes. In this work, we present a novel automatic approach to infer the geometrical structure of a leaf blade developmental model out of samples of sequences of the leaf development. The main idea is to infer the set of parameters of a non-linear ordinary differential equation model based on relative elementary rates of growth, which better adjusts an empirical leaf blade developmental sequence that was extracted from real images. From the resulting models leaf shape simulations were calculated. These simulations were compared against the 12 real sequences of leaf blade growing. The results show that the proposed method is able properly infer leaf blade parameters of leaf development for a variety of leaf shapes, both in simulated and real sequences.

María Escobar, Mary Berdugo, Orlando Rangel-Ch, Francisco Gómez
Human Skin Segmentation Improved by Texture Energy Under Superpixels

Several applications demand the segmentation of images in skin and non-skin regions, such as face recognition, hand gesture detection, nudity recognition, among others. Human skin detection is still a challenging task and, although color attribute is a very important clue, it usually generates high rate of false positives. This work proposes and analyzes a skin segmentation method improved by texture energy. Experimental results on a challenging public data set demonstrate significant improvement of the proposed skin segmentation method over color-based state-of-the-art approaches.

Anderson Santos, Helio Pedrini
Integrative Functional Analysis Improves Information Retrieval in Breast Cancer

Gene expression analysis does not end in a list of differentially expressed (DE) genes, but requires a comprehensive functional analysis (FA) of the underlying molecular mechanisms. Gene Set and Singular Enrichment Analysis (GSEA and SEA) over Gene Ontology (GO) are the most used FA approaches. Several statistical methods have been developed and compared in terms of computational efficiency and/or appropriateness. However, none of them were evaluated from a biological point of view or in terms of consistency on information retrieval. In this context, questions regarding “are methods comparable?”, “is one of them preferable to the others?”, “how sensitive are they to different parameterizations?” All of them are crucial questions to face prior choosing a FA tool and they have not been, up to now, fully addressed.In this work we evaluate and compare the effect of different methods and parameters from an information retrieval point of view in both GSEA and SEA under GO. Several experiments comparing breast cancer subtypes with known different outcome (i.e. Basal-Like vs. Luminal A) were analyzed. We show that GSEA could lead to very different results according to the used statistic, model and parameters. We also show that GSEA and SEA results are fairly overlapped, indeed they complement each other. Also an integrative framework is proposed to provide complementary and a stable enrichment information according to the analyzed datasets.

Juan Cruz Rodriguez, Germán González, Cristobal Fresno, Elmer A. Fernández
Color Fractal Descriptors for Adaxial Epidermis Texture Classification

The leaves are an important plant organ and source of information for the traditional plant taxonomy. This study proposes a plant classification approach using the adaxial epidermis tissue, a specific cell layer that covers the leaf. To accomplish this task, we apply a high discriminative color texture analysis method based on the Bouligand-Minkowski fractal dimension. In an experimental comparison, the success rate obtained by our proposed approach ($$96.66\%$$) was the highest among all the methods used, demonstrating that the Bouligand-Minkowski method is very suitable to extract discriminant features from the adaxial epidermis. Thus, this research can significantly contribute with other studies on plant classification by using computer vision.

André R. Backes, Jarbas Joaci de Mesquita Sá Junior, Rosana Marta Kolb
A Multiclass Approach for Land-Cover Mapping by Using Multiple Data Sensors

An usual way to acquire information about monitored objects or areas in earth surface is by using remote sensing images. These images can be obtained by different types of sensors (e.g., active and passive) and according to the sensor, distinct properties can be observed from the specified data. Typically, these sensors are specialized to encode one or few properties from the object (e.g. spectral and spatial properties), which makes necessary the use of diverse and different sensors to obtain complementary information. Given the amount of information collected, it is essential to use a suitable technique to combine the different features. In this work, we propose a new late fusion technique, a majority voting scheme, which is able to exploit the diversity of different types of features, extracted from different sensors. The new approach is evaluated in an urban classification scenario, achieving statistically better results in comparison with the proposed baselines.

Edemir Ferreira de Andrade Jr., Arnaldo de Albuquerque Araújo, Jefersson A. dos Santos
Coffee Crop Recognition Using Multi-scale Convolutional Neural Networks

Identifying crops from remote sensing images is a fundamental to know and monitor land-use. However, manual identification is expensive and maybe impracticable given the amount data. Automatic methods, although interesting, are highly dependent on the quality of extracted features, since encoding the spatial features in an efficient and robust fashion is the key to generating discriminatory models. Even though many visual descriptors have been proposed or successfully used to encode spatial features, in some cases, more specific description are needed. Deep learning has achieved very good results in some tasks, mainly boosted by the feature learning performed which allows the method to extract specific and adaptable visual features depending on the data In this paper, we propose two multi-scale methods, based on deep learning, to identify coffee crops. Specifically, we propose the Cascade Convolutional Neural Networks, or simply CCNN, that identifies crops considering a hierarchy of networks and, also, propose the Iterative Convolutional Neural Network, called ICNN, which feeds a same network with data several times. We conducted a systematic evaluation of the proposed algorithms using a remote sensing dataset. The experiments show that the proposed methods outperform the baseline consistent of state-of-the-art components by a factor that ranges from 3 to 6%, in terms of average accuracy.

Keiller Nogueira, William Robson Schwartz, Jefersson A. dos Santos
A Grading Strategy for Nuclear Pleomorphism in Histopathological Breast Cancer Images Using a Bag of Features (BOF)

Nuclear pleomorphism is an early breast cancer (BCa) indicator that assesses any nuclear size, shape or chromatin appearance variations. Research involving the ranking by several experts shows that kappa coefficient ranges from 0.3(low) to 0.5 (moderate)[12]. In this work, an automatic grading approach for nuclear pleomorphism is proposed. First, a large nuclei sample is characterized by a multi-scale descriptor that is then assigned to the most similar atom of a previously learned dictionary. An occurrence histogram represents then any Field of View (FoV) in terms of the occurrence of the descriptors with respect to the learned atoms of the dictionary. Finally, a SVM classifier assigns a full pleomorphism grading, between 1 and 3, using the previous histogram. The strategy was evaluated extracting 134 FoV ($$\times 20$$), graded by a pathologist, from 14 BCa slides of ’The Cancer Genome Atlas’ (TCGA) database.The obtained precision and recall measures were 0.67 and 0.67.

Ricardo Moncayo, David Romo-Bucheli, Eduardo Romero
Optimal and Linear F-Measure Classifiers Applied to Non-technical Losses Detection

Non-technical loss detection represents a very high cost to power supply companies. Finding classifiers that can deal with this problem is not easy as they have to face a high imbalance scenario with noisy data. In this paper we propose to use Optimal F-measure Classifier (OFC) and Linear F-measure Classifier (LFC), two novel algorithms that are designed to work in problems with unbalanced classes. We compare both algorithm performances with other previously used methods to solve automatic fraud detection problem.

Fernanda Rodriguez, Matías Di Martino, Juan Pablo Kosut, Fernando Santomauro, Federico Lecumberry, Alicia Fernández
A Multimodal Approach for Percussion Music Transcription from Audio and Video

A multimodal approach for percussion music transcription from audio and video recordings is proposed in this work. It is part of an ongoing research effort for the development of tools for computer-aided analysis of Candombe drumming, a popular afro-rooted rhythm from Uruguay. Several signal processing techniques are applied to automatically extract meaningful information from each source. This involves detecting certain relevant objects in the scene from the video stream. The location of events is obtained from the audio signal and this information is used to drive the processing of both modalities. Then, the detected events are classified by combining the information from each source in a feature-level fusion scheme. The experiments conducted yield promising results that show the advantages of the proposed method.

Bernardo Marenco, Magdalena Fuentes, Florencia Lanzaro, Martín Rocamora, Alvaro Gómez
Modeling Onset Spectral Features for Discrimination of Drum Sounds

Motivated by practical problems related to ongoing research on Candombe drumming (a popular afro-rooted rhythm from Uruguay), this paper proposes an approach for recognizing drum sounds in audio signals that models for sound classification the same audio spectral features employed in onset detection. Among the reported experiments involving recordings of real performances, one aims at finding the predominant Candombe drum heard in an audio file, while the other attempts to identify those temporal segments within a performance when a given sound pattern is played. The attained results are promising and suggest many ideas for future research.

Martín Rocamora, Luiz W. P. Biscainho
Classification of Basic Human Emotions from Electroencephalography Data

This paper explores the combination of known signal processing techniques to analyze electroencephalography (EEG) data for the classification of a set of basic human emotions. An Emotiv EPOC headset with 16 electrodes was used to measure EEG data from a population of 24 subjects who were presented an audiovisual stimuli designed to evoke 4 emotions (rage, fear, fun and neutral). Raw data was preprocessed to eliminate noise, interference and physiologic artifacts. Discrete Wavelet Transform (DWT) was used to extract its main characteristics and define relevant features. Classification was performed using different algorithms and results compared. The best results were obtained when using meta-learning techniques with classification errors at 5 %. Final conclusions and future work are discussed.

Ximena Fernández, Rosana García, Enrique Ferreira, Juan Menéndez
A Non-parametric Approach to Detect Changes in Aerial Images

Detecting changes in aerial images acquired from a scene at different times, possibly with different cameras and at different view points, is a crucial step for many image processing and computer vision applications, such as remote sensing, visual surveillance and civil infrastructure. In this paper, we propose a novel approach to automatically detect changes based on local descriptors and a non-parametric image block modeling. Differently from most approaches, which are pixel-based, our approach combines contextual information and kernel density estimation to model the image regions to identify changes. The experimental results show the effectiveness of the proposed approach compared to other methods in the literature, demonstrating the robustness of our algorithm. The results also demonstrate that the approach can be employed to generate a summary containing mostly frames presenting significant changes.

Marco Túlio Alves Rodrigues, Daniel Balbino, Erickson Rangel Nascimento, William Robson Schwartz

Biometrics

Frontmatter
A New Fingerprint Indexing Algorithm for Latent and Non-latent Impressions Identification

In this work, a new fingerprint identification algorithm for latent and non-latent impressions based on indexing techniques is presented. This proposal uses a minutia triplet state-of-the-art representation, which has proven to be very tolerant to distortions. Also, a novel strategy to partition the indexes is implemented, in the retrieving stage. This strategy allows to use the algorithm in both contexts, criminal and non-criminal. The experimental results show that in latent identification this approach has a 91.08% of hit rate at penetration rate of 20%, on NIST27 database using a large background of 267000 rolled impressions. Meanwhile in non-latent identification at the same penetration rate, the algorithm reaches a hit rate of 97.8% on NIST4 database and a 100% of hit rate on FVC2004 DB1A database. These accuracy values were reached with a high efficiency.

José Hernández-Palancar, Alfredo Muñoz-Briseño
Homogeneity Measure for Forensic Voice Comparison: A Step Forward Reliability

In forensic voice comparison, it is strongly recommended to follow the Bayesian paradigm to present a forensic evidence to the court. In this paradigm, the strength of the forensic evidence is summarized by a likelihood ratio (LR). But in the real world, to base only on the LR without looking to its degree of reliability does not allow experts to have a good judgement. This work is mainly motivated by the need to quantify this reliability. In this concept, we think that the presence of speaker specific information and its homogeneity between the two signals to compare should be evaluated. This paper is dedicated to the latter, the homogeneity. We propose an information theory based homogeneity measure which determines whether a voice comparison is feasible or not.

Moez Ajili, Jean-François Bonastre, Solange Rossato, Juliette Kahn, Itshak Lapidot
On Multiview Analysis for Fingerprint Liveness Detection

Fingerprint recognition systems, as any other biometric system, can be subject to attacks, which are usually carried out using artificial fingerprints. Several approaches to discriminate between live and fake fingerprint images have been presented to address this issue. These methods usually rely on the analysis of individual features extracted from the fingerprint images. Such features represent different and complementary views of the object in analysis, and their fusion is likely to improve the classification accuracy. However, very little work in this direction has been reported in the literature. In this work, we present the results of a preliminary investigation on multiview analysis for fingerprint liveness detection. Experimental results show the effectiveness of such approach, which improves previous results in the literature.

Amirhosein Toosi, Sandro Cumani, Andrea Bottino
Online Signature Verification: Is the Whole Greater Than the Sum of the Parts?

To choose the best features to model the signatures is one of the most challenging problems in online signature verification. In this paper, the idea is to evaluate whether it would be possible to combine different feature sets selected by different criteria in such a way that their main characteristics could be properly exploited and the verification performance could be improved with respect to the case of using each set individually. In particular, the combination of an automatically selected feature set, a feature set inspired by the ones used by Forensic Handwriting Experts (FHEs), and a set of global features is proposed. Two different fusion strategies are used to perform the combination, namely, a decision level fusion scheme and a pre-classification scheme. Experimental results show that the proposed feature combination approaches result not only in improvements regarding the verification error rates but also the simplicity, flexibility and interpretability of the verification system.

Marianela Parodi, Juan Carlos Gómez
Fingerprint Matching Using a Geometric Subgraph Mining Approach

In the present work, a new representation of fingerprints in form of geometric graph, is proposed. This representation is obtained by fusing two previously defined approaches found in the literature and proves to be very tolerant to occlusions and distortions in the minutiae. Also, a novel matching fingerprint algorithm that uses geometric graphs was introduced. The mentioned algorithm applies frequent geometric subgraph mining in order to match fingerprint representations for computing a final similarity score. The introduced proposal reports very promising accuracy values and it applies a new approach allowing many future improvements.

Alfredo Muñoz-Briseño, Andrés Gago-Alonso, José Hernández-Palancar
Improving Writer Identification Through Writer Selection

In this work we present a method for selecting instances for a writer identification system underpinned on the dissimilarity representation and a holistic representation based on texture. The proposed method is based on a genetic algorithm that surpasses the limitations imposed by large training sets by selecting writers instead of instances. To show the efficiency of the proposed method, we have performed experiments on three different databases (BFL, IAM, and Firemaker) where we can observe not only a reduction of about 50% in the number of writers necessary to build the dissimilarity model but also a gain in terms of identification rate. Comparing the writer selection with the traditional instance selection, we could observe that both strategies produce similar results but the former converges about three times faster.

Diego Bertolini, Luiz S. Oliveira, Robert Sabourin
One-Shot 3D-Gradient Method Applied to Face Recognition

In this work we describe a novel one-shot face recognition setup. Instead of using a 3D scanner to reconstruct the face, we acquire a single photo of the face of a person while a rectangular pattern is been projected over it. Using this unique image, it is possible to extract 3D low-level geometrical features without the explicit 3D reconstruction. To handle expression variations and occlusions that may occur (e.g. wearing a scarf or a bonnet), we extract information just from the eyes-forehead and nose regions which tend to be less influenced by facial expressions. Once features are extracted, SVM hyper-planes are obtained from each subject on the database (one vs all approach), then new instances can be classified according to its distance to each of those hyper-planes. The advantage of our method with respect to other ones published in the literature, is that we do not need and explicit 3D reconstruction. Experiments with the Texas 3D Database and with new acquired data are presented, which shows the potential of the presented framework to handle different illumination conditions, pose and facial expressions.

J. Matías Di Martino, Alicia Fernández, José Ferrari
Iris Texture Description Using Ordinal Co-occurrence Matrix Features

Feature extraction is one of the fundamental steps of any biometric recognition system. The biometric iris recognition is not an exception. In the last 30 years a lot of algorithms have been proposed seeking a better description of the texture image of the human iris. The problem still remains into find features that are robust to the different conditions in which the iris images are captured. This paper proposes a new iris texture description based on ordinal co-occurrence matrix features for iris recognition scheme that increases the recognition accuracy. The novelty of this work is the new strategy in applying robust feature extraction method for texture description in iris recognition. The experiments with the Casia-Interval, Casia-Thousands and Ubiris-v1 databases show that our scheme increases the recognition accuracy and it is robust to different condition of image capture.

Yasser Chacón-Cabrera, Man Zhang, Eduardo Garea-Llano, Zhenan Sun
Confounding Factors in Keystroke Dynamics

Authentication is the verification of the identity of a person to access a resource or perform an activity. Authentication based on keystroke dynamics biometrics validates a legitimate user, comparing his typing on keyboard with his stored template. An important group of factors influences the capture of the raw data generated by the user’s typing. These Confounding Factors have been addressed in the literature from different approaches, and most of these studies agree that their influence affects the reliability of Keystroke Dynamics. In this research, a taxonomy of Confounding Factors is proposed, and several mitigation actions are discussed to face them.

Osvaldo Andrés Pérez-García
Denoising Autoencoder for Iris Recognition in Noncooperative Environments

The iris is considered as the most unique phenotype feature visible in a person’s face and has been explored in the last three decades. Outstanding approaches are known for iris recognition task when the image is acquired in a well controlled environment. However, the problem is still challenging in a noncooperative environment. Having this context in mind, and from a learning representation perspective, in this paper, we propose the use of denoising autoencoders networks to create descriptors to iris recognition. We extract features from six regions of the iris and also use a specific scheme in the literature that employ a set of thresholds for iris acceptance / rejection. We perform experiments on two well-know databases, by comparing our descriptor with 2D Gabor and Wavelet representations of implementations of us. In both data sets, the proposed descriptor outperforms these features, and presents comparable results with the best performing method in a NICE contest.

Eduardo Luz, David Menotti
A New Ridge-Features-Based Method for Fingerprint Image Quality Assessment

Fingerprint is the most widely used biometric trait. Many factors may cause the quality degradation of fingerprint impressions: users, sensors and environmental facts. Most of the fingerprint-based biometric systems need an accurate prediction of fingerprint quality. A fingerprint quality measure can be used in enrollment or recognition stages, for improving the AFIS performances. In this work, a new fingerprint image quality estimation method guided by how experts classify fingerprint images quality is presented. By using six features, a continuous quality value is calculated. Experiments were performed in a well-known database. The proposed approach performance was evaluated by measuring its impact on the recognition stage and comparing it with the NFIQ quality algorithm. The Verifinger 4.2 was used as matching algorithm. The results shown that the proposed approach has a very good performance.

Katy Castillo-Rosado, José Hernández-Palancar

Computer Vision

Frontmatter
A Computer Vision Approach for Automatic Measurement of the Inter-plant Spacing

Global food demand is increasing every year and it is needed to respond to this demand. In addition, some crops such as corn, which is the most produced grain in the world, is used as food, feed, bio-energy and other industrial purposes. Thus, it is needed the development of new technologies that make possible to produce more from less land. In particular, the corn crop is sensitive to its spatial arrangement and any variation in plant distribution pattern can lead to reduction in corn production. Nowadays, the uniformity of the plant spacing is checked manually by agronomists in order to predict possible production losses. In this context, this work proposes an automatic approach for measuring the spacing between corn plants in the early stages of growth. The proposed approach is based on computer vision techniques in order to evaluate the automatic inter-plant spacing measurement from images in a simple and efficient way, allowing its use on devices with low computational power such as smart phones and tablets. An image dataset was built as an additional contribution of this work containing 2186 corn plants in two conditions: tillage after the application of herbicide (TH) with 1387 corn plants and conventional tillage (CT) with 799 corn plants. The dataset is available at url: http://github.com/Brilhador/cornspacing. The experimental results achieve 90% of precision and 92% of sensitivity in corn plant identification. Regarding the automatic measurement of the inter-plant spacing, the results showed no significant differences from the same measurements taken manually, indicating the effectiveness of the proposed approach in two distinct types of planting.

Anderson Brilhador, Daniel A. Serrarens, Fabrício M. Lopes
An EA-Based Method for Estimating the Fundamental Matrix

The camera calibration problem consists in estimating intrinsic and extrinsic parameters. It can be solved by computing a 3x3 matrix enclosing such parameters - the fundamental matrix -, which can be obtained from a set of corresponding points. Nevertheless, in practice, corresponding points may be falsely matched or badly located, due to occlusion and ambiguity. Moreover, if the set of corresponding points does not include information on existing scene depth, the estimated fundamental matrix may not be able to correctly recover the epipolar geometry. In this paper, an EA-based method for accurately selecting estimated corresponding points is introduced. It considers geometric issues that were ignored in previous EA-based approaches. Two selection operators were evaluated and obtained similar results. Additionally, a mutation operator is designed to tackle bad located points by shifting disparity vectors. An inter-technique comparison is performed against a standard camera calibration method. The qualitative evaluation is conducted by analysing obtained epipolar lines, regarding expected appearance, based on a-priori knowledge of camera systems during the capturing process. The quantitative evaluation of the proposed method is based on residuals. Experimental results shown the proposed method is able to correctly reconstruct the epipolar geometry.

Daniel Barragan, Maria Trujillo, Ivan Cabezas
Two Applications of RGB-D Descriptors in Computer Vision

In this paper an evaluation of RGB-D descriptors in the context of Object Recognition and Object Tracking is presented. Spin-images, CSHOT and ECV context descriptors were used for detecting objects in point clouds. Empirical evaluation over a dataset with ground truth shows that shape is the most important cue for RGB-D descriptors. However, texture helps discrimination when objects are large or have little structure.

Mariano Bianchi, Nadia Heredia, Francisco Gómez-Fernández, Alvaro Pardo, Marta Mejail

Gesture Recogntion

Frontmatter
Fast and Accurate Gesture Recognition Based on Motion Shapes

As in many other computer vision applications, the large amount of data is an inherent problem in video gesture recognition. A challenging task is to maintain a suitable trade-off between time and accuracy aiming a solution meeting certain requirements and constraints. In this paper, we propose a simple and fast gesture recognition approach that extracts meaningful and discriminative descriptors from hand gesture videos. Experiments conducted on the Sheffield Kinect Gestures (SKIG) data set show that our method achieves competitive accuracies, while processing frames at frequencies higher than those required for real-time applications.

Thierry Moreira, Marlon Alcantara, Helio Pedrini, David Menotti
Recognition of Facial Expressions Based on Deep Conspicuous Net

Facial expression has an important role in human interaction and non-verbal communication. Hence more and more applications, which automatically detect facial expressions, start to be pervasive in various fields, such as education, entertainment, psychology, human-computer interaction, behavior monitoring, just to cite a few. In this paper, we present a new approach for facial expression recognition using a so-called deep conspicuous neural network. The proposed method builds a conspicuous map of region faces, training it via a deep network. Experimental results achieved an average accuracy of 90% over the extended Cohn-Kanade data set for seven basic expressions, demonstrating the best performance against four state-of-the-art methods.

João Paulo Canário, Luciano Oliveira
Facial Expression Recognition with Occlusions Based on Geometric Representation

In recent years, emotion recognition based on facial expressions has received increasing attention by the scientific community in several knowledge domains, such as emotional analysis, pattern recognition, behavior prediction, interpersonal relations, human-computer interactions, among others. This work describes an emotion recognition system based on facial expressions robust to occlusions. Initially, the occluded facial expression to be recognized is reconstructed through Robust Principal Component Analysis (RPCA). Then, a fiducial point detection is performed to extract facial expression features, represented by Gabor wavelets and geometric features. The feature vector space is reduced using Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Finally, K-nearest neighbor algorithm (K-NN) and Support Vector Machine (SVM) classifiers are used to recognize the expressions. Three public data sets are used to evaluate our results. The geometric representation achieved high accuracy rates for occluded and non-occluded faces compared to approaches available in the literature.

Jadisha Y. Ramírez Cornejo, Helio Pedrini, Francisco Flórez-Revuelta
Automatic Eyes and Nose Detection Using Curvature Analysis

In the present work we propose a method for detecting the nose and eyes position when we observe a scene that contains a face. The main goal of the proposed technique is that it capable of bypassing the 3D explicit mapping of the face and instead take advantage of the information available in the Depth gradient map of the face. To this end we will introduce a simple false positive rejection approach restricting the distance between the eyes, and between the eyes and the nose. The main idea is to use nose candidates to estimate those regions where is expected to find the eyes, and vice versa. Experiments with Texas database are presented and the proposed approach is testes when data presents different power of noise and when faces are in different positions with respect to the camera.

J. Matías Di Martino, Alicia Fernández, José Ferrari

Image Classification and Retrieval

Frontmatter
A Novel Framework for Content-Based Image Retrieval Through Relevance Feedback Optimization

Content-based image retrieval remains an important research topic in many domains. It can be applied to assist specialists to improve the efficiency and accuracy of interpreting the images. However, it presents some intrinsic problems. This occurs due to the semantic interpretation of an image is still far to be reach, because it depends on the user’s perception about the image. Besides, each user presents different personal behaviors and experiences, which generates a high subjective analysis of a given image. To mitigate these problems the paper presents a novel framework for content-based image retrieval joining relevance feedback techniques with optimization methods. It is capable to not only capture the user intention, but also to tune the process through the optimization method according to each user. The experiments demonstrate the great applicability and efficacy of the proposed framework, which presented considerable gains of precision regarding similarity queries.

Reginaldo Rocha, Priscila T. M. Saito, Pedro H. Bugatti
Graph Fusion Using Global Descriptors for Image Retrieval

This paper addresses the problem of content-based image retrieval in a large-scale setting. Recently several graph-based image retrieval systems to fuse different representations have been proposed with excellent results, however most of them use at least one representation based on local descriptors that does not scale very well with the number the images, hurting time and memory requirements as the database grows. This motivated us to investigate the possibility to retain the performance of local descriptor methods while using only global descriptions of the image. Thus, we propose a graph-based query fusion approach -where we combine several representations based on aggregating local descriptors such as Fisher Vectors- using distance and neighborhood information to evaluate the individual importance of each element in every query. Performance is analyzed in different time and memory constrained scenarios. Experiments are performed on 3 public datasets: the UKBench, Holidays and MIRFLICKR-1M, obtaining state of the art performance.

Tomás Mardones, Héctor Allende, Claudio Moraga
Fisher Vectors for Leaf Image Classification: An Experimental Evaluation

In this work we present an experimental evaluation of the exponential family Fisher vector (eFV) encoding applied to the problem of visual plant identification. We evaluate the performance of this model together with a variety of local image descriptors on four different datasets and compare the results with other methods proposed in the literature. Experiments show that the eFV achieves a performance that compares favorably with other state-of-the-art approaches on this problem.

Javier A. Redolfi, Jorge A. Sánchez, Julián A. Pucheta
Hierarchical Combination of Semantic Visual Words for Image Classification and Clustering

Image classification and image clustering are two important tasks related to image analysis. In this work a two-level hierarchical model for both tasks using a hierarchical combination of image descriptors is presented. The construction of a latent semantic representation for images is also presented and its impact on the results of both tasks for the two-level hierarchical model is evaluated. Experiments have shown the superior performance attained by the hierarchical combination of descriptors when compared to the simple concatenation of them or to the use of single descriptors. The hierarchical combination of a latent semantic representation has presented results similar to the other hierarchical combinations, using only a small fraction of the time and space needed by others, which is interesting specially for those with restrictions of computer power and/or storage space.

Vinicius von Glehn De Filippo, Zenilton Kleber G. do Patrocínio Jr., Silvio Jamil F. Guimarães
Kernel Combination Through Genetic Programming for Image Classification

Support vector machine is a supervised learning technique which uses kernels to perform nonlinear separations of data. In this work, we propose a combination of kernels through genetic programming in which the individual fitness is obtained by a K-NN classifier using a kernel-based distance measure. Experiments have shown that our method KGP-K is much faster than other methods during training, but it is still able to generate individuals (i.e., kernels) with competitive performance (in terms of accuracy) to the ones that were produced by other methods. KGP-K produces reasonable kernels to use in the SVM with no knowledge about the distribution of data, even if they could be more complex than the ones generated by other methods and, therefore, they need more time during tests.

Yuri H. Ribeiro, Zenilton K. G. do Patrocínio Jr., Silvio Jamil F. Guimarães
A Complex Network-Based Approach to the Analysis and Classification of Images

Complex network is a topic related with a plurality of knowledge from various areas and has been applied with success in all of them. However, it is a recent area considering its application in image pattern recognition. There are few works in the literature that use the complex networks for image characterization following its analysis and classification. An image can be interpreted as a complex network wherein each pixel represents a vertex and the weighted edges are generated according to the location and intensity between two pixels. Thus, the present paper aims to investigate this type of application and explore different measurements that can be extracted from complex networks to better characterize an image. One special type of measure that we applied were those based on motifs, which are employed in several areas. However, to the best of our knowledge, motifs were never explored in complex networks representing images. The results demonstrate that our proposed methodology presented great potential, reaching up to 89.81% of accuracy for the classification of public domain image texture datasets.

Geovana V. L. de Lima, Thullyo R. Castilho, Pedro H. Bugatti, Priscila T. M. Saito, Fabrício M. Lopes
Assessing the Distinctiveness and Representativeness of Visual Vocabularies

Bag of Visual Words is one of the most widely used approaches for representing images for object categorization; however, it has several drawbacks. In this paper, we propose three properties and their corresponding quantitative evaluation measures to assess the ability of a visual word to represent and discriminate an object class. Additionally, we also introduce two methods for ranking and filtering visual vocabularies and a soft weighting method for BoW image representation. Experiments conducted on the Caltech-101 dataset showed the improvement introduced by our proposals, which obtained the best classification results for the highest compression rates when compared with a state-of-the-art mutual information based method for feature selection.

Leonardo Chang, Airel Pérez-Suárez, Máximo Rodríguez-Collada, José Hernández-Palancar, Miguel Arias-Estrada, Luis Enrique Sucar

Image Coding, Processing and Analysis

Frontmatter
A Scale Invariant Keypoint Detector Based on Visual and Geometrical Cues

One of the first steps in a myriad of Visual Recognition and Computer Vision algorithms is the detection of keypoints. Despite the large number of works proposing image keypoint detectors, only a few methodologies are able to efficiently use both visual and geometrical information. In this paper we introduce KVD (Keypoints from Visual and Depth Data), a novel keypoint detector which is scale invariant and combines intensity and geometrical data. We present results from several experiments that show high repeatability scores of our methodology for rotations, translations and scale changes and also presents robustness in the absence of either visual or geometric information.

Levi O. Vasconcelos, Erickson R. Nascimento, Mario F. M. Campos
Efficient Polynomial Implementation of Several Multithresholding Methods for Gray-Level Image Segmentation

Multithresholding consists of segmenting a histogram of image in classes by using thresholds. Many researchers avoid the exponential space problem of possible thresholds combinations of a given criteria function. In this work, we present a polynomial easy-to-implement dynamic programming algorithm to find the exact optimum thresholds of three well-known criteria functions for multithresholding: the maximum of histogram between class-variance (Otsu’s method); the maximum histogram entropy (Kapur et al.’s method), and minimum histogram error (Kittler and Illingworth’s method). The algorithm, that has been used to optimum quantization, has $$O((K-1)L^2)$$ time complexity, where K and L stand for the number of desired classes and the number of gray levels in the image, respectively. Experiments showed that the exact optimum thresholds for gray-level image segmentation can be found in less than 160 milliseconds in a Pentium 4-2GHz, in whatever the number of classes.

David Menotti, Laurent Najman, Arnaldo de A. Araújo
Optimizing the Data Adaptive Dual Domain Denoising Algorithm

This paper presents two new strategies that greatly improve the execution time of the DA3D Algorithm, a new denoising algorithm with state-of-the-art results. First, the weight map used in DA3D is implemented as a quad-tree. This greatly reduces the time needed to search the minimum weight, greatly reducing the overall computation time. Second, a simple but effective tiling strategy is shown to work in order to allow the parallel execution of the algorithm. This allows the implementation of DA3D in a parallel architecture. Both these improvements do not affect the quality of the output.

Nicola Pierazzo, Jean-Michel Morel, Gabriele Facciolo
Sub-Riemannian Fast Marching in SE(2)

We propose a Fast Marching based implementation for computing sub-Riemanninan (SR) geodesics in the roto-translation group SE(2), with a metric depending on a cost induced by the image data. The key ingredient is a Riemannian approximation of the SR-metric. Then, a state of the art Fast Marching solver that is able to deal with extreme anisotropies is used to compute a SR-distance map as the solution of a corresponding eikonal equation. Subsequent backtracking on the distance map gives the geodesics. To validate the method, we consider the uniform cost case in which exact formulas for SR-geodesics are known and we show remarkable accuracy of the numerically computed SR-spheres. We also show a dramatic decrease in computational time with respect to a previous PDE-based iterative approach. Regarding image analysis applications, we show the potential of considering these data adaptive geodesics for a fully automated retinal vessel tree segmentation.

Gonzalo Sanguinetti, Erik Bekkers, Remco Duits, Michiel H. J. Janssen, Alexey Mashtakov, Jean-Marie Mirebeau
Re-ranking of the Merging Order for Hierarchical Image Segmentation

Hierarchical image segmentation provides a set of image segmentations at different detail levels in which coarser detail levels can be produced from merges of regions belonging to finer detail levels. However, similarity measures adopted by hierarchical image segmentation methods do not always consider the homogeneity of the combined components. In this work, we propose a hierarchical graph-based image segmentation using a new similarity measure based on the variability of the merged components which is responsible for the re-ranking of the merging order that was originally established by the minimum spanning tree. Furthermore, we study how the inclusion of this characteristic has influenced the quality measures. Experiments have shown the superior performance of the proposed method on three well known image databases, and its robustness to noise was also demonstrated.

Zenilton Kleber G. do Patrocínio Jr., Silvio Jamil F. Guimarães
A Novel Quality Image Fusion Assessment Based on Maximum Codispersion

In this paper, we present a novel objetive measure for image fusion based on the codispersion quality index, following the structure of Piella’s metric. The measure quantifies the maximum local similarity between two images for many directions using the maximum codispersion quality index. This feature is not commonly assessed by other measures of similarity between images. To vizualize the performance of the maximum codispersion quality index we suggested two graphical tools. The proposed fusion measure is compared to image structural similarity based metrics of the state-of-art. Different experiments performed on several databases show that our metric is consistent with human visual evaluation and can be applied to evaluate different image fusion schemes.

Silvina Pistonesi, Jorge Martinez, Silvia María Ojeda, Ronny Vallejos
Magnetic Resonance Image Selection for Multi-Atlas Segmentation Using Mixture Models

In this paper, magnetic resonance image similarity metrics based on generative model induced spaces are introduced. Particularly, three generative-based similarities are proposed. Metrics are tested in an atlas selection task for multi-atlas-based image segmentation of basal ganglia structure, and compared with the mean square metric, as it is assessed on the high dimensional image domain. Attained results show that our proposal provides a suitable atlas selection and improves the segmentation of the structures of interest.

Mauricio Orbes-Arteaga, David Cárdenas-Peña, Mauricio A. Álvarez, Alvaro A. Orozco, Germán Castellanos-Dominguez
Interleaved Quantization for Near-Lossless Image Coding

Signal level quantization, a fundamental component in digital sampling of continuous signals such as DPCM, or in near-lossless predictive-coding based compression schemes of digital data such as JPEG-LS, often produces visible banding artifacts in regions where the input signals are very smooth. Traditional techniques for dealing with this issue include dithering, where the encoder contaminates the input signal with a noise function (which may be known to the decoder as well) prior to quantization. We propose an alternate way for avoiding banding artifacts, where quantization is applied in an interleaved fashion, leaving a portion of the samples untouched, following a known pseudo-random Beroulli sequence. Our method, which is sufficiently general to be applied to other types of media, is demonstrated on a modified version of JPEG-LS, resulting in a significant reduction in visible artifacts in all cases, while producing a graceful degradation in compression ratio.

Ignacio Ramírez
Image Edge Detection Based on a Spatial Autoregressive Bootstrap Approach

In this paper a new algorithm to perform edge detection based on a bootstrap approach is presented. This approach uses the estimated spatial conditional distribution of the pixels conditioned by their neighbors. The proposed algorithm approximates the original image by adjusting local 2D autoregressive models to different blocks of the image. The residuals are used in order to generate resampled images by using bootstrap techniques. The proposed algorithm applied to synthetic and real images generates as a result, a binary image, in which the detected edges can be observed.

Gustavo Ulloa, Héctor Allende-Cid, Héctor Allende
Iterative Gradient-Based Shift Estimation: To Multiscale or Not to Multiscale?

Fast global shift estimation is a critical preprocessing step on many high level tasks such as remote sensing or medical imaging. In this work we deal with a simple question: should we use an iterative technique to perform shift estimation or should we use a multiscale approach. Based on the obtained results, both methodologies proved to lose accuracy as the noise increases, however this accuracy loss increases with the shift magnitude. The conclusion is that a multiscale strategy should be used when the shift magnitude is higher than approximately a fifth of a pixel.

Martin Rais, Jean-Michel Morel, Gabriele Facciolo
A Study on Low-Cost Representations for Image Feature Extraction on Mobile Devices

Due the limited battery life and wireless network bandwidth limitations, compact and fast (but also accurate) representations of image features are important for multimedia applications running on mobile devices. The main purpose of this work is to analyze the behavior of techniques for image feature extraction on mobile devices by considering the triple trade-off problem regarding effectiveness, efficiency, and compactness. We perform an extensive comparative study of state-of-the-art binary descriptors with bag of visual words. We employ a dense sampling strategy to select points for low-level feature extraction and implement four bag of visual words representations which use hard or soft assignments and two most commonly used pooling strategies: average and maximum. These mid-level representations are analyzed with and without lossless and lossy compression techniques. Experimental evaluation point out ORB and BRIEF descriptors with soft assignment and maximum pooling as the best representation in terms of effectiveness, efficiency, and compactness.

Ramon F. Pessoa, William R. Schwartz, Jefersson A. dos Santos

Segmentation, Analysis of Shape and Textur

Frontmatter
Multiscale Exemplar Based Texture Synthesis by Locally Gaussian Models

In exemplar based texture synthesis methods one of the major difficulties is to synthesize correctly the wide diversity of texture images. So far the proposed methods tend to have satisfying results for specific texture classes and fail for others. Statistics-based algorithms present good results when synthesizing textures that have few geometric structures and are able to preserve a complex statistical model of the sample texture. On the other hand, non-parametric patch-based methods have the ability to reproduce faithfully highly structured textures but lack a mechanism to preserve its global statistics. Furthermore, they are strongly dependent on a patch size that is decided manually. In this paper we propose a multiscale approach able to combine advantages of both strategies and avoid some of their drawbacks. The texture is modeled at each scale as a spatially variable Gaussian vector in the patch space, which allows to fix a patch size fairly independent of the texture.

Lara Raad, Agnès Desolneux, Jean-Michel Morel
Top-Down Online Handwritten Mathematical Expression Parsing with Graph Grammar

In recognition of online handwritten mathematical expressions, symbol segmentation and classification and recognition of relations among symbols is managed through a parsing technique. Most parsing techniques follow a bottom-up approach and adapt grammars typically used to parse strings. However, in contrast to top-down approaches, pure bottom-up approaches do not exploit grammar information to avoid parsing of invalid subexpressions. Moreover, modeling math expressions by string grammars makes difficult to extend it to include new structures. We propose a new parsing technique that models mathematical expressions as languages generated by graph grammars, and parses expressions following a top-down approach. The method is general in the sense that it can be easily extended to parse other multidimensional languages, as chemical expressions, or diagrams. We evaluate the method using the (publicly available) CROHME-2013 dataset.

Frank Julca-Aguilar, Harold Mouchère, Christian Viard-Gaudin, Nina S. T. Hirata
Shape Analysis Using Multiscale Hough Transform Statistics

With the widespread proliferation of computers, many human activities entail the use of automatic image analysis. The basic features used for image analysis include color, texture, and shape. In this paper, we propose MHTS (Multiscale Hough Transform Statistics), a multiscale version of the shape description method called HTS (Hough Transform Statistics). Likewise HTS, MHTS uses statistics from the Hough Transform to characterize the shape of objects or regions in digital images. Experiments carried out on MPEG-7 CE-1 (Part B) shape database show that MHTS is better than the original HTS, and presents superior precision–recall results than some well-known shape description methods, such as: Tensor Scale, Multiscale Fractal Dimension, Fourier, and Contour Salience. Besides, when using the multiscale separability criterion, MHTS is also superior to Zernike Moments and Beam Angle Statistics (BAS) methods. The linear complexity of the HTS algorithm was preserved in this new multiscale version, making MHTS even more appropriate than BAS method for shape analysis in high-resolution image retrieval tasks when very large databases are used.

Lucas Alexandre Ramos, Gustavo Botelho de Souza, Aparecido Nilceu Marana
Inference Strategies for Texture Parameters

The Autobinomial model is a commonplace in Bayesian image analysis since its introduction as a convenient image model. Such model depends on a set of parameters; their value characterize texture allowing to perform classification of the whole image into regions with uniform properties of the model.This work propose a new estimator of the parameter vector of the Autobinomial model based on Conditional Least Square minimization via Real Coded Genetic Modeling and analyze its performance compared to the classical linear approximation, which exchanges the CLS equation with a reduced Taylor equation prior to minimization. Our simulation study shows that the genetic modeling approach gives more accurate estimations when true data is provided. We also discuss its influence in a set of classification experiments with multispectral optical imagery, estimating the scalar vector parameter with our estimator and the classical linear one. Our experiments show promising results, since our approach is able to distinguish image features that the classical approach does not.

Jorge Martinez, Silvina Pistonesi, Ana Georgina Flesia
Texture Characterization via Automatic Threshold Selection on Image-Generated Complex Network

This work presents an automated approach to texture characterization through complex networks. By applying an automatic threshold selection for network degree map generation, we managed to achieve significant reduction in the number of descriptors used. The method is adaptive to any image database, because it is based on the analysis of the energy value of the degree histogram of the complex networks generated particularly from each database. Experiments using the proposed method for texture classification using databases from literature show that the proposed method can not only reduce feature vector size, but in some cases also improve correct classification rates when compared to other state of the art methods.

Thiago P. Ribeiro, Leandro N. Couto, André R. Backes, Celia A. Zorzo Barcelos
A RFM Pattern Recognition System Invariant to Rotation, Scale and Translation

In this paper a rotation, scale and translation (RST) invariant pattern recognition digital system based on 1D signatures is proposed. The rotation invariance is obtained using the Radon transform, the scale invariance is achieved by the analytical Fourier-Mellin transform and the translation invariance is realized through the Fourier’s amplitude spectrum of the image. Once, the RST invariant Radon-Fourier-Mellin (RFM) image is generated (a 2D RST invariant), the marginal frequencies of that image are used to build a RST invariant 1D signature. The Latin alphabet letters in Arial font style were used to test the system. According with the statistical method of bootstrap the pattern recognition system yields a confidence level at least of 95%.

Selene Solorza-Calderón, Jonathan Verdugo-Olachea
Texture Analysis by Bag-Of-Visual-Words of Complex Networks

The texture is an important property of images, and it has been widely used to image characterization and classification. In this paper, we propose a novel method for texture analysis based on Complex Network theory. Basically, we show how to build networks from images, and then construct a vocabulary of visual words with Bag-Of-Visual-Words method. To build the vocabulary, the degree and strength of each vertex are extracted from the networks. The feature vector is composed by the visual word occurrence, unlike most traditional Complex Network works that extract global statistical measures of vertices. We show through experiments in four databases the effectiveness of our approach, which has overcome traditional texture analysis methods.

Leonardo F. S. Scabini, Wesley N. Gonçalves, Amaury A. Castro Jr.
Bregman Divergence Applied to Hierarchical Segmentation Problems

Image segmentation is one of the first steps in any process concerning digital image analysis and its accuracy will go on to determine the quality of this analysis. A classic model used in image segmentation is the Mumford-Shah functional, which includes both the information to pertaining the region and the length of its borders. In this work, by using the concept of loss in Bregman Information a functional is defined which is a generalization of the Mumford-Shah functional, once it is obtained from the proposed function by means of the Squared Euclidean distance as a measure of similarity. The algorithm is constructed by using a fusion criterion, which minimizes the loss in Bregman Information. It is shown that the proposed hierarchical segmentation method generalizes the algorithm which minimizes the piecewise constant Mumford-Shah functional. The results obtained through use of the Generalized I-Divergence, Itakura-Saito and Squared Euclidean distance, show that the algorithm attained a good performance.

Daniela Portes L. Ferreira, André R. Backes, Celia A. Zorzo Barcelos
SALSA – A Simple Automatic Lung Segmentation Algorithm

This work proposes the use of SALSA (A Simple Automatic Lung Segmentation Algorithm), a simple and fast algorithm for the segmentation of Computerized Tomography lung volumes. The algorithm is composed by the application of several simple image processing operations. The algorithm was tested on the database provided by LOLA11, a lung segmentation challenge that took part during the MICCAI 2011. The obtained results put SALSA’s accuracy rate very close to the accuracy rates of the methods on the top of the LOLA11 ranking. We are currently at the stage of developing the method for segmenting the lung lobes, a more challenging task.

Addson Costa, Bruno M. Carvalho
Segmentation of Urban Impervious Surface Using Cellular Neural Networks

In this paper an automatic segmentation technique for endmember detection to urban impervious surface with the help of the Biophysical Composition Index (BCI) and the segmentation based on Cellular Neural Network (CNN) was proposed. In particular, we focused in the derivation of BCI through of Landsat-8 Operational Land Imager (OLI) images, to proceed to the CNN segmentation through the threshold auto-select for impervious surface estimation as a linear decision given by a linear activation function. After some simulations based on the proposed technique, the obtained results, traditional single threshold-based segmentation and Otsu algorithm are assessed in terms of accuracy achieved through a stratified sample taken of a Very High Resolution image (VHR) of WorldView-2 (WV-2) with the same date as Landsat-8 OLI.The accuracy assessment from a stratified random sample showed that the CNN segmentation was the most accurate method followed by the traditional single threshold-based segmentation.

Juan Manuel Núñez

Signals Analysis and Processing

Frontmatter
Adaptive Training for Robust Spoken Language Understanding

Spoken Language Understanding, as other areas of Language Technologies, suffers from a mismatching between the conditions of the training of the models and the real use of the systems. If the semantic models are estimated from the correct transcriptions of the training corpus, when the system interacts with real users, some recognition errors can not be recovered by the understanding system. To achieve an improvement in real environments we propose the use of the output sentences from the recognition process of the training corpus in order to adapt the models. To estimate these models, a labeled and segmented corpus is needed. We propose an algorithm for the automatic segmentation and labeling of the recognized sentences considering the correct segmented and labeled data as reference. Experiments with a spoken dialog corpus show that this approach outperforms the approach based on correct transcriptions.

Fernando García, Emilio Sanchis, Lluís-F. Hurtado, Encarna Segarra
The Effect of Innovation Assumptions on Asymmetric GARCH Models for Volatility Forecasting

The modelling and forecasting of volatility in Time Series has been receiving great attention from researchers over the past years. In this topic, GARCH models are one of the most popular models. In this work, the effects of choosing different distribution families for the innovation process on asymmetric GARCH models are investigated. In particular, we compare A-PARCH models for the IBM stock data with Normal, Student’s t, Generalized Error, skew Student’s t and Pearson type-IV distributions. The main findings indicate that distributions with skewness have better performance than non-skewed distributions and that the Pearson IV distribution arises as a great candidate for the innovation process on asymmetric models.

Diego Acuña, Héctor Allende-Cid, Héctor Allende
Blind Spectrum Sensing Based on Cyclostationary Feature Detection

Cognitive Radio has emerged as a promising technology to improve the spectrum utilization efficiency, where spectrum sensing is the key functionality to enable its deployment. This study proposes a cyclostationary feature detection method for signals with unknown parameters. We develop a rule of automatic decision based on the resulting hypothesis test and without statistical knowledge of the communication channel. Performance analysis and simulation results indicate that the obtained algorithm outperforms reported solutions under low SNR regime.

Luis Miguel Gato, Liset Martínez, Jorge Torres
Language Identification Using Spectrogram Texture

This paper proposes a novel front-end for automatic spoken language recognition, based on the spectrogram representation of the speech signal and in the properties of the Fourier spectrum to detect global periodicity in an image. Local Phase Quantization (LPQ) texture descriptor was used to capture the spectrogram content. Results obtained for 30 seconds test signal duration have shown that this method is very promising for low cost language identification. The best performance is achieved when our proposed method is fused with the i-vector representation.

Ana Montalvo, Yandre M. G. Costa, José Ramón Calvo
Combining Several ASR Outputs in a Graph-Based SLU System

In this paper, we present an approach to Spoken Language Understanding (SLU) where we perform a combination of multiple hypotheses from several Automatic Speech Recognizers (ASRs) in order to reduce the impact of recognition errors in the SLU module. This combination is performed using a Grammatical Inference algorithm that provides a generalization of the input sentences by means of a weighted graph of words. We have also developed a specific SLU algorithm that is able to process these graphs of words according to a stochastic semantic modelling.The results show that the combinations of several hypotheses from the ASR module outperform the results obtained by taking just the 1-best transcription.

Marcos Calvo, Lluís-F. Hurtado, Fernando García, Emilio Sanchis
EEG Signal Pre-Processing for the P300 Speller

One of the workhorses of Brain Computer Interfaces (BCI) is the P300 speller, which allows a person to spell text by looking at the corresponding letters that are laid out on a flashing grid. The device functions by detecting the Event Related Potentials (ERP), which can be measured in an electroencephalogram (EEG), that occur when the letter that the subject is looking at flashes (unexpectedly). In this work, after a careful analysis of the EEG signals involved, we propose a preprocessing method that allows us to improve on the state-of-the-art results for this kind of applications. Our results are comparable, and sometimes better, than the best results published, and do not require a feature (channel) selection step, which is extremely costly, and which must be adapted to each user of the P300 speller separately.

Martín Patrone, Federico Lecumberry, Álvaro Martín, Ignacio Ramirez, Gadiel Seroussi
Audio-Visual Speech Recognition Scheme Based on Wavelets and Random Forests Classification

This paper describes an audio-visual speech recognition system based on wavelets and Random Forests. Wavelet multiresolution analysis is used to represent in a compact form the sequence of both acoustic and visual input parameters. Then, recognition is performed using Random Forests classification using the wavelet-based features as inputs. The efficiency of the proposed speech recognition scheme is evaluated over two audio-visual databases, considering acoustic noisy conditions. Experimental results show that a good performance is achieved with the proposed system, outperforming the efficiency of traditional Hidden Markov Model-based approaches. The proposed system has only one tuning parameter, however, experimental results also show that this parameter can be selected within a small range without significantly changing the recognition results.

Lucas Daniel Terissi, Gonzalo D. Sad, Juan Carlos Gómez, Marianela Parodi
Fall Detection Algorithm Based on Thresholds and Residual Events

Falling is a risk factor of vital importance in elderly adults, hence, the ability to detect falls automatically is necessary to minimize the risk of injury. In this work, we develop a fall detection algorithm based in inertial sensors due its scope of activity, portability, and low cost. This algorithm detects the fall across thresholds and residual events after that occurs, for this it filters the acceleration data through three filtering methodologies and by means of the amount of acceleration difference falls from Activities of Daily Living (ADLs). The algorithm is tested in a human activity and fall dataset, showing improves respect to performance compared with algorithms detailed in the literature.

Fily M. Grisales-Franco, Francisco Vargas, Álvaro Ángel Orozco, Mauricio A. Alvarez, German Castellanos-Dominguez
Digital Filter Design with Time Constraints Based on Calculus of Variations

Digital filter design with short transient state is a problem encountered in many fields of signal processing. In this paper a novel low-pass filter design technique with time-varying parameters is introduced in order to minimize the rise-time parameter. Through the use of calculus of variations a methodology is developed to write down the optimal closed-form expression for varying the parameters. In this concern, two cases are addressed. The ideal case in which infinite bandwidth is required and a solution of finite bandwidth. The latest is obtained by means of a proper restriction in the frequency domain. The proposed filter achieves the shortest rise-time and allows better preservation of the edge shape in comparison with other reported filtering methods. The performance of the proposed system is illustrated with the aid of simulations.

Karel Toledo, Jorge Torres, Liset Martínez

Theory of Pattern Recognition

Frontmatter
Discriminative Training for Convolved Multiple-Output Gaussian Processes

Multi-output Gaussian processes (MOGP) are probability distributions over vector-valued functions, and have been previously used for multi-output regression and for multi-class classification. A less explored facet of the multi-output Gaussian process is that it can be used as a generative model for vector-valued random fields in the context of pattern recognition. As a generative model, the multi-output GP is able to handle vector-valued functions with continuous inputs, as opposed, for example, to hidden Markov models. It also offers the ability to model multivariate random functions with high dimensional inputs. In this paper, we use a discriminative training criteria known as Minimum Classification Error to fit the parameters of a multi-output Gaussian process. We compare the performance of generative training and discriminative training of MOGP in subject recognition, activity recognition, and face recognition. We also compare the proposed methodology against hidden Markov models trained in a generative and in a discriminative way.

Sebastián Gómez-González, Mauricio A. Álvarez, Hernan F. García, Jorge I. Ríos, Alvaro A. Orozco
Improving the Accuracy of CAR-based Classifiers by Combining Netconf Measure and Dynamic $$-K$$ Mechanism

In this paper, we propose combining Netconf as quality measure and Dynamic$$-K$$ satisfaction mechanism into Class Association Rules (CARs) based classifiers. In our study, we evaluate the use of several quality measures to compute the CARs as well as the main satisfaction mechanisms (“Best Rule”, “Best K Rules” and “All Rules”) commonly used in the literature. Our experiments over several datasets show that our proposal gets the best accuracy in contrast to those reported in state-of-the-art works.

Raudel Hernàndez-Leòn
Local Entropies for Kernel Selection and Outlier Detection in Functional Data

An important question in data analysis is how to choose the kernel function (or its parameters) to solve classification or regression problems. The choice of a suitable kernel is usually carried out by cross validation. In this paper we introduce a novel method consisting in choosing the kernel according to an optimal entropy criterion. After selecting the best kernel function we proceed by using a measure of local entropy to compute the functional outliers in the sample.

Gabriel Martos, Alberto Muñoz
Improving Optimum-Path Forest Classification Using Confidence Measures

Machine learning techniques have been actively pursued in the last years, mainly due to the great number of applications that make use of some sort of intelligent mechanism for decision-making processes. In this work, we presented an improved version of the Optimum-Path Forest classifier, which learns a score-based confidence level for each training sample in order to turn the classification process “smarter”, i.e., more reliable. Experimental results over 20 benchmarking datasets have showed the effectiveness and efficiency of the proposed approach for classification problems, which can obtain more accurate results, even on smaller training sets.

Silas E. N. Fernandes, Walter Scheirer, David D. Cox, João Paulo Papa
Multiple Kernel Learning for Spectral Dimensionality Reduction

This work introduces a multiple kernel learning (MKL) approach for selecting and combining different spectral methods of dimensionality reduction (DR). From a predefined set of kernels representing conventional spectral DR methods, a generalized kernel is calculated by means of a linear combination of kernel matrices. Coefficients are estimated via a variable ranking aimed at quantifying how much each variable contributes to optimize a variance preservation criterion. All considered kernels are tested within a kernel PCA framework. The experiments are carried out over well-known real and artificial data sets. The performance of compared DR approaches is quantified by a scaled version of the average agreement rate between K-ary neighborhoods. Proposed MKL approach exploits the representation ability of every single method to reach a better embedded data for both getting more intelligible visualization and preserving the structure of data.

Diego Hernán Peluffo-Ordóñez, Andrés Eduardo Castro-Ospina, Juan Carlos Alvarado-Pérez, Edgardo Javier Revelo-Fuelagán
Indian Buffet Process for Model Selection in Latent Force Models

Latent force models (LFM) are an hybrid approach which combines multiple output Gaussian processes and differential equations, where the covariance functions encode the physical models given by the differential equations. LFM require the specification of the number of latent functions used to build the covariance function for the outputs. Furthermore, they assume that the output data is explained by using the entire set of latent functions, which is not the case in many real applications. We propose in this paper the use of an Indian Buffet process (IBP) as a way to perform model selection over the number of latent Gaussian processes in LFM applications. Furthermore, IBP allows us to infer the interconnection between latent functions and the outputs. We use variational inference to approximate the posterior distributions, and show examples of the proposed model performance over artificial data and a motion capture dataset.

Cristian Guarnizo, Mauricio A. Álvarez, Alvaro A. Orozco
SPaR-FTR: An Efficient Algorithm for Mining Sequential Patterns-Based Rules

In this paper, we propose a novel algorithm for mining Sequential Patterns-based Rules, called SPaR-FTR. This algorithm introduces a new efficient strategy to generate the set of sequential rules based on the interesting rules of size three. The experimental results show that the SPaR-FTR algorithm has better performance than the main algorithms reported to discover frequent sequences, all they adapted to mine this kind of sequential rules.

José Kadir Febrer-Hernández, Raudel Hernández-León, José Hernández-Palancar, Claudia Feregrino-Uribe
Online Kernel Matrix Factorization

Matrix factorization (MF) has shown to be a competitive machine learning strategy for many problems such as dimensionality reduction, latent topic modeling, clustering, dictionary learning and manifold learning, among others. In general, MF is a linear modeling method, so different strategies, most of them based on kernel methods, have been proposed to extend it to non-linear modeling. However, as with many other kernel methods, memory requirements and computing time limit the application of kernel-based MF methods in large-scale problems. In this paper, we present a new kernel MF (KMF). This method uses a budget, a set of representative points of size $$p\ll n$$, where n is the size of the training data set, to tackle the memory problem, and uses stochastic gradient descent to tackle the computation time and memory problems. The experimental results show a performance, in particular tasks, comparable to other kernel matrix factorization and clustering methods, and a competitive computing time in large-scale problems.

Andrés Esteban Páez-Torres, Fabio A. González
From Local to Global Communities in Large Networks Through Consensus

Given a universe of local communities of a large network, we aim at identifying the meaningful and consistent communities in it. We address this from a new perspective as the process of obtaining consensual community detections and formalize it as a bi-clustering problem. We obtain the global community structure of the given network without running expensive global community detection algorithms. The proposed mathematical characterization of the consensus problem and a new bi-clustering algorithm to solve it render the problem tractable for large networks. The approach is successfully validated in experiments with synthetic and large real-world networks, outperforming other state-of-the-art alternatives in terms of speed and results quality.

Mariano Tepper, Guillermo Sapiro
A Feature Selection Approach for Evaluate the Inference of GRNs Through Biological Data Integration - A Case Study on A. Thaliana

The inference of gene regulatory networks (GRNs) from expression profiles is a great challenge in bioinformatics due to the curse of dimensionality. For this reason, several methods that perform data integration have been developed to reduce the estimation error of the inference. However, it is not completely formulated how to use each type of biological information available. This work address this issue by proposing feature selection approach in order to integrate biological data and evaluate three types of biological information regarding their effect on the similarity of inferred GRNs. The proposed feature selection method is based on sequential forward floating selection (SFFS) search algorithm and the mean conditional entropy (MCE) as criterion function. An expression dataset was built as an additional contribution of this work containing 22746 genes and 1206 experiments regarding A. thaliana. The experimental results achieve 39% of GRNs improvement in average when compared to non-use of biological data integration. Besides, the results showed that the improvement is associated to a specific type of biological information: the cellular localization, which is a valuable and information for the development of new experiments and indicates an important insight for investigation.

Fábio F. R. Vicente, Euler Menezes, Gabriel Rubino, Juliana de Oliveira, Fabrício Martins Lopes
Semi-supervised Dimensionality Reduction via Multimodal Matrix Factorization

This paper presents a matrix factorization method for dimensionality reduction, semi-supervised two-way multimodal online matrix factorization (STWOMF). This method performs a semantic embedding by finding a linear mapping to a low dimensional semantic space modeled by the original high dimensional feature representation and the label space. An important characteristic of the proposed algorithm is that the new representation can be learned in a semi-supervised fashion. So, annotated instances are used to maximize the discrimination between classes, but also, non-annotated instances can be exploited to estimate the intrinsic manifold structure of the data. Another important advantage of this algorithm is its online formulation that allows to deal with large-scale collections by keeping low computational requirements. According with the experimental evaluation, the proposed STWOMF in comparison with several linear supervised, unsupervised and semi-supervised dimensionality reduction methods, presents a competitive performance in classification while having a lower computational cost.

Viviana Beltrán, Jorge A. Vanegas, Fabio A. González
Fine-Tuning Convolutional Neural Networks Using Harmony Search

Deep learning-based approaches have been paramount in the last years, mainly due to their outstanding results in several application domains, that range from face and object recognition to handwritten digits identification. Convolutional Neural Networks (CNN) have attracted a considerable attention since they model the intrinsic and complex brain working mechanism. However, the huge amount of parameters to be set up may turn such approaches more prone to configuration errors when using a manual tuning of the parameters. Since only a few works have addressed such shortcoming by means of meta-heuristic-based optimization, in this paper we introduce the Harmony Search algorithm and some of its variants for CNN optimization, being the proposed approach validated in the context of fingerprint and handwritten digit recognition, as well as image classification.

Gustavo Rosa, João Papa, Aparecido Marana, Walter Scheirer, David Cox
Genetic Sampling k-means for Clustering Large Data Sets

In this paper we present a sampling approach to run the k-means algorithm on large data sets. We propose a new genetic algorithm to guide sample selection to yield better results than selecting the individuals at random and also maintains a reasonable computing time. We apply our proposal in a public mapping points data set from the 9th DIMACS Implementation Challenge.

Diego Luchi, Willian Santos, Alexandre Rodrigues, Flávio Miguel Varejão
Analysing the Safe, Average and Border Samples on Two-Class Imbalance Problems in the Back-Propagation Domain

In this work, we analyze the training samples for discovering what kind of samples are more appropriate to train the back-propagation algorithm. To do this, we propose a Gaussian function in order to identify three types of samples: Border, Safe and Average samples. Experiments on sixteen two-class imbalanced data sets where carried out, and a non-parametrical statistical test was applied. In addition, we employ the SMOTE as classification performance reference, i.e., to know whether the studied methods are competitive with respect to SMOTE performance. Experimental results show that the best samples to train the back-propagation are the average samples and the worst are the safe samples.

Roberto Alejo, Juan Monroy-de-Jesús, J. Horacio Pacheco-Sánchez, Rosa María Valdovinos, Juan A. Antonio-Velázquez, J. Raymundo Marcial-Romero
Improving the Accuracy of the Sequential Patterns-Based Classifiers

In this paper, we propose some improvements to the Sequential Patterns-based Classifiers. First, we introduce a new pruning strategy, using the Netconf as measure of interest, that allows to prune the rules search space for building specific rules with high Netconf. Additionally, a new way for ordering the set of rules based on their sizes and Netconf values, is proposed. The ordering strategy together with the “Best K rules” satisfaction mechanism allow to obtain better accuracy than SVM, J48, NaiveBayes and PART classifiers, over three document collections.

José K. Febrer-Hernández, Raudel Hernández-León, José Hernández-Palancar, Claudia Feregrino-Uribe
A Mixed Learning Strategy for Finding Typical Testors in Large Datasets

This paper presents a mixed, global and local, learning strategy for finding typical testors in large datasets. The goal of the proposed strategy is to allow any search algorithm to achieve the most significant reduction possible in the search space of a typical testor-finding problem. The strategy is based on a trivial classifier which partitions the search space into four distinct classes and allows the assessment of each feature subset within it. Each class is handled by slightly different learning actions, and induces a different reduction in the search-space of a problem. Any typical testor-finding algorithm, whether deterministic or metaheuristc, can be adapted to incorporate the proposed strategy and can take advantage of the learned information in diverse manners.

Víctor Iván González-Guevara, Salvador Godoy-Calderon, Eduardo Alba-Cabrera, Julio Ibarra-Fiallo
A Bag Oversampling Approach for Class Imbalance in Multiple Instance Learning

Multiple Instance Learning (MIL) is a relatively new learning paradigm which allows to train a classifier with weakly labelled data. In spite that the community has been developing different methods to learn from this kind of data, there is little discussion on how to proceed when there is an imbalanced representation of the classes. The class imbalance problem in MIL is more complex compared with their counterpart in single-instance learning because it may occur at instance and/or bag level, or at both. Here, we propose an oversampling approach at bag level in order to improve the representation of the minority class. Experiments in nine benchmark data sets are conducted to evaluate the proposed approach.

Carlos Mera, Jose Arrieta, Mauricio Orozco-Alzate, John Branch

Video Analysis, Segmentation and Tracking

Frontmatter
Supervised Video Genre Classification Using Optimum-Path Forest

Multimedia-content classification has been paramount in the last years, mainly because of the massive data accessed daily. Video-based retrieval and recommendation systems have attracted a considerable attention, since it is a profitable feature for several online and offline markets. In this work, we deal with the problem of automatic video classification in different genres based on visual information by means of Optimum-Path Forest (OPF), which is a recently developed graph-based pattern recognition technique. The aforementioned classifier is compared against with some state-of-the-art supervised machine learning techniques, such as Support Vector Machines and Bayesian classifier, being its efficiency and effectiveness evaluated in a number of datasets and problems.

Guilherme B. Martins, Jurandy Almeida, Joao Paulo Papa
Annotating and Retrieving Videos of Human Actions Using Matrix Factorization

This paper presents a method for annotating and retrieving videos of human actions based on two-way matrix factorization. The method addresses the problem by modeling it as the problem of finding common latent space representation for multimodal objects. In this particular case, the modalities correspond to the visual and textual (annotations) information associated with videos, which are projected by the method to the latent space. Assuming this space exists, it is possible to map between input spaces, i.e. visual to textual, by projecting across the latent space. The mapping between the spaces is explicitly optimized in the cost function and learned from training data including both modalities. The algorithm may be used for annotation, by projecting only visual information and obtaining a textual representation, or for retrieval by indexing on the latent or textual spaces. Experimental evaluation shows competitive results when compared to state-of-the-art annotation and retrieval methods.

Fabián Páez, Fabio A. González
Recognition of Non-pedestrian Human Forms Through Locally Weighted Descriptors

To recognize human forms in non-pedestrian poses presents a high complexity problem due mainly to the large number of degrees of freedom of the human body and its limbs. In this paper it is proposed a methodology to build and classify descriptors of non-pedestrian human body forms in images which is formed with local and global information. Local information is obtained by computing Local Binary Pattern (LBP) of key-body parts (head-shoulders, hands, feet, crotch-hips) detected in the image in a first stage of the method, and then this data is coupled in the descriptor with global information about euclidean distances computed between the key-body parts recognized in the image. The descriptor is then classified using a Support Vector Machine. The results obtained using the proposed recognition methodology show that it is robust to partial occlusion of bodies, furthermore the values of sensitivity, accuracy and specificity of the classifier are high enough compared with those obtained using other state of the art descriptors.

Nancy Arana-Daniel, Isabel Cibrian-Decena
Automatic Video Summarization Using the Optimum-Path Forest Unsupervised Classifier

In this paper a novel method for video summarization is presented, which uses a color-based feature extraction technique and a graph-based clustering technique. One major advantage of this method is that it is parameter-free, that is, we do not need to define neither the number of shots or a consecutive-frames dissimilarity threshold. The results have shown that the method is both effective and efficient in processing videos containing several thousands of frames, obtaining very meaningful summaries in a quick way.

César Castelo-Fernández, Guillermo Calderón-Ruiz
A Shuffled Complex Evolution Algorithm For the Multidimensional Knapsack Problem

This work addresses the application of a population based evolutionary algorithm called shuffled complex evolution (SCE) in the multidimensional knapsack problem. The SCE regards a natural evolution happening simultaneously in independent communities. The performance of the SCE algorithm is verified through computational experiments using well-known problems from literature and randomly generated problem as well. The SCE proved to be very effective in finding good solutions demanding a very small amount of processing time.

Marcos Daniel Valadão Baroni, Flávio Miguel Varejão
Pedestrian Detection Using Multi-Objective Optimization

Pedestrian detection on urban video sequences challenges classification systems because of the presence of cluttered backgrounds which drop their performances. This article proposes a Multi-Objective Optimization (MOO) technique reducing this limitation. It trains a pool of cascades of boosted classifiers using different positive datasets. A Pareto Front is obtained from the locally non-dominated operational points of the Receptive Objective Curve (ROC) of those classifiers. Using information about the dynamic of the scene, different pairs of operational points from the Pareto Front are employed to improve the performance of the system. Results on a real sequences outperform traditional detector systems.

Pablo Negri
Backmatter
Metadaten
Titel
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
herausgegeben von
Alvaro Pardo
Josef Kittler
Copyright-Jahr
2015
Electronic ISBN
978-3-319-25751-8
Print ISBN
978-3-319-25750-1
DOI
https://doi.org/10.1007/978-3-319-25751-8

Premium Partner