Skip to main content
Top

2021 | Book

Structural, Syntactic, and Statistical Pattern Recognition

Joint IAPR International Workshops, S+SSPR 2020, Padua, Italy, January 21–22, 2021, Proceedings

Editors: Dr. Andrea Torsello, Luca Rossi, Prof. Marcello Pelillo, Dr. Battista Biggio, Antonio Robles-Kelly

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, S+SSPR 2020, held in Padua, Italy, in January 2021.

The 35 papers presented in this volume were carefully reviewed and selected from 81 submissions.

The accepted papers cover the major topics of current interest in pattern recognition, including classification and clustering, deep learning, structural matching and graph-theoretic methods, and multimedia analysis and understanding.

Table of Contents

Frontmatter

Classification and Data Processing

Frontmatter
Target Robust Discriminant Analysis

In practice, the data distribution at test time often differs, to a smaller or larger extent, from that of the original training data. Consequentially, the so-called source classifier, trained on the available labelled data, deteriorates on the test, or target, data. Domain adaptive classifiers aim to combat this problem, but typically assume some particular form of domain shift. Most are not robust to violations of domain shift assumptions and may even perform worse than their non-adaptive counterparts. We construct robust parameter estimators for discriminant analysis that guarantee performance improvements of the adaptive classifier over the non-adaptive source classifier.

Wouter M. Kouw, Marco Loog
Complex-Valued Embeddings of Generic Proximity Data

Proximities are at the heart of almost all machine learning methods. In a more generic view, objects are compared by a (symmetric) similarity or dissimilarity measure, which may not obey particular mathematical properties. This renders many machine learning methods invalid, leading to convergence problems and the loss of generalization behavior. In many cases, the preferred dissimilarity measure is not metric. If the input data are non-vectorial, like text sequences, proximity-based learning is used or embedding techniques can be applied. Standard embeddings lead to the desired fixed-length vector encoding, but are costly and are limited in preserving the full information. As an information preserving alternative, we propose a complex-valued vector embedding of proximity data, to be used in respective learning approaches. In particular, we address supervised learning and use extensions of prototype-based learning. The proposed approach is evaluated on a variety of standard benchmarks showing good performance compared to traditional techniques in processing non-metric or non-psd proximity data.

Maximilian Münch, Michiel Straat, Michael Biehl, Frank-Michael Schleif
Metric Learning for Multi-label Classification

This paper proposes an approach for multi-label classification based on metric learning. The approach has been designed to deal with general classification problems, without any assumption on the specific kind of data used (images, text, etc.) or semantic meaning assigned to labels (tags, categories, etc.). It is based on clustering and metric learning algorithm aimed at constructing a space capable of facilitating and improving the task of classifiers. The experimental results obtained on public benchmarks of different nature confirm the effectiveness of the proposal.

Marco Brighi, Annalisa Franco, Dario Maio
An Alternative Exploitation of Isolation Forests for Outlier Detection

Isolation Forests are one of the most successful outlier detection techniques: they isolate outliers by performing random splits in each node. It has been recently shown that a trained Random Forest-based model can also be used to define and extract informative distance measures between objects. Although their success has been shown mainly in the clustering field, we propose to extract these pairwise distances between the objects from an Isolation Forest and use them as input to a distance or density-based outlier detector. We show that the extracted distances from Isolation Forests are able to describe outliers meaningfully. We evaluate our technique on ten benchmark datasets for outlier detection: we employ three different distance measures and evaluate the obtained representation using a density-based classifier, the Local Outlier Factor. We also compare the methodology to the standard Isolation Forests scheme.

Antonella Mensi, Alessio Franzoni, David M. J. Tax, Manuele Bicego
Exponential Weighted Moving Average of Time Series in Arbitrary Spaces with Application to Strings

The exponentially weighted moving average (EWMA) is an important tool in time series analysis. So far the research on EWMA is typically limited to the real (vector) space $$\mathbb {R}^n$$ R n . In this work we present an extension of this concept to arbitrary spaces. It is based on an interpretation of EWMA as a special case of weighted mean computation. We develop three computation methods. In addition to the direct computation in the original space, we particularly study an approach to embedding the data items of a time series into vector space. The feasibility of our EWMA computation framework is exemplarily demonstrated on strings.

Alexander Welsing, Andreas Nienkötter, Xiaoyi Jiang
Experimental Analysis of Bidirectional Pairwise Ordinal Classifier Cascades

Ordinal classifier cascades (OCCs) are basic machine learning tools in the field of ordinal classification (OC) that consist of a sequence of classification models (CMs). Each of the CMs is trained in combination with a specific subtask of the initial OC task. OCC architectures make use of a data set’s ordinal class structure by simply arranging the CMs with respect to the corresponding class order (e.g., small - medium - large). Recently, we proposed bidirectional OCC (bOCC) architectures that combine two basic one-directional OCCs, based on a person-independent pain intensity recognition scenario, in combination with support vector machines. In the current study, we further analyse the effectiveness of bOCC architectures. To this end, we evaluate our proposed approach based on different OC benchmark data sets. Additionally, we analyse the proposed bOCCs in combination with two different classification models. Our outcomes indicate that it seems to be beneficial to replace basic pairwise one-directional OCCs by the pairwise bOCC architecture, in general.

Peter Bellmann, Ludwig Lausser, Hans A. Kestler, Friedhelm Schwenker

Deep Learning

Frontmatter
On Calibration of Mixup Training for Deep Neural Networks

Deep Neural Networks (DNN) represent the state of the art in many tasks. However, due to their overparameterization, their generalization capabilities are in doubt and still a field under study. Consequently, DNN can overfit and assign overconfident predictions – effects that have been shown to affect the calibration of the confidences assigned to unseen data. Data Augmentation (DA) strategies have been proposed to regularize these models, being Mixup one of the most popular due to its ability to improve the accuracy, the uncertainty quantification and the calibration of DNN. In this work however we argue and provide empirical evidence that, due to its fundamentals, Mixup does not necessarily improve calibration. Based on our observations we propose a new loss function that improves the calibration, and also sometimes the accuracy, of DNN trained with this DA technique. Our loss is inspired by Bayes decision theory and introduces a new training framework for designing losses for probabilistic modelling. We provide state-of-the-art accuracy with consistent improvements in calibration performance. (Appendix and code are provided here: GitHub link )

Juan Maroñas, Daniel Ramos, Roberto Paredes
Augmenting Graph Convolutional Neural Networks with Highpass Filters

In this paper, we propose a graph neural network that employs high-pass filters in the convolutional layers. To do this, we depart from a linear model for the convolutional layer and consider the case of directed graphs. This allows for graph spectral theory and the connections between eigenfunctions over the graph and Fourier analysis to employ graph signal processing to obtain an architecture that “concatenates” low and high-pass filters to process data on a connected graph. This yields a method that is quite general in nature applicable to directed and undirected graphs and with clear links to graph spectral methods, Fourier analysis and graph signal processing. Here, we illustrate the utility of our graph convolutional approach to the classification using citation datasets and knowledge graphs. The results show that our method provides a margin of improvement over the alternative.

Fatemeh Ansarizadeh, David B. Tay, Dhananjay Thiruvady, Antonio Robles-Kelly
Selecting Features from Time Series Using Attention-Based Recurrent Neural Networks

Capturing, storing, and analyzing high-dimensional time series data are important challenges that need to be effectively tackled nowadays, as the extremely large amounts of such data are being generated every second. In this paper, we introduce the recurrent neural networks equipped with attention modules that quantify the importance of features, hence can be employed to select only an informative subset of all available features. Additionally, our models are trained in an end-to-end fashion, hence are directly applicable to infer over the unseen data. Our experiments included datasets from various domains and showed that the proposed technique is data-driven, easily applicable to new use cases, and competitive to other dimensionality reduction algorithms.

Michal Myller, Michal Kawulok, Jakub Nalepa
Feature Extraction Functions for Neural Logic Rule Learning

Combining symbolic human knowledge with neural networks provides a rule-based ante-hoc explanation of the output. In this paper, we propose feature extracting functions for integrating human knowledge abstracted as logic rules into the predictive behaviour of a neural network. These functions are embodied as programming functions, which represent the applicable domain knowledge as a set of logical instructions and provide a modified distribution of independent features on input data. Unlike other existing neural logic approaches, the programmatic nature of these functions implies that they do not require any kind of special mathematical encoding, which makes our method very general and flexible in nature. We illustrate the performance of our approach for sentiment classification and compare our results to those obtained using two baselines.

Shashank Gupta, Antonio Robles-Kelly, Mohamed Reda Bouadjenek
Learning High-Resolution Domain-Specific Representations with a GAN Generator

In recent years generative models of visual data have made a great progress, and now they are able to produce images of high quality and diversity. In this work we study representations learnt by a GAN generator. First, we show that these representations can be easily projected onto semantic segmentation map using a lightweight decoder. We find that such semantic projection can be learnt from just a few annotated images. Based on this finding, we propose LayerMatch scheme for approximating the representation of a GAN generator that can be used for unsupervised domain-specific pretraining. We consider the semi-supervised learning scenario when a small amount of labeled data is available along with a large unlabeled dataset from the same domain. We find that the use of LayerMatch-pretrained backbone leads to superior accuracy compared to standard supervised pretraining on ImageNet. Moreover, this simple approach also outperforms recent semi-supervised semantic segmentation methods that use both labeled and unlabeled data during training.

Danil Galeev, Konstantin Sofiiuk, Danila Rukhovich, Mikhail Romanov, Olga Barinova, Anton Konushin
Predicting Polypharmacy Side Effects Through a Relation-Wise Graph Attention Network

Polypharmacy is the combined use of multiple drugs, widely adopted in medicine to treat patients that suffer of complex diseases. Therefore, it is important to have reliable tools able to predict if the activity of a drug could unfavorably change when combined with others. State-of-the-art methods face this problem as a link prediction task on a multilayer graph describing drug-drug interactions (DDI) and protein-protein interactions (PPI), since it has been demonstrated to be the most effective representation. Graph Convolutional Networks (GCN) are the method most commonly chosen in recent research for this problem. We propose to improve the performance of GCN on this link prediction task through the addition of a novel relation-wise Graph Attention Network (GAT), used to assign different weight to the different relationships in the multilayer graph. We experimentally demonstrate that the proposed GCN, compared with other recent methods, is able to achieve a state-of-the-art performance on a publicly available polypharmacy side effect network.

Vincenzo Carletti, Pasquale Foggia, Antonio Greco, Antonio Roberto, Mario Vento
LGL-GNN: Learning Global and Local Information for Graph Neural Networks

In this article, we have developed a graph convolutional network model LGL that can learn global and local information at the same time for effective graph classification tasks. Our idea is to concatenate the convolution results of the deep graph convolutional network and the motif-based subgraph convolutional network layer by layer, and give attention weights to global features and local features. We hope that this method can alleviate the over-smoothing problem when the depth of the neural networks increases, and the introduction of motif for local convolution can better learn local neighborhood features with strong connectivity. Finally, our experiments on standard graph classification benchmarks prove the effectiveness of the model.

Huan Li, Boyuan Wang, Lixin Cui, Lu Bai, Edwin R. Hancock
Graph Transformer: Learning Better Representations for Graph Neural Networks

Graph classifications are significant tasks for many real-world applications. Recently, Graph Neural Networks (GNNs) have achieved excellent performance on many graph classification tasks. However, most state-of-the-art GNNs face the challenge of the over-smoothing problem and cannot learn latent relations between distant vertices well. To overcome this problem, we develop a novel Graph Transformer (GT) unit to learn latent relations timely. In addition, we propose a mixed network to combine different methods of graph learning. We elucidate that the proposed GT unit can both learn distant latent connections well and form better representations for graphs. Moreover, the proposed Graph Transformer with Mixed Network (GTMN) can learn both local and global information simultaneously. Experiments on standard graph classification benchmarks demonstrate that our proposed approach performs better when compared with other competing methods.

Boyuan Wang, Lixin Cui, Lu Bai, Edwin R. Hancock

Graph-Theoretic Methods

Frontmatter
Weighted Network Analysis Using the Debye Model

Statistical mechanics provides effective means for complex network analysis, and in particular the classical Boltzmann partition function has been extensively used to explore network structure. One of the shortcomings of this model is that it is couched in terms of unweighted edges. To overcome this problem and to extend the utility of this type of analysis, in this paper, we explore how the Debye solid model can be used to describe the probability density function for particles in such a system. According to our analogy the distribution of node degree and edge-weight in the network can be derived from the distribution of molecular energy in the Debye model. This allows us to derive a probability density function for nodes, and thus is identical to the degree distribution for the case of uniformly weighted edges. We also consider the case where the edge weights follow a distribution (non-uniformly weighted edges). The corresponding network energy is the cumulative distribution function for the node degree. This distribution reveals a phase transition for the temperature dependence. The Debye model thus provides a new way to describe the node degree distribution in both unweighted and weighted networks.

Haoran Zhu, Hui Wu, Jianjia Wang, Edwin R. Hancock
Estimating the Manifold Dimension of a Complex Network Using Weyl’s Law

The dimension of the space underlying real-world networks has been shown to strongly influence the networks structural properties, from the degree distribution to the way the networks respond to diffusion and percolation processes. In this paper we propose a way to estimate the dimension of the manifold underlying a network that is based on Weyl’s law, a mathematical result that describes the asymptotic behaviour of the eigenvalues of the graph Laplacian. For the case of manifold graphs, the dimension we estimate is equivalent to the fractal dimension of the network, a measure of structural self-similarity. Through an extensive set of experiments on both synthetic and real-world networks we show that our approach is able to correctly estimate the manifold dimension. We compare this with alternative methods to compute the fractal dimension and we show that our approach yields a better estimate on both synthetic and real-world examples.

Luca Rossi, Andrea Torsello
Efficient Partitioning of Partial Correlation Networks

Partial correlation is a popular and principled metric for determining edges between nodes in a graph. However when the goal is to both estimate network connectivity from sample data and subsequently partition the result, methods such as spectral clustering can be applied much more efficiency and at larger scale. We derive a method that can similarly partition partial correlation networks directly from sample data. The method is closely related to spectral clustering, and can be implemented with comparable efficiency. Our results also provide new insight into the success of spectral clustering in many fields, as an approximation to clustering of partial correlation networks.

Keith Dillon
Alzheimer’s Brain Network Analysis Using Sparse Learning Feature Selection

Accurate identification of Mild Cognitive Impairment (MCI) based on resting-state functional Magnetic Resonance Imaging (RS-fMRI) is crucial for reducing the risk of developing Alzheimer’s disease (AD). In the literature, functional connectivity (FC) is often used to extract brain network features. However, it still remains challenging for the estimation of FC because RS-fMRI data are often high-dimensional and small in sample size. Although various Lasso-type sparse learning feature selection methods have been adopted to identify the most discriminative features for brain disease diagnosis, they suffer from two common drawbacks. First, Lasso is instable and not very satisfactory for the high-dimensional and small sample size problem. Second, existing Lasso-type feature selection methods have not simultaneously encapsulate the joint correlations between pairwise features and the target, the correlations between pairwise features, and the joint feature interaction into the feature selection process, thus may lead to suboptimal solutions. To overcome these issues, we propose a novel sparse learning feature selection method for MCI classification in this work. It unifies the above measures into a minimization problem associated with a least square error and an Elastic Net regularizer. Experimental results demonstrate that the diagnosis accuracy for MCI subjects can be significantly improved using our proposed feature selection method.

Lixin Cui, Lichi Zhang, Lu Bai, Yue Wang, Edwin R. Hancock
The Entropy of Graph Embeddings: A Proxy of Potential Mobility in Covid19 Outbreaks

In this paper, we propose a proxy of the $$R_0$$ R 0 (reproductive number) of COVID-19 by computing the entropy of the mobility graph during the first peak of the pandemic. The study was performed by the COVID-19 Data Science Task Force at the Comunidad Valenciana (Spain) during 70 days. Since mobility graphs are naturally attributed, directed and become more and more disconnected as more and more non-pharmaceutical measures are implemented, we discarded spectral complexity measures and classical ones such as network efficiency. Alternatively, we turned our attention to embeddings resulting from random walks and their links with stochastic matrices. In our experiments, we show that this leads to a powerful tool for predicting the spread of the virus and to assess the effectiveness of the political interventions.

Francisco Escolano, Miguel Angel Lozano, Edwin R. Hancock
A Novel Data Set for Information Retrieval on the Basis of Subgraph Matching

We are facing the challenge of rapidly increasing amounts of data. Moreover, we observe that in many applications the underlying data contains strongly related entities making graphs the most appropriate structure for data modeling. When data is represented by means of a graph, querying corresponds to a graph matching problem. The present paper introduces a novel graph that models information from the medical domain with about 110,000 nodes and 220,000 edges. Additionally we present several basic benchmark queries, i.e. specific subgraphs, from different categories that can be found multiple times in the medical graph. Both the graph and the benchmark can be used to implement, test, and compare novel graph matching algorithms in a real world scenario.

Kaspar Riesen, Hans-Friedrich Witschel, Loris Grether
A Graph Pre-image Method Based on Graph Edit Distances

The pre-image problem for graphs is increasingly attracting attention owing to many promising applications. However, it is a challenging problem due to the complexity of graph structure. In this paper, we propose a novel method to construct graph pre-images as median graphs, by aligning graph edit distances (GEDs) in the graph space with distances in the graph kernel space. The two metrics are aligned by optimizing the edit costs of GEDs according to the distances between the graphs within the space associated with a particular graph kernel. Then, the graph pre-image can be estimated using a median graph method founded on the GED. In particular, a recently introduced method to compute generalized median graphs with iterative alternate minimizations is revisited for this purpose. Conducted experiments show very promising results while opening the computation of graph pre-image to any graph kernel and to graphs with non-symbolic attributes.

Linlin Jia, Benoit Gaüzère, Paul Honeine
Multivalent Graph Matching Problem Solved by Max-Min Ant System

This paper presents a multivalent graph matching problem and proposes a max-min ant system for its resolution. Multivalent graph matching is a very combinatorial problem where a node (edge) in one graph can match with more than one node (edge) in the other graph. We formalize this problem as an extended graph edit distance problem by adding possibilities of splitting and merging operations. Then, we employ an ant colony based optimization algorithm, the max-min ant system, to solve this very combinatorial problem. A local search is also integrated to enhance the solution quality. The efficiency of the proposed approach is verified on a symbol data set in several aspects. The results show that the proposed approach can be very useful in case of noise when the bijective graph matching-based approaches are not usually robust.

Kieu Diem Ho, Jean Yves Ramel, Nicolas Monmarché
A Metric Learning Approach to Graph Edit Costs for Regression

Graph edit distance (GED) is a widely used dissimilarity measure between graphs. It is a natural metric for comparing graphs and respects the nature of the underlying space, and provides interpretability for operations on graphs. As a key ingredient of the GED, the choice of edit cost functions has a dramatic effect on the GED and therefore the classification or regression performances. In this paper, in the spirit of metric learning, we propose a strategy to optimize edit costs according to a particular prediction task, which avoids the use of predefined costs. An alternate iterative procedure is proposed to preserve the distances in both the underlying spaces, where the update on edit costs obtained by solving a constrained linear problem and a re-computation of the optimal edit paths according to the newly computed costs are performed alternately. Experiments show that regression using the optimized costs yields better performances compared to random or expert costs.

Linlin Jia, Benoit Gaüzère, Florian Yger, Paul Honeine
Parallel Subgraph Isomorphism on Multi-core Architectures: A Comparison of Four Strategies Based on Tree Search

Subgraph isomorphism is one of the most challenging problems on graph-based representations. Despite many efficient sequential algorithms have been proposed over the last decades, solving this problem on large graphs is still a time demanding task. For this reason, there is a recently growing interest in realizing effective parallel algorithms able to exploit at their best the modern multi-core architectures commonly available on servers and workstations. We propose a comparison of four parallel algorithms derived from the state-of-the-art sequential algorithm VF3-Light; two of them were presented in previous works, while the other two are introduced in this paper. In order to evaluate strong points and weaknesses of each algorithm, we performed a benchmark over six datasets of random large and dense graphs, both labelled and unlabelled, measuring memory usage, speed-up and efficiency. We also add a comparison with a different parallel algorithm, named Glasgow, that is not derived from VF3-Light.

Vincenzo Carletti, Pasquale Foggia, Antonio Greco, Mario Vento

Multimedia Analysis and Understanding

Frontmatter
Multiple-Image Super-Resolution Using Deep Learning and Statistical Features

Capturing, transferring, and storing high-resolution images has become a serious issue in a wide range of fields, in which these processes are costly, time-consuming, or even infeasible. As obtaining low-resolution images may be easier in practice, enhancing their spatial resolution is currently an active research area and encompasses both single- and multiple-image super-resolution techniques. In this paper, we propose a deep learning approach for multiple-image super-resolution that is independent from the number of available low-resolution images of the scene. It is in contrast to other deep networks which are crafted to deal with input stacks of a constant size, hence are not applicable once the number of low-resolution images varies. The experiments showed that our technique not only outperforms other single- and multiple-image super-resolution algorithms, but also it is lightweight and delivers instant operation, thus can be deployed in hardware-constrained environments.

Jakub Nalepa, Krzysztof Hrynczenko, Michal Kawulok
Unsupervised Semantic Discovery Through Visual Patterns Detection

We propose a new fast fully unsupervised method to discover semantic patterns. Our algorithm is able to hierarchically find visual categories and produce a segmentation mask where previous methods fail. Through the modeling of what is a visual pattern in an image, we introduce the notion of “semantic levels" and devise a conceptual framework along with measures and a dedicated benchmark dataset for future comparisons. Our algorithm is composed by two phases. A filtering phase, which selects semantical hotsposts by means of an accumulator space, then a clustering phase which propagates the semantic properties of the hotspots on a superpixels basis. We provide both qualitative and quantitative experimental validation, achieving optimal results in terms of robustness to noise and semantic consistency. We also made code and dataset publicly available.

Francesco Pelosin, Andrea Gasparetto, Andrea Albarelli, Andrea Torsello
Deep Residual Neural Network for Child’s Spontaneous Facial Expressions Recognition

Early identification of deficits in emotion recognition and expression skills may prevent low social functioning in adulthood. Deficits in young children’s ability to recognize facial expressions can lead to impairments in social functioning. Kids may need extra help learning to read facial expressions. Most of the earlier efforts consider the problem of emotion recognition in adults; however, ignore the child’s emotions, especially in an unconstrained environment. In this paper, we present progressive light residual learning to classify spontaneous emotion recognition in children. Unlike earlier residual neural network, we reduce the skip connection at the earlier part of the network and increase gradually as the network go deeper. The progressive light residual network can explore more feature space due to limiting the skip connection locally, which makes the network more vulnerable to perturbations which help to deal with overfitting problem for smaller data. Experimental results on benchmark children emotions dataset show that the proposed approach showed a considerable gain in performance compared to the state of the art methods.

Abdul Qayyum, Imran Razzak
Multi-layer PCA Network for Image Classification

PCANet is a simple deep learning baseline for image classification, which learns the filters banks by PCA instead of stochastic gradient descent (SGD) in each layer. It shows a good performance for image classification tasks with only a few parameters and no backpropagation procedure. However, PCANet suffers from two main problems. The first problem is the features explosion which limits its depth to two layers. The second issue is the binarization process which leads to discriminative information loss. To handle these problems, we adopted CNN-like convolution layers to learn the PCA filter-bank and reduce the number of dimensions. We also used second-order pooling with z-score normalization to replace the histogram descriptor. The late fusion method is used to combine the class posteriors generated each layer. The proposed network has been tested on image classification tasks including MNIST, Cifar10, Cifar100 and Tiny ImageNet databases. The experimental results show that our model achieves better performance than standard PCANet and is competitive with some CNN methods.

Mubarakah Alotaibi, Richard C. Wilson
A Multimodal Fusion Model Based on Hybrid Attention Mechanism for Gesture Recognition

Gesture recognition based on multimodal information plays a significant role in the field of human-computer interaction. In recent years, although many researchers devoted themselves to the related work in this field, the correlation and complementarity of multimodal information have not been explored and utilized fully. Consequently, this paper proposes a multimodal fusion network based on the hybrid attention mechanism for gesture recognition, where: 1. the cross-attention mechanism is introduced to fuse and enhance multi-dimensional features mutually, such as video and audio features; 2. the single-attention mechanism is employed to balance the correlation and redundancy between one-dimensional representation and multi-dimensional representation, such as skeleton and video features. The proposed network aims to excavate the relationship between modalities from different perspectives, fuse various information in different fusion stages, and achieve high accuracy of recognition. The method is evaluated on the publicly available datasets, ChaLearn Montalbano dataset, and obtains 95.97% accuracy when fusing video, skeleton, and audio modalities, which outperforms state-of-the-art approaches.

Yajie Li, Yiqiang Chen, Yang Gu, Jianquan Ouyang
StegColNet: Steganalysis Based on an Ensemble Colorspace Approach

Image steganography refers to the process of hiding information inside images. Steganalysis is the process of detecting a steganographic image. We introduce a steganalysis approach that uses an ensemble color space model to obtain a weighted concatenated feature activation map. The concatenated map helps to obtain certain features explicit to each color space. We use a levy-flight grey wolf optimization strategy to reduce the number of features selected in the map. We then use these features to classify the image into one of two classes: whether the given image has secret information stored or not. Extensive experiments have been done on a large scale dataset extracted from the Bossbase dataset. Also, we show that the model can be transferred to different datasets and perform extensive experiments on a mixture of datasets. Our results show that the proposed approach outperforms the recent state of the art deep learning steganalytical approaches by 2.32% on average for 0.2 bits per channel (bpc) and 1.87% on average for 0.4 bpc.

Shreyank N. Gowda, Chun Yuan
Residual Multiscale Full Convolutional Network (RM-FCN) for High Resolution Semantic Segmentation of Retinal Vasculature

In a fundus image, Vessel local characteristics like direction, illumination and noise vary considerably, making vessel segmentation a challenging task. Methods based upon deep convolutional networks have consistently yield state of the art performance. Despite effective, of the drawbacks of these methods is their computational complexity, whereby testing and training of these networks require substantial computational resources and can be time consuming. Here we present a multi-scale kernel based on fully convolutional layers that is quite lightweight and can effectively segment large, medium, and thin vessels over a wide variations of contrast, position and size of the optic disk. Moreover, the architecture presented here makes use of these multi-scale kernels, reduced application of pooling operations and skip connections to achieve faster training. We illustrate the utility of our method for retinal vessel segmentation on the DRIVE, CHASE_DB and STARE data sets. We also compare the results delivered by our method with a number of alternatives elsewhere in the literature. In our experiments, our method always provides a margin of improvement on specificity, accuracy, AUC and sensitivity with respect to the alternative.

Tariq M. Khan, Antonio Robles-Kelly, Syed S. Naqvi, Muhammad Arsalan
A Practical Hybrid Active Learning Approach for Human Pose Estimation

Active learning (AL) has not received much attention in deep learning (DL) for human pose estimation. In this paper, a practical hybrid active learning strategy is proposed for training a human pose estimation model, and it is tested in an industrial online environment. The conducted experiments show that the active learning strategy to select diverse samples to be annotated outperforms the baseline method with random sampling. As a result, the strategy enables a significant improvement in the performance of pose estimation.

Sinan Kaplan, Joni Juvonen, Lasse Lensu
IterDet: Iterative Scheme for Object Detection in Crowded Environments

Deep learning-based detectors tend to produce duplicate detections of the same objects. After that, the detections are filtered via a non-maximum suppression algorithm (NMS) so that there remains only one bounding box per object. This simple greedy scheme is sufficient for isolated objects. However, it often fails in crowded environments since boxes for different objects should be preserved and duplicate detections should be suppressed at the same time. In this work, we propose to obtain predictions following iterative scheme called IterDet. At each iteration, a new subset of objects is detected. Detected boxes from all the previous iterations are considered at the current iteration to ensure that the same object would not be detected twice. This iterative scheme can be applied to both one-stage and two-stage deep learning-based detectors with minor modifications. Through extensive evaluation on 4 diverse datasets with two different baseline detectors, we prove our iterative scheme to achieve significant improvement over the baseline. On CrowdHuman and WiderPerson datasets, we obtain state-of-the-art results. The source code and the trained models are available at https://github.com/saic-vul/iterdet .

Danila Rukhovich, Konstantin Sofiiuk, Danil Galeev, Olga Barinova, Anton Konushin
Multi-modal 3D Human Pose Estimation for Human-Robot Collaborative Applications

We propose a multi-modal 3D human pose estimation approach which combines a 2D human pose estimation network utilizing RGB data with a 3D human pose estimation network utilizing the 2D pose estimation results and depth information, in order to predict 3D human poses. We improve upon the state-of-the-art by proposing the use of a more accurate 2D human pose estimation network, as well as by introducing squeeze-excite blocks into the architecture of the 3D pose estimation network. More importantly, we focused on the challenging application of 3D human pose estimation during collaborative tasks. In that direction, we selected appropriate sub-sets that address collaborative tasks from a large-scale multi-view RGB-D dataset and generated a novel one-view RGB-D dataset for training and testing respectively. We achieved above state-of-the-art performance among RGB-D approaches when tested on a novel benchmark RGB-D dataset on collaborative assembly that we have created and made publicly available.

Konstantinos Peppas, Konstantinos Tsiolis, Ioannis Mariolis, Angeliki Topalidou-Kyniazopoulou, Dimitrios Tzovaras
Image = Structure + Few Colors

Topology plays an important role in computer vision by capturing the structure of the objects. Nevertheless, its potential applications have not been sufficiently developed yet. In this paper, we combine the topological properties of an image with hierarchical approaches to build a topology preserving irregular image pyramid (TIIP). The TIIP algorithm uses combinatorial maps as data structure which implicitly capture the structure of the image in terms of the critical points. Thus, we can achieve a compact representation of an image, preserving the structure and topology of its critical points (maxima, the minima and the saddles). The parallel algorithmic complexity of building the pyramid is $$O(\log d)$$ O ( log d ) where d is the diameter of the largest object. We achieve promising results for image reconstruction using only a few color values and the structure of the image, although preserving fine details including the texture of the image.

Darshan Batavia, Rocio Gonzalez-Diaz, Walter G. Kropatsch
Backmatter
Metadata
Title
Structural, Syntactic, and Statistical Pattern Recognition
Editors
Dr. Andrea Torsello
Luca Rossi
Prof. Marcello Pelillo
Dr. Battista Biggio
Antonio Robles-Kelly
Copyright Year
2021
Electronic ISBN
978-3-030-73973-7
Print ISBN
978-3-030-73972-0
DOI
https://doi.org/10.1007/978-3-030-73973-7

Premium Partner