main-content

## Über dieses Buch

The proceedings set LNCS 12396 and 12397 constitute the proceedings of the 29th International Conference on Artificial Neural Networks, ICANN 2020, held in Bratislava, Slovakia, in September 2020.*

The total of 139 full papers presented in these proceedings was carefully reviewed and selected from 249 submissions. They were organized in 2 volumes focusing on topics such as adversarial machine learning, bioinformatics and biosignal analysis, cognitive models, neural network theory and information theoretic learning, and robotics and neural models of perception and action.

*The conference was postponed to 2021 due to the COVID-19 pandemic.

## Inhaltsverzeichnis

### On the Security Relevance of Initial Weights in Deep Neural Networks

Recently, a weight-based attack on stochastic gradient descent inducing overfitting has been proposed. We show that the threat is broader: A task-independent permutation on the initial weights suffices to limit the achieved accuracy to for example 50% on the Fashion MNIST dataset from initially more than 90%. These findings are supported on MNIST and CIFAR. We formally confirm that the attack succeeds with high likelihood and does not depend on the data. Empirically, weight statistics and loss appear unsuspicious, making it hard to detect the attack if the user is not aware. Our paper is thus a call for action to acknowledge the importance of the initial weights in deep learning.

Kathrin Grosse, Thomas A. Trost, Marius Mosbach, Michael Backes, Dietrich Klakow

### Fractal Residual Network for Face Image Super-Resolution

Recently, many Convolutional Neural Network (CNN) algorithms have been proposed for image super-resolution, but most of them aim at architecture or natural scene images. In this paper, we propose a new fractal residual network model for face image super-resolution, which is very useful in the domain of surveillance and security. The architecture of the proposed model is composed of multi-branches. Each branch is incrementally cascaded with multiple self-similar residual blocks, which makes the branch appears as a fractal structure. Such a structure makes it possible to learn both global residual and local residual sufficiently. We propose a multi-scale progressive training strategy to enlarge the image size and make the training feasible. We propose to combine the loss of face attributes and face structure to refine the super-resolution results. Meanwhile, adversarial training is introduced to generate details. The results of our proposed model outperform other benchmark methods in qualitative and quantitative analysis.

Yuchun Fang, Qicai Ran, Yifan Li

### From Imbalanced Classification to Supervised Outlier Detection Problems: Adversarially Trained Auto Encoders

Imbalanced datasets pose severe challenges in training well performing classifiers. This problem is also prevalent in the domain of outlier detection since outliers occur infrequently and are generally treated as minorities. One simple yet powerful approach is to use autoencoders which are trained on majority samples and then to classify samples based on the reconstruction loss. However, this approach fails to classify samples whenever reconstruction errors of minorities overlap with that of majorities. To overcome this limitation, we propose an adversarial loss function that maximizes the loss of minorities while minimizing the loss for majorities. This way, we obtain a well-separated reconstruction error distribution that facilitates classification. We show that this approach is robust in a wide variety of settings, such as imbalanced data classification or outlier- and novelty detection.

Max Lübbering, Rajkumar Ramamurthy, Michael Gebauer, Thiago Bell, Rafet Sifa, Christian Bauckhage

### Generating Adversarial Texts for Recurrent Neural Networks

Adversarial examples have received increasing attention recently due to their significant values in evaluating and improving the robustness of deep neural networks. Existing adversarial attack algorithms have achieved good result for most images. However, those algorithms cannot be directly applied to texts as the text data is discrete in nature. In this paper, we extend two state-of-the-art attack algorithms, PGD and C&W, to craft adversarial text examples for RNN-based models. For Extend-PGD attack, it identifies the words that are important for classification by computing the Jacobian matrix of the classifier, to effectively generate adversarial text examples. For Extend-C&W attack, it utilizes $$\mathcal {L}_{1}$$ regularization to minimize the alteration of the original input text. We conduct comparison experiments on two recurrent neural networks trained for classifying texts in two real-world datasets. Experimental results show that our Extend-PGD and Extend-C&W attack algorithms have advantages of attack success rate and semantics-preserving ability, respectively.

Chang Liu, Wang Lin, Zhengfeng Yang

### Enforcing Linearity in DNN Succours Robustness and Adversarial Image Generation

Recent studies on the adversarial vulnerability of neural networks have shown that models trained with the objective of minimizing an upper bound on the worst-case loss over all possible adversarial perturbations improve robustness against adversarial attacks. Beside exploiting adversarial training framework, we show that by enforcing a Deep Neural Network (DNN) to be linear in transformed input and feature space improves robustness significantly. We also demonstrate that by augmenting the objective function with Local Lipschitz regularizer boost robustness of the model further. Our method outperforms most sophisticated adversarial training methods and achieves state of the art adversarial accuracy on MNIST, CIFAR10 and SVHN dataset. We also propose a novel adversarial image generation method by leveraging Inverse Representation Learning and Linearity aspect of an adversarially trained deep neural network classifier.

Anindya Sarkar, Raghu Iyengar

### Computational Analysis of Robustness in Neural Network Classifiers

Neural networks, especially deep architectures, have proven excellent tools in solving various tasks, including classification. However, they are susceptible to adversarial inputs, which are similar to original ones, but yield incorrect classifications, often with high confidence. This reveals the lack of robustness in these models. In this paper, we try to shed light on this problem by analyzing the behavior of two types of trained neural networks: fully connected and convolutional, using MNIST, Fashion MNIST, SVHN and CIFAR10 datasets. All networks use a logistic activation function whose steepness we manipulate to study its effect on network robustness. We also generated adversarial examples with FGSM method and by perturbing those pixels that fool the network most effectively. Our experiments reveal a trade-off between accuracy and robustness of the networks, where models with a logistic function approaching a threshold function (very steep slope) appear to be more robust against adversarial inputs.

Iveta Bečková, Štefan Pócoš, Igor Farkaš

### Convolutional Neural Networks with Reusable Full-Dimension-Long Layers for Feature Selection and Classification of Motor Imagery in EEG Signals

In the present article the author addresses the task of classification of motor imagery in EEG signals by proposing innovative architecture of neural network. Despite all the successes of deep learning, neural networks of significant depth could not ensure better performance compared to shallow architectures. The approach presented in the article employs this idea, making use of yet shallower, but productive architecture. The main idea of the proposed architecture is based on three points: full-dimension-long ‘valid’ convolutions, dense connections - combination of layer’s input and output and layer reuse. Another aspect addressed in the paper is related to interpretable machine learning. Interpretability is extremely important in medicine, where decisions must be taken on the basis of solid arguments and clear reasons. Being shallow, the architecture could be used for feature selection by interpreting the layers’ weights, which allows understanding of the knowledge about the data cumulated in the network’s layers. The approach, based on a fuzzy measure, allows using Choquet integral to aggregate the knowledge generated in the layer weights and understanding which features (EEG electrodes) provide the most essential information. The approach allows lowering feature number from 64 to 14 with an insignificant drop of accuracy (less than a percentage point).

Mikhail Tokovarov

### Compressing Genomic Sequences by Using Deep Learning

Huge amount of genomic sequences have been generated with the development of high-throughput sequencing technologies, which brings challenges to data storage, processing, and transmission. Standard compression tools designed for English text are not able to compress genomic sequences well, so an effective dedicated method is needed urgently. In this paper, we propose a genomic sequence compression algorithm based on a deep learning model and an arithmetic encoder. The deep learning model is structured as a convolutional layer followed by an attention-based bi-directional long short-term memory network, which predicts the probabilities of the next base in a sequence. The arithmetic encoder employs the probabilities to compress the sequence. We evaluate the proposed algorithm with various compression approaches, including a state-of-the-art genomic sequence compression algorithm DeepDNA, on several real-world data sets. The results show that the proposed algorithm can converge stably and achieves the best compression performance which is even up to 3.7 times better than DeepDNA. Furthermore, we conduct ablation experiments to verify the effectiveness and necessity of each part in the model and implement the visualization of attention weight matrix to present different importance of various hidden states for final prediction. The source code for the model is available in Github ( https://github.com/viviancui59/Compressing-Genomic-Sequences ).

Wenwen Cui, Zhaoyang Yu, Zhuangzhuang Liu, Gang Wang, Xiaoguang Liu

### Learning Tn5 Sequence Bias from ATAC-seq on Naked Chromatin

Technological advances in the last decade resulted in an explosion of biological data. Sequencing methods in particular provide large-scale data sets as resource for incorporation of machine learning in the biological field. By measuring DNA accessibility for instance, enzymatic hypersensitivity assays facilitate identification of regions of open chromatin in the genome, marking potential locations of regulatory elements. ATAC-seq is the primary method of choice to determine these footprints. It allows measurements on the cellular level, complementing the recent progress in single cell transcriptomics. However, as the method-specific enzymes tend to bind preferentially to certain sequences, the accessibility profile is confounded by binding specificity. The inference of open chromatin should be adjusted for this bias [1].To enable such corrections, we built a deep learning model that learns the sequence specificity of ATAC-seq’s enzyme Tn5 on naked DNA. We found binding preferences and demonstrate that cleavage patterns specific to Tn5 can successfully be discovered by the means of convolutional neural networks. Such models can be combined with accessibility analysis in the future in order to predict bias on new sequences and furthermore provide a better picture of the regulatory landscape of the genome.

Meshal Ansari, David S. Fischer, Fabian J. Theis

### Tucker Tensor Decomposition of Multi-session EEG Data

The Tucker model is a tensor decomposition method for multi-way data analysis. However, its application in the area of multi-channel electroencephalogram (EEG) is rare and often without detailed electrophysiological interpretation of the obtained results. In this work, we apply the Tucker model to a set of multi-channel EEG data recorded over several separate sessions of motor imagery training. We consider a three-way and four-way version of the model and investigate its effect when applied to multi-session data. We discuss the advantages and disadvantages of both Tucker model approaches.

Zuzana Rošťáková, Roman Rosipal, Saman Seifpour

### Reactive Hand Movements from Arm Kinematics and EMG Signals Based on Hierarchical Gaussian Process Dynamical Models

The prediction of finger kinematics from EMG signals is a difficult problem due to the high level of noise in recorded biological signals. In order to improve the quality of such predictions, we propose a Bayesian inference architecture that enables the combination of multiple sources of sensory information with an accurate and flexible model for the online prediction of high-dimensional kinematics. Our method integrates hierarchical Gaussian process latent variable models (GP-LVMs) for nonlinear dimension reduction with Gaussian process dynamical models (GPDMs) to represent movement dynamics in latent space. Using several additional approximations, we make the resulting sophisticated inference architecture real-time capable. Our results demonstrate that the prediction of hand kinematics can be substantially improved by inclusion of information from the online-measured arm kinematics, and by exploiting learned online generative models of finger kinematics. The proposed architecture provides a highly flexible framework for the integration of accurate generative models with high-dimensional motion in real-time inference and control problems.

Nick Taubert, Jesse St. Amand, Prerana Kumar, Leonardo Gizzi, Martin A. Giese

### Investigating Efficient Learning and Compositionality in Generative LSTM Networks

When comparing human with artificial intelligence, one major difference is apparent: Humans can generalize very broadly from sparse data sets because they are able to recombine and reintegrate data components in compositional manners. To investigate differences in efficient learning, Joshua B. Tenenbaum and colleagues developed the character challenge: First an algorithm is trained in generating handwritten characters. In a next step, one version of a new type of character is presented. An efficient learning algorithm is expected to be able to re-generate this new character, to identify similar versions of this character, to generate new variants of it, and to create completely new character types. In the past, the character challenge was only met by complex algorithms that were provided with stochastic primitives. Here, we tackle the challenge without providing primitives. We apply a minimal recurrent neural network (RNN) model with one feedforward layer and one LSTM layer and train it to generate sequential handwritten character trajectories from one-hot encoded inputs. To manage the re-generation of untrained characters when presented with only one example of them, we introduce a one-shot inference mechanism: the gradient signal is backpropagated to the feedforward layer weights only, leaving the LSTM layer untouched. We show that our model is able to meet the character challenge by recombining previously learned dynamic substructures, which are visible in the hidden LSTM states. Making use of the compositional abilities of RNNs in this way might be an important step towards bridging the gap between human and artificial intelligence.

Sarah Fabi, Sebastian Otte, Jonas Gregor Wiese, Martin V. Butz

### Fostering Event Compression Using Gated Surprise

Our brain receives a dynamically changing stream of sensorimotor data. Yet, we perceive a rather organized world, which we segment into and perceive as events. Computational theories of cognitive science on event-predictive cognition suggest that our brain forms generative, event-predictive models by segmenting sensorimotor data into suitable chunks of contextual experiences. Here, we introduce a hierarchical, surprise-gated recurrent neural network architecture, which models this process and develops compact compressions of distinct event-like contexts. The architecture contains a contextual LSTM layer, which develops generative compressions of ongoing and subsequent contexts. These compressions are passed to a GRU-like layer, which uses surprise signals to update its recurrent latent state. The latent state is passed on to another LSTM layer, which processes actual dynamic sensory flow in the light of the provided latent, contextual compression signals. Our model develops distinct event compressions and achieves the best performance on multiple event processing tasks. The architecture may be very useful for the further development of resource-efficient learning, hierarchical model-based reinforcement learning, as well as the development of artificial event-predictive cognition and intelligence.

Dania Humaidan, Sebastian Otte, Martin V. Butz

### Physiologically-Inspired Neural Circuits for the Recognition of Dynamic Faces

Dynamic faces are essential for the communication of humans and non-human primates. However, the exact neural circuits of their processing remain unclear. Based on previous models for cortical neural processes involved for social recognition (of static faces and dynamic bodies), we propose two alternative neural models for the recognition of dynamic faces: (i) an example-based mechanism that encodes dynamic facial expressions as sequences of learned keyframes using a recurrent neural network (RNN), and (ii) a norm-based mechanism, relying on neurons that represent differences between the actual facial shape and the neutral facial pose. We tested both models exploiting highly controlled facial monkey expressions, generated using a photo-realistic monkey avatar that was controlled by motion capture data from monkeys. We found that both models account for the recognition of normal and temporally reversed facial expressions from videos. However, if tested with expression morphs, and with expressions of reduced strength, both models made quite different prediction, the norm-based model showing an almost linear variation of the neuron activities with the expression strength and the morphing level for cross-expression morphs, while the example based model did not generalize well to such stimuli. These predictions can be tested easily in electrophysiological experiments, exploiting the developed stimulus set.

Michael Stettler, Nick Taubert, Tahereh Azizpour, Ramona Siebert, Silvia Spadacenta, Peter Dicke, Peter Thier, Martin A. Giese

### Hierarchical Modeling with Neurodynamical Agglomerative Analysis

We propose a new analysis technique for neural networks, Neurodynamical Agglomerative Analysis (NAA), an analysis pipeline designed to compare class representations within a given neural network model. The proposed pipeline results in a hierarchy of class relationships implied by the network representation, i.e. a semantic hierarchy analogous to a human-made ontological view of the relevant classes. We use networks pretrained on the ImageNet benchmark dataset to infer semantic hierarchies and show the similarity to human-made semantic hierarchies by comparing them with the WordNet ontology. Further, we show using MNIST training experiments that class relationships extracted using NAA appear to be invariant to random weight initializations, tending toward equivalent class relationships across network initializations in sufficiently parameterized networks.

Michael Marino, Georg Schröter, Gunther Heidemann, Joachim Hertzberg

### Deep and Wide Neural Networks Covariance Estimation

It has been recently shown that a deep neural network with i.i.d. random parameters is equivalent to a Gaussian process in the limit of infinite network width. The Gaussian process associated to the neural network is fully described by a recursive covariance kernel determined by the architecture of the network, and which is expressed in terms of expectation. We give a numerically workable analytic expression of the neural network recursive covariance based on Hermite polynomials. We give explicit forms of this recursive covariance for the cases of neural networks with activation function the Heaviside, ReLU and sigmoid.

Argimiro Arratia, Alejandra Cabaña, José Rafael León

### Monotone Deep Spectrum Kernels

A recent result in the literature states that polynomial and conjunctive features can be hierarchically organized and described by different kernels of increasing expressiveness (or complexity). Additionally, the optimal combination of those kernels through a Multiple Kernel Learning approach produces effective and robust deep kernels. In this paper, we extend this approach to structured data, showing an adaptation of classical spectrum kernels, here named monotone spectrum kernels, reflecting a hierarchical feature space of sub-structures of increasing complexity. Finally, we show that (i) our kernels adaptation does not differ significantly from classical spectrum kernels, and (ii) the optimal combination achieves better results than the single spectrum kernel.

Ivano Lauriola, Fabio Aiolli

### Permutation Learning in Convolutional Neural Networks for Time-Series Analysis

This study proposes a novel module in the convolutional neural networks (CNN) framework named permutation layer. With the new layer, we are particularly targeting time-series tasks where 2-dimensional CNN kernel loses its ability to capture the spatially co-related features. Multivariate time-series analysis consists of stacked input channels without considering the order of the channels resulting in an unsorted “2D-image”. 2D convolution kernels are not efficient at capturing features from these distorted as the time-series lacks spatial information between the sensor channels. To overcome this weakness, we propose learnable permutation layers as an extension of vanilla convolution layers which allow to interchange different sensor channels such that sensor channels with similar information content are brought together to enable a more effective 2D convolution operation. We test the approach on a benchmark time-series classification task and report the superior performance and applicability of the proposed method.

Gavneet Singh Chadha, Jinwoo Kim, Andreas Schwung, Steven X. Ding

### GTFNet: Ground Truth Fitting Network for Crowd Counting

Crowd counting aims to estimate the number of pedestrians in a single image. Current crowd counting methods usually obtain counting results by integrating density maps. However, the label density map generated by the Gaussian kernel cannot accurately map the ground truth in the corresponding crowd image, thereby affecting the final counting result. In this paper, a ground truth fitting network called GTFNet was proposed, which aims to generate estimated density maps which can fit the ground truth better. Firstly, the VGG network combined with the dilated convolutional layers was used as the backbone network of GTFNet to extract hierarchical features. The multi-level features were concatenated to achieve compensation for information loss caused by pooling operations, which may assist the network to obtain texture information and spatial information. Secondly, the regional consistency loss function was designed to obtain the mapping results of the estimated density map and the label density map at different region levels. During the training process, the region-level dynamic weights were designed to assign a suitable region fitting range for the network, which can effectively reduce the impact of label errors on the estimated density maps. Finally, our proposed GTFNet was evaluated upon three crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QRNF). The experimental results demonstrated that the proposed GTFNet achieved excellent overall performance on all these datasets.

Jinghan Tan, Jun Sang, Zhili Xiang, Ying Shi, Xiaofeng Xia

### Evaluation of Deep Learning Methods for Bone Suppression from Dual Energy Chest Radiography

Bone suppression in chest x-rays is an important processing step that can often improve visual detection of lung pathologies hidden under ribs or clavicle shadows. Current diagnostic imaging protocol does not include hardware-based bone suppression, hence the need for a software-based solution. This paper evaluates various deep learning models adapted for bone suppression task, namely, we implemented several state-of-the-art deep learning architectures: convolution autoencoder, U-net, FPN, cGAN; augmented them with domain-specific denoising techniques, such as wavelet decomposition, with the aim to identify the optimal solution for chest x-ray analysis. Our results show that wavelet decomposition does not improve the rib suppression, “skip connections” modification outperforms baseline autoencoder approach with and without the usage of the wavelet decomposition, the residual models are trained faster than plain models and achieve higher validation scores.

Ilyas Sirazitdinov, Konstantin Kubrak, Semen Kiselev, Alexey Tolkachev, Maksym Kholiavchenko, Bulat Ibragimov

### Multi-person Absolute 3D Human Pose Estimation with Weak Depth Supervision

In 3D human pose estimation one of the biggest problems is the lack of large, diverse datasets. This is especially true for multi-person 3D pose estimation, where, to our knowledge, there are only machine generated annotations available for training. To mitigate this issue, we introduce a network that can be trained with additional RGB-D images in a weakly supervised fashion. Due to the existence of cheap sensors, videos with depth maps are widely available, and our method can exploit a large, unannotated dataset. Our algorithm is a monocular, multi-person, absolute pose estimator. We evaluate the algorithm on several benchmarks, showing a consistent improvement in error rates. Also, our model achieves state-of-the-art results on the MuPoTS-3D dataset by a considerable margin. Our code will be publicly available ( https://github.com/vegesm/wdspose ).

Márton Véges, András Lőrincz

### Solar Power Forecasting Based on Pattern Sequence Similarity and Meta-learning

We consider the task of simultaneously predicting the solar power output for the next day at half-hourly intervals using data from three related time series: solar, weather and weather forecast. We propose PSF3, a novel pattern sequence forecasting approach, an extension of the standard PSF algorithm, which uses all three time series for clustering, pattern sequence extraction and matching. We evaluate its performance on two Australian datasets from different climate zones; the results show that PSF3 is more accurate than the other PSF methods. We also investigate if a dynamic meta-learning ensemble combining the two best methods, PSF3 and a neural network, can further improve the results. We propose a new weighting strategy for combining the predictions of the ensemble members and compare it with other strategies. The overall most accurate prediction model is the meta-learning ensemble with the proposed weighting strategy.

Yang Lin, Irena Koprinska, Mashud Rana, Alicia Troncoso

### Analysis and Prediction of Deforming 3D Shapes Using Oriented Bounding Boxes and LSTM Autoencoders

For sequences of complex 3D shapes in time we present a general approach to detect patterns for their analysis and to predict the deformation by making use of structural components of the complex shape. We incorporate long short-term memory (LSTM) layers into an autoencoder to create low dimensional representations that allow the detection of patterns in the data and additionally detect the temporal dynamics in the deformation behavior. This is achieved with two decoders, one for reconstruction and one for prediction of future time steps of the sequence. In a preprocessing step the components of the studied object are converted to oriented bounding boxes which capture the impact of plastic deformation and allow reducing the dimensionality of the data describing the structure. The architecture is tested on the results of 196 car crash simulations of a model with 133 different components, where material properties are varied. In the latent representation we can detect patterns in the plastic deformation for the different components. The predicted bounding boxes give an estimate of the final simulation result and their quality is improved in comparison to different baselines.

Sara Hahner, Rodrigo Iza-Teran, Jochen Garcke

### Novel Sketch-Based 3D Model Retrieval via Cross-domain Feature Clustering and Matching

To date, with the rapid advancement of scanning hardware and CAD software, we are facing technically challenging on how to search and find a desired model from a huge shape repository in a fast and accurate way in this bigdata digital era. Sketch-based 3D model retrieval is a flexible and user-friendly approach to tackling the existing challenges. In this paper, we articulate a novel way for model retrieval by means of sketching and building a 3D model retrieval framework based on deep learning. The central idea is to dynamically adjust the distance between the learned features of sketch and model in the encoded latent space through the utility of several deep neural networks. In the pre-processing phase, we convert all models in the shape database from meshes to point clouds because of its lightweight and simplicity. We first utilize two deep neural networks for classification to generate embeddings of both input sketch and point cloud. Then, these embeddings are fed into our clustering deep neural network to dynamically adjust the distance between encodings of the sketch domain and the model domain. The application of the sketch embedding to the retrieval similarity measurement could continue to improve the performance of our framework by re-mapping the distance between encodings from both domains. In order to evaluate the performance of our novel approach, we test our framework on standard datasets and compare it with other state-of-the-art methods. Experimental results have validated the effectiveness, robustness, and accuracy of our novel method.

Kai Gao, Jian Zhang, Chen Li, Changbo Wang, Gaoqi He, Hong Qin

### Multi-objective Cuckoo Algorithm for Mobile Devices Network Architecture Search

The network architecture search technique is nowadays becoming the next generation paradigm of architectural engineering, which could free experts from trials and errors while achieving state-of-the-art performances in lots of applications such as image classification and language modeling. It is immensely crucial for deploying deep networks on a wide range of mobile devices with limited computing resources to provide more flexible service. In this paper, a novel multi-objective oriented algorithm called MOCS-Net for mobile devices network architecture search is proposed. In particular, the search space is compact and flexible which leverages good virtues from efficient mobile CNNs and is block-wise constructed by different stacked blocks. Moreover, an enhanced multi-objective cuckoo algorithm is incorporated, in which mutation is achieved by Lévy flights which are performed at the block level. Experimental results suggest that MOCS-Net could find competitive neural architectures on ImageNet with a better trade-off among various competing objectives compared with other state-of-the-art methods. Meanwhile, these results show the effectiveness of proposed MOCS-Net and the promise to further the use of MOCS-Net in various deep-learning paradigms.

Nan Zhang, Jianzong Wang, Jian Yang, Xiaoyang Qu, Jing Xiao

### DeepED: A Deep Learning Framework for Estimating Evolutionary Distances

Evolutionary distances refer to the number of substitutions per site in two aligned nucleotide or amino acid sequences, which reflect divergence time and are much significant for phylogenetic inferences. In the past several decades, lots of molecular evolution models have been proposed for evolutionary distance estimation. Most of these models are designed under more or less assumptions and some assumptions are in good agreement with some real-world data but not all. To relax these assumptions and improve accuracies in evolutionary distance estimation, this paper proposes a framework containing Deep Neural Networks (DNNs), called DeepED (Deep learning method to estimate Evolutionary Distances), to estimate evolutionary distances for aligned DNA sequence pairs. The purposely designed structure in this framework enables it to handle long and variable length sequences as well as to find important segments in a sequence. The models of the network are trained with reliable data from real world which includes highly credible phylogenetic inferences. Experimental results demonstrate that DeepED models achieve a accuracy up to 0.98 (R-Squared), which outperforms traditional methods.

Zhuangzhuang Liu, Mingming Ren, Zhiheng Niu, Gang Wang, Xiaoguang Liu

### Interpretable Machine Learning Structure for an Early Prediction of Lane Changes

This paper proposes an interpretable machine learning structure for the task of lane change intention prediction, based on multivariate time series data. A Mixture-of-Experts architecture is adapted to simultaneously predict lane change directions and the time-to-lane-change. To facilitate reproducibility, the approach is demonstrated on a publicly available dataset of German highway scenarios. Recurrent networks for time series classification using Gated Recurrent Units and Long-Short-Term Memory cells, as well as convolution neural networks serve as comparison references. The interpretability of the results is shown by extracting the rule sets of the underlying classification and regression trees, which are grown using data-adaptive interpretable features. The proposed method outperforms the reference methods in false alarm behavior while displaying a state-of-the-art early detection performance.

Oliver Gallitz, Oliver De Candido, Michael Botsch, Ron Melz, Wolfgang Utschick

### Convex Density Constraints for Computing Plausible Counterfactual Explanations

The increasing deployment of machine learning as well as legal regulations such as EU’s GDPR cause a need for user-friendly explanations of decisions proposed by machine learning models. Counterfactual explanations are considered as one of the most popular techniques to explain a specific decision of a model. While the computation of “arbitrary” counterfactual explanations is well studied, it is still an open research problem how to efficiently compute plausible and feasible counterfactual explanations. We build upon recent work and propose and study a formal definition of plausible counterfactual explanations. In particular, we investigate how to use density estimators for enforcing plausibility and feasibility of counterfactual explanations. For the purpose of efficient computations, we propose convex density constraints that ensure that the resulting counterfactual is located in a region of the data space of high density.

André Artelt, Barbara Hammer

### Identifying Critical States by the Action-Based Variance of Expected Return

The balance of exploration and exploitation plays a crucial role in accelerating reinforcement learning (RL). To deploy an RL agent in human society, its explainability is also essential. However, basic RL approaches have difficulties in deciding when to choose exploitation as well as in extracting useful points for a brief explanation of its operation. One reason for the difficulties is that these approaches treat all states the same way. Here, we show that identifying critical states and treating them specially is commonly beneficial to both problems. These critical states are the states at which the action selection changes the potential of success and failure dramatically. We propose to identify the critical states using the variance in the Q-function for the actions and to perform exploitation with high probability on the identified states. These simple methods accelerate RL in a grid world with cliffs and two baseline tasks of deep RL. Our results also demonstrate that the identified critical states are intuitively interpretable regarding the crucial nature of the action selection. Furthermore, our analysis of the relationship between the timing of the identification of especially critical states and the rapid progress of learning suggests there are a few especially critical states that have important information for accelerating RL rapidly.

Izumi Karino, Yoshiyuki Ohmura, Yasuo Kuniyoshi

### Explaining Concept Drift by Mean of Direction

The notion of concept drift refers to the phenomenon that the distribution, which is underlying the observed data, changes over time; as a consequence machine learning models may become inaccurate and need adjustment. In this paper we present a novel method to describe concept drift as a whole by means of flows, i.e. the change of direction and magnitude of particles drawn according to the distribution over time. This problem is of importance in the context of monitoring technical devices and systems, since it allows us to adapt models according to the expected drift, and it enables an inspection of the most prominent features where drift manifests itself. The purpose of this paper is to establish a formal definition of this problem and to present a first, yet simple linear method as a proof of concept. Interestingly, we show that a natural choice in terms of normalized expected linear change constitutes the canonical solution for a linear modeling under mild assumptions, which generalizes expected differences on the one hand and expected direction on the other. This first, global linear approach can be extended to a more fine grained method using common localization techniques. We demonstrate the usefulness of our approach by applying it to theoretical and real world data.

Fabian Hinder, Johannes Kummert, Barbara Hammer

### Context Adaptive Metric Model for Meta-learning

The metric-based meta-learning is effective to solve few-shot problems. Generally, a metric model learns a task-agnostic embedding function, maps instances to a low-dimensional embedding space, then classifies unlabeled examples by similarity comparison. However, different classification tasks have individual discriminative characteristics, and previous approaches are constrained to use a single set of features for all possible tasks. In this work, we introduce a Context Adaptive Metric Model (CAMM), which has adaptive ability to extract key features and can be used for most metric models. Our extension consists of two parts: Context parameter module and Self-evaluation module. The context is interpreted as a task representation that modulates the behavior of feature extractor. CAMM fine-tunes context parameters via Self-evaluation module to generate task-specific embedding functions. We demonstrate that our approach is competitive with recent state-of-the-art systems, improves performance considerably (4%–6% relative) over baselines on mini-imagenet benchmark. Our code is publicly available at https://github.com/Jorewang/CAMM .

Zhe Wang, Fanzhang Li

### Ensemble-Based Deep Metric Learning for Few-Shot Learning

Overfitting is an inherent problem in few-shot learning. Ensemble learning integrates multiple machine learning models to improve the overall prediction ability on limited data and hence alleviates the problem of overfitting effectively. Therefore, we apply the idea of ensemble learning to few-shot learning to improve the accuracy of few-shot classification. Metric learning is an important means to solve the problem of few-shot classification. In this paper, we propose ensemble-based deep metric learning (EBDM) for few-shot learning, which is trained end-to-end from scratch. We split the feature extraction network into two parts: the shared part and exclusive part. The shared part is the lower layers of the feature extraction network and is shared across ensemble members to reduce the number of parameters. The exclusive part is the higher layers of the feature extraction network and is exclusive to each individual learner. The coupling of the two parts naturally forces any diversity between the ensemble members to be concentrated on the deeper, unshared layers. We can obtain different features from the exclusive parts and then use these different features to compute diverse metrics. Combining these multiple metrics together will generate a more accurate ensemble metric. This ensemble metric can be used to assign labels to images of new classes with a higher accuracy. Our work leads to a simple, effective, and efficient framework for few-shot classification. The experimental results show that our approach attains superior performance, with the largest improvement of $$4.85\%$$ in classification accuracy over related competitive baselines.

Meng Zhou, Yaoyi Li, Hongtao Lu

### More Attentional Local Descriptors for Few-Shot Learning

Learning from a few examples remains a key challenge for many computer vision tasks. Few-shot learning is proposed to tackle this problem. It aims to learn a classifier to classify images when each class contains only few samples with supervised information in image classification. So far, existing methods have achieved considerable progress, which use fully connected layer or global average pooling as the final classification method. However, due to the lack of samples, global feature may no longer be useful. In contrast, the local feature is more conductive to few-shot learning, but inevitably there will be some noises. In the meanwhile, inspired by human visual systems, the attention mechanism can obtain more valuable information and be widely used in various areas. Therefore, in this paper, we propose a method called More Attentional Deep Nearest Neighbor Neural Network (MADN4 in short) that combines the local descriptors with attention mechanism and is trained end-to-end from scratch. The experimental results on four benchmark datasets demonstrate the superior capability of our method.

Hui Li, Liu Yang, Fei Gao

### Implementation of Siamese-Based Few-Shot Learning Algorithms for the Distinction of COPD and Asthma Subjects

This paper investigates the practicality of applying brain-inspired Few-Shot Learning (FSL) algorithms for addressing shortcomings of Machine Learning (ML) methods in medicine with limited data availability. As a proof of concept, the application of ML for the detection of Chronic Obstructive Pulmonary Disease (COPD) patients was investigated. The complexities associated with the distinction of COPD and asthma patients and the lack of sufficient training data for asthma subjects impair the performance of conventional ML models for the recognition of COPD. Therefore, the objective of this study was to implement FSL methods for the distinction of COPD and asthma subjects with a few available data points. The proposed FSL models in this work were capable of recognizing asthma and COPD patients with 100% accuracy, demonstrating the feasibility of the approach for applications such as medicine with insufficient data availability.

Pouya Soltani Zarrin, Christian Wenger

### Few-Shot Learning for Medical Image Classification

Rapid and accurate classification of medical images plays an important role in medical diagnosis. Nowadays, for medical image classification, there are some methods based on machine learning, deep learning and transfer learning. However, these methods may be time-consuming and not suitable for small datasets. Based on these limitations, we propose a novel method which combines few-shot learning method and attention mechanism. Our method takes end-to-end learning to solve the problem of artificial feature extraction in machine learning and few-shot learning method is especially to fulfill small datasets tasks, which means it performs better than traditional deep learning. In addition, our method can make full use of spatial and channel information which enhances the representation of models. Furthermore, we adopt 1 $$\times$$ 1 convolution to enhance the interactions of cross channel information. Then we apply the model to the medical dataset Brain Tumor and compare it with the transfer learning method and Dual Path Network. Our method achieves an accuracy of 92.44%, which is better than the above methods.

Aihua Cai, Wenxin Hu, Jun Zheng

### Adversarial Defense via Attention-Based Randomized Smoothing

Recent works have shown the effectiveness of randomized smoothing in adversarial defense. This paper presents a new understanding of randomized smoothing. Features that are vulnerable to noise are not conducive to the prediction of model under adversarial perturbations. An enhanced defense called Attention-based Randomized Smoothing (ARS) is proposed. Based on smoothed classifier, ARS designs a mixed attention module, which helps model merge smoothed feature with original feature and pay more attention to robust feature. The advantages of ARS are manifested in four ways: 1) Superior performance on both clean and adversarial samples. 2) Without pre-processing in inference. 3) Explicable attention map. 4) Compatible with other defense methods. Experiment results demonstrate that ARS achieves the state-of-the-art defense against adversarial attacks on MNIST and CIFAR-10 datasets, outperforming Salman’s defense when the attacks are limited to a maximum norm.

Xiao Xu, Shiyu Feng, Zheng Wang, Lizhe Xie, Yining Hu

### Learning to Learn from Mistakes: Robust Optimization for Adversarial Noise

Sensitivity to adversarial noise hinders deployment of machine learning algorithms in security-critical applications. Although many adversarial defenses have been proposed, robustness to adversarial noise remains an open problem. The most compelling defense, adversarial training, requires a substantial increase in processing time and it has been shown to overfit on the training data. In this paper, we aim to overcome these limitations by training robust models in low data regimes and transfer adversarial knowledge between different models. We train a meta-optimizer which learns to robustly optimize a model using adversarial examples and is able to transfer the knowledge learned to new models, without the need to generate new adversarial examples. Experimental results show the meta-optimizer is consistent across different architectures and data sets, suggesting it is possible to automatically patch adversarial vulnerabilities.

Alex Serban, Erik Poll, Joost Visser

### Unsupervised Anomaly Detection with a GAN Augmented Autoencoder

Anomaly detection is a task of identifying samples that differ from the training data distribution. There are several studies that employ generative adversarial networks (GANs) as the main tool to detect anomalies using the rich contextual information that GANs provide. We propose an unsupervised GAN-based model combined with an autoencoder to detect the anomalies. Then we use the latent information obtained from the autoencoder, the internal representation of the discriminator, and visual information of the generator to assign an anomaly score to samples. This anomaly score is used to discriminate anomalous samples from normal samples. The model was evaluated on benchmark datasets such as MNIST and CIFAR10 plus on a specific domain of medical images of a public Leukemia dataset. The model achieved state-of-the-art performance in comparison with its counterparts in almost all of the experiments.

Laya Rafiee, Thomas Fevens

### An Efficient Blurring-Reconstruction Model to Defend Against Adversarial Attacks

Although deep neural networks have been widely applied in many fields, they can be easily fooled by adversarial examples which are generated by adding imperceptible perturbations to natural images. Intuitively, traditional denoising methods can be used to remove the added noise but the original useful information is eliminated inevitably when denoising. Inspired by image super-resolution, we propose a novel blurring-reconstruction method to defend against adversarial attacks which consists of two period, blurring and reconstruction. When blurring, the improved bilateral filter, which we call it Other Channels Assisted Bilateral Filter (OCABF), is firstly used to remove the perturbations, followed by a bilinear interpolation based downsampling to resize the image into a quarter size. Then, in the reconstruction period, we design a deep super-resolution neural network called SrDefense-Net to recover the natural details. It enlarges the downsampled image after blurring to the same size as the original one and complements the lost information. Plenty of experiments show that the proposed method outperforms the state-of-the-art defense methods as well as less training images demanded.

Wen Zhou, Liming Wang, Yaohao Zheng

### EdgeAugment: Data Augmentation by Fusing and Filling Edge Maps

Data augmentation is an effective technique for improving the accuracy of network. However, current data augmentation can not generate more diverse training data. In this article, we overcome this problem by proposing a novel form of data augmentation to fuse and fill different edge maps. The edge fusion augmentation pipeline consists of four parts. We first use the Sobel operator to extract the edge maps from the training images. Then a simple integrated strategy is used to integrate the edge maps extracted from different images. After that we use an edge fuse GAN (Generative Adversarial Network) to fuse the integrated edge maps to synthesize new edge maps. Finally, an edge filling GAN is used to fill the edge maps to generate new training images. This augmentation pipeline can augment data effectively by making full use of the features from training set. We verified our edge fusion augmentation pipeline on different datasets combining with different edge integrated strategies. Experimental results illustrate a superior performance of our pipeline comparing to the existing work. Moreover, as far as we know, we are the first using GAN to augment data by fusing and filling feature from multiple edge maps.

Bangfeng Xia, Yueling Zhang, Weiting Chen, Xiangfeng Wang, Jiangtao Wang

### Face Anti-spoofing with a Noise-Attention Network Using Color-Channel Difference Images

The wide deployment of face recognition systems has raised serious concern on its security level. One of the most prominent threats is the presentation attack, which attempts to fool the system in the absence of the real person. In this article, we propose a way of identifying the spoof face by using spoofing noise. Since the spoofing noise is brought into the spoof image when an adversary uses some tricks to fool the face recognition system, it consists of imagining device noise and is affected by the spoofing environments. We think it is a clue against fake face. We first address how to use color channel difference images to enhance the spoofing noise, and then introduce a self-adapting attention framework named Noise-Attention Network to learn the end-to-end spoofing-features. Experiments on benchmarks including CASIA-FASD, MSU-MFSD and Idiap Replay-Attack demonstrate the effectiveness of the proposed method. It can yield results comparable with other current methods but has better robustness.

Yuanyuan Ren, Yongjian Hu, Beibei Liu, Yixiang Xie, Yufei Wang

### Variational Autoencoder with Global- and Medium Timescale Auxiliaries for Emotion Recognition from Speech

Unsupervised learning is based on the idea of self-organization to find hidden patterns and features in the data without the need for labels. Variational autoencoders (VAEs) are generative unsupervised learning models that create low-dimensional representations of the input data and learn by regenerating the same input from that representation. Recently, VAEs were used to extract representations from audio data, which possess not only content-dependent information but also speaker-dependent information such as gender, health status, and speaker ID. VAEs with two timescale variables were then introduced to disentangle these two kinds of information from each other. Our approach introduces a third, i.e. medium timescale into a VAE. So instead of having only a global and a local timescale variable, this model holds a global, a medium, and a local variable. We tested the model on three downstream applications: speaker identification, gender classification, and emotion recognition, where each hidden representation performed better on some specific tasks than the other hidden representations. Speaker ID and gender were best reported by the global variable, while emotion was best extracted when using the medium. Our model achieves excellent results exceeding state-of-the-art models on speaker identification and emotion regression from audio.

Hussam Almotlak, Cornelius Weber, Leyuan Qu, Stefan Wermter

### Improved Classification Based on Deep Belief Networks

For better classification, generative models are used to initialize the model and extract features before training a classifier. Typically, separate unsupervised and supervised learning problems are solved. Generative restricted Boltzmann machines and deep belief networks are widely used for unsupervised learning. We developed several supervised models based on deep belief networks in order to improve this two-phase strategy. Modifying the loss function to account for expectation with respect to the underlying generative model, introducing weight bounds, and multi-level programming are all applied in model development. The proposed models capture both unsupervised and supervised objectives effectively. The computational study verifies that our models perform better than the two-phase training approach. In addition, we conduct an ablation study to examine how a different part of our model and a different mix of training samples affect the performance of our models.

Jaehoon Koo, Diego Klabjan

### Temporal Anomaly Detection by Deep Generative Models with Applications to Biological Data

An approach to anomaly detection is to use a partly disentangled representation of the latent space of a generative model. In this study, generative adversarial networks (GAN) are used as the normal data generator, and an additional encoder is trained to map data to the latent space. Then, a data anomaly can be detected by a reconstruction error and a position in the latent space. If the latent space is disentangled (in a sense that some latent variables are interpretable and can characterize the data), the anomaly is also characterized by the mapped position in the latent space. The present study proposes a method to characterize temporal anomalies in time series using Causal InfoGAN, proposed by Kurutach et al., to disentangle a state space of the dynamics of time-series data. Temporal anomalies are quantified by the transitions in the acquired state space. The proposed method is applied to four-dimensional biological dataset: morphological data of a genetically manipulated embryo. Computer experiments are conducted on three-dimensional data of the cell (nuclear) division dynamics in early embryonic development of C. elegans, which lead to the detection of morphological and temporal anomalies caused by the knockdown of lethal genes.

Takaya Ueda, Yukako Tohsato, Ikuko Nishikawa

### Inferring, Predicting, and Denoising Causal Wave Dynamics

The novel DISTributed Artificial neural Network Architecture (DISTANA) is a generative, recurrent graph convolution neural network. It implements a grid or mesh of locally parameterizable laterally connected network modules. DISTANA is specifically designed to identify the causality behind spatially distributed, non-linear dynamical processes. We show that DISTANA is very well-suited to denoise data streams, given that re-occurring patterns are observed, significantly outperforming alternative approaches, such as temporal convolution networks and ConvLSTMs, on a complex spatial wave propagation benchmark. It produces stable and accurate closed-loop predictions even over hundreds of time steps. Moreover, it is able to effectively filter noise—an ability that can be improved further by applying denoising autoencoder principles or by actively tuning latent neural state activities retrospectively. Results confirm that DISTANA is ready to model real-world spatio-temporal dynamics such as brain imaging, supply networks, water flow, or soil and weather data patterns.

Matthias Karlbauer, Sebastian Otte, Hendrik P. A. Lensch, Thomas Scholten, Volker Wulfmeyer, Martin V. Butz

### PART-GAN: Privacy-Preserving Time-Series Sharing

In this paper, we provide a practical privacy-preserving generative model for time series data augmentation and sharing, called PART-GAN. Our model enables the local data curator to provide a freely accessible public generative model derived from original data for cloud storage. Compared with existing approaches, PART-GAN has three key advantages: It enables the generation of an unlimited amount of synthetic time series data under the guidance of a given classification label and addresses the incomplete and temporal irregularity issues. It provides a robust privacy guarantee that satisfies differential privacy to time series data augmentation and sharing. It addresses the trade-offs between utility and privacy by applying optimization strategies. We evaluate and report the utility and efficacy of PART-GAN through extensive empirical evaluations of real-world health/medical datasets. Even at a higher level of privacy protection, our method outperforms GAN with ordinary perturbation. It achieves similar performance with GAN without perturbation in terms of inception score, machine learning score similarity, and distance-based evaluations.

Shuo Wang, Carsten Rudolph, Surya Nepal, Marthie Grobler, Shangyu Chen

### EvoNet: A Neural Network for Predicting the Evolution of Dynamic Graphs

Neural networks for structured data like graphs have been studied extensively in recent years. To date, the bulk of research activity has focused mainly on static graphs. However, most real-world networks are dynamic since their topology tends to change over time. Predicting the evolution of dynamic graphs is a task of high significance in the area of graph mining. Despite its practical importance, the task has not been explored in depth so far, mainly due to its challenging nature. In this paper, we propose a model that predicts the evolution of dynamic graphs. Specifically, we use a graph neural network along with a recurrent architecture to capture the temporal evolution patterns of dynamic graphs. Then, we employ a generative model which predicts the topology of the graph at the next time step and constructs a graph instance that corresponds to that topology. We evaluate the proposed model on several artificial datasets following common network evolving dynamics, as well as on real-world datasets. Results demonstrate the effectiveness of the proposed model.

Changmin Wu, Giannis Nikolentzos, Michalis Vazirgiannis

### Facial Expression Recognition Method Based on a Part-Based Temporal Convolutional Network with a Graph-Structured Representation

Facial expressions are controlled by facial muscles and can be regarded as appearance and shape variations in key parts. A key challenge in facial expression recognition is capturing effective information from a facial image. In this paper, we propose a basic graph contour that is based on key parts for facial expression recognition. Each node on the graph contour represents a landmark, and each edge represents the connection between the two selected nodes. To further investigate the graph representation and to make the graphs more distinctive, we use a Gabor filter to extract appearance variations around the graph nodes while applying an affine transformation to capture the shape variations from graphs without expression in graphs with expression. Then, to serve as an efficient network for processing in which the graph extracts the appearance and shape representations, we introduce the temporal convolutional network (TCN). Finally, we propose a part-based temporal convolutional network (PTCN) that emphasizes the key facial parts. The experimental results demonstrate that this method realizes significant improvements over state-of-the-art methods utilizing three widely used facial databases: Oulu-CASIA, CK+, and MMI.

Lei Zhong, Changmin Bai, Jianfeng Li, Tong Chen, Shigang Li

### Generating Facial Expressions Associated with Text

How will you react to the next post that you are going to read? In this paper we propose a learning system that is able to artificially alter the picture of a face in order to generate the emotion that is associated with a given input text. The face generation procedure is function of further information about the considered person, either given (topics of interest) or automatically estimated from the provided picture (age, sex). In particular, two Convolutional Networks are trained to predict age and sex, while two other Recurrent Neural Network-based models predict the topic and the dominant emotion in the input text. First Order Logic (FOL)-based functions are introduced to mix the outcome of the four neural models and to decide which emotion to generate, following the theory of T-Norms. Finally, the same theory is exploited to build a neural generative model of facial expressions, that is used create the final face. Experimental results are performed to assess the quality of the information extraction process and to show the outcome of the generative network.

Lisa Graziani, Stefano Melacci, Marco Gori

### Bilinear Fusion of Commonsense Knowledge with Attention-Based NLI Models

We consider the task of incorporating real-world commonsense knowledge into deep Natural Language Inference (NLI) models. Existing external knowledge incorporation methods are limited to lexical-level knowledge and lack generalization across NLI models, datasets, and commonsense knowledge sources. To address these issues, we propose a novel NLI model-independent neural framework, BiCAM. BiCAM incorporates real-world commonsense knowledge into NLI models. Combined with convolutional feature detectors and bilinear feature fusion, BiCAM provides a conceptually simple mechanism that generalizes well. Quantitative evaluations with two state-of-the-art NLI baselines on SNLI and SciTail datasets in conjunction with ConceptNet and Aristo Tuple KGs show that BiCAM considerably improves the accuracy the incorporated NLI baselines. For example, our BiECAM model, an instance of BiCAM, on the challenging SciTail dataset, improves the accuracy of incorporated baselines by 7.0% with ConceptNet, and 8.0% with Aristo Tuple KG.

Amit Gajbhiye, Thomas Winterbottom, Noura Al Moubayed, Steven Bradley

### Neural-Symbolic Relational Reasoning on Graph Models: Effective Link Inference and Computation from Knowledge Bases

The recent developments and growing interest in neural-symbolic models has shown that hybrid approaches can offer richer models for Artificial Intelligence. The integration of effective relational learning and reasoning methods is one of the key challenges in this direction, as neural learning and symbolic reasoning offer complementary characteristics that can benefit the development of AI systems. Relational labelling or link prediction on knowledge graphs has become one of the main problems in deep learning-based natural language processing research. Moreover, other fields which make use of neural-symbolic techniques may also benefit from such research endeavours. There have been several efforts towards the identification of missing facts from existing ones in knowledge graphs. Two lines of research try and predict knowledge relations between two entities by considering all known facts connecting them or several paths of facts connecting them. We propose a neural-symbolic graph neural network which applies learning over all the paths by feeding the model with the embedding of the minimal subset of the knowledge graph containing such paths. By learning to produce representations for entities and facts corresponding to word embeddings, we show how the model can be trained end-to-end to decode these representations and infer relations between entities in a multitask approach. Our contribution is two-fold: a neural-symbolic methodology leverages the resolution of relational inference in large graphs, and we also demonstrate that such neural-symbolic model is shown more effective than path-based approaches.

Henrique Lemos, Pedro Avelar, Marcelo Prates, Artur Garcez, Luís Lamb

### Tell Me Why You Feel That Way: Processing Compositional Dependency for Tree-LSTM Aspect Sentiment Triplet Extraction (TASTE)

Sentiment analysis has transitioned from classifying the sentiment of an entire sentence to providing the contextual information of what targets exist in a sentence, what sentiment the individual targets have, and what the causal words responsible for that sentiment are. However, this has led to elaborate requirements being placed on the datasets needed to train neural networks on the joint triplet task of determining an entity, its sentiment, and the causal words for that sentiment. Requiring this kind of data for training systems is problematic, as they suffer from stacking subjective annotations and domain over-fitting leading to poor model generalisation when applied in new contexts. These problems are also likely to be compounded as we attempt to jointly determine additional contextual elements in the future. To mitigate these problems, we present a hybrid neural-symbolic method utilising a Dependency Tree-LSTM’s compositional sentiment parse structure and complementary symbolic rules to correctly extract target-sentiment-cause triplets from sentences without the need for triplet training data. We show that this method has the potential to perform in line with state-of-the-art approaches while also simplifying the data required and providing a degree of interpretability through the Tree-LSTM.

Alexander Sutherland, Suna Bensch, Thomas Hellström, Sven Magg, Stefan Wermter

### SOM-Based System for Sequence Chunking and Planning

In this paper we present a connectionist architecture called C-block combining several powerful and cognitively relevant features. It can learn sequential dependencies in incoming data and predict probability distributions over possible next inputs, notice repeatedly occurring sequences, automatically detect sequence boundaries (based on surprise in prediction) and represent sequences declaratively as chunks/plans for future execution or replay. It can associate plans with reward, and also with their effects on the system state. It also supports plan inference from an observed sequence of behaviours: it can recognize possible plans, along with their likely intended effects and expected reward, and can revise these inferences as the sequence unfolds. Finally, it implements goal-driven behaviour, by finding and executing a plan that most effectively reduces the difference between the current system state and the agent’s desired state (goal). C-block is based on modified self-organizing maps that allow fast learning, approximate queries and Bayesian inference.

Martin Takac, Alistair Knott, Mark Sagar

### Bilinear Models for Machine Learning

In this work we define and analyze the bilinear models which replace the conventional linear operation used in many building blocks of machine learning (ML). The main idea is to devise the ML algorithms which are adapted to the objects they treat. In the case of monochromatic images, we show that the bilinear operation exploits better the structure of the image than the conventional linear operation which ignores the spatial relationship between the pixels. This translates into significantly smaller number of parameters required to yield the same performance. We show numerical examples of classification in the MNIST data set.

Tayssir Doghri, Leszek Szczecinski, Jacob Benesty, Amar Mitiche

### Enriched Feature Representation and Combination for Deep Saliency Detection

One of the most challenging issue in visual saliency detection is to discover and integrate meaningful features through deep neural networks. Saliency detection model should be carefully designed to extract sufficient features from different levels and reorganize them into the final prediction. In this paper, we propose an efficient saliency detection framework by introducing multi-scale representation and multi-level combination to deep convolutional neural networks. The main idea of our proposed model is to optimize intra-level feature extraction and inter-level feature combination, so that both saliency semantic and object details can be correctly preserved in final saliency maps. The model utilizes parallel dilated convolutions and pyramid pooling structures to enhance local details and acquire multi-scale feature representation. Feature maps of different resolutions are integrated by performing hierarchical combination in the encoder and decoder parts respectively. As a result, the model can better retain detail information during feature extraction and locate salient regions for saliency map recovery. Experimental results show that our model achieves state-of-the-art performance on several representative datasets.

Lecheng Zhou, Xiaodong Gu

### Spectral Graph Reasoning Network for Hyperspectral Image Classification

Convolutional neural networks (CNNs) have achieved remarkable performance in hyperspectral image (HSI) classification over the last few years. Despite the progress that has been made, rich and informative spectral information of HSI has been largely underutilized by existing methods which employ convolutional kernels with limited size of receptive field in the spectral domain. To address this issue, we propose a spectral graph reasoning network (SGR) learning framework comprising two crucial modules: 1) a spectral decoupling module which unpacks and casts multiple spectral embeddings into a unified graph whose node corresponds to an individual spectral feature channel in the embedding space; the graph performs interpretable reasoning to aggregate and align spectral information to guide learning spectral-specific graph embeddings at multiple contextual levels 2) a spectral ensembling module explores the interactions and interdependencies across graph embedding hierarchy via a novel recurrent graph propagation mechanism. Experiments on two HSI datasets demonstrate that the proposed architecture can significantly improve the classification accuracy compared with the existing methods with a sizable margin.

Huiling Wang

### Salient Object Detection with Edge Recalibration

Salient Object Detection (SOD) based on Convolutional Neural Networks (CNNs) has been widely studied recently. How to maintain a complete and clear object boundary structure is still a key issue. Existing works with the utilization of edge information have already improved this issue to some extent. However, these methods extract boundary features indiscriminately, which may weaken useful edge information and mislead edge construction. To address this problem, we present an Edge Recalibration Network (ERN) model for image-based SOD to perform edge-guided features effectively. In a specific, a progressive Fully Convolutional neural Networks (FCNs) for SOD is adopted to incorporate multi-scale and multi-level features. Besides, to locate the edge position and preserve the boundary features accurately, we propose an edge enhancement module with pixel-wise semantic-edge integration and channel-wise feature recalibration. Based on pixelwise semantic-edge integration, the semantic features and boundary features are integrated into the holistic feature maps. Based on channel-wise feature recalibration, the boundary features selectively recalibrate salient semantic features on channel dimension, aiming to enhance useful features and suppress useless features, for the similarity of boundary features and salient semantic features. Experimental results on five popular benchmark datasets show that the proposed model ERN outperforms other state-of-the-art methods under different evaluation metrics.

Zhenshan Tan, Yikai Hua, Xiaodong Gu

### Multi-Scale Cross-Modal Spatial Attention Fusion for Multi-label Image Recognition

Multi-label image recognition aims to jointly predict multiple tags for an image. Despite great progress achieved, there are still two limitations for existing methods: 1) can not accurately locate the object regions due to the lack of adequate supervision information or semantic guidance; 2) can not effectively identify the target categories of small-size object due to only employing the high-level feature of deep CNN. In this paper, we propose a Multi-Scale Cross-Modal Spatial Attention Fusion (MCSAF) network to accurately locate more informative regions by introducing a spatial attention module, and our model can effectively recognize target classes of different scales with multi-scale cross-modal feature fusion. Furthermore, we develop an adaptive graph convolutional network (Adaptive-GCN) to capture the complex correlations among labels in depth. Empirical studies on benchmark datasets validate the superiority of our proposed model over state-of-the-art methods.

Junbing Li, Changqing Zhang, Xueman Wang, Ling Du

### A New Efficient Finger-Vein Verification Based on Lightweight Neural Network Using Multiple Schemes

Existing deep learning-based finger-vein algorithms tend to use large-scale neural networks. From the perspective of computational complexity, this is not conducive to practical applications. Besides, in our opinion, finger-vein images often have relatively simple textures and are small in image size, it is not economical to use large-scale neural networks. Inspired by the increasing accuracy of lightweight neural networks on ImageNet, we introduce the lightweight neural network ShuffleNet V2 as a backbone to construct a basic pipeline for finger-vein verification. To customize the network for this application, we propose schemes to improve it from the aspects including data input, network structure, and loss function design. Experimental results on three public databases have exhibited the excellence of the proposed model.

Haocong Zheng, Yongjian Hu, Beibei Liu, Guang Chen, Alex C. Kot

### SU-Net: An Efficient Encoder-Decoder Model of Federated Learning for Brain Tumor Segmentation

Using deep learning for semantic segmentation of medical images is a popular topic of wise medical. The premise of training an efficient deep learning model is to have a large number of medical images with annotations. Most medical images are scattered in hospitals or research institutions, and professionals such as doctors always don’t have enough time to label the images. Besides, due to the constraints of privacy protection regulations like GDPR, sharing data directly between multiple institutions is prohibited. To solve the obstacles above, we propose an efficient federated learning model SU-Net for brain tumor segmentation. We introduce inception module and dense block into standard U-Net to comprise our SU-Net with multi-scale receptive fields and information reusing. We conduct experiments on the LGG (Low-Grade Glioma) Segmentation dataset “Brain MRI Segmentation” in Kaggle. The results show that, in non-federated scenario, SU-Net achieves a AUC (Area Under Curve which measures classification accuracy) of $$99.7\%$$ and a DSC (Dice Similarity Coefficient which measures segmentation accuracy) of $$78.5\%$$ , which are remarkably higher than the state-of-the-art semantic segmentation model DeepLabv3+ and the classical model U-Net dedicated to semantic segmentation of medical images. In federated scenario, SU-Net still outperforms the baselines.

Liping Yi, Jinsong Zhang, Rui Zhang, Jiaqi Shi, Gang Wang, Xiaoguang Liu

### Synthesis of Registered Multimodal Medical Images with Lesions

The collection and annotation of medical images data have always been a challenge in many data-driven medical image processing tasks, especially for registered multimodal medical images data. This can be effectively alleviated by utilizing the image synthesis technology. However, directly-synthesized medical images generated by current methods usually have unreasonable structures or contours and uncontrollable lesions. In this paper, we proposed a new method to synthesize registered multimodal medical images from a random normal distribution matrix based on the Generative Adversarial Networks. Besides, the corresponding lesions can be generated efficiently based on the selected lesion labels. We performed validation experiments on multiple public datasets to verify the effectiveness of synthetic lesions and the availability of synthetic data. The results show that our synthetic data can be used as pre-trained data or enhanced data in medical image intelligent processing tasks to greatly improve the generalization ability of the model.

Yili Qu, Wanqi Su, Xuan Lv, Chufu Deng, Ying Wang, Yutong Lu, Zhiguang Chen, Nong Xiao

### ACE-Net: Adaptive Context Extraction Network for Medical Image Segmentation

It remains a challenging task to segment the medical images due to their diversity of structures. Although some state-of-the-art approaches have been proposed, the following two problems have not been fully explored: the redundant use of low-level features, and the lack of effective contextual modules to model long-range dependencies. In this paper, we propose a combination model based on ACE-Net of two newly proposed modules: The Joint Attention Upsample (JAU) module and Context Similarity Module (CSM). We extend skip connections by introducing an attention mechanism within the JAU module, followed by generating guidance information to weight low-level features using high-level features. We then introduce an affinity matrix into the CSM to optimize the long-range dependencies adaptively, which is based on a self-attention mechanism. Furthermore, our ACE-Net adaptively construct multi-scale contextual representations with multiple well-designed Context Similarity Modules (CSMs) which are been used in parallel in next process. Based on the evaluation on two public medical image datasets (EndoScene and RIM-ONE-R1), our network demonstrates significantly improvements of the segmentation performance of the model comparing to other similar methods, as the extraction on context information is more effectively and richer.

Tuo Leng, Yu Wang, Ying Li, Zhijie Wen

### Wavelet U-Net for Medical Image Segmentation

Biomedical image segmentation plays an increasingly important role in medical diagnosis. However, it remains a challenging task to segment the medical images due to their diversity of structures. Convolutional networks (CNNs) commonly uses pooling to enlarge the receptive field, which usually results in irreversible information loss. In order to solve this problem, we rethink the alternative method of pooling operation. In this paper, we embed the wavelet transform into the U-Net architecture to achieve the purpose of down-sampling and up-sampling which we called wavelet U-Net (WU-Net). Specifically, in the encoder module, we use discrete wavelet transform (DWT) to replace the pooling operation to reduce the resolution of the image, and use inverse wavelet transform (IWT) to gradually restore the resolution in the decoder module. Besides, we use Densely Cross-level Connection strategy to encourage feature re-use and to enhance the complementarity between cross-level information. Furthermore, in Attention Feature Fusion module (AFF), we introduce the channel attention mechanism to select useful feature maps, which can effectively improve the segmentation performance of the network. We evaluated this model on the digital retinal images for vessel extraction (DRIVE) dataset and the child heart and health study (CHASEDB1) dataset. The results show that the proposed method outperforms the classic U-Net method and other state-of-the-art methods on both datasets.

Ying Li, Yu Wang, Tuo Leng, Wen Zhijie

### Character-Based LSTM-CRF with Semantic Features for Chinese Event Element Recognition

Event element recognition is a significant task in event-based information extraction. In this paper, we propose an event element recognition model based on character-level embedding with semantic features. By extracting character-level features, the proposed model can capture more information of words. Our results show that joint character Convolutional Neural Networks (CNN) and character Bi-directional Long Short-Term Memory Networks (Bi-LSTM) is superior to single character-level model. In addition, adding semantic features such as POS (part-of-speech) and DP (dependency parsing) tends to improve the effect of recognition. We evaluated different methods in CEC (Chinese Emergency Corpus), and the experimental results show that our model can achieve good performance, and the F value of element recognition was 77.17%.

Wei Liu, Yusen Wu, Lei Jiang, Jianfeng Fu, Weimin Li

### Sequence Prediction Using Spectral RNNs

Fourier methods have a long and proven track record as an excellent tool in data processing. As memory and computational constraints gain importance in embedded and mobile applications, we propose to combine Fourier methods and recurrent neural network architectures. The short-time Fourier transform allows us to efficiently process multiple samples at a time. Additionally, weight reductions trough low pass filtering is possible. We predict time series data drawn from the chaotic Mackey-Glass differential equation and real-world power load and motion capture data. (Source code available at https://github.com/v0lta/Spectral-RNN ).

Moritz Wolter, Jürgen Gall, Angela Yao

### Attention Based Mechanism for Load Time Series Forecasting: AN-LSTM

Smart grids collect high volumes of data that contain valuable information about energy consumption patterns. The data can be utilized for future strategies planning, including generation capacity and economic planning by forecasting the energy demand. In the recent years, deep learning has gained significant importance for energy load time-series forecasting applications. In this context, the current research work proposes an attention-based deep learning model to forecast energy demand. The proposed approach works by initially implementing an attention mechanism to extract relevant deriving segments of the input load series at each timestamp and assigns weights to them. Subsequently, the extracted segments are then fed to the long-short term memory network prediction model. In this way, the proposed model provides support for handling big-data temporal sequences by extracting complex hidden features of the data. The experimental evaluation of the proposed approach is conducted on the three seasonally segmented dataset of UT Chandigarh, India. Two popular performance measures (RMSE and MAPE) are used to compare the prediction results of the proposed approach with state-of-the-art prediction models (SVR and LSTM). The comparison results shows that the proposed approach outperforms other benchmark prediction models and has the lowest MAPE (7.11%).

Jatin Bedi

### DartsReNet: Exploring New RNN Cells in ReNet Architectures

We present new Recurrent Neural Network (RNN) cells for image classification using a Neural Architecture Search (NAS) approach called DARTS. We are interested in the ReNet architecture, which is a RNN based approach presented as an alternative for convolutional and pooling steps. ReNet can be defined using any standard RNN cells, such as LSTM and GRU. One limitation is that standard RNN cells were designed for one dimensional sequential data and not for two dimensions like it is the case for image classification. We overcome this limitation by using DARTS to find new cell designs. We compare our results with ReNet that uses GRU and LSTM cells. Our found cells outperform the standard RNN cells on CIFAR-10 and SVHN. The improvements on SVHN indicate generalizability, as we derived the RNN cell designs from CIFAR-10 without performing a new cell search for SVHN. (The source code of our approach and experiments is available at https://github.com/LuckyOwl95/DartsReNet/ .).

Brian B. Moser, Federico Raue, Jörn Hees, Andreas Dengel

We present a study of multi-modal freehand gesture recognition relying on three sensory modalities. The modalities are RGB images, depth data, and acceleration data from an IMD attached to the hand. Based on a new self-recorded dataset, we initially establish the ability of a deep Long Short-Term Memory (LSTM) network to correctly classify individual data streams from each modality. Notably, classifying the IMD stream alone generates very good results already. In addition, we investigate two different strategies of multi-modal fusion, since there is no agreement in the literature as to which strategy is preferable. Combining the modalities leads to better recognition performance. Most importantly, fusion considerably improves ahead-of-time classification, i.e., gesture class estimates before sequences are completed, for classes that are difficult to classify on their own.

Monika Schak, Alexander Gepperth

### Recurrent Neural Network Learning of Performance and Intrinsic Population Dynamics from Sparse Neural Data

Recurrent Neural Networks (RNNs) are popular models of brain function. The typical training strategy is to adjust their input-output behavior so that it matches that of the biological circuit of interest. Even though this strategy ensures that the biological and artificial networks perform the same computational task, it does not guarantee that their internal activity dynamics match. This suggests that the trained RNNs might end up performing the task employing a different internal computational mechanism. In this work, we introduce a novel training strategy that allows learning not only the input-output behavior of an RNN but also its internal network dynamics. We test the proposed method by training an RNN to simultaneously reproduce internal dynamics and output signals of a physiologically-inspired neural model of motor cortical and muscle activity dynamics. Remarkably, we show that the reproduction of the internal dynamics is successful even when the training algorithm relies on the activities of a small subset of neurons sampled from the biological network. Furthermore, we show that training the RNNs with this method significantly improves their generalization performance. Overall, our results suggest that the proposed method is suitable for building powerful functional RNN models, which automatically capture important computational properties of the biological circuit of interest from sparse neural recordings.

Alessandro Salatiello, Martin A. Giese

### Backmatter

Weitere Informationen