main-content

The proceedings set LNCS 12396 and 12397 constitute the proceedings of the 29th International Conference on Artificial Neural Networks, ICANN 2020, held in Bratislava, Slovakia, in September 2020.*

The total of 139 full papers presented in these proceedings was carefully reviewed and selected from 249 submissions. They were organized in 2 volumes focusing on topics such as adversarial machine learning, bioinformatics and biosignal analysis, cognitive models, neural network theory and information theoretic learning, and robotics and neural models of perception and action.

*The conference was postponed to 2021 due to the COVID-19 pandemic.

### Fine-Grained Channel Pruning for Deep Residual Neural Networks

Pruning residual neural networks is a challenging task due to the constraints induced by cross layer connections. Many existing approaches assign channels connected by skip-connections to the same group and prune them simultaneously, limiting the pruning ratio on those troublesome filters. Instead, we propose a Fine-grained Channel Pruning (FCP) method that allows any channels to be pruned independently. To avoid the misalignment problem between convolution and skip connection, we always keep the residual addition operations alive. Thus we can obtain a novel efficient residual architecture by removing any unimportant channels without the alignment constraint. Besides classification, We further apply FCP on residual models for image super-resolution, which is a low-level vision task. Extensive experimental results show that FCP can achieve better performance than other state-of-the-art methods in terms of parameter and computation cost. Notably, on CIFAR-10, FCP reduces more than 78% FLOPs on ResNet-56 with no accuracy drop. Moreover, it achieves more than 48% FLOPs reduction on MSRResNet with negligible performance degradation.

Siang Chen, Kai Huang, Dongliang Xiong, Bowen Li, Luc Claesen

### A Lightweight Fully Convolutional Neural Network of High Accuracy Surface Defect Detection

Surface defect detection is an indispensable step in the production process. Recent researches based on deep learning have paid primarily attention to improving accuracy. However, it is difficult to apply in real situation, because of huge number of parameters and the strict hardware requirements. In this paper, a lightweight fully convolutional neural network, named LFCSDD, is proposed. The parameters of our model are 11x fewer than baselines at least, and obtain the accuracy of 99.72% and 98.74% on benchmark defect datasets, DAGM 2007 and KolektorSDD, respectively, outperforming all the baselines. In addition, our model can process the images with different sizes, which is verified on the RSDDs with the accuracy of 97.00%.

Yajie Li, Yiqiang Chen, Yang Gu, Jianquan Ouyang, Jiwei Wang, Ni Zeng

### Detecting Uncertain BNN Outputs on FPGA Using Monte Carlo Dropout Sampling

Monte Carlo dropout sampling (MC Dropout), which approximates a Bayesian Neural Network, is useful for measuring the uncertainty in the output of a Deep Neural Network (DNN). However, because it takes a long time to sample DNN’s output for calculating its distribution, it is difficult to apply it to edge computing where resources are limited. Thus, this research proposes a method of reducing a sampling time required for MC Dropout in edge computing by parallelizing the calculation circuit using FPGA. To apply MC dropout in an FPGA, this paper shows an efficient implementation by binarizing the neural network and simplifying dropout computation by pre-dropout and localizing parallel circuits. The proposed method was evaluated using the MNIST dataset and a dataset of satellite images of ships at sea captured. As a result, it was possible to reject approximately 60% of data which the model had not learned as “uncertain” on a classification identification problem of the image on an FPGA. Furthermore, for 20 units in parallel, the amount of increase in the circuit scale was only 2–3 times that of non-parallelized circuits. In terms of inference speed, parallelization of dropout circuits has achieved up to 3.62 times faster.

Tomoyuki Myojin, Shintaro Hashimoto, Naoki Ishihama

### Neural Network Compression via Learnable Wavelet Transforms

Wavelets are well known for data compression, yet have rarely been applied to the compression of neural networks. This paper shows how the fast wavelet transform can be used to compress linear layers in neural networks. Linear layers still occupy a significant portion of the parameters in recurrent neural networks (RNNs). Through our method, we can learn both the wavelet bases and corresponding coefficients to efficiently represent the linear layers of RNNs. Our wavelet compressed RNNs have significantly fewer parameters yet still perform competitively with the state-of-the-art on synthetic and real-world RNN benchmarks (Source code is available at https://github.com/v0lta/Wavelet-network-compression ). Wavelet optimization adds basis flexibility, without large numbers of extra weights.

Moritz Wolter, Shaohui Lin, Angela Yao

### Fast and Robust Compression of Deep Convolutional Neural Networks

Deep convolutional neural networks (CNNs) currently demonstrate the state-of-the-art performance in several domains. However, a large amount of memory and computing resources are required in the commonly used CNN models, posing challenges in training as well as deploying, especially on those devices with limited computational resources. Inspired by the recent advancement of random tensor decomposition, we introduce a Hierarchical Framework for Fast and Robust Compression (HFFRC), which significantly reduces the number of parameters needed to represent a convolution layer via a fast low-rank Tucker decomposition algorithm, while preserving its expressive power. In the merit of randomized algorithm, the proposed compression framework is robust to noises in parameters. In addition, it is a general framework that any tensor decomposition method can be easily adopted. The efficiency and effectiveness of the proposed approach have been demonstrated via comprehensive experiments conducted on the benchmarks CIFAR-10 and CIFAR-100 image classification datasets.

Jia Wen, Liu Yang, Chenyang Shen

### Pruning Artificial Neural Networks: A Way to Find Well-Generalizing, High-Entropy Sharp Minima

Recently, a race towards the simplification of deep networks has begun, showing that it is effectively possible to reduce the size of these models with minimal or no performance loss. However, there is a general lack in understanding why these pruning strategies are effective. In this work, we are going to compare and analyze pruned solutions with two different pruning approaches, one-shot and gradual, showing the higher effectiveness of the latter. In particular, we find that gradual pruning allows access to narrow, well-generalizing minima, which are typically ignored when using one-shot approaches. In this work we also propose PSP-entropy, a measure to understand how a given neuron correlates to some specific learned classes. Interestingly, we observe that the features extracted by iteratively-pruned models are less correlated to specific classes, potentially making these models a better fit in transfer learning approaches.

Enzo Tartaglione, Andrea Bragagnolo, Marco Grangetto

### Log-Nets: Logarithmic Feature-Product Layers Yield More Compact Networks

We introduce Logarithm-Networks (Log-Nets), a novel bio-inspired type of network architecture based on logarithms of feature maps followed by convolutions. Log-Nets are capable of surpassing the performance of traditional convolutional neural networks (CNNs) while using fewer parameters. Performance is evaluated on the Cifar-10 and ImageNet benchmarks.

Philipp Grüning, Thomas Martinetz, Erhardt Barth

### Tuning Deep Neural Network’s Hyperparameters Constrained to Deployability on Tiny Systems

Deep Neural Networks are increasingly deployed on tiny systems such as microcontrollers or embedded systems. Notwithstanding the recent success of Deep Learning, also enabled by the availability of Automated Machine Learning and Neural Architecture Search solutions, the computational requirements of the optimization of the structure and the hyperparameters of Deep Neural Networks usually far exceed what is available on tiny systems. Therefore, the deployability becomes critical when the learned model must be deployed on a tiny system. To overcome this critical issue, we propose a framework, based on Bayesian Optimization, to optimize the hyperparameters of a Deep Neural Network by dealing with black-box deployability constraints. Encouraging results obtained on a classification benchmark problem on a real microcontroller by STMicroelectronics are presented.

Riccardo Perego, Antonio Candelieri, Francesco Archetti, Danilo Pau

### Obstacles to Depth Compression of Neural Networks

Massive neural network models are often preferred over smaller models for their more favorable optimization landscapes during training. However, since the cost of evaluating a model grows with the size, it is desirable to obtain an equivalent compressed neural network model before deploying it for prediction. The best-studied tools for compressing neural networks obtain models with broadly similar architectures, including the depth of the model. No guarantees have been available for obtaining compressed models with substantially reduced depth. In this paper, we present fundamental obstacles to any algorithm achieving depth compression of neural networks. In particular, we show that depth compression is as hard as learning the input distribution, ruling out guarantees for most existing approaches. Furthermore, even when the input distribution is of a known, simple form, we show that there are no local algorithms for depth compression.

Will Burstein, John Wilmes

#### Frontmatter

The explosion of the label space degrades the performance of the classic multi-class learning models. Label space dimension reduction (LSDR) is developed to reduce the dimension of the label space by learning a latent representation of both the feature space and label space. Almost all existing models adopt a two-step strategy, i.e., first learn the latent space, and then connect the feature space with the label space by the latent space. Additionally, the latent space lacks interpretability for LSDR. In this paper, motivated by cross-modal learning, we propose a novel one-step model, named Quadruplet Dictionary Learning (QDL), for multi-label classification with many labels. QDL models the latent space by the representation coefficients, which own preeminent recoverability, predictability and interpretability. By simultaneously learning two dictionary pairs, the feature space and label space are well bi-directly bridged and recovered by four dictionaries. Experiments on benchmark datasets show that QDL outperforms the state-of-the-art label space dimension reduction algorithms.

Jiayu Zheng, Wencheng Zhu, Pengfei Zhu

Neuroevolution has been used to train Deep Neural Networks on reinforcement learning problems. A few attempts have been made to extend it to address either multi-task or multi-objective optimization problems. This research work presents the Multi-Task Multi-Objective Deep Neuroevolution method, a highly parallelizable algorithm that can be adopted for tackling both multi-task and multi-objective problems. In this method prior knowledge on the tasks is used to explicitly define multiple utility functions, which are optimized simultaneously. Experimental results on some Atari 2600 games, a challenging testbed for deep reinforcement learning algorithms, show that a single neural network with a single set of parameters can outperform previous state of the art techniques. In addition to the standard analysis, all results are also evaluated using the Hypervolume indicator and the Kullback-Leibler divergence to get better insights on the underlying training dynamics. The experimental results show that a neural network trained with the proposed evolution strategy can outperform networks individually trained respectively on each of the tasks.

Salvatore D. Riccio, Deyan Dyankov, Giorgio Jansen, Giuseppe Di Fatta, Giuseppe Nicosia

### Convex Graph Laplacian Multi-Task Learning SVM

Multi-Task Learning (MTL) goal is to achieve a better generalization by using data from different sources. MTL Support Vector Machines (SVMs) embrace this idea in two main ways: by using a combination of common and task-specific parts, or by fitting individual models adding a graph Laplacian regularization that defines different degrees of task relationships. The first approach is too rigid since it imposes the same relationship among all tasks. The second one does not have a clear way of sharing information among the different tasks. In this paper, we propose a model that combines both approaches. It uses a convex combination of a common model and of task specific models, where the relationships between these specific models are determined through a graph Laplacian regularization. We write the primal problem of this formulation and derive its dual problem, which is shown to be equivalent to a standard SVM dual using a particular kernel choice. Empirical results over different regression and classification problems support the usefulness of our proposal.

Carlos Ruiz, Carlos M. Alaíz, José R. Dorronsoro

### Prediction Stability as a Criterion in Active Learning

Recent breakthroughs made by deep learning rely heavily on a large number of annotated samples. To overcome this shortcoming, active learning is a possible solution. Besides the previous active learning algorithms that only adopted information after training, we propose a new class of methods named sequential-based method based on the information during training. A specific criterion of active learning called prediction stability is proposed to prove the feasibility of sequential-based methods. We design a toy model to explain the principle of our proposed method and pointed out a possible defect of the former uncertainty-based methods. Experiments are made on CIFAR-10 and CIFAR-100, and the results indicates that prediction stability was effective and works well on fewer-labeled datasets. Prediction stability reaches the accuracy of traditional acquisition functions like entropy on CIFAR-10, and notably outperformed them on CIFAR-100.

Junyu Liu, Xiang Li, Jiqiang Zhou, Jianxiong Shen

### Neural Spectrum Alignment: Empirical Study

Expressiveness and generalization of deep models was recently addressed via the connection between neural networks (NNs) and kernel learning, where first-order dynamics of NN during a gradient-descent (GD) optimization were related to gradient similarity kernel, also known as Neural Tangent Kernel (NTK) [9]. In the majority of works this kernel is considered to be time-invariant [9, 13]. In contrast, we empirically explore these properties along the optimization and show that in practice top eigenfunctions of NTK align toward the target function learned by NN which improves the overall optimization performance. Moreover, these top eigenfunctions serve as basis functions for NN output - a function represented by NN is spanned almost completely by them for the entire optimization process. Further, we study how learning rate decay affects the neural spectrum. We argue that the presented phenomena may lead to a more complete theoretical understanding behind NN learning.

### Nonlinear, Nonequilibrium Landscape Approach to Neural Network Dynamics

Distributions maximizing $$S_q$$ entropies are not rare in Nature. They have been observed in complex systems in diverse fields, including neuroscience. Nonlinear Fokker-Planck dynamics constitutes one of the main mechanisms that can generate $$S_q$$ -maximum entropy distributions. In the present work, we investigate a nonlinear Fokker-Planck equation associated with general, continuous, neural network dynamical models for associative memory. These models admit multiple applications in artificial intelligence, and in the study of mind, because memory is central to many, if not all, the processes investigated by psychology and neuroscience. We explore connections between the nonlinear Fokker-Planck treatment of network dynamics, and the nonequilibrium landscape approach to this dynamics discussed in [34]. We show that the nonequilibrium landscape approach leads to fundamental relations between the Liapunov function of the network model, the deterministic equations of motion (phase-space flow) of the network, and the form of the diffusion coefficients appearing in the nonlinear Fokker-Planck equations. This, in turn, leads to an H-theorem involving a free energy-like functional related to the $$S_q$$ entropy. To illustrate these results, we apply them to the Cohen-Grossberg family of neural network models.

Roseli S. Wedemann, Angel R. Plastino

### Hopfield Networks for Vector Quantization

We consider the problem of finding representative prototypes within a set of data and solve it using Hopfield networks. Our key idea is to minimize the mean discrepancy between kernel density estimates of the distributions of data points and prototypes. We show that this objective can be cast as a quadratic unconstrained binary optimization problem which is equivalent to a Hopfield energy minimization problem. This result is of current interest as it suggests that vector quantization can be accomplished via adiabatic quantum computing.

C. Bauckhage, R. Ramamurthy, R. Sifa

### Prototype-Based Online Learning on Homogeneously Labeled Streaming Data

Algorithms in machine learning commonly require training data to be independent and identically distributed. This assumption is not always valid, e. g. in online learning, when data becomes available in homogeneously labeled blocks, which can severely impede especially instance-based learning algorithms. In this work, we analyze and visualize this issue, and we propose and evaluate strategies for Learning Vector Quantization to compensate for homogeneously labeled blocks. We achieve considerably improved results in this difficult setting.

Christian Limberg, Jan Philip Göpfert, Heiko Wersing, Helge Ritter

### Neural Network Training with Safe Regularization in the Null Space of Batch Activations

We propose to formulate the training of neural networks with side optimization goals, such as obtaining structured weight matrices, as lexicographic optimization problem. The lexicographic order can be maintained during training by optimizing the side-optimization goal exclusively in the null space of batch activations. We call the resulting training method Safe Regularization, because the side optimization goal can be safely integrated into the training with limited influence on the main optimization goal. Moreover, this results in a higher robustness regarding the choice of regularization hyperparameters. We validate our training method with multiple real-world regression data sets with the side-optimization goal of obtaining sparse weight matrices.

Matthias Kissel, Martin Gottwald, Klaus Diepold

### The Effect of Batch Normalization in the Symmetric Phase

Learning neural networks has long been known to be difficult. One of the causes of such difficulties is thought to be the equilibrium points caused by the symmetry between the weights of the neural network. Such an equilibrium point is known to delay neural network training. However, neural networks have been widely used in recent years largely because of the development of methods that make learning easier. One such technique is batch normalization, which is empirically known to speed up learning. Therefore, if the equilibrium point due to symmetry truly affects the neural network learning, and batch normalization speeds up the learning, batch normalization should help escape from such equilibrium points. Therefore, we analyze whether batch normalization helps escape from such equilibrium points by a method called statistical mechanical analysis. By examining the eigenvalue of the Hessian matrix of the generalization error at the equilibrium point, we find that batch normalization delays escape from poor equilibrium points. This contradicts the empirically known finding of speeding up learning, and we discuss why we obtained this result.

Shiro Takagi, Yuki Yoshida, Masato Okada

### Regularized Pooling

In convolutional neural networks (CNNs), pooling operations play important roles such as dimensionality reduction and deformation compensation. In general, max pooling, which is the most widely used operation for local pooling, is performed independently for each kernel. However, the deformation may be spatially smooth over the neighboring kernels. This means that max pooling is too flexible to compensate for actual deformations. In other words, its excessive flexibility risks canceling the essential spatial differences between classes. In this paper, we propose regularized pooling, which enables the value selection direction in the pooling operation to be spatially smooth across adjacent kernels so as to compensate only for actual deformations. The results of experiments on handwritten character images and texture images showed that regularized pooling not only improves recognition accuracy but also accelerates the convergence of learning compared with conventional pooling operations.

Takato Otsuzuki, Hideaki Hayashi, Yuchen Zheng, Seiichi Uchida

### Deep Recurrent Deterministic Policy Gradient for Physical Control

The observable states play a significant role in Reinforcement Learning (RL), meanwhile, the performance of RL is strongly associated with the quality of inferred hidden states. It is a challenging task to accurately extract hidden states because they are often related to both environment’s and agent’s histories, and require numerous domain knowledge. In this work, we aim to leverage history information to improve the performance of agent. Firstly, we discuss that the neglect and usual process of history information are harmful to agent’s performance. Secondly, we propose a novel model that combines the advantage of both supervised learning and RL. Specifically, we extend the framework of classical policy gradient and propose to extract history information using recurrent neural networks. Thirdly, we evaluate our model in simulated physical control environments, outperforming the state-of-the-art models and performing obviously better on more challenging tasks. Finally, we analyze the reasons and suggest possible approaches to extend and scale up the model.

Lei Zhang, Shuai Han, Zhiruo Zhang, Lefan Li, Shuai Lü

### Exploration via Progress-Driven Intrinsic Rewards

Traditional exploration methods in reinforcement learning rely on well-designed extrinsic rewards. However, many real-world scenarios involve sparse or delayed rewards. One solution inspired by curious behaviors in animals is to let the agent develop its own intrinsic rewards. In this paper we propose a novel end-to-end curiosity mechanism which uses learning progress as novelty bonus. We compare a policy-based and a visual-based progress bonus to move the agent towards hard-to-learn regions of the state space. We further leverage the agent’s learning to identify the most critical regions, which results in more sample-efficient and global exploration strategies. We evaluate our method on a variety of benchmark environments, including Minigrid, Super Mario Bros., and Atari games. Experimental results show that our method outperforms prior approaches in most tasks in terms of exploration efficiency and average scores, especially for those featuring high-level exploration patterns or with deceptive rewards.

Nicolas Bougie, Ryutaro Ichise

### An Improved Reinforcement Learning Based Heuristic Dynamic Programming Algorithm for Model-Free Optimal Control

For complicated processing industrial area, model-free adaptive control in data-driven schema is a classic problem. This paper proposes an improved reinforcement learning (RL) based heuristic dynamic programming algorithm for optimal tracking control in industrial system. The proposed method designs a double neural networks framework and employs a gradient-based optimization schema to present the optimal control law. Inspired by the experience replay buffer in deep RL learning, historical system trajectories in short-term are also considered in the training phase which achieves the stabilization of network learning. An experimental study based on an simulated industrial device shows that the proposed method is superior to other algorithms in terms of time consumption and control accuracy.

Jia Li, Zhaolin Yuan, Xiaojuan Ban

### PBCS: Efficient Exploration and Exploitation Using a Synergy Between Reinforcement Learning and Motion Planning

The exploration-exploitation trade-off is at the heart of reinforcement learning (RL). However, most continuous control benchmarks used in recent RL research only require local exploration. This led to the development of algorithms that have basic exploration capabilities, and behave poorly in benchmarks that require more versatile exploration. For instance, as demonstrated in our empirical study, state-of-the-art RL algorithms such as DDPG and TD3 are unable to steer a point mass in even small 2D mazes. In this paper, we propose a new algorithm called “Plan, Backplay, Chain Skills” (PBCS) that combines motion planning and reinforcement learning to solve hard exploration environments. In a first phase, a motion planning algorithm is used to find a single good trajectory, then an RL algorithm is trained using a curriculum derived from the trajectory, by combining a variant of the Backplay algorithm and skill chaining. We show that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes, and is able to improve on the trajectory obtained by the motion planning phase.

Guillaume Matheron, Nicolas Perrin, Olivier Sigaud

### Understanding Failures of Deterministic Actor-Critic with Continuous Action Spaces and Sparse Rewards

In environments with continuous state and action spaces, state-of-the-art actor-critic reinforcement learning algorithms can solve very complex problems, yet can also fail in environments that seem trivial, but the reason for such failures is still poorly understood. In this paper, we contribute a formal explanation of these failures in the particular case of sparse reward and deterministic environments. First, using a very elementary control problem, we illustrate that the learning process can get stuck into a fixed point corresponding to a poor solution, especially when the reward is not found very early. Then, generalizing from the studied example, we provide a detailed analysis of the underlying mechanisms which results in a new understanding of one of the convergence regimes of these algorithms.

Guillaume Matheron, Nicolas Perrin, Olivier Sigaud

### GAN-Based Planning Model in Deep Reinforcement Learning

Deep reinforcement learning methods have achieved unprecedented success in many high-dimensional and large-scale space sequential decision-making tasks. In these methods, model-based methods rely on planning as their primary component, while model-free methods primarily rely on learning. However, the accuracy of the environmental model has a significant impact on the learned policy. When the model is incorrect, the planning process is likely to compute a suboptimal policy. In order to get a more accurate environmental model, this paper introduces the GAN-based Planning Model (GBPM) exploiting the strong expressive ability of Generative Adversarial Net (GAN), which can learn to simulate the environment from experience and construct implicit planning. The GBPM can be trained using real transfer samples experienced by the agent. Then, the agent can utilize the GBPM to produce simulated experience or trajectories so as to improve the learned policy. The GBPM can act as a role for experience replay so that it can be applied to both model-based and model-free methods, such as Dyna, DQN, ACER, and so on. Experimental results indicate that the GBPM can improve the data efficiency and algorithm performance on Maze and Atari 2600 game domain.

Song Chen, Junpeng Jiang, Xiaofang Zhang, Jinjin Wu, Gongzheng Lu

### Guided Reinforcement Learning via Sequence Learning

Applications of Reinforcement Learning (RL) suffer from high sample complexity due to sparse reward signals and inadequate exploration. Novelty Search (NS) guides as an auxiliary task, in this regard to encourage exploration towards unseen behaviors. However, NS suffers from critical drawbacks concerning scalability and generalizability since they are based off instance learning. Addressing these challenges, we previously proposed a generic approach using unsupervised learning to learn representations of agent behaviors and use reconstruction losses as novelty scores. However, it considered only fixed-length sequences and did not utilize sequential information of behaviors. Therefore, we here extend this approach by using sequential auto-encoders to incorporate sequential dependencies. Experimental results on benchmark tasks show that this sequence learning aids exploration outperforming previous novelty search methods.

Rajkumar Ramamurthy, Rafet Sifa, Max Lübbering, Christian Bauckhage

### Neural Machine Translation Based on Improved Actor-Critic Method

Reinforcement learning based neural machine translation (NMT) is limited by the sparse reward problem which further affects the quality of the model, and the actor-critic method is mainly used to enrich the reward of the output fragments. But for low-resource agglutinative languages, it does not show significant results. To this end, we propose an novel actor-critic approach that provides additional affix-level rewards and also combines the traditional token-level rewards to guide the parameters update of the NMT model. In addition, for purpose of improving the decoding speed, we utilize an improved non-autoregressive model as the actor model to make it pay more attention to the translation quality while outputting in parallel. We achieve remarkable progress on two translation tasks, including the low-resource Mongolian-Chinese and the public NIST English-Chinese, while significantly shorting training time and accomplishing faster convergence.

Ziyue Guo, Hongxu Hou, Nier Wu, Shuo Sun

### Neural Machine Translation Based on Prioritized Experience Replay

Reward mechanism of reinforcement learning alleviates the inconsistency between training and evaluation in neural machine translation. However, the model still incapable to learn ideal parameters when rewards are sparse or a weak sampling strategy is adopted. Therefore, we propose a reinforcement learning method based on prioritized experience replay to deal with the problems. The model experiences are obtained through reinforcement learning. Then they are stored in a experience buffer and assigned priorities according to the value of experience. The experience with higher priority in buffer will be extracted by model to optimize the parameters during training phase. To verify the robustness of our method, we not only conduct experiments on English-German and Chinese-English, but also perform on agglutinative language Mongolian-Chinese. Experimental results show that our work consistently outperforms the baselines.

Shuo Sun, Hongxu Hou, Nier Wu, Ziyue Guo

### Improving Multi-agent Reinforcement Learning with Imperfect Human Knowledge

Multi-agent reinforcement learning has gained great success in many decision-making tasks. However, there are still some challenges such as low efficiency of exploration, significant time consumption, which bring great obstacles for it to be applied in the real world. Incorporating human knowledge into the learning process has been regarded as a promising way to ameliorate these problems. This paper proposes a novel approach to utilize imperfect human knowledge to improve the performance of multi-agent reinforcement learning. We leverage logic rules, which can be seen as a popular form of human knowledge, as part of the action space in reinforcement learning. During the trial-and-error, the value of rules and the original action will be estimated. Logic rules, therefore, can be selected flexibly and efficiently to assist the learning. Moreover, we design a new exploration way, in which rules are preferred to be explored at the early training stage. Finally, we make experimental evaluations and analyses of our approach in challenging StarCraftII micromanagement scenarios. The empirical results show that our approach outperforms the state-of-the-art multi-agent reinforcement learning method, not only in the performance but also in the learning speed.

Xiaoxu Han, Hongyao Tang, Yuan Li, Guang Kou, Leilei Liu

### Adaptive Skill Acquisition in Hierarchical Reinforcement Learning

Reinforcement learning has become an established class of powerful machine learning methods operating online on sequential tasks by direct interaction with an environment instead of processing precollected training datasets. At the same time, the nature of many tasks with an inner hierarchical structure has evoked interest in hierarchical RL approaches that introduced the two-level decomposition directly into computational models. These methods are usually composed of lower-level controllers – skills – providing simple behaviors, and a high-level controller which uses the skills to solve the overall task. Skill discovery and acquisition remain principal challenges in hierarchical RL, and most of the relevant works have focused on resolving this issue by using pre-trained skills, fixed during the main learning process, which may lead to suboptimal solutions. We propose a universal pluggable framework of Adaptive Skill Acquisition (ASA), aimed to augment existing solutions by trying to achieve optimality. ASA can observe the high-level controller during its training and identify skills that it lacks to successfully learn the task. These missing skills are subsequently trained and integrated into the hierarchy, enabling better performance of the overall architecture. As we show in the pilot maze-type experiments, the identification of missing skills performs reasonably well, and embedding such skills into the hierarchy may significantly improve the performance of an overall model.

Juraj Holas, Igor Farkaš

### Social Navigation with Human Empowerment Driven Deep Reinforcement Learning

Mobile robot navigation has seen extensive research in the last decades. The aspect of collaboration with robots and humans sharing workspaces will become increasingly important in the future. Therefore, the next generation of mobile robots needs to be socially-compliant to be accepted by their human collaborators. However, a formal definition of compliance is not straightforward. On the other hand, empowerment has been used by artificial agents to learn complicated and generalized actions and also has been shown to be a good model for biological behaviors. In this paper, we go beyond the approach of classical Reinforcement Learning (RL) and provide our agent with intrinsic motivation using empowerment. In contrast to self-empowerment, a robot employing our approach strives for the empowerment of people in its environment, so they are not disturbed by the robot’s presence and motion. In our experiments, we show that our approach has a positive influence on humans, as it minimizes its distance to humans and thus decreases human travel time while moving efficiently towards its own goal. An interactive user-study shows that our method is considered more social than other state-of-the-art approaches by the participants.

Tessa van der Heiden, Florian Mirus, Herke van Hoof

### Curious Hierarchical Actor-Critic Reinforcement Learning

Hierarchical abstraction and curiosity-driven exploration are two common paradigms in current reinforcement learning approaches to break down difficult problems into a sequence of simpler ones and to overcome reward sparsity. However, there is a lack of approaches that combine these paradigms, and it is currently unknown whether curiosity also helps to perform the hierarchical abstraction. As a novelty and scientific contribution, we tackle this issue and develop a method that combines hierarchical reinforcement learning with curiosity. Herein, we extend a contemporary hierarchical actor-critic approach with a forward model to develop a hierarchical notion of curiosity. We demonstrate in several continuous-space environments that curiosity can more than double the learning performance and success rates for most of the investigated benchmarking problems. We also provide our source code ( https://github.com/knowledgetechnologyuhh/goal_conditioned_RL_baselines ) and a supplementary video ( https://www2.informatik.uni-hamburg.de/wtm/videos/chac_icann_roeder_2020.mp4 ).

Frank Röder, Manfred Eppe, Phuong D. H. Nguyen, Stefan Wermter

### Policy Entropy for Out-of-Distribution Classification

One critical prerequisite for the deployment of reinforcement learning systems in the real world is the ability to reliably detect situations on which the agent was not trained. Such situations could lead to potential safety risks when wrong predictions lead to the execution of harmful actions. In this work, we propose PEOC, a new policy entropy based out-of-distribution classifier that reliably detects unencountered states in deep reinforcement learning. It is based on using the entropy of an agent’s policy as the classification score of a one-class classifier. We evaluate our approach using a procedural environment generator. Results show that PEOC is highly competitive against state-of-the-art one-class classification algorithms on the evaluated environments. Furthermore, we present a structured process for benchmarking out-of-distribution classification in reinforcement learning.

Andreas Sedlmeier, Robert Müller, Steffen Illium, Claudia Linnhoff-Popien

### Analysis of Reservoir Structure Contributing to Robustness Against Structural Failure of Liquid State Machine

Attempts have been made to realize reservoir computing by using physical materials, but they assume the stable structure of a reservoir. However, in reality, a physical reservoir suffers from malfunctions, noise, and interferences, which cause failures of neurons and disconnection of synaptic connections. Consequently dynamics of system state changes and computation performance deteriorates. In this paper, we investigate structural properties contributing to the functional robustness of a reservoir. More specifically, we analyze the relationship between structural properties of a reservoir of a Liquid State Machine and the decrease in discrimination capability in a delayed readout task when experiencing failures of connections and neurons. We apply seven types of networks which have different structural properties to a reservoir. As a result, we revealed that high modularity, structural irregularity, and high clustering coefficient are most important for an LSM to be robust against random connection and neuron failures.

Yuta Okumura, Naoki Wakamiya

### Quantifying Robustness and Capacity of Reservoir Computers with Consistency Profiles

We study the consistency property in reservoir computers with noise. Consistency quantifies the functional dependence of a driven dynamical system on its input via replica tests. We characterise the high-dimensional profile of consistency in typical reservoirs subject to intrinsic and measurement noise. An integral of the consistency is introduced to measure capacity and act as an effective size of the reservoir. We observe a scaling law in the dependency of the consistency capacity on the noise amplitude and reservoir size, and demonstrate how this measure of capacity explains performance.

Thomas Lymburn, Thomas Jüngling, Michael Small

### Two-Step FORCE Learning Algorithm for Fast Convergence in Reservoir Computing

Reservoir computing devices are promising as energy-efficient machine learning hardware for real-time information processing. However, some online algorithms for reservoir computing are not simple enough for hardware implementation. In this study, we focus on the first order reduced and controlled error (FORCE) algorithm for online learning with reservoir computing models. We propose a two-step FORCE algorithm by simplifying the operations in the FORCE algorithm, which can reduce necessary memories. We analytically and numerically show that the proposed algorithm can converge faster than the original FORCE algorithm.

Hiroto Tamura, Gouhei Tanaka

### Morphological Computation of Skin Focusing on Fingerprint Structure

When humans get tactile sensation, we touch an object with the skin and the stimuli are transmitted to the brain. The effect of the skin in tactile perception however has not been clarified yet, and sensors considering the skin functions are not introduced. In this research, we investigate the information processing performed by the skin against physical stimuli in touching an object from the viewpoint of morphological computation. We create a dynamical model that expresses the skin structure based on the spring and mass model, and show that the model contributes to the learning of temporal response against physical stimuli. In addition, we conduct an experiment to compare the learning performance of a finger model having fingerprints with a model without fingerprints. Frequency response against physical stimuli with different frequencies is examined, and the result shows that the performance of a model with fingerprints is better in the higher frequency range. The model with fingerprints also reflects the hardness of the human skin remarkably. These results are expected to help clarify the information processing ability of the human skin focusing on the fingerprint structure in response to external physical stimuli.

Akane Musha, Manabu Daihara, Hiroki Shigemune, Hideyuki Sawada

### Time Series Clustering with Deep Reservoir Computing

This paper proposes a method for clustering of time series, based upon the ability of deep Reservoir Computing networks to grasp the dynamical structure of the series that is presented as input. A standard clustering algorithm, such as k-means, is applied to the network states, rather than the input series themselves. Clustering is thus embedded into the network dynamical evolution, since a clustering result is obtained at every time step, which in turn serves as initialisation at the next step. We empirically assess the performance of deep reservoir systems in time series clustering on benchmark datasets, considering the influence of crucial hyper-parameters. Experimentation with the proposed model shows enhanced clustering quality, measured by the silhouette coefficient, when compared to both static clustering of data, and dynamic clustering with a shallow network.

Miguel Atencia, Claudio Gallicchio, Gonzalo Joya, Alessio Micheli

### ReservoirPy: An Efficient and User-Friendly Library to Design Echo State Networks

We present a simple user-friendly library called ReservoirPy based on Python scientific modules. It provides a flexible interface to implement efficient Reservoir Computing (RC) architectures with a particular focus on Echo State Networks (ESN). Advanced features of ReservoirPy allow to improve up to $$87.9\%$$ of computation time efficiency on a simple laptop compared to basic Python implementation. Overall, we provide tutorials for hyperparameters tuning, offline and online training, fast spectral initialization, parallel and sparse matrix computation on various tasks (MackeyGlass and audio recognition tasks). In particular, we provide graphical tools to easily explore hyperparameters using random search with the help of the hyperopt library.

Nathan Trouvain, Luca Pedrelli, Thanh Trung Dinh, Xavier Hinaut

### Adaptive, Neural Robot Control – Path Planning on 3D Spiking Neural Networks

Safe, yet efficient, Human-robot interaction requires real-time-capable and flexible algorithms for robot control including the human as a dynamic obstacle. Even today, methods for collision-free motion planning are often computationally expensive, preventing real-time control. This leads to unnecessary standstills due to safety requirements. As nature solves navigation and motion control sophisticatedly, biologically motivated techniques based on the Wavefront algorithm have been previously applied successfully to path planning problems in 2D. In this work, we present an extension thereof using Spiking Neural Networks. The proposed network equals a topologically organized map of the work space, allowing an execution in 3D space. We tested our work on simulated environments with increasing complexity in 2D with different connection types. Subsequently, the application is extended to 3D spaces and the effectiveness and efficiency of the used approach are attested by simulations and comparison studies. Thereby, a foundation is set to control a robot arm flexibly in a workspace with a human co-worker. In combination with neuromorphic hardware this method will likely achieve real-time capability.

Lea Steffen, Artur Liebert, Stefan Ulbrich, Arne Roennau, Rüdiger Dillmannn

### CABIN: A Novel Cooperative Attention Based Location Prediction Network Using Internal-External Trajectory Dependencies

Nowadays, large quantities of advanced locating sensors have been widely used, which makes it possible to deploy location-based service (LBS) enhanced by intelligent technologies. Location prediction, as one of the most fundamental technologies, aims to acquire possible location at next timestamp based on the moving pattern of current trajectories. High accuracy of location prediction could enrich and increase user experience of various LBSs and brings lots of benefits to service providers. Lots of state-of-the-art research try to model spatial-temporal trajectories based on recurrent neural networks (RNNs), yet fails to arrive at a practical usability. We observe that there exists two ways to improve through attention mechanism which performs well in computer vision and natural language processing domains. Firstly recent location prediction methods are usually equipped with single-head attention mechanism to promote accuracy, which is only able to capture limited information in a specific subspace at a specific position. Secondly, existing methods focus on external relations between spatial-temporal trajectories, but miss internal relations in each spatial-temporal trajectory. To tackle the problem of model spatial-temporal patterns of mobility, we propose a novel Cooperative Attention Based location prediction network using Internal-External trajectory dependencies correspondingly in this paper. We also design and perform experiments on two real-world check-in datasets, Foursquare data in New York and Tokyo cities. Evaluation results demonstrate that our method outperforms state-of-the-art models.

Tangwen Qian, Fei Wang, Yongjun Xu, Yu Jiang, Tao Sun, Yong Yu

### Neuro-Genetic Visuomotor Architecture for Robotic Grasping

We present a novel, hybrid neuro-genetic visuomotor architecture for object grasping on a humanoid robot. The approach combines the state-of-the-art object detector RetinaNet, a neural network-based coordinate transformation and a genetic-algorithm-based inverse kinematics solver. We claim that a hybrid neural architecture can utilise the advantages of neural and genetic approaches: while the neural components accurately locate objects in the robot’s three-dimensional reference frame, the genetic algorithm allows reliable motor control for the humanoid, despite its complex kinematics. The modular design enables independent training and evaluation of the components. We show that the additive error of the coordinate transformation and inverse kinematics solver is appropriate for a robotic grasping task. We additionally contribute a novel spatial-oversampling approach for training the neural coordinate transformation that overcomes the known issue of neural networks with extrapolation beyond training data and the extension of the genetic inverse kinematics solver with numerical fine-tuning. The grasping approach was realised and evaluated on the humanoid robot platform NICO in a simulation environment.

Matthias Kerzel, Josua Spisak, Erik Strahl, Stefan Wermter

### From Geometries to Contact Graphs

When a robot perceives its environment, it is not only important to know what kind of objects are present in it, but also how they relate to each other. For example in a cleanup task in a cluttered environment, a sensible strategy is to pick the objects with the least contacts to other objects first, to minimize the chance of unwanted movements not related to the current picking action. Estimating object contacts in cluttered scenes only based on passive observation is a complex problem. To tackle this problem, we present a deep neural network that learns physically stable object relations directly from geometric features. The learned relations are encoded as contact graphs between the objects. To facilitate training of the network, we generated a rich, publicly available dataset consisting of more than 25000 unique contact scenes, by utilizing a physics simulation. Different deep architectures have been evaluated and the final architecture, which shows good results in reconstructing contact graphs, is evaluated quantitatively and qualitatively.

Martin Meier, Robert Haschke, Helge J. Ritter

### Structural Position Network for Aspect-Based Sentiment Classification

Aspect-based sentiment classification aims to discriminate the polarity of each aspect term for a given sentence. Previous works mainly focus on sequential modeling and aspect representations. However, the syntactical information and relative structural position of aspect in sentence are neglected, resulting in some irrelevant contextual words as clues during the identification process of aspect sentiment. This paper proposes a structural position network (SPNet) based on bidirectional long short-term memory (LSTM) for further integrating syntactical information. Specifically, we first utilize the dependency tree to represent the grammatical structure of the aspect in sentence. Then, a structural weighted-layer is applied after LSTM. In this situation, the syntactically relevant context is formulated. Besides, the sequential position is combined to reduce the impact of noise caused by imperfect grammatical analysis tools. SPNet not only significantly improves the ability of encoding syntactical information and word dependencies, but also provides a tailor-made representation for different aspect in a sentence. On three public ABSC datasets, SPNet produces a competitive performance compared with some existing state-of-the-art methods.

Pu Song, Wei Jiang, Fuqing Zhu, Yan Zhou, Jizhong Han, Songlin Hu

Cross-domain sentiment classification aims at transferring the knowledge of the source domain with rich annotation resource to the scarcely labeled target domain or even without labels. Existing models fail to automatically capture simultaneously the three related topics, namely sentiment-only topic, i.e. containing domain-independent sentiment words or pivots in literature, domain-only topic, i.e. containing domain-specific words, and function word topic containing such as stop words. We propose a two-stage framework for tackling this problem. The first stage consists of a topic attention network specialized in discovering topics mentioned above. The second stage utilizes the learned knowledge from the first stage for learning a sentiment classification model with the consideration of context. A new sentiment-domain dual-task adversarial training strategy is designed and utilized in both stages. Experiments on a real-world product review dataset show that our proposed model outperforms the state-of-the-art model.

Kwun-Ping Lai, Jackie Chun-Sing Ho, Wai Lam

### Data Augmentation for Sentiment Analysis in English – The Online Approach

This paper investigates a change of approach to textual data augmentation for sentiment classification, by switching from offline to online data modification. In other words, from changing the data before the training is started to using transformed samples during the training process. This allows utilizing the information about the current loss of the classifier. We try training with examples that maximize, minimize the loss, or are randomly sampled. We observe that the maximizing variant performs best in most cases. We use 2 neural network architectures, 3 data augmentation methods, and test them on 4 different datasets. Our experiments indicate that the switch to the online data augmentation improves the results for recurrent neural networks in all cases and for convolutional networks in some cases. The improvement reaches 2.3% above the baseline in terms of accuracy, averaged over all datasets, and 2.25% on one of the datasets, but averaged over dataset sizes.

Michał Jungiewicz, Aleksander Smywiński-Pohl

### Dendritic Computation in a Point Neuron Model

Biological neurons possess elaborate dendrites that perform elaborate computations. They are however ignored in the widely used point neuron models. Here, we present a simple addition to the commonly used leaky integrate-and-fire model that introduces the concept of a dendrite. All synapses on the dendrite have a mutual relationship. The result is a form of short term plasticity in which synapse strengths are influenced by recent activity in other synapses. This improves the ability of the neuron to recognize temporal sequences.

Alexander Vandesompele, Francis Wyffels, Joni Dambre

### Benchmarking Deep Spiking Neural Networks on Neuromorphic Hardware

With more and more event-based neuromorphic hardware systems being developed at universities and in industry, there is a growing need for assessing their performance with domain specific measures. In this work, we use the methodology of converting pre-trained non-spiking to spiking neural networks to evaluate the performance loss and measure the energy-per-inference for three neuromorphic hardware systems (BrainScaleS, Spikey, SpiNNaker) and common simulation frameworks for CPU (NEST) and CPU/GPU (GeNN). For analog hardware we further apply a re-training technique known as hardware-in-the-loop training to cope with device mismatch. This analysis is performed for five different networks, including three networks that have been found by an automated optimization with a neural architecture search framework. We demonstrate that the conversion loss is usually below one percent for digital implementations, and moderately higher for analog systems with the benefit of much lower energy-per-inference costs.

Christoph Ostrau, Jonas Homburg, Christian Klarhorst, Michael Thies, Ulrich Rückert

### Unsupervised Learning of Spatio-Temporal Receptive Fields from an Event-Based Vision Sensor

Neuromorphic vision sensors exhibit several advantages compared to conventional frame-based cameras including low latencies, high dynamic range, and low data rates. However, how efficient visual representations can be learned from the output of such sensors in an unsupervised fashion is still an open problem. Here we present a spiking neural network that learns spatio-temporal receptive fields in an unsupervised way from the output of a neuromorphic event-based vision sensor. Learning relies on the combination of spike timing-dependent plasticity with different synaptic delays, the homeostatic regulations of synaptic weights and firing thresholds, and fast inhibition among neurons to decorrelate their responses. Our network develops biologically plausible spatio-temporal receptive fields when trained on real world input and is suited for implementation on neuromorphic hardware.

Thomas Barbier, Céline Teulière, Jochen Triesch

### Spike-Train Level Unsupervised Learning Algorithm for Deep Spiking Belief Networks

Deep spiking belief network (DSBN) uses unsupervised layer-wise pre-training method to train the network weights, it is stacked with the spike neural machine (SNM) modules. However, the synaptic weights of SNMs are difficult to pre-training through simple and effective approach for spike-train driven networks. This paper proposes a new algorithm that uses unsupervised multi-spike learning rule to train SNMs, which can implement the complex spatio-temporal pattern learning of spike trains. The spike signals first propagate in the forward direction, and then are reconstructed in the reverse direction, and the synaptic weights are adjusted according to the reconstruction error. The algorithm is successfully applied to spike train patterns, the module parameters are analyzed, such as the neuron number and learning rate in the SNMs. In addition, the low reconstruction errors of DSBNs are shown by the experimental results.

Xianghong Lin, Pangao Du

### Modelling Neuromodulated Information Flow and Energetic Consumption at Thalamic Relay Synapses

Recent experimental and theoretical work has shown that synapses in the visual pathway balance information flow with their energetic needs, maximising not the information flow from the retina to the primary visual cortex (bits per second), but instead maximising information flow per concomitant energy consumption (bits of information transferred per number of adenosine triphosphate molecules necessary to power the corresponding synaptic and neuronal activities) [5, 10, 11]. We have previously developed a biophysical Hodgkin-Huxley-type model for thalamic relay cells, calibrated on experimental data, and that recapitulates those experimental findings [10]. Here, we introduce an improved version of that model to include neuromodulation of thalamic relay synapses’ transmission properties by serotonin. We show how significantly neuromodulation affects the output of thalamic relay cells, and discuss the implications of that mechanism in the context of energetically optimal information transfer at those synapses.

### Learning Precise Spike Timings with Eligibility Traces

Recent research in the field of spiking neural networks (SNNs) has shown that recurrent variants of SNNs, namely long short-term SNNs (LSNNs), can be trained via error gradients just as effective as LSTMs. The underlying learning method (e-prop) is based on a formalization of eligibility traces applied to leaky integrate and fire (LIF) neurons. Here, we show that the proposed approach cannot fully unfold spike timing dependent plasticity (STDP). As a consequence, this limits in principle the inherent advantage of SNNs, that is, the potential to develop codes that rely on precise relative spike timings. We show that STDP-aware synaptic gradients naturally emerge within the eligibility equations of e-prop when derived for a slightly more complex spiking neuron model, here at the example of the Izhikevich model. We also present a simple extension of the LIF model that provides similar gradients. In a simple experiment we demonstrate that the STDP-aware LIF neurons can learn precise spike timings from an e-prop-based gradient signal.

Manuel Traub, Martin V. Butz, R. Harald Baayen, Sebastian Otte

### Meta-STDP Rule Stabilizes Synaptic Weights Under in Vivo-like Ongoing Spontaneous Activity in a Computational Model of CA1 Pyramidal Cell

It is widely accepted that in the brain processes related to learning and memory there are changes at the level of synapses. Synapses have the ability to change their strength depending on the stimuli, which is called activity-dependent synaptic plasticity. To date, many mathematical models describing activity-dependent synaptic plasticity have been introduced. However, the remaining question is whether these rules apply in general to the whole brain or only to individual areas or even just to individual types of cells. Here, we decided to test whether the well-known rule of Spike-Timing Dependent Plasticity (STDP) extended by metaplasticity (meta-STDP) supports long-term stability of major synaptic inputs to hippocampal CA1 pyramidal neurons. For this reason, we have coupled synaptic models equipped with a previously established meta-STDP rule to a biophysically realistic computational model of the hippocampal CA1 pyramidal cell with a simplified dendritic tree. Our simulations show that the meta-STDP rule is able to keep synaptic weights stable during ongoing spontaneous input activity as it happens in the hippocampus in vivo. This is functionally advantageous as neurons should not change their weights during the ongoing activity of neural circuits in vivo. However, they should maintain their ability to display plastic changes in the case of significantly different or “meaningful” inputs. Thus, our study is the first step before we attempt to simulate different stimulation protocols which induce changes in synaptic weights in vivo.

Matúš Tomko, Peter Jedlička, L’ubica Beňušková

### Adaptive Chemotaxis for Improved Contour Tracking Using Spiking Neural Networks

In this paper we present a Spiking Neural Network (SNN) for autonomous navigation, inspired by the chemotaxis network of the worm Caenorhabditis elegans. In particular, we focus on the problem of contour tracking, wherein the bot must reach and subsequently follow a desired concentration setpoint. Past schemes that used only klinokinesis can follow the contour efficiently but take excessive time to reach the setpoint. We address this shortcoming by proposing a novel adaptive klinotaxis mechanism that builds upon a previously proposed gradient climbing circuit. We demonstrate how our klinotaxis circuit can autonomously be configured to perform gradient ascent, gradient descent and subsequently be disabled to seamlessly integrate with the aforementioned klinokinesis circuit. We also incorporate speed regulation (orthokinesis) to further improve contour tracking performance. Thus for the first time, we present a model that successfully integrates klinokinesis, klinotaxis and orthokinesis. We demonstrate via contour tracking simulations that our proposed scheme achieves an 2.4x reduction in the time to reach the setpoint, along with a simultaneous 8.7x reduction in average deviation from the setpoint.

Shashwat Shukla, Rohan Pathak, Vivek Saraswat, Udayan Ganguly

### Mental Imagery-Driven Neural Network to Enhance Representation for Implicit Discourse Relation Recognition

Implicit discourse relation recognition is an important sub-task in discourse parsing, which needs to infer the relation based on proper discourse comprehension. Recent studies on cognitive learning strategies have suggested that using mental imagery strategy will foster text comprehension, which could effectively improve the capability of learners’ reading. Therefore, we propose a novel Mental Imagery-driven Neural Networks (MINN) to enhance representation for implicit discourse relation recognition. It employs the multi-granularity imagery vectors generated by the arguments to capture the deeper semantic information of discourse at different scales. Specifically, we 1) encode the different granularities of arguments (i.e., phrases, sentences.) and generate the corresponding imagery vectors as mentally imagining images of text content; 2) fuse the argument representations and imagery vectors as sequence representations; 3) further adopt self-attention to mine the important interactions between the sequence representations to infer the discourse relations. Extensive experimental results on the Penn Discourse TreeBank (PDTB) show that our model achieves competitive results against several state-of-the-art systems.

Jian Wang, Ruifang He, Fengyu Guo, Yugui Han

### Adaptive Convolution Kernel for Text Classification via Multi-channel Representations

Although existing text classification algorithms with LSTM-CNN-like structures have achieved great success, these models still have deficiencies in text feature representation and extraction. Most of the text representation methods based on LSTM-like models often adopt a single-channel form, and the size of convolution kernel is usually fixed in further feature extraction by CNN. Hence, in this study, we propose an Adaptive Convolutional Kernel via Multi-Channel Representation (ACK-MCR) model to solve the above two problems. The multi-channel text representation is formed by two different Bi-LSTM networks, extracting time-series features from forward and backward directions to retain more semantic information. Furthermore, after CNNs, a multi-scale feature attention is used to adaptively select multi-scale feature for classification. Extensive experiments show that our model obtains competitive performance against state-of-the-art baselines on six benchmark datasets.

Cheng Wang, Xiaoyan Fan

### Text Generation in Discrete Space

Variational AutoEncoders (VAEs) are applied to many generation tasks while suffering from posterior collapse issue. Vector Quantization (VQ) is recently employed in VAE model on image generation, which could get rid of the posterior collapse problem and show its potentiality for more generation tasks. In this paper, the VQ method is applied to VAE on text generation. We elaborately design the model architecture to mitigate the index collapse issue brought in by VQ process. Experiments show that our text generation model can achieve better reconstruction and generation performance than other VAE based approaches.

Ting Hu, Christoph Meinel

### Short Text Processing for Analyzing User Portraits: A Dynamic Combination

The rich digital footprint left by users on the Internet has led to extensive researches on all aspects of Internet users. Among them, topic modeling is used to analyze text information posted by users on websites to generate user portraits. For dealing with the serious sparsity problems when extracting topics from short texts by traditional text modeling methods such as Latent Dirichlet Allocation (LDA), researchers usually aggregate all the texts published by each user into a pseudo-document. However, such pseudo-documents contain a lot of irrelevant topics, which is not consistent with the documents published by people in reality. To that end, this paper introduces the LDA-RCC model for dynamic text modeling based on the actual text, which is used to analyze the interests of forum users and build user portraits. Specifically, this combined model can effectively process short texts through the iterative combination of text modeling method LDA and robust continuous clustering method (RCC). Meanwhile, this model can automatically extract the number of topics based on the user’s data. In this way, by processing the clustering results, we can obtain the preferences of each user for deep user analysis. A large number of experimental results show that the LDA-RCC model can obtain good results and is superior to both traditional text modeling methods and short text clustering benchmark methods.

Zhengping Ding, Chen Yan, Chunli Liu, Jianrui Ji, Yezheng Liu

### A Hierarchical Fine-Tuning Approach Based on Joint Embedding of Words and Parent Categories for Hierarchical Multi-label Text Classification

Many important classification problems in real world consist of a large number of categories. Hierarchical multi-label text classification (HMTC) with higher accuracy over large sets of closely related categories organized in a hierarchical structure or taxonomy has become a challenging problem. In this paper, we present a hierarchical fine-tuning deep learning approach for HMTC, where a joint embedding of words and their parent categories is generated by leveraging the hierarchical relations in the hierarchical structure of categories and the textual data. A fine tuning technique is applied to the Ordered Neural LSTM (ONLSTM) neural network such that the text classification results in the upper levels are able to help the classification in the lower ones. The extensive experiments were made over two benchmark datasets, and the results show that the method proposed in this paper outperforms the state-of-the-art hierarchical and flat multi-label text classification approaches, in particular the aspect of reducing computational costs while achieving superior performance.

Yinglong Ma, Jingpeng Zhao, Beihong Jin

### Boosting Tricks for Word Mover’s Distance

Word embeddings have opened a new path in creating novel approaches for addressing traditional problems in the natural language processing (NLP) domain. However, using word embeddings to compare text documents remains a relatively unexplored topic—with Word Mover’s Distance (WMD) being the prominent tool used so far. In this paper, we present a variety of tools that can further improve the computation of distances between documents based on WMD. We demonstrate that, alternative stopwords, cross document-topic comparison, deep contextualized word vectors and convex metric learning, constitute powerful tools that can boost WMD.

Konstantinos Skianis, Fragkiskos D. Malliaros, Nikolaos Tziortziotis, Michalis Vazirgiannis

### Embedding Compression with Right Triangle Similarity Transformations

Word embedding technology has promoted the development of many NLP tasks. However, these embeddings often require a lot of storage, memory, and computation, resulting in low efficiency in NLP tasks. To address the problem, this paper proposes a new method for compressing word embeddings. We sample a set of orthogonal vector pairs from word embedding matrix in advance. Then these vector pairs are fed into a neural network sharing weights (i.e., Siamese network), and low-dimensional forms of the vector pairs are obtained. We get two vector triplets by adding the subtraction results of the vector pairs, respectively, which can be regarded as two triangles. The neural network is trained by minimizing the mean square error of the three internal angles between the two triangles. Finally, we extract its shared body as a compressor. The essence of this method is the right triangle similarity transformation (RTST), which is a combination of manifold learning and neural networks. It is distinguishable from other methods. The orthogonality in right triangles is beneficial to the compressed space construction. RTST also maintains the relative order of each edge (vector norm) in triangles. Experimental results on semantic similarity tasks reveal that the vector size is 64% of the original, while the performance is improved by 1.8%. When the compression rate reaches 2.7%, the performance drop is only 1.2%. Detailed analysis and ablation study further validate the rationality and the robustness of the RTST method.

Haohao Song, Dongsheng Zou, Lei Hu, Jieying Yuan

### Neural Networks for Detecting Irrelevant Questions During Visual Question Answering

Mengdi Li, Cornelius Weber, Stefan Wermter

### F-Measure Optimisation and Label Regularisation for Energy-Based Neural Dialogue State Tracking Models

In recent years many multi-label classification methods have exploited label dependencies to improve performance of classification tasks in various domains, hence casting the tasks to structured prediction problems. We argue that multi-label predictions do not always satisfy domain constraint restrictions. For example when the dialogue state tracking task in task-oriented dialogue domains is solved with multi-label classification approaches, slot-value constraint rules should be enforced following real conversation scenarios.To address these issues we propose an energy-based neural model to solve the dialogue state tracking task as a structured prediction problem. Furthermore we propose two improvements over previous methods with respect to dialogue slot-value constraint rules: (i) redefining the estimation conditions for the energy network; (ii) regularising label predictions following the dialogue slot-value constraint rules. In our results we find that our extended energy-based neural dialogue state tracker yields better overall performance in term of prediction accuracy, and also behaves more naturally with respect to the conversational rules.

Anh Duong Trinh, Robert J. Ross, John D. Kelleher

### Unsupervised Change Detection Using Joint Autoencoders for Age-Related Macular Degeneration Progression

Age-Related Macular Degeneration (ARMD) is an eye disease that has been an important research field for two decades now. Researchers have been mostly interested in studying the evolution of lesions that slowly causes patients to go blind. Many techniques ranging from manual annotation to mathematical models of the disease evolution bring interesting leads to explore. However, artificial intelligence for ARMD image analysis has become one of the main research focus to study the progression of the disease, as accurate manual annotation of its evolution has proved difficult using traditional methods even for experienced doctors. Within this context, in this paper, we propose a neural network architecture for change detection in eye fundus images to highlight the evolution of the disease. The proposed method is fully unsupervised, and is based on fully convolutional joint autoencoders. Our algorithm has been applied to several pairs of images from eye fundus images time series of ARMD patients, and has shown to be more effective than most state-of-the-art change detection methods, including non-neural network based algorithms that are usually used to follow the evolution of the disease.

Guillaume Dupont, Ekaterina Kalinicheva, Jérémie Sublime, Florence Rossant, Michel Pâques

### A Fast Algorithm to Find Best Matching Units in Self-Organizing Maps

Self-Organizing Maps (SOM) are well-known unsupervised neural networks able to perform vector quantization while mapping an underlying regular neighbourhood structure onto the codebook. They are used in a wide range of applications. As with most properly trained neural networks models, increasing the number of neurons in a SOM leads to better results or new emerging properties. Therefore highly efficient algorithms for learning and evaluation are key to improve the performance of such models. In this paper, we propose a faster alternative to compute the Winner Takes All component of SOM that scales better with a large number of neurons. We present our algorithm to find the so-called best matching unit (BMU) in a SOM, and we theoretically analyze its computational complexity. Statistical results on various synthetic and real-world datasets confirm this analysis and show an even more significant improvement in computing time with a minimal degradation of performance. With our method, we explore a new approach for optimizing SOM that can be combined with other optimization methods commonly used in these models for an even faster computation in both learning and recall phases.

Yann Bernard, Nicolas Hueber, Bernard Girau

### Tumor Characterization Using Unsupervised Learning of Mathematical Relations Within Breast Cancer Data

Despite the variety of imaging, genetic and histopathological data used to assess tumors, there is still an unmet need for patient-specific tumor growth profile extraction and tumor volume prediction, for use in surgery planning. Models of tumor growth predict tumor size and require tumor biology-dependent parametrization, which hardly generalizes to cope with tumor variability among patients. In addition, the datasets are limited in size, owing to the restricted or single-time measurements. In this work, we address the shortcomings that incomplete biological specifications, the inter-patient variability of tumors, and the limited size of the data bring to mechanistic tumor growth models. We introduce a machine learning model that alleviates these shortcomings and is capable of characterizing a tumor’s growth pattern, phenotypical transitions, and volume. The model learns without supervision, from different types of breast cancer data the underlying mathematical relations describing tumor growth curves more accurate than three state-of-the-art models. Experiments performed on publicly available clinical breast cancer datasets, demonstrate the versatility of the approach among breast cancer types. Moreover, the model can also, without modification, learn the mathematical relations among, for instance, histopathological and morphological parameters of the tumor and, combined with the growth curve, capture the (phenotypical) growth transitions of the tumor from a small amount of data. Finally, given the tumor growth curve and its transitions, our model can learn the relation among tumor proliferation-to-apoptosis ratio, tumor radius, and tumor nutrient diffusion length, used to estimate tumor volume. Such a quantity can be readily incorporated within current clinical practice, for surgery planning.

Cristian Axenie, Daria Kurz

### Balanced SAM-kNN: Online Learning with Heterogeneous Drift and Imbalanced Data

Recently, machine learning techniques are often applied in real world scenarios where learning signals are provided as a stream of data points, and models need to be adapted online according to the current information. A severe problem of such settings consists in the fact that the underlying data distribution might change over time and concept drift or change of the feature characteristics have to be dealt with. In addition, data are often imbalanced because training signals for rare classes are particularly sparse. In the last years, a number of learning technologies have been proposed, which can reliably learn in the presence of drift, whereby non-parametric approaches such as the recent model SAM-kNN [10] can deal particularly well with heterogeneous or priorly unknown types of drift. Yet these methods share the deficiencies of the underlying vanilla-kNN classifier when dealing with imbalanced classes. In this contribution, we propose intuitive extensions of SAM-kNN, which incorporate successful balancing techniques for kNN, namely SMOTE-sampling [1] and kENN [9], respectively, into the online learning scenario. Besides, we propose a new method, Informed Downsampling, for solving class imbalance in non-stationary settings with underlying drift, and demonstrate its superiority in a number of benchmarks.

Valerie Vaquet, Barbara Hammer

### A Rigorous Link Between Self-Organizing Maps and Gaussian Mixture Models

This work presents a mathematical treatment of the relation between Self-Organizing Maps (SOMs) and Gaussian Mixture Models (GMMs). We show that energy-based SOM models can be interpreted as performing gradient descent, minimizing an approximation to the GMM log-likelihood that is particularly valid for high data dimensionalities. The SOM-like decrease of the neighborhood radius can be understood as an annealing procedure ensuring that gradient descent does not get stuck in undesirable local minima. This link allows to treat SOMs as generative probabilistic models, giving a formal justification for using SOMs, e.g., to detect outliers, or for sampling.

Alexander Gepperth, Benedikt Pfülb

### Collaborative Clustering Through Optimal Transport

Significant results have been achieved recently by exchanging information between multiple learners for clustering tasks. However, this approaches still suffer from a few issues regarding the choice of the information to trade, the stopping criteria and the trade-of between the information extracted from the data and the information exchanged by the models. We aim in this paper to address this issues through a novel approach propelled by the optimal transport theory. More specifically, the objective function is based on the Wasserstein metric, with a bidirectional transport of the information. This formulation leads to a high stability and increase of the quality. It also allows the learning of a stopping criteria. Extensive experiments were conducted on multiple data sets to evaluate the proposed method, which confirm the advantages of this approach.

Fatima Ezzahraa Ben Bouazza, Younès Bennani, Guénaël Cabanes, Abdelfettah Touzani