nach oben

2018 | Buch | 1. Auflage

Kapitel lesen Erstes Kapitel lesen

Artificial Neural Networks and Machine Learning – ICANN 2018

27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III

herausgegeben von: Věra Kůrková, Prof. Yannis Manolopoulos, Barbara Hammer, Lazaros Iliadis, Ilias Maglogiannis

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This three-volume set LNCS 11139-11141 constitutes the refereed proceedings of the 27th International Conference on Artificial Neural Networks, ICANN 2018, held in Rhodes, Greece, in October 2018.

The papers presented in these volumes was carefully reviewed and selected from total of 360 submissions. They are related to the following thematic topics: AI and Bioinformatics, Bayesian and Echo State Networks, Brain Inspired Computing, Chaotic Complex Models, Clustering, Mining, Exploratory Analysis, Coding Architectures, Complex Firing Patterns, Convolutional Neural Networks, Deep Learning (DL), DL in Real Time Systems, DL and Big Data Analytics, DL and Big Data, DL and Forensics, DL and Cybersecurity, DL and Social Networks, Evolving Systems – Optimization, Extreme Learning Machines, From Neurons to Neuromorphism, From Sensation to Perception, From Single Neurons to Networks, Fuzzy Modeling, Hierarchical ANN, Inference and Recognition, Information and Optimization, Interacting with The Brain, Machine Learning (ML), ML for Bio Medical systems, ML and Video-Image Processing, ML and Forensics, ML and Cybersecurity, ML and Social Media, ML in Engineering, Movement and Motion Detection, Multilayer Perceptrons and Kernel Networks, Natural Language, Object and Face Recognition, Recurrent Neural Networks and Reservoir Computing, Reinforcement Learning, Reservoir Computing, Self-Organizing Maps, Spiking Dynamics/Spiking ANN, Support Vector Machines, Swarm Intelligence and Decision-Making, Text Mining, Theoretical Neural Computation, Time Series and Forecasting, Training and Learning.

Inhaltsverzeichnis

Frontmatter

Recurrent ANN

Frontmatter

Policy Learning Using SPSA

We analyze the use of simultaneous perturbation stochastic approximation (SPSA), a stochastic optimization technique, for solving reinforcement learning problems. In particular, we consider settings of partial observability and leverage the short-term memory capabilities of echo state networks (ESNs) to learn parameterized control policies. Using SPSA, we propose three different variants to adapt the weight matrices of an ESN to the task at hand. Experimental results on classic control problems with both discrete and continuous action spaces reveal that ESNs trained using SPSA approaches outperform conventional ESNs trained using temporal difference and policy gradient methods.

R. Ramamurthy, C. Bauckhage, R. Sifa, S. Wrobel

Simple Recurrent Neural Networks for Support Vector Machine Training

We show how to implement a simple procedure for support vector machine training as a recurrent neural network. Invoking the fact that support vector machines can be trained using Frank-Wolfe optimization which in turn can be seen as a form of reservoir computing, we obtain a model that is of simpler structure and can be implemented more easily than those proposed in previous contributions.

Rafet Sifa, Daniel Paurat, Daniel Trabold, Christian Bauckhage

RNN-SURV: A Deep Recurrent Model for Survival Analysis

Current medical practice is driven by clinical guidelines which are designed for the “average” patient. Deep learning is enabling medicine to become personalized to the patient at hand. In this paper we present a new recurrent neural network model for personalized survival analysis called rnn-surv. Our model is able to exploit censored data to compute both the risk score and the survival function of each patient. At each time step, the network takes as input the features characterizing the patient and the identifier of the time step, creates an embedding, and outputs the value of the survival function in that time step. Finally, the values of the survival function are linearly combined to compute the unique risk score. Thanks to the model structure and the training designed to exploit two loss functions, our model gets better concordance index (C-index) than the state of the art approaches.

Eleonora Giunchiglia, Anton Nemchenko, Mihaela van der Schaar

Do Capsule Networks Solve the Problem of Rotation Invariance for Traffic Sign Classification?

Detecting and classifying traffic signs is a very import step to future autonomous driving. In contrast to earlier approaches with handcrafted features, modern neural networks learn the representation of the classes themselves. Current convolutional neural networks achieve very high accuracy when classifying images, but they have one big problem with their robustness to shift and rotation. In this work an evaluation of a new technique with Capsule Networks is performed and the results are compared to a standard Convolutional Neural Network and a Spatial Transformer Network. Moreover various methods for augmenting the training data are evaluated. This comparison shows the big advantages of the Capsule Networks but also their restrictions. They give a big boost in solving problems mentioned above but their computational complexity is much higher than convolutional neural networks.

Jan Kronenberger, Anselm Haselhoff

Balanced and Deterministic Weight-Sharing Helps Network Performance

Weight-sharing plays a significant role in the success of many deep neural networks, by increasing memory efficiency and incorporating useful inductive priors about the problem into the network. But understanding how weight-sharing can be used effectively in general is a topic that has not been studied extensively. Chen et al. [1] proposed HashedNets, which augments a multi-layer perceptron with a hash table, as a method for neural network compression. We generalize this method into a framework (ArbNets) that allows for efficient arbitrary weight-sharing, and use it to study the role of weight-sharing in neural networks. We show that common neural networks can be expressed as ArbNets with different hash functions. We also present two novel hash functions, the Dirichlet hash and the Neighborhood hash, and use them to demonstrate experimentally that balanced and deterministic weight-sharing helps with the performance of a neural network.

Oscar Chang, Hod Lipson

Neural Networks with Block Diagonal Inner Product Layers

We consider a modified version of the fully connected layer we call a block diagonal inner product layer. These modified layers have weight matrices that are block diagonal, turning a single fully connected layer into a set of densely connected neuron groups. This idea is a natural extension of group, or depthwise separable, convolutional layers applied to the fully connected layers. Block diagonal inner product layers can be achieved by either initializing a purely block diagonal weight matrix or by iteratively pruning off diagonal block entries. This method condenses network storage and speeds up the run time without significant adverse effect on the testing accuracy.

Amy Nesky, Quentin F. Stout

Training Neural Networks Using Predictor-Corrector Gradient Descent

We improve the training time of deep feedforward neural networks using a modified version of gradient descent we call Predictor-Corrector Gradient Descent (PCGD). PCGD uses predictor-corrector inspired techniques to enhance gradient descent. This method uses a sparse history of network parameter values to make periodic predictions of future parameter values in an effort to skip unnecessary training iterations. This method can cut the number of training epochs needed for a network to reach a particular testing accuracy by nearly one half when compared to stochastic gradient descent (SGD). PCGD can also outperform, with some trade-offs, Nesterov’s Accelerated Gradient (NAG).

Amy Nesky, Quentin F. Stout

Investigating the Role of Astrocyte Units in a Feedforward Neural Network

Current research in neuroscience has begun to shift perspective from neurons as sole information processors to including the astrocytes as equal and cooperating units in this function. Recent evidence sheds new light on astrocytes and presents them as important regulators of neuronal activity and synaptic plasticity. In this paper, we present a multi-layer perceptron (MLP) with artificial astrocyte units which listen to and regulate hidden neurons based on their activity. We test the behavior and performance of this bio-inspired model on two classification tasks, N-parity problem and the two-spirals problem and show that proposed models outperform the standard MLP. Interestingly, we have also discovered multiple regimes of astrocyte activity depending on the complexity of the problem.

Peter Gergel’, Igor Farkaŝ

Interactive Area Topics Extraction with Policy Gradient

Extracting representative topics and improving the extraction performance is rather challenging. In this work, we formulate a novel problem, called Interactive Area Topics Extraction, and propose a learning interactive topics extraction (LITE) model to regard this problem as a sequential decision making process and construct an end-to-end framework to use interaction with users. In particular, we use recurrent neural network (RNN) decoder to address the problem and policy gradient method to tune the model parameters considering user feedback. Experimental result has shown the effectiveness of the proposed framework.

Jingfei Han, Wenge Rong, Fang Zhang, Yutao Zhang, Jie Tang, Zhang Xiong

Implementing Neural Turing Machines

Neural Turing Machines (NTMs) are an instance of Memory Augmented Neural Networks, a new class of recurrent neural networks which decouple computation from memory by introducing an external memory unit. NTMs have demonstrated superior performance over Long Short-Term Memory Cells in several sequence learning tasks. A number of open source implementations of NTMs exist but are unstable during training and/or fail to replicate the reported performance of NTMs. This paper presents the details of our successful implementation of a NTM. Our implementation learns to solve three sequential learning tasks from the original NTM paper. We find that the choice of memory contents initialization scheme is crucial in successfully implementing a NTM. Networks with memory contents initialized to small constant values converge on average 2 times faster than the next best memory contents initialization scheme.

Mark Collier, Joeran Beel

A RNN-Based Multi-factors Model for Repeat Consumption Prediction

Consumption is a common activity in people’s daily life, and some reports show that repeat consumption even accounts for a greater portion of people’s observed activities compared with novelty-seeking consumption. Therefore, modeling repeat consumption is a very important study to understand human behavior. In this paper, we proposed a multi-factors RNN (MF-RNN) model to predict the users’ repeat consumption behavior. We analysed some factors which can influence customers’ daily repeat consumption and introduced those factor in MF-RNN model to predict the users’ repeat consumption behavior. An empirical study on real-world data sets shows encouraging results on our approach. In the real-world dataset, the MF-RNN gets good prediction performance, better than Most Frequent, HMM, Recency, DYRC and LSTM methods. We compared the effect of different factors on the customers’ repeat consumption behavior, and found that the MF-RNN gets better performance than non-factor RNN. Besides, we analyzed the differences in consumption behaviors between different cities and different regions in China.

Zengwei Zheng, Yanzhen Zhou, Lin Sun, Jianping Cai

Practical Fractional-Order Neuron Dynamics for Reservoir Computing

This paper proposes a practical reservoir computing with fractional-order leaky integrator neurons, which yield longer memory capacity rather than normal leaky integrator. In general, fractional-order derivative needs all memories leading to the current state from the initial state. Although this feature is useful as a viewpoint of memory capacity, to keep all memories is intractable, in particular, for reservoir computing with many neurons. A reasonable approximation to the fractional-order neuron dynamics is therefore introduced, thereby deriving a model that exponentially decays past memories before threshold. This derivation is regarded as natural extension of reservoir computing with leaky integrator that has been used most commonly. The proposed method is compared with reservoir computing methods with normal neurons and leaky integrator neurons by solving four kinds of regression and classification problems with time-series data. As a result, the proposed method shows superior results in all of problems.

Taisuke Kobayashi

An Unsupervised Character-Aware Neural Approach to Word and Context Representation Learning

In the last few years, neural networks have been intensively used to develop meaningful distributed representations of words and contexts around them. When these representations, also known as “embeddings”, are learned from unsupervised large corpora, they can be transferred to different tasks with positive effects in terms of performances, especially when only a few supervisions are available. In this work, we further extend this concept, and we present an unsupervised neural architecture that jointly learns word and context embeddings, processing words as sequences of characters. This allows our model to spot the regularities that are due to the word morphology, and to avoid the need of a fixed-sized input vocabulary of words. We show that we can learn compact encoders that, despite the relatively small number of parameters, reach high-level performances in downstream tasks, comparing them with related state-of-the-art approaches or with fully supervised methods.

Giuseppe Marra, Andrea Zugarini, Stefano Melacci, Marco Maggini

Towards End-to-End Raw Audio Music Synthesis

In this paper, we address the problem of automated music synthesis using deep neural networks and ask whether neural networks are capable of realizing timing, pitch accuracy and pattern generalization for automated music generation when processing raw audio data. To this end, we present a proof of concept and build a recurrent neural network architecture capable of generalizing appropriate musical raw audio tracks.

Manfred Eppe, Tayfun Alpay, Stefan Wermter

Real-Time Hand Prosthesis Biomimetic Movement Based on Electromyography Sensory Signals Treatment and Sensors Fusion

The hand of the human being is a very sophisticated and useful instrument, being essential for all types of tasks, from delicate manipulations and of high precision, to tasks that require a lot of force. For a long time researchers have been studying the biomechanics of the human hand, to reproduce it in robotic hands to be used as a prosthesis in humans, in the replacement of limbs lost or used in robots. In this study, we present the implementation (electronics project, acquisition, treatment, processing and control) of different sensors in the control of prostheses. The sensors studied and implemented are: inertial, electromyography (EMG), force and slip. The tests showed reasonable results due to sliding and dropping of some objects. These sensors will be used in a more complex system that will approach the fusion of sensors through Artificial Neural Networks (ANNs) and new tests should be performed for different scenarios.

João Olegário de Oliveira de Souza, José Vicente Canto dos Santos, Rodrigo Marques de Figueiredo, Gustavo Pessin

An Exploration of Dropout with RNNs for Natural Language Inference

Dropout is a crucial regularization technique for the Recurrent Neural Network (RNN) models of Natural Language Inference (NLI). However, dropout has not been evaluated for the effectiveness at different layers and dropout rates in NLI models. In this paper, we propose a novel RNN model for NLI and empirically evaluate the effect of applying dropout at different layers in the model. We also investigate the impact of varying dropout rates at these layers. Our empirical evaluation on a large (Stanford Natural Language Inference (SNLI)) and a small (SciTail) dataset suggest that dropout at each feed-forward connection severely affects the model accuracy at increasing dropout rates. We also show that regularizing the embedding layer is efficient for SNLI whereas regularizing the recurrent layer improves the accuracy for SciTail. Our model achieved an accuracy $$86.14 \%$$ 86.14 % on the SNLI dataset and $$77.05 \%$$ 77.05 % on SciTail.

Amit Gajbhiye, Sardar Jaf, Noura Al Moubayed, A. Stephen McGough, Steven Bradley

Neural Model for the Visual Recognition of Animacy and Social Interaction

Humans reliably attribute social interpretations and agency to highly impoverished stimuli, such as interacting geometrical shapes. While it has been proposed that this capability is based on high-level cognitive processes, such as probabilistic reasoning, we demonstrate that it might be accounted for also by rather simple physiologically plausible neural mechanisms. Our model is a hierarchical neural network architecture with two pathways that analyze form and motion features. The highest hierarchy level contains neurons that have learned combinations of relative position-, motion-, and body-axis features. The model reproduces psychophysical results on the dependence of perceived animacy on motion smoothness and the orientation of the body axis. In addition, the model correctly classifies six categories of social interactions that have been frequently tested in the psychophysical literature. For the generation of training data we propose a novel algorithm that is derived from dynamic human navigation models, and which allows to generate arbitrary numbers of abstract social interaction stimuli by self-organization.

Mohammad Hovaidi-Ardestani, Nitin Saini, Aleix M. Martinez, Martin A. Giese

Attention-Based RNN Model for Joint Extraction of Intent and Word Slot Based on a Tagging Strategy

In this paper, we proposed an attention-based recurrent neural network model based on a tagging strategy for intent detection and word slot extraction. Unlike other joint models dividing the joint task into two sub-models by sharing parameters, we explore a tagging strategy to incorporate the intent detection task and word slot extraction task in a sequence labeling model. We implemented experiments on a public dataset and the results show that the tagging strategy methods outperform most of the existing pipelined and joint methods. Our tagging strategy model obtained 97.65% accuracy rate on intent detection task and 95.15% F1 score on word slot extraction task.

Dongjie Zhang, Zheng Fang, Yanan Cao, Yanbing Liu, Xiaojun Chen, Jianlong Tan

Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures

The presence of Long Distance Dependencies (LDDs) in sequential data poses significant challenges for computational models. Various recurrent neural architectures have been designed to mitigate this issue. In order to test these state-of-the-art architectures, there is growing need for rich benchmarking datasets. However, one of the drawbacks of existing datasets is the lack of experimental control with regards to the presence and/or degree of LDDs. This lack of control limits the analysis of model performance in relation to the specific challenge posed by LDDs. One way to address this is to use synthetic data having the properties of subregular languages. The degree of LDDs within the generated data can be controlled through the k parameter, length of the generated strings, and by choosing appropriate forbidden strings. In this paper, we explore the capacity of different RNN extensions to model LDDs, by evaluating these models on a sequence of SPk synthesized datasets, where each subsequent dataset exhibits a longer degree of LDD. Even though SPk are simple languages, the presence of LDDs does have significant impact on the performance of recurrent neural architectures, thus making them prime candidate in benchmarking tasks.

Abhijit Mahalunkar, John D. Kelleher

Learning Trends on the Fly in Time Series Data Using Plastic CGP Evolved Recurrent Neural Networks

An approach of Direct Online Learning (DOL) to incorporate developmental plasticity in Recurrent Neural Networks termed as Plastic Cartesian Genetic Programming evolved Recurrent Neural Network (PCGPRNN), is proposed to exploit the trends in the data of the foreign currency to forecast the future currency rates, while reshaping its connectivity, biasing factors and selecting various parameters from the input vector ‘on the fly’ according to the traversed trends. The developed model learns in real time and exhibits the optimum topology for the best possible output using neuro-evolution. The network performance is observed in a range of scenarios with varying network parameters and various currencies and trading indexes obtaining competitive results. Networks trained to predict single instances are further explored in independent scenarios to predict various time intervals in advance, achieving remarkable results.

Gul Mummad Khan, Durr-e-Nayab

Noise Masking Recurrent Neural Network for Respiratory Sound Classification

In this paper, we propose a novel architecture called noise masking recurrent neural network (NMRNN) for lung sound classification. The model jointly learns to extract only important respiratory-like frames without redundant noise and then by exploiting this information is trained to classify lung sounds into four categories: normal, containing wheezes, crackles and both wheezes and crackles. We compare the performance of our model with machine learning based models. As a result, the NMRNN model reaches state-of-the-art performance on recently introduced publicly available respiratory sound database.

Kirill Kochetov, Evgeny Putin, Maksim Balashov, Andrey Filchenkov, Anatoly Shalyto

Lightweight Neural Programming: The GRPU

Deep Learning techniques have achieved impressive results over the last few years. However, they still have difficulty in producing understandable results that clearly show the embedded logic behind the inductive process. One step in this direction is the recent development of Neural Differentiable Programmers. In this paper, we designed a neural programmer that can be easily integrated into existing deep learning architectures, with similar amount of parameters to a single commonly used Recurrent Neural Network. Tests conducted with the proposal suggest that it has the potential to induce algorithms even without any kind of special optimization, achieving competitive results in problems handled by more complex RNN architectures.

Felipe Carregosa, Aline Paes, Gerson Zaverucha

Towards More Biologically Plausible Error-Driven Learning for Artificial Neural Networks

Since the standard error backpropagation algorithm for supervised learning was shown biologically implausible, alternative models of training that use only local activation variables have been proposed. In this paper we present a novel algorithm called UBAL, inspired by the GeneRec model. We shortly describe the model and show the performance of the algorithm for XOR and 4-2-4 problems.

Kristína Malinovská, Ľudovít Malinovský, Igor Farkaš

Online Carry Mode Detection for Mobile Devices with Compact RNNs

Nowadays mobile devices are an essential part of our daily life. Especially fitness tracking application, which record our daily actions or exercise sessions, require a robust carry mode detection of the device. For a detailed and accurate analysis of the acquired data it is essential to know the relative position and thus the expected movement of the phone relative to the performed actions. On the other hand, it is important that such a detection is as energy-efficient as possible, which eliminates common deep convolutional approaches in advance. The contribution of this paper is twofold. First, we provide a mobile device carry mode data set, which currently consists of 6 h and 28 min of labeled accelerometer recordings. Second, we developed a robust online method to estimate the carry mode of such a device, which allows robust classification of long sequences of data based on compact Recurrent Neural Networks (RNNs), particularly Long Short-Term Memories (LSTMs). Our approach is generally applicable due to only requiring data from an accelerometer and is lightweight enough to run on small embedded devices. Specifically, we demonstrate that LSTMs can almost perfectly distinguish between the carry modes hand, bag and pocket.

Philipp Kuhlmann, Paul Sanzenbacher, Sebastian Otte

Deep Learning

Frontmatter

Deep CNN-ELM Hybrid Models for Fire Detection in Images

In this paper, we propose a hybrid model consisting of a Deep Convolutional feature extractor followed by a fast and accurate classifier, the Extreme Learning Machine, for the purpose of fire detection in images. The reason behind using such a model is that Deep CNNs used for image classification take a very long time to train. Even with pre-trained models, the fully connected layers need to be trained with backpropagation, which can be very slow. In contrast, we propose to employ the Extreme Learning Machine (ELM) as the final classifier trained on pre-trained Deep CNN feature extractor. We apply this hybrid model on the problem of fire detection in images. We use state of the art Deep CNNs: VGG16 and Resnet50 and replace the softmax classifier with the ELM classifier. For both the VGG16 and Resnet50, the number of fully connected layers is also reduced. Especially in VGG16, which has 3 fully connected layers of 4096 neurons each followed by a softmax classifier, we replace two of these with an ELM classifier. The difference in convergence rate between fine-tuning the fully connected layers of pre-trained models and training an ELM classifier are enormous, around 20x to 51x speed-up. Also, we show that using an ELM classifier increases the accuracy of the system by 2.8% to 7.1% depending on the CNN feature extractor. We also compare our hybrid architecture with another hybrid architecture, i.e. the CNN-SVM model. Using SVM as the classifier does improve accuracy compared to state-of-the-art deep CNNs. But our Deep CNN-ELM model is able to outperform the Deep CNN-SVM models. (Preliminary version of some of the results of this paper appear in “Deep Convolutional Neural Networks for Fire Detection in Images”, Springer Proceedings Engineering Applications of Neural Networks 2017 (EANN’17), Athens, Greece, 25–27 August).

Jivitesh Sharma, Ole-Christopher Granmo, Morten Goodwin

Siamese Survival Analysis with Competing Risks

Survival analysis in the presence of multiple possible adverse events, i.e., competing risks, is a pervasive problem in many industries (healthcare, finance, etc.). Since only one event is typically observed, the incidence of an event of interest is often obscured by other related competing events. This nonidentifiability, or inability to estimate true cause-specific survival curves from empirical data, further complicates competing risk survival analysis. We introduce Siamese Survival Prognosis Network (SSPN), a novel deep learning architecture for estimating personalized risk scores in the presence of competing risks. SSPN circumvents the nonidentifiability problem by avoiding the estimation of cause-specific survival curves and instead determines pairwise concordant time-dependent risks, where longer event times are assigned lower risks. Furthermore, SSPN is able to directly optimize an approximation to the C-discrimination index, rather than relying on well-known metrics which are unable to capture the unique requirements of survival analysis with competing risks.

Anton Nemchenko, Trent Kyono, Mihaela Van Der Schaar

A Survey on Deep Transfer Learning

As a new classification platform, deep learning has recently received increasing attention from researchers and has been successfully applied to many domains. In some domains, like bioinformatics and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of data acquisition and costly annotation, which limits its development. Transfer learning relaxes the hypothesis that the training data must be independent and identically distributed (i.i.d.) with the test data, which motivates us to use transfer learning to solve the problem of insufficient training data. This survey focuses on reviewing the current researches of transfer learning by using deep neural network and its applications. We defined deep transfer learning, category and review the recent research works based on the techniques used in deep transfer learning.

Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, Chunfang Liu

Cloud Detection in High-Resolution Multispectral Satellite Imagery Using Deep Learning

Cloud detection in high-resolution satellite images is a critical step for many remote sensing applications, but also a challenge, as such images have limited spectral bands. The contribution of this paper is twofold: We present a dataset called CloudPeru as well as a methodology for cloud detection in multispectral satellite images (approximately 2.8 meters per pixel) using deep learning. We prove that an agile Convolutional Neural Network (CNN) is able to distinguish between non-clouds and different types of clouds, including thin and very small ones, and achieve a classification accuracy of 99.94%. Each image is subdivided into superpixels by the SLICO algorithm, which are then processed by the trained CNN. Finally, we obtain the cloud mask by applying a threshold of 0.5 on the probability map. The results are compared with manually annotated images, showing a Kappa coefficient of 0.944, which is higher than that of compared methods.

Giorgio Morales, Samuel G. Huamán, Joel Telles

Metric Embedding Autoencoders for Unsupervised Cross-Dataset Transfer Learning

Cross-dataset transfer learning is an important problem in person re-identification (Re-ID). Unfortunately, not too many deep transfer Re-ID models exist for realistic settings of practical Re-ID systems. We propose a purely deep transfer Re-ID model consisting of a deep convolutional neural network and an autoencoder. The latent code is divided into metric embedding and nuisance variables. We then utilize an unsupervised training method that does not rely on co-training with non-deep models. Our experiments show improvements over both the baseline and competitors’ transfer learning models.

Alexey Potapov, Sergey Rodionov, Hugo Latapie, Enzo Fenoglio

Classification of MRI Migraine Medical Data Using 3D Convolutional Neural Network

While statistical approaches are being implemented in medical data analyses because of their high accuracy and efficiency, the use of deep learning computations can potentially provide out-of-the-box insights, especially when statistical approaches did not yield a good result. In this paper we classify migraine and non-migraine magnetic resonance imaging (MRI) data, using a deep learning method named convolutional neural network (CNN). 198 MRI scans, which were obtained equally from both data groups, resulted in the maximum classification test accuracy of 85% (validation accuracy: $$\bar{x}$$ x ¯ = 0.69, $$\sigma $$ σ = 0.06), compared to the baseline statistical accuracy of 50%. We then used class activation mapping (CAM) method to visualize brain regions that the CNN model took to distinguish one data group from the other and the visualization pointed at the parietal lobe, corpus callosum, brain stem and anterior cingulate cortex, of which the brain stem was mentioned in the medical findings for white matter abnormalities. Our findings suggest that CNN and CAM combined can be a useful image-based data analysis tool to add inspiration or discussion in the medical problem-solving process.

Hwei Geok Ng, Matthias Kerzel, Jan Mehnert, Arne May, Stefan Wermter

Deep 3D Pose Dictionary: 3D Human Pose Estimation from Single RGB Image Using Deep Convolutional Neural Network

In this work, we propose a new approach for 3D human pose estimation from a single monocular RGB image based on a deep convolutional neural network (CNN). The proposed method depends on reducing the huge search space of the continuous-valued 3D human poses by discretizing and approximating these continuous poses into many discrete key-poses. These key-poses constitute more restricted search space and then can be considered as multiple-class candidates of 3D human poses. Thus, a suitable classification technique is trained using a set of 3D key-poses and their corresponding RGB images to build a model to predict the 3D pose class of an input monocular RGB image. We use deep CNN as a suitable classifier because it is proven to be the most accurate technique for RGB image classification. Our approach is proven to achieve good accuracy which is comparable to the state-of-the-art methods.

Reda Elbasiony, Walid Gomaa, Tetsuya Ogata

FiLayer: A Novel Fine-Grained Layer-Wise Parallelism Strategy for Deep Neural Networks

Data parallelism and model parallelism are regarded as two major parallelism strategies for deep neural networks (DNNs). However, the two methodologies achieve acceleration mainly by applying coarse-grained network-model-based parallelization. Neither methodology can fully tap into the potentials of the parallelism of network models and many-core systems (such as GPUs). In this work, we propose a novel fine-grained parallelism strategy based on layer-wise parallelization (named FiLayer), which includes inter-layer parallelism and intra-layer parallelism. The former allows several adjacent layers in a network model to be processed in a pipelined manner. The latter divides the operations in one layer into several parts and processes them in parallel. CUDA streams are applied to realize the above fine-grained parallelisms. FiLayer is implemented by extending Caffe. Several typical datasets are used for the performance evaluation. The experimental results indicate that FiLayer can help Caffe achieve speedups of $$1.58{\times }$$ 1.58 × – $$2.19{\times }$$ 2.19 × .

Wenbin Jiang, Yangsong Zhang, Pai Liu, Geyan Ye, Hai Jin

DeepVol: Deep Fruit Volume Estimation

Due to the variety of fruit, fruit volume estimation is quite challenging. In this paper, we present a deep neural network based approach, DeepVol, to joint detection and volume estimation in a framework. The proposed architecture consists two independent parts: SSD-based fruit detector and ResNet-based volume regressor. To train the network models, a fruit dataset involving fruit volume and images is collected as a benchmark to verify the volume estimation framework. This method is simple and convenient in practical applications, owing to its requiring no conventional camera calibration and only single image as input. Experimental results demonstrate that our approach is robust to different surroundings, and promising in calorie measurement and unmanned stores.

Hongyu Li, Tianqi Han

Graph Matching and Pseudo-Label Guided Deep Unsupervised Domain Adaptation

The goal of domain adaptation is to train a high-performance predictive model on the target domain data by using knowledge from the source domain data, which has different but related data distribution. In this paper, we consider unsupervised domain adaptation where we have labelled source domain data but unlabelled target domain data. Our solution to unsupervised domain adaptation is to learn a domain-invariant representation that is also category discriminative. Domain-invariant representations are realized by minimizing the domain discrepancy. To minimize the domain discrepancy, we propose a novel graph-matching metric between the source and target domain representations. Minimizing this metric allows the source and target representations to be in support of each other. We further exploit confident unlabelled target domain samples and their pseudo-labels to refine our proposed model. We expect the refining step to improve the performance further. This is validated by performing experiments on standard image classification adaptation datasets. Results showed our proposed approach out-perform previous domain-invariant representation learning approaches.

Debasmit Das, C. S. George Lee

fNIRS-Based Brain–Computer Interface Using Deep Neural Networks for Classifying the Mental State of Drivers

Accidents on the road mostly occur because of human error. Understanding and predicting the manner in which the brain functions when driving can help in reduce fatalities. Particularly, with the recent development of auto-driving cars, it is important to ensure that the driver is ready to retake the control of the vehicle at all times in the event of a system failure. This study attempts to create a brain–computer interface (BCI) using signals obtained through functional near-infrared spectroscopy (fNIRS) to evaluate the impact of different external conditions on the driver’s mental state: weather condition, type of road, including manual driving versus auto-pilot. A deep neural network (DNN) and a recurrent neural network (RNN) are employed for their ability of pattern recognition in the processing of fNIRS signals and are compared to other common classification methods. The results of the study demonstrated that both DNN and RNN offer the same performance. Furthermore, brain activity under different weather conditions cannot be classified by any of the proposed methods. Nevertheless, DNN and RNN have proven their effectiveness in the road type classification with 63% accuracy.

Gauvain Huve, Kazuhiko Takahashi, Masafumi Hashimoto

Research on Fight the Landlords’ Single Card Guessing Based on Deep Learning

In the real world, most of the information is non-accurate and non-complete. The model which guesses the number of the cards is a predictive model based on incomplete information. Players need to know a relatively small amount of information on the card to make accurate predictions. Based on the deep learning method, this paper studies single card speculation method on Fight the landlords game. Located in the perspective of the landlord, the model based on a certain amount of historical card information extracts the dominant features, and makes a reasonable prediction for peasant players’ hands. The algorithm uses the CNN model to design the game turn-based body, single player’s history and the brand-out process of three players simultaneously in the model input matrix. It extracts the characteristics of the landlord playing cards, and predicts the situation of the hand of two peasant players up and down. The experimental results show that the result of single card guess basically accords with the habit of human playing cards.

Saisai Li, Shuqin Li, Meng Ding, Kun Meng

Short-Term Precipitation Prediction with Skip-Connected PredNet

Short-term forecasting of rainfall in a local area is called precipitation nowcasting, and it has been traditionally addressed using rule-based or numerical approaches. Recently, deep neural network models have started to be used for precipitation nowcasting; however, their utility has not been extensively explored yet. Especially, the existing efforts focus only on the choice of their building blocks and pay little attention to the design of the whole network structure. In this paper, we propose a new precipitation nowcasting model based on the PredNet network architecture, which was originally proposed for short-term video prediction tasks. The proposed model outperforms the state-of-the-art models in the MovingMNIST++ dataset in terms of MSE, and it also shows a good predictive performance on a real dataset of precipitation in Kyoto City.

Ryoma Sato, Hisashi Kashima, Takehiro Yamamoto

An End-to-End Deep Learning Architecture for Classification of Malware’s Binary Content

In traditional machine learning techniques for malware detection and classification, significant efforts are expended on manually designing features based on expertise and domain-specific knowledge. These solutions perform feature engineering in order to extract features that provide an abstract view of the software program. Thus, the usefulness of the classifier is roughly dependent on the ability of the domain experts to extract a set of descriptive features. Instead, we introduce a file agnostic end-to-end deep learning approach for malware classification from raw byte sequences without extracting hand-crafted features. It consists of two key components: (1) a denoising autoencoder that learns a hidden representation of the malware’s binary content; and (2) a dilated residual network as classifier. The experiments show an impressive performance, achieving almost 99% of accuracy classifying malware into families.

Daniel Gibert, Carles Mateu, Jordi Planes

Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio

We show that the dynamics and convergence properties of SGD are set by the ratio of learning rate to batch size. We observe that this ratio is a key determinant of the generalization error, which we suggest is mediated by controlling the width of the final minima found by SGD. We verify our analysis experimentally on a range of deep neural networks and datasets.

Stanislaw Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

Data Correction by a Generative Model with an Encoder and its Application to Structure Design

An alternative training model is proposed for adversarial networks to correct a slightly defective data. Generator is first acquired by classical Generative Adversarial Networks, where the discriminator is trained only by feasible data. Then, both an encoder as the inverse mapping of the generator and a classifier which judges a feasibility of a generated data, are trained to lead the generator to correct an infeasible data by the minimum modification. The proposed method is applied to a housing member placement problem to satisfy every constraint for earthquake resistance, and evaluated by a rigorous structural calculation.

Takaya Ueda, Masataka Seo, Ikuko Nishikawa

PMGAN: Paralleled Mix-Generator Generative Adversarial Networks with Balance Control

A Generative Adversarial Network (GAN) is an unsupervised generative framework to generate a sample distribution that is identical to the data distribution. Recently, mix strategy multi-generator/discriminator GANs have been shown to outperform single pair GANs. However, the mixed model suffers from the problem of linearly growing training time. Also, imbalanced training among generators makes it difficult to parallelize. In this paper, we propose a balanced mix-generator GAN that works in parallel by mixing multiple disjoint generators to approximate the real distribution. The weights of the discriminator and the classifier are controlled by a balance strategy. We also present an efficient loss function, to force each generator to embrace few modes with a high probability. Our model is naturally adaptive to large parallel computation frameworks. Each generator can be trained on multiple GPUs asynchronously. We have performed extensive experiments on synthetic datasets, MNIST1000, CIFAR-10, and ImageNet. The results establish that our model can achieve the state-of-the-art performance (in terms of the modes coverage and the inception score), with significantly reduced training time. We also show that the missing mode problem can be relieved with a growing number of generators.

Xia Xiao, Sanguthevar Rajasekaran

Modular Domain-to-Domain Translation Network

We present a method for constructing and training a deep domain-to-domain translation network: two datasets describing the same classes (i.e. the source and target domains) are used to train a deep network that can translate a pattern coming from the source domain to its counterpart form in the target domain. We introduce the development of a hierarchical architecture that encapsulates information of the target domain by embedding individually trained networks. This deep hierarchical architecture is then trained as one unified deep network. Using this approach, we prove that samples from the original domain are translated to the target domain format for both the cases where there is a one-to-one correspondence in the samples of the two domains and also when this correspondence information is absent. In our experiments we get a good translation operation as long as the target domain dataset provides good classification results when trained alone. We use either some distorted version of the MNIST dataset or the SVHN dataset as the original domain for the translation task and the MNIST as the target domain. The translation from one information domain to the other is visualized and evaluated. We also discuss the proposed model’s relation to the conditional Generative Adversarial Networks and we further argue that deep learning can benefit from such forms of strict hierarchical architectures.

Savvas Karatsiolis, Christos N. Schizas, Nicolai Petkov

OrieNet: A Regression System for Latent Fingerprint Orientation Field Extraction

Orientation field is an important characteristic of fingerprints. Many biometrics processing steps rely on its accurate estimation. Previous works on this task failed because of blurry fingerprint patterns and severe background noises. In this paper, a new algorithm system specific for fingerprint orientation estimation is proposed, combining domain knowledge of handcraft methods and the generalization ability of DNN. System’s preprocessing part roughly extracts effective information of input image with specially designed traditional method combination, then a Deep Regression Neural Network (DRNN) is adopted to predict the orientations fields, showing much faster convergence speed during training process than classification networks with the same backbone structure. Novel structure for DNN design is proposed to solve problem of discontinuity around 0° and increase prediction accuracy. Experimental results on test database proves that proposed algorithm system defeats state-of-the-art fingerprint orientation estimation algorithms.

Zhenshen Qu, Junyu Liu, Yang Liu, Qiuyu Guan, Chunyu Yang, Yuxin Zhang

Avoiding Degradation in Deep Feed-Forward Networks by Phasing Out Skip-Connections

A widely observed phenomenon in deep learning is the degradation problem: increasing the depth of a network leads to a decrease in performance on both test and training data. Novel architectures such as ResNets and Highway networks have addressed this issue by introducing various flavors of skip-connections or gating mechanisms. However, the degradation problem persists in the context of plain feed-forward networks. In this work we propose a simple method to address this issue. The proposed method poses the learning of weights in deep networks as a constrained optimization problem where the presence of skip-connections is penalized by Lagrange multipliers. This allows for skip-connections to be introduced during the early stages of training and subsequently phased out in a principled manner. We demonstrate the benefits of such an approach with experiments on MNIST, fashion-MNIST, CIFAR-10 and CIFAR-100 where the proposed method is shown to greatly decrease the degradation effect and is often competitive with ResNets.

Ricardo Pio Monti, Sina Tootoonian, Robin Cao

A Deep Predictive Coding Network for Inferring Hierarchical Causes Underlying Sensory Inputs

Predictive coding has been argued as a mechanism underlying sensory processing in the brain. In computational models of predictive coding, the brain is described as a machine that constructs and continuously adapts a generative model based on the stimuli received from external environment. It uses this model to infer causes that generated the received stimuli. However, it is not clear how predictive coding can be used to construct deep neural network models of the brain while complying with the architectural constraints imposed by the brain. Here, we describe an algorithm to construct a deep generative model that can be used to infer causes behind the stimuli received from external environment. Specifically, we train a deep neural network on real-world images in an unsupervised learning paradigm. To understand the capacity of the network with regards to modeling the external environment, we studied the causes inferred using the trained model on images of objects that are not used in training. Despite the novel features of these objects the model is able to infer the causes for them. Furthermore, the reconstructions of the original images obtained from the generative model using these inferred causes preserve important details of these objects.

Shirin Dora, Cyriel Pennartz, Sander Bohte

Type-2 Diabetes Mellitus Diagnosis from Time Series Clinical Data Using Deep Learning Models

Clinical data is usually observed and recorded at irregular intervals and includes: evaluations, treatments, vital sign and lab test results. These provide an invaluable source of information to help diagnose and understand medical conditions. In this work, we introduce the largest patient records dataset in diabetes research: King Abdullah International Research Centre Diabetes (KAIMRCD) which includes over 14k patient data. KAIMRCD contains detailed information about the patient’s visit and have been labelled against T2DM by clinicians. The data is processed as time series and then investigated using temporal predictive Deep Learning models with the goal of diagnosing Type 2 Diabetes Mellitus (T2DM). Long Short-Term Memory (LSTM) and Gated-Recurrent Unit (GRU) are trained on KAIMRCD and are demonstrated here to outperform classical machine learning approaches in the literature with over 97% accuracy.

Zakhriya Alhassan, A. Stephen McGough, Riyad Alshammari, Tahani Daghstani, David Budgen, Noura Al Moubayed

A Deep Learning Approach for Sentence Classification of Scientific Abstracts

The classification of abstract sentences is a valuable tool to support scientific database querying, to summarize relevant literature works and to assist in the writing of new abstracts. This study proposes a novel deep learning approach based on a convolutional layer and a bi-directional gated recurrent unit to classify sentences of abstracts. The proposed neural network was tested on a sample of 20 thousand abstracts from the biomedical domain. Competitive results were achieved, with weight-averaged precision, recall and F1-score values around 91%, which are higher when compared to a state-of-the-art neural network.

Sérgio Gonçalves, Paulo Cortez, Sérgio Moro

Weighted Multi-view Deep Neural Networks for Weather Forecasting

In multi-view regression the information from multiple representations of the input data is combined to improve the prediction. Inspired by the success of deep learning, this paper proposes a novel model called Weighted Multi-view Deep Neural Networks (MV-DNN) regression. The objective function used is a weighted version of the primal formulation of the existing Multi-View Least Squares Support Vector Machines method, where both the objectives from all different views, as well as the coupling term, are weighted. This work is motivated by the challenging application of weather forecasting. To predict the temperature, the weather variables from several previous days are taken into account. Each feature vector belonging to a previous day (delay) is regarded as a different view. Experimental results on the minimum and maximum temperature prediction in Brussels, reveal the merit of the weighting and show promising results when compared to existing the state-of-the-art methods in weather prediction.

Zahra Karevan, Lynn Houthuys, Johan A. K. Suykens

Combining Articulatory Features with End-to-End Learning in Speech Recognition

End-to-end neural networks have shown promising results on large vocabulary continuous speech recognition (LVCSR) systems. However, it is challenging to integrate domain knowledge into such systems. Specifically, articulatory features (AFs) which are inspired by the human speech production mechanism can help in speech recognition. This paper presents two approaches to incorporate domain knowledge into end-to-end training: (a) fine-tuning networks which reuse hidden layer representations of AF extractors as input for ASR tasks; (b) progressive networks which combine articulatory knowledge by lateral connections from AF extractors. We evaluate the proposed approaches on the speech Wall Street Journal corpus and test on the eval92 standard evaluation dataset. Results show that both fine-tuning and progressive networks can integrate articulatory information into end-to-end learning and outperform previous systems.

Leyuan Qu, Cornelius Weber, Egor Lakomkin, Johannes Twiefel, Stefan Wermter

Estimation of Air Quality Index from Seasonal Trends Using Deep Neural Network

Growing economy of a country is actually leading to harm for its atmosphere. Due to increase in the number of vehicles and industrial development in or around a city, air pollution has also escalated, which has started affecting health of the citizens. Therefore, the level of air pollution of a city needs to be monitored regularly in real-time to maintain the air quality. The state of the air of a city is described by a dimensionless value known as air quality index (AQI). In order to find a pattern from the time-series data, several techniques have been reported in literature such as linear regression, support vector machine, neural network. In this paper, we propose a method based on deep neural network architecture namely recurrent neural network (RNN) and memory cell called as long-short-term-memory (LSTM) for estimation of AQI of a city on future dates using the seasonal trends of the recorded time-series data. Simulation results confirm that the proposed method outperforms in terms of both root mean square error and Min/Max aggregation of AQI values compared to a state-of-the-art technique of AQI estimation.

Arjun Sharma, Anirban Mitra, Sumit Sharma, Sudip Roy

A Deep Learning Approach to Bacterial Colony Segmentation

In this paper, we introduce a new method for the segmentation of bacterial colonies in solid agar plate images. The proposed approach comprises two contributions. First, a simple but nonetheless effective engine is devised to generate synthetic plate images. This engine overlays bacterial colony patches to existing background images, taking into account both the local appearance of the background and the intrinsic opacity of the bacterial colonies. Therefore, a scalable alternative to the human ground–truth supervision—often difficult to obtain in medical imaging, due to privacy issues and scarcity of data—is provided. Then, synthetic generated data, together with few annotated images, were used to train a Fully–Convolutional Network. Such network is actually effective in separating bacterial colonies from the background. Finally, we discuss the role of the generation of synthetic images, conducting experiments that show how their inclusion improves the performances of the segmentation network, producing very encouraging results.

Paolo Andreini, Simone Bonechi, Monica Bianchini, Alessandro Mecocci, Franco Scarselli

Sparsity and Complexity of Networks Computing Highly-Varying Functions

Approximative measures of network sparsity in terms of norms tailored to dictionaries of computational units are investigated. Lower bounds on these norms of real-valued functions on finite domains are derived. The bounds are proven by combining the concentration of measure property of high-dimensional spaces with characterization of dictionaries of computational units in terms of their capacities and coherence measured by their covering numbers. The results are applied to dictionaries used in neurocomputing which have power-type covering numbers. Probabilistic results are illustrated by a concrete construction of a class of functions, computation of which by perceptron networks requires large number of units or it is unstable due to large output weights.

Věra Kůrková

Deep Learning Based Vehicle Make-Model Classification

This paper studies the problem of vehicle make & model classification. Some of the main challenges are reaching high classification accuracy and reducing the annotation time of the images. To address these problems, we have created a fine-grained database using online vehicle marketplaces of Turkey. A pipeline is proposed to combine an SSD (Single Shot Multibox Detector) model with a CNN (Convolutional Neural Network) model to train on the database. In the pipeline, we first detect the vehicles by following an algorithm which reduces the time for annotation. Then, we feed them into the CNN model. It is reached approximately 4% better classification accuracy result than using a conventional CNN model. Next, we propose to use the detected vehicles as ground truth bounding box (GTBB) of the images and feed them into an SSD model in another pipeline. At this stage, it is reached reasonable classification accuracy result without using perfectly shaped GTBB. Lastly, an application is implemented in a use case by using our proposed pipelines which detects the unauthorized vehicles by comparing their license plate numbers and make & models. It is assumed that license plates are readable.

Burak Satar, Ahmet Emir Dirik

Detection and Recognition of Badgers Using Deep Learning

This paper describes the use of two different deep-learning algorithms for object detection to recognize different badgers. We use recordings of four different badgers under varying background illuminations. In total four different object detection algorithms based on deep neural networks are compared: The single shot multi-box detector (SSD) with the Inception-V2 or MobileNet as a backbone, and the faster region-based convolutional neural network (Faster R-CNN) combined with Inception-V2 or residual networks. Furthermore, two different activation functions are compared to compute probabilities that some badger is in the detected region: the softmax and sigmoid functions. The results of all eight models show that SSD obtains higher recognition accuracies (97.8%–98.6%) than Faster R-CNN (84.8%–91.7%). However, the training time of Faster R-CNN is much shorter than that of SSD. The use of different output activation functions seems not to matter much.

Emmanuel Okafor, Gerard Berendsen, Lambert Schomaker, Marco Wiering

SPSA for Layer-Wise Training of Deep Networks

Concerned with neural learning without backpropagation, we investigate variants of the simultaneous perturbation stochastic approximation (SPSA) algorithm. Experimental results suggest that these allow for the successful training of deep feed-forward neural networks using forward passes only. In particular, we find that SPSA-based algorithms which update network parameters in a layer-wise manner are superior to variants which update all weights simultaneously.

Benjamin Wulff, Jannis Schuecker, Christian Bauckhage

Dipolar Data Aggregation in the Context of Deep Learning

Separable data aggregation processes can be analyzed and realized with models of multilayer neuronal networks. Deep learning techniques can be engaged in forming hierarchical neuronal structures with such powerful properties.Data processing through hierarchical, multilayer structure may result in a replacement of many feature vectors of the same category by a single output vector in an upper layer. Separable data aggregation in the dipolar layers of binary classifiers allows reaching such goal.

Leon Bobrowski, Magdalena Topczewska

Video Surveillance of Highway Traffic Events by Deep Learning Architectures

In this paper we describe a video surveillance system able to detect traffic events in videos acquired by fixed videocameras on highways. The events of interest consist in a specific sequence of situations that occur in the video, as for instance a vehicle stopping on the emergency lane. Hence, the detection of these events requires to analyze a temporal sequence in the video stream. We compare different approaches that exploit architectures based on Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). A first approach extracts vectors of features, mostly related to motion, from each video frame and exploits a RNN fed with the resulting sequence of vectors. The other approaches are based directly on the sequence of frames, that are eventually enriched with pixel-wise motion information. The obtained stream is processed by an architecture that stacks a CNN and a RNN, and we also investigate a transfer-learning-based model. The results are very promising and the best architecture will be tested online in real operative conditions.

Matteo Tiezzi, Stefano Melacci, Marco Maggini, Angelo Frosini

Augmenting Image Classifiers Using Data Augmentation Generative Adversarial Networks

Effective training of neural networks requires much data. In the low-data regime, parameters are underdetermined, and learnt networks generalise poorly. Data Augmentation alleviates this by using existing data more effectively, but standard data augmentation produces only limited plausible alternative data. Given the potential to generate a much broader set of augmentations, we design and train a generative model to do data augmentation. The model, based on image conditional Generative Adversarial Networks, uses data from a source domain and learns to take a data item and augment it by generating other within-class data items. As this generative process does not depend on the classes themselves, it can be applied to novel unseen classes. We demonstrate that a Data Augmentation Generative Adversarial Network (DAGAN) augments classifiers well on Omniglot, EMNIST and VGG-Face.

Antreas Antoniou, Amos Storkey, Harrison Edwards

DeepEthnic: Multi-label Ethnic Classification from Face Images

Ethnic group classification is a well-researched problem, which has been pursued mainly during the past two decades via traditional approaches of image processing and machine learning. In this paper, we propose a method of classifying an image face into an ethnic group by applying transfer learning from a previously trained classification network for large-scale data recognition. Our proposed method yields state-of-the-art success rates of 99.02%, 99.76%, 99.2%, and 96.7%, respectively, for the four ethnic groups: African, Asian, Caucasian, and Indian.

Katia Huri, Eli (Omid) David, Nathan S. Netanyahu

Handwriting-Based Gender Classification Using End-to-End Deep Neural Networks

Handwriting-based gender classification is a well-researched problem that has been approached mainly by traditional machine learning techniques. In this paper, we propose a novel deep learning-based approach for this task. Specifically, we present a convolutional neural network (CNN), which performs automatic feature extraction from a given handwritten image, followed by classification of the writer’s gender. Also, we introduce a new dataset of labeled handwritten samples, in Hebrew and English, of 405 participants. Comparing the gender classification accuracy on this dataset against human examiners, our results show that the proposed deep learning-based approach is substantially more accurate than that of humans.

Evyatar Illouz, Eli (Omid) David, Nathan S. Netanyahu

A Deep Learning Approach for Sentiment Analysis in Spanish Tweets

Sentiment Analysis at Document Level is a well-known problem in Natural Language Processing (NLP), being considered as a reference in NLP, over which new architectures and models are tested in order to compare metrics that are also referents in other issues. This problem has been solved in good enough terms for English language, but its metrics are still quite low in other languages. In addition, architectures which are successful in a language do not necessarily works in another. In the case of Spanish, data quantity and quality become a problem during data preparation and architecture design, due to the few labeled data available including not-textual elements (like emoticons or expressions). This work presents an approach to solve the sentiment analysis problem in Spanish tweets and compares it with the state of art. To do so, a preprocessing algorithm is performed based on interpretation of colloquial expressions and emoticons, and trivial words elimination. Processed sentences turn into matrices using the 3 most successful methods of word embeddings (GloVe, FastText and Word2Vec), then the 3 matrices merge into a 3-channels matrix which is used to feed our CNN-based model. The proposed architecture uses parallel convolution layers as k-grams, by this way the value of each word and their contexts are weighted, to predict the sentiment polarity among 4 possible classes. After several tests, the optimal tuple which improves the accuracy were <1, 2>. Finally, our model presents %61.58 and %71.14 of accuracy in InterTASS and General Corpus respectively.

Gerson Vizcarra, Antoni Mauricio, Leonidas Mauricio

Location Dependency in Video Prediction

Deep convolutional neural networks are used to address many computer vision problems, including video prediction. The task of video prediction requires analyzing the video frames, temporally and spatially, and constructing a model of how the environment evolves. Convolutional neural networks are spatially invariant, though, which prevents them from modeling location-dependent patterns. In this work, the authors propose location-biased convolutional layers to overcome this limitation. The effectiveness of location bias is evaluated on two architectures: Video Ladder Network (VLN) and Convolutional Predictive Gating Pyramid (Conv-PGP). The results indicate that encoding location-dependent features is crucial for the task of video prediction. Our proposed methods significantly outperform spatially invariant models.

Niloofar Azizi, Hafez Farazi, Sven Behnke

Brain Neurocomputing Modeling

Frontmatter

State-Space Analysis of an Ising Model Reveals Contributions of Pairwise Interactions to Sparseness, Fluctuation, and Stimulus Coding of Monkey V1 Neurons

In this study, we analyzed the activity of monkey V1 neurons responding to grating stimuli of different orientations using inference methods for a time-dependent Ising model. The method provides optimal estimation of time-dependent neural interactions with credible intervals according to the sequential Bayes estimation algorithm. Furthermore, it allows us to trace dynamics of macroscopic network properties such as entropy, sparseness, and fluctuation. Here we report that, in all examined stimulus conditions, pairwise interactions contribute to increasing sparseness and fluctuation. We then demonstrate that the orientation of the grating stimulus is in part encoded in the pairwise interactions of the neural populations. These results demonstrate the utility of the state-space Ising model in assessing contributions of neural interactions during stimulus processing.

Jimmy Gaudreault, Hideaki Shimazaki

Sparse Coding Predicts Optic Flow Specifities of Zebrafish Pretectal Neurons

Zebrafish pretectal neurons exhibit specificities for large-field optic flow patterns associated with rotatory or translatory body motion. We investigate the hypothesis that these specificities reflect the input statistics of natural optic flow. Realistic motion sequences were generated using computer graphics simulating self-motion in an underwater scene. Local retinal motion was estimated with a motion detector and encoded in four populations of directionally tuned retinal ganglion cells, represented as two signed input variables. This activity was then used as input into one of two learning networks: a sparse coding network (competitive learning) and backpropagation network (supervised learning). Both simulations develop specificities for optic flow which are comparable to those found in a neurophysiological study [8], and relative frequencies of the various neuronal responses are best modeled by the sparse coding approach. We conclude that the optic flow neurons in the zebrafish pretectum do reflect the optic flow statistics. The predicted vectorial receptive fields show typical optic flow fields but also “Gabor” and dipole-shaped patterns that likely reflect difference fields needed for reconstruction by linear superposition.

Gerrit A. Ecke, Fabian A. Mikulasch, Sebastian A. Bruijns, Thede Witschel, Aristides B. Arrenberg, Hanspeter A. Mallot

Brain-Machine Interface for Mechanical Ventilation Using Respiratory-Related Evoked Potential

The correct ventilation for patients in intensive care units plays a critical role for the prognostic and the recovery during the stay in the hospital. Desynchronization between the ventilator and the patient is an important source of stress, emphasized by the lack of communication due to intubation or loss of consciousness. This contribution proposes a novel approach based on electroencephalographic (EEG) activity to detect breathing effort. Relying both on recent neuroscience finding on respiratory-related evoked potential and on latest development of information geometry, the proposed approach elaborates on Riemannian distances between EEG covariance matrices to differentiate among different respiratory loads. The results demonstrate that this approach outperform existing state-of-the-art methods quantitatively, in terms of mean accuracy, and qualitatively, being able to predict level of breathing discomfort.

Sylvain Chevallier, Guillaume Bao, Mayssa Hammami, Fabienne Marlats, Louis Mayaud, Djillali Annane, Frédéric Lofaso, Eric Azabou

Effectively Interpreting Electroencephalogram Classification Using the Shapley Sampling Value to Prune a Feature Tree

Identifying the features that contribute to classification using machine learning remains a challenging problem in terms of the interpretability and computational complexity of the endeavor. Especially in electroencephalogram (EEG) medical applications, it is important for medical doctors and patients to understand the reason for the classification. In this paper, we thus propose a method to quantify contributions of interpretable EEG features on classification using the Shapley sampling value (SSV). In addition, a pruning method is proposed to reduce the SSV computation cost. The pruning is conducted on an EEG feature tree, specifically at the sensor (electrode) level, frequency-band level, and amplitude-phase level. If the contribution of a feature at a high level (e.g., sensor level) is very small, the contributions of features at a lower level (e.g., frequency-band level) should also be small. The proposed method is verified using two EEG datasets: classification of sleep states, and screening of alcoholics. The results show that the method reduces the SSV computational complexity while maintaining high SSV accuracy. Our method will thus increase the importance of data-driven approaches in EEG analysis.

Kazuki Tachikawa, Yuji Kawai, Jihoon Park, Minoru Asada

EEG-Based Person Identification Using Rhythmic Brain Activity During Sleep

In this paper we present a novel approach to the person identification problem using rhythmic brain activity of spindles from whole night EEG recordings. The proposed system consists of a feature extraction module and a K-NN based classifier. Different types of features from time, frequency and wavelet domain are used to highlight the topographic, temporal, morphological, spectral and statistical discriminative information of sleep spindles. The feature set’s efficacy is exhaustively tested in order to find the most significant descriptors that maximize intra-subject separability. Extensive experiments resulted in the optimal number of sensors and features that must be used to form the subject-specific unique descriptors. The proposed system showed significant identification accuracy of 99% ~ 90% for 2–20 subjects, and not lower than 86% when identifying 28 persons, indicating that this new type of modality should be further investigated to be used in EEG based identification applications.

Athanasios Koutras, George K. Kostopoulos

An STDP Rule for the Improvement and Stabilization of the Attractor Dynamics of the Basal Ganglia-Thalamocortical Network

The basal ganglia-thalamocortical (BGT) network has been investigated for many years, in particular in relation to disorders of the motor system and of the sleep-waking cycle. Its attractor dynamics is related to significant aspects of processing and coding of information, the most important of which being associative memories. The consideration of a simplified Boolean model of the BGT network allows for an exhaustive analysis of its attractor dynamics. In this context, it has been shown that both global and local changes in the synaptic weights could strongly influence the attractor-based complexity of the network. We propose a novel adaptive spike-timing dependent plasticity (STDP) rule which allows the network to improve and stabilize its attractor complexity during its computational process. The rule is based on an adaptive learning rate which varies according to the attractor dynamics that the network continuously visits.

Jérémie Cabessa, Alessandro E. P. Villa

Neuronal Asymmetries and Fokker-Planck Dynamics

Much of our recent work regards the development of schematic, neurocomputational models based on memory associativity to describe some processes associated with basic structures of mental functioning, such as neurosis, creativity, consciousness/unconsciousness, and psychoses. We have emphasized associative memory mechanisms, since they are central in the description of these processes by psychodynamical theories. In memory neural networks, such as the Hopfield or Boltzmann Machine models, the symmetry of synaptic connections is a condition for the existence of stationary states, although this assumption is biologically unrealistic. Many efforts to model stationary states of networks with asymmetric weights are mathematically complex and can usually be applied only to specific cases. We thus further explore a possible new approach to the asymmetry problem, based on studies of some characteristics of the behavior of these networks, which may be modeled by the Fokker-Planck formalism. Besides considering asymmetric interactions, we also relaxed other symmetries of our previous models, enriching the concomitant dynamics. Among other things, we identified the presence of limit cycles.

Vitor Tocci F. de Luca, Roseli S. Wedemann, Angel R. Plastino

Robotics/Motion Detection

Frontmatter

Learning-While Controlling RBF-NN for Robot Dynamics Approximation in Neuro-Inspired Control of Switched Nonlinear Systems

Radial Basis Function-Neural Networks are well-established function approximators. This paper presents an adaptive Gaussian RBF-NN with an extended learning-while controlling behaviour. The weights, function centres and widths are updated online based on a sliding mode control element. In this way, the need for fixing parameters a priori is overcome and the network is able to adapt to dynamically changing systems. The aim of this work is to present an extended adaptive neuro-controller for trajectory tracking of serial robots with unknown dynamics. The adaptive RBF-NN is used to approximate the unknown robot manipulator dynamics-function. It is combined with a conventional controller and a bio-inspired extension for the control of a robot in the presence of switching constraints and discontinuous inputs. The controller-extension increases the robustness and adaptability of the system. Its learned goal-directed output results from the complementary action of an actuator, A, and a preventer, P. The trigger is an incentive, I, based on the weighted perception of the environment. The concept is validated through simulations and implementation on a KUKA LWR4-robot.

Sophie Klecker, Bassem Hichri, Peter Plapper

A Feedback Neural Network for Small Target Motion Detection in Cluttered Backgrounds

Small target motion detection is critical for insects to search for and track mates or prey which always appear as small dim speckles in the visual field. A class of specific neurons, called small target motion detectors (STMDs), has been characterized by exquisite sensitivity for small target motion. Understanding and analyzing visual pathway of STMD neurons are beneficial to design artificial visual systems for small target motion detection. Feedback loops have been widely identified in visual neural circuits and play an important role in target detection. However, if there exists a feedback loop in the STMD visual pathway or if a feedback loop could significantly improve the detection performance of STMD neurons, is unclear. In this paper, we propose a feedback neural network for small target motion detection against naturally cluttered backgrounds. In order to form a feedback loop, model output is temporally delayed and relayed to previous neural layer as feedback signal. Extensive experiments showed that the significant improvement of the proposed feedback neural network over the existing STMD-based models for small target motion detection.

Hongxin Wang, Jigen Peng, Shigang Yue

De-noise-GAN: De-noising Images to Improve RoboCup Soccer Ball Detection

A moving robot or moving camera causes motion blur in the robot’s vision and distorts recorded images. We show that motion blur, differing lighting, and other distortions heavily affect the object localization performance of deep learning architectures for RoboCup Humanoid Soccer scenes. The paper proposes deep conditional generative models to apply visual noise filtering. Instead of generating new samples for a specific domain our model is constrained by reconstructing RoboCup soccer images. The conditional DCGAN (deep convolutional generative adversarial network) works semi-supervised. Thus there is no need for labeled training data. We show that object localization architectures significantly drop in accuracy when supplied with noisy input data and that our proposed model can significantly increase the accuracy again.

Daniel Speck, Pablo Barros, Stefan Wermter

Integrative Collision Avoidance Within RNN-Driven Many-Joint Robot Arms

Robot arm control and motion planning in dynamically changing environments is a challenging task. It requires an adaptive planning algorithm that generates solutions on-the-fly, incorporating the current environmental conditions. This paper explores an alternative approach. Adaptive planning is realized in a generative Recurrent Neural Network (RNN) architecture, which produces goal-directed motor commands by means of active-inference-based, model-predictive control. As the main contribution, in this paper we show how to integrate local collision avoidance gradients into the active inference process. The result is a control mechanism that avoids arm collisions while concurrently pursuing arm goal poses. The RNN processes embodied, sensorimotor dynamics into which proximity signals from locally embedded distance sensors are injected at the respective joint locations. We demonstrate that a 3D trunk-like many-joint robot arm with up to 80 articulated degrees of freedom (DoF) can maneuver collision-free even through very challenging, dynamic obstacle constellations, evading potential collision sources while pursuing goal-directed arm pose and end-effector control.

Sebastian Otte, Lea Hofmaier, Martin V. Butz

An Improved Block-Matching Algorithm Based on Chaotic Sine-Cosine Algorithm for Motion Estimation

Motion estimation (ME) plays an important role in a video coding solution to achieve a low bit rate. The selection of the optimal motion vector (MV) has a significant impact on the quality of the compressed video. Block-matching (BM) algorithm is one of the widely accepted ME techniques to estimate the motion between the successive frames. In any BM technique, the motion vectors (MVs) are obtained for the current frame over a pre-defined search region in the previous frame by minimizing certain matching criterion. However, the computation of these matching criteria is highly expensive (in terms of the computational time). Hence, the block-based ME (BME) can be realized as an optimization problem which aims at finding the best-matched block within a specified search region. In this context, an improved block-matching technique is proposed that incorporates a chaotic-based sine-cosine optimization algorithm along with a fitness approximation (FA) strategy. The proposed approach has been compared with several other BM techniques in terms of different parameters, namely, the peak-signal-to-noise-ratio (PSNR), PSNR degradation ratio ( $$D_{PSNR}$$ D PSNR ), and the number of search points. The analysis of the results obtained demonstrates that the proposed method yields potential improvements over other competent schemes.

Bodhisattva Dash, Suvendu Rup

Terrain Classification with Crawling Robot Using Long Short-Term Memory Network

Terrain classification is a crucial feature for mobile robots operating across multiple terrains. One way to learn a terrain classifier is to use a stream of labeled proprioceptive data recorded during a terrain traversal. In this paper, we propose a new terrain classifier that combines a feature extraction from a data stream with the long short-term memory (LSTM) network. Features are extracted from the information-sparse data stream by applying a sliding window computing three central moments. The feature sequence is continuously classified by the LSTM network into multiple terrain classes. Furthermore, a modified bagging method is used to deal with a limited and unbalanced training set. In comparison to the previous work on terrain classifiers for a hexapod crawling robot using only servo-drive feedback, the proposed classifier provides continuous classification with the F1 score up to 0.88, and thus provide better results than SVM classifier learned on the same input data.

Rudolf J. Szadkowski, Jan Drchal, Jan Faigl

Mass-Spring Damper Array as a Mechanical Medium for Computation

Recently, it has been reported that the dynamics of mechanical structures can be used as a computational resource—also referred to as morphological computation. In particular soft materials have been shown to have the potential to be used for time series forecasting. Although most soft materials can be modeled by mass-spring systems, a limited number of researches has been performed on the computational capabilities of such systems. In this paper, we propose an array of masses linked in a grid-like structure by spring-damper connections to investigate systematically the influence of structural (size) and dynamic (stiffness, damping) parameters on the computational capabilities for time series forecasting. In addition, such a structure gives us a good approximation of two-dimensional elastic media, e.g., a rubber sheet, and therefore a direct pathway to potentially implement results in a real system. In particular, we compared the mass-spring array to echo state networks, which are standard machine learning techniques for this kind of problems and are also closely related to the underlying theoretical models applied when exploiting mechanical structures for computation. Our results suggest a clear connection of morphological features to computational capabilities.

Yuki Yamanaka, Takaharu Yaguchi, Kohei Nakajima, Helmut Hauser

Kinematic Estimation with Neural Networks for Robotic Manipulators

In this paper, we focus on estimating the forward kinematic equation of robots with multilayer feed-forward neural networks. The effectiveness of this approach is tested on a simulated kinematic model of the 7-DOF Sawyer Robotic Arm. In the initial sections of the paper, we discuss related work that associates with the creation of model agnostic control schemes on a kinematic level. Moreover, we formalize the kinematic problem as a supervised problem and we propose an MLP architecture to solve the problem. Lastly, we present experimental results and discuss the potential and importance to create model agnostic control schemes with machine learning.

Michail Theofanidis, Saif Iftekar Sayed, Joe Cloud, James Brady, Fillia Makedon

Social Media

Frontmatter

Hierarchical Attention Networks for User Profile Inference in Social Media Systems

User profile inference, which aims to portray a user in detail, is one of fundamental tasks in social network analysis. Existing works still suffer from the difficulty in modeling user’s explicit attributes and social links, which is mainly caused by the text diversity and complex community structures. In this paper, we propose a hierarchical attention neural network to infer users’ missing attributes, which handles the user representation integrating both explicit personal information and social links. The core module is a hierarchical recurrent neural network which encodes both attribute-level and user-level information, and the attention mechanism can adaptively render different attributes and users with different weights. Extensive empirical studies are conducted on two real-world datasets. The experimental results show that our model prominently outperform other comparative deep models in predicting multi-value attributes (especially occupation), verify the effect of using user social links, and reveal different effects of different attention mechanism.

Zhezhou Kang, Xiaoxue Li, Yanan Cao, Yanmin Shang, Yanbing Liu, Li Guo

A Topological k-Anonymity Model Based on Collaborative Multi-view Clustering

Data anonymization is the process of de-identifying sensitive data while preserving its format and data type. The masked data can be a realistic or a random sequence of data, dependent on the technique used for anonymization. Individual privacy can be at risk if a published data set is not properly de-identified. The most known approach of anonymization is k-anonymity that can be viewed as clustering with a constraint of k minimum objects in every cluster. In this paper, we propose a new anonymization approach based on multi-view topological collaborative clustering. The proposed method has the advantage of detecting the k level automatically. The aim of collaborative clustering is to reveal the common structure of data using different views on variables, it allows to take into account other knowledges without recourse to the data in an unsupervised learning frame. The proposed approach has been validated on several data sets, and experimental results have shown very promising performance.

Sarah Zouinina, Nistor Grozavu, Younès Bennani, Abdelouahid Lyhyaoui, Nicoleta Rogovschi

A Credibility-Based Analysis of Information Diffusion in Social Networks

Social networks have many advantages and they are very popular. The number of people having at least one account on a certain social network has grown considerably. Social networks allow people to connect and interact more easily with one another, leading to a much easier way to obtain information. However one major disadvantage of social networks is that some information may be untrue. In this paper we propose a protocol in which the network becomes more immune to the diffusion of false information. Our approach is based on evidence theory with Dempster-Shafer and Yager’s rule which plays an important role in an individual’s decision whether to send further the received information or not. We also took into consideration the confidence degree of the neighbours regarding the information which is spread by a specific source node. Furthermore, we propose a simulation algorithm that allows us to observe the diffusion of two contradictory information spread by two different source nodes. The experimental results show that the true information spreads more easily if the ground truth is sometimes revealed, even rarely.

Sabina-Adriana Floria, Florin Leon, Doina Logofătu

Backmatter

Titel: Artificial Neural Networks and Machine Learning – ICANN 2018
herausgegeben von: Věra Kůrková
Prof. Yannis Manolopoulos
Barbara Hammer
Lazaros Iliadis
Ilias Maglogiannis
Verlag: Springer International Publishing
Electronic ISBN: 978-3-030-01424-7
Print ISBN: 978-3-030-01423-0
DOI: https://doi.org/10.1007/978-3-030-01424-7