Skip to main content

Über dieses Buch

This three-volume set LNCS 11139-11141 constitutes the refereed proceedings of the 27th International Conference on Artificial Neural Networks, ICANN 2018, held in Rhodes, Greece, in October 2018.

The papers presented in these volumes was carefully reviewed and selected from total of 360 submissions. They are related to the following thematic topics: AI and Bioinformatics, Bayesian and Echo State Networks, Brain Inspired Computing, Chaotic Complex Models, Clustering, Mining, Exploratory Analysis, Coding Architectures, Complex Firing Patterns, Convolutional Neural Networks, Deep Learning (DL), DL in Real Time Systems, DL and Big Data Analytics, DL and Big Data, DL and Forensics, DL and Cybersecurity, DL and Social Networks, Evolving Systems – Optimization, Extreme Learning Machines, From Neurons to Neuromorphism, From Sensation to Perception, From Single Neurons to Networks, Fuzzy Modeling, Hierarchical ANN, Inference and Recognition, Information and Optimization, Interacting with The Brain, Machine Learning (ML), ML for Bio Medical systems, ML and Video-Image Processing, ML and Forensics, ML and Cybersecurity, ML and Social Media, ML in Engineering, Movement and Motion Detection, Multilayer Perceptrons and Kernel Networks, Natural Language, Object and Face Recognition, Recurrent Neural Networks and Reservoir Computing, Reinforcement Learning, Reservoir Computing, Self-Organizing Maps, Spiking Dynamics/Spiking ANN, Support Vector Machines, Swarm Intelligence and Decision-Making, Text Mining, Theoretical Neural Computation, Time Series and Forecasting, Training and Learning.



CNN/Natural Language


Fast CNN Pruning via Redundancy-Aware Training

The heavy storage and computational overheads have become a hindrance to the deployment of modern Convolutional Neural Networks (CNNs). To overcome this drawback, many works have been proposed to exploit redundancy within CNNs. However, most of them work as post-training processes. They start from pre-trained dense models and apply compression and extra fine-tuning. The overall process is time-consuming. In this paper, we introduce redundancy-aware training, an approach to learn sparse CNNs from scratch with no need for any post-training compression procedure. In addition to minimizing training loss, redundancy-aware training prunes unimportant weights for sparse structures in the training phase. To ensure stability, a stage-wise pruning procedure is adopted, which is based on carefully designed model partition strategies. Experiment results show redundancy-aware training can compress LeNet-5, ResNet-56 and AlexNet by a factor of $$43.8\times $$ , $$7.9\times $$ and $$6.4\times $$ , respectively. Compared to state-of-the-art approaches, our method achieves similar or higher sparsity while consuming significantly less time, e.g., 2.3 $$\times $$ –18 $$\times $$ more efficient in terms of time.

Xiao Dong, Lei Liu, Guangli Li, Peng Zhao, Xiaobing Feng

Two-Stream Convolutional Neural Network for Multimodal Matching

Mulitimudal matching aims to establish relationship across different modalities such as image and text. Existing works mainly focus on maximizing the correlation between feature vectors extracted from the off-the-shelf models. The feature extraction and the matching are two-stage learning process. This paper presents a novel two-stream convolutional neural network that integrates the feature extraction and the matching under an end-to-end manner. Visual and textual stream are designed for feature extraction and then are concatenated with multiple shared layers for multimodal matching. The network is trained using an extreme multiclass classification loss by viewing each multimodal data as a class. Then a finetuning step is performed by a ranking constraint. Experimental results on Flickr30k datasets demonstrate the effectiveness of the proposed network for multimodal matching.

Youcai Zhang, Yiwei Gu, Xiaodong Gu

Kernel Graph Convolutional Neural Networks

Graph kernels have been successfully applied to many graph classification problems. Typically, a kernel is first designed, and then an SVM classifier is trained based on the features defined implicitly by this kernel. This two-stage approach decouples data representation from learning, which is suboptimal. On the other hand, Convolutional Neural Networks (CNNs) have the capability to learn their own features directly from the raw data during training. Unfortunately, they cannot handle irregular data such as graphs. We address this challenge by using graph kernels to embed meaningful local neighborhoods of the graphs in a continuous vector space. A set of filters is then convolved with these patches, pooled, and the output is then passed to a feedforward network. With limited parameter tuning, our approach outperforms strong baselines on 7 out of 10 benchmark datasets. Code and data are publicly available ( ).

Giannis Nikolentzos, Polykarpos Meladianos, Antoine Jean-Pierre Tixier, Konstantinos Skianis, Michalis Vazirgiannis

A Histogram of Oriented Gradients for Broken Bars Diagnosis in Squirrel Cage Induction Motors

The three-phase induction motors are widely used in a lot of applications both industry and other environments. Although this electrical machine is robust and reliable for industrial tasks, for example, conditioning monitoring techniques have been investigated during the last years to identify some electrical and mechanical faults in induction motors. In this sense, broken rotor bars is a typical fault related to the induction machine damage and the current technical solutions have shown some drawbacks for this kind of failure diagnosis, particularly when motor is running at very low slip. Therefore, this paper proposes a new use of Histogram of Oriented Gradients, usually applied in computer vision and image processing, for broken bars detection, using data from only one phase of the stator current of the machine. The intensity gradients and edge directions of each time-window of the stator signal have been applied as inputs for a neural network classifier. This method has been validated using some experimental data from a 7.5 kW squirrel cage induction machine running at distinct load levels (slip conditions).

Luiz C. Silva, Cleber G. Dias, Wonder A. L. Alves

Learning Game by Profit Sharing Using Convolutional Neural Network

In this paper, Profit Sharing using convolutional neural network is realized. In the proposed method, action value in Profit Sharing is learned by convolutional neural network. This is a method that learns the value function of Profit Sharing instead of the value function of Q Learning used in the Deep Q-Network. By changing to an error function based on the value function of Profit Sharing which can acquire probabilistic policy in a shorter time, the proposed method is able to learn in a shorter time than the conventional Deep Q-Network. Computer experiments were carried out on Asterix of Atari 2600, and the proposed method was compared with the conventional Deep Q-Network. As a result, we confirmed that the proposed method can learn from the earlier stage than Deep Q-Network and can obtain higher score finally.

Nobuaki Hasuike, Yuko Osana

Detection of Fingerprint Alterations Using Deep Convolutional Neural Networks

Fingerprint alteration is a challenge that poses enormous security risks. As a result, many research efforts in the scientific community have attempted to address this issue. However, non-existence of publicly available datasets that contain obfuscation and distortion of fingerprints makes it difficult to identify the type of alteration. In this work we present the publicly available Sokoto-Coventry Fingerprints Dataset (SOCOFing), which provides ten fingerprints for 600 different subjects, as well as gender, hand and finger name for each image, among other unique characteristics. We also provide a total of 55,249 images with three levels of alteration for Z-cut, obliteration and central rotation synthetic alterations, which are the most common types of obfuscation and distortion. In addition, this paper proposes a Convolutional Neural Network (CNN) to identify these alterations. The proposed CNN model achieves a classification accuracy rate of 98.55%. Results are also compared with a residual CNN model pre-trained on ImageNet, which produces an accuracy of 99.88%.

Yahaya Isah Shehu, Ariel Ruiz-Garcia, Vasile Palade, Anne James

A Convolutional Neural Network Approach for Modeling Semantic Trajectories and Predicting Future Locations

In recent years, Location Based Service (LBS) providers rely increasingly on predictive models in order to offer their users timely and tailored solutions. Current location prediction algorithms go beyond using plain location data and show that additional context information can lead to a higher performance. Moreover, it has been shown that using semantics and projecting GPS trajectories on so called semantic trajectories can further improve the model. At the same time, Artificial Neural Networks (ANNs) have been proven to be very reliable when it comes to modeling and predicting time series. Recurrent network architectures show a particularly good performance. However, very little research has been done on the use of Convolutional Neural Networks (CNNs) in connection with modeling human movement patterns. In this work, we introduce a CNN-based approach for representing semantic trajectories and predicting future locations. Furthermore, we included an additional embedding layer to raise the efficiency. In order to evaluate our approach, we use the MIT Reality Mining dataset and use a Feed-Forward (FFNN) -, a Recurrent (RNN) - and a LSTM network to compare it with on two different semantic trajectory levels. We show that CNNs are more than capable of handling semantic trajectories, while providing high prediction accuracies at the same time.

Antonios Karatzoglou, Nikolai Schnell, Michael Beigl

Neural Networks for Multi-lingual Multi-label Document Classification

This paper proposes a novel approach for multi-lingual multi-label document classification based on neural networks. We use popular convolutional neural networks for this task with three different configurations. The first one uses static word2vec embeddings that are let as is, while the second one initializes it with word2vec and fine-tunes the embeddings while learning on the available data. The last method initializes embeddings randomly and then they are optimized to the classification task. The proposed method is evaluated on four languages, namely English, German, Spanish and Italian from the Reuters corpus. Experimental results show that the proposed approach is efficient and the best obtained F-measure reaches 84%.

Jiří Martínek, Ladislav Lenc, Pavel Král

Multi-region Ensemble Convolutional Neural Network for Facial Expression Recognition

Facial expressions play an important role in conveying the emotional states of human beings. Recently, deep learning approaches have been applied to image recognition field due to the discriminative power of Convolutional Neural Network (CNN). In this paper, we first propose a novel Multi-Region Ensemble CNN (MRE-CNN) framework for facial expression recognition, which aims to enhance the learning power of CNN models by capturing both the global and the local features from multiple human face sub-regions. Second, the weighted prediction scores from each sub-network are aggregated to produce the final prediction of high accuracy. Third, we investigate the effects of different sub-regions of the whole face on facial expression recognition. Our proposed method is evaluated based on two well-known publicly available facial expression databases: AFEW 7.0 and RAF-DB, and has been shown to achieve the state-of-the-art recognition accuracy.

Yingruo Fan, Jacqueline C. K. Lam, Victor O. K. Li

Further Advantages of Data Augmentation on Convolutional Neural Networks

Data augmentation is a popular technique largely used to enhance the training of convolutional neural networks. Although many of its benefits are well known by deep learning researchers and practitioners, its implicit regularization effects, as compared to popular explicit regularization techniques, such as weight decay and dropout, remain largely unstudied. As a matter of fact, convolutional neural networks for image object classification are typically trained with both data augmentation and explicit regularization, assuming the benefits of all techniques are complementary. In this paper, we systematically analyze these techniques through ablation studies of different network architectures trained with different amounts of training data. Our results unveil a largely ignored advantage of data augmentation: networks trained with just data augmentation more easily adapt to different architectures and amount of training data, as opposed to weight decay and dropout, which require specific fine-tuning of their hyperparameters.

Alex Hernández-García, Peter König

DTI-RCNN: New Efficient Hybrid Neural Network Model to Predict Drug–Target Interactions

Drug-target interactions (DTIs) are a critical step in the technology of new drugs discovery and drug repositioning. Various computational algorithms have been developed to discover new DTIs, whereas the prediction accuracy is not very satisfactory. Most existing computational methods are based on homogeneous networks or on integrating multiple data sources, without considering the feature associations between gene and drug data. In this paper, we proposed a deep-learning-based hybrid model, DTI-RCNN, which integrates long short term memory (LSTM) networks with convolutional neural network (CNN) to further improve DTIs prediction accuracy using the drug data and gene data. First, we extracted potential semantic information between gene data and drug data via a LSTM network. We then constructed a CNN to extract the loci knowledge in the LSTM outputs. Finally, a fully connected network was used for prediction. The results comparison shows that the proposed model exhibits better performance. More importantly, DTI-RCNN is stable and efficient in predicting novel DTIs. Therefore, it should help select candidate DTIs, and further promote the development of drug repositioning.

Xiaoping Zheng, Song He, Xinyu Song, Zhongnan Zhang, Xiaochen Bo

Hierarchical Convolution Neural Network for Emotion Cause Detection on Microblogs

Emotion cause detection which recognizes the cause of an emotion in microblogs is a challenging research issue in Natural Language Processing field. In this paper, we propose a hierarchical Convolution Neural Network (Hier-CNN) for emotion cause detection. Our Hier-CNN model deals with the feature sparse problem through a clause-level encoder, and handles the less event-based information problem by a subtweet-level encoder. In the clause-level encoder, the representation of a word is augmented with its context. In the subtweet-level encoder, the event-based features are extracted in term of microblogs. Experimental results show that our model outperforms several strong baselines and achieves the state-of-the-art performance.

Ying Chen, Wenjun Hou, Xiyao Cheng

Direct Training of Dynamic Observation Noise with UMarineNet

Accurate uncertainty predictions are crucial to assess the reliability of a model, especially for neural networks. Part of this uncertainty is the observation noise, which is dynamic in our marine virtual sensor task. Typically, dynamic noise is not trained directly, but approximated through terms in the loss function. Unfortunately, this noise loss function needs to be scaled by a trade-off-parameter to achieve accurate uncertainties. In this paper we propose an upgrade to the existing architecture, which increases interpretability and introduces a novel direct training procedure for dynamic noise modelling. To that end, we train the point prediction model and the noise model separately. We present a new loss function that requires Monte Carlo runs of the model to directly train for the uncertainty prediction accuracy. In an experimental evaluation, we show that in most tested cases the uncertainty prediction is more accurate than the manually tuned trade-off-parameter. Because of the architectural changes we are able to analyze the importance of individual parts of the time series of our prediction.

Stefan Oehmcke, Oliver Zielinski, Oliver Kramer

Convolutional Soft Decision Trees

Soft decision trees, aka hierarchical mixture of experts, are composed of soft multivariate decision nodes and output-predicting leaves. Previously, they have been shown to work successfully in supervised classification and regression tasks, as well as in training unsupervised autoencoders. This work has two contributions: First, we show that dropout and dropconnect on input units, previously proposed for deep multi-layer neural networks, can also be used with soft decision trees for regularization. Second, we propose a convolutional extension of the soft decision tree with local feature detectors in successive layers that are trained together with the other parameters of the soft decision tree. Our experiments on four image data sets, MNIST, Fashion-MNIST, CIFAR-10 and Imagenet32, indicate improvements due to both contributions.

Alper Ahmetoğlu, Ozan İrsoy, Ethem Alpaydın

A Multi-level Attention Model for Text Matching

Text matching based on deep learning models often suffer from the limitation of query term coverage problems. Inspired by the success of attention based models in machine translation, which the models can automatically search for parts of a sentence that are relevant to a target word, we propose a multi-level attention model with maximum matching matrix rank to simulate what human does when finding a good answer for a query question. Firstly, we apply a multi-attention mechanism to choose the high effect document words for every query words. Then an approach we called reciprocal relative standard deviation (RRSD) will calculate the matching coverage score for all query words. Experiments on both question-answer task and learning to rank task have achieved state-of-the-art results compared to traditional statistical methods and deep neural network methods.

Qiang Sun, Yue Wu

Attention Enhanced Chinese Word Embeddings

We introduce a new Chinese word embeddings method called AWE by utilizing attention mechanism to enhance Mikolov’s CBOW. Considering the shortcomings of existing word representation methods, we improve CBOW in two aspects. Above all, the context vector in CBOW is obtained by simply averaging the representation of the surrounding words while our AWE model aligns the surrounding words with the central word by global attention mechanism and self attention mechanism. Moreover, CBOW is a bag-of-word model which ignores the order of surrounding words, and this paper uses the position encoding to further enhance AWE and proposes P&AWE. We design both qualitative and quantitative experiments to analyze the effectiveness of the models. Results indicate that the AWE models far exceed the CBOW model, and achieve state-of-the-art performances on the task of word similarity. Last but not least, we also further verify the AWE models through attention visualization and case analysis.

Xingzhang Ren, Leilei Zhang, Wei Ye, Hang Hua, Shikun Zhang

Balancing Convolutional Neural Networks Pipeline in FPGAs

Convolutional Neural Networks (CNNs) have achieved excellent performance in image classification, being successfully applied in a wide range of domains. However, their processing power demand offers a challenge to their implementation in embedded real-time applications. To tackle this problem, we focused in this work on the FPGA acceleration of the convolutional layers, since they account for about 90% of the overall computational load. We implemented buffers to reduce the storage of feature maps and consequently, facilitating the allocation of the whole kernel weights in Block-RAMs (BRAMs). Moreover, we used 8-bits kernel weights, rounded from an already trained CNN, to further reduce the need for memory, storing them in multiple BRAMs to aid kernel loading throughput. To balance the pipeline of convolutions through the convolutional layers we manipulated the amount of parallel computation in the convolutional step in each convolutional layer. We adopted the AlexNet CNN architecture to run our experiments and compare the results. We were able to run the inference of the convolutional layers in 3.9 ms with maximum operation frequency of 76.9 MHz.

Mark Cappello Ferreira de Sousa, Miguel Angelo de Abreu de Sousa, Emilio Del-Moral-Hernandez

Generating Diverse and Meaningful Captions

Unsupervised Specificity Optimization for Image Captioning

Image Captioning is a task that requires models to acquire a multimodal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online ( ).

Annika Lindh, Robert J. Ross, Abhijit Mahalunkar, Giancarlo Salton, John D. Kelleher

Assessing Image Analysis Filters as Augmented Input to Convolutional Neural Networks for Image Classification

Convolutional Neural Networks (CNNs) have been proven very effective in image classification and object recognition tasks, often exceeding the performance of traditional image analysis techniques. However, training a CNN requires very extensive datasets, as well as very high computational burden. In this work, we test the hypothesis that if the input includes the responses of established image analysis filters that detect salient image structures, the CNN should be able to perform better than an identical CNN fed with the plain RGB images only. Thus, we employ a number of families of image analysis filter banks and use their responses to compile a small number of filtered responses for each original RGB image. We perform a large number of CNN training/testing repetitions for a 40-class building recognition problem, on a publicly available image database, using the original images, as well as the original images augmented by the compiled filter responses. Results show that the accuracy achieved by the CNN with the augmented input is consistently higher than that of the RGB image input, both in terms of different repetitions of the execution, as well as throughout the iterations of each repetition.

K. Delibasis, Ilias Maglogiannis, S. Georgakopoulos, K. Kottari, V. Plagianakos



Balanced Cortical Microcircuitry-Based Network for Working Memory

Working memory (WM) is an important part of cognitive activity. The WM system maintains information temporarily to be used in learning and decision-making. Recent studies of WM focused on positive feedback, but positive feedback models require fine tuning of the strength of the feedback and are sensitive to common perturbations. However, different people have different strength of the feedback and it is impossible to let every people have same network parameter. In this research, we proposed a new approach to understanding WM based on the theory that positive and negative feedback are closely balanced in neocortical circuits. Our experimental results demonstrated that the model does not need fine tuning parameter and can achieve the memory storage, memory association, memory updating and memory forgetting. Our proposed negative-derivative feedback model was shown to be more robust to common perturbations than previous models based on positive feedback alone.

Hui Wei, Zihao Su, Dawei Dai

Learning Continuous Muscle Control for a Multi-joint Arm by Extending Proximal Policy Optimization with a Liquid State Machine

There have been many advances in the field of reinforcement learning in continuous control problems. Usually, these approaches use deep learning with artificial neural networks for approximation of policies and value functions. In addition, there have been interesting advances in spiking neural networks, towards a more biologically plausible model of the neurons and the learning mechanisms. We present an approach to learn continuous muscle control of a multi joint arm. We use reinforcement learning for a target reaching task, which can be modeled as partially observable markov decision processes. We extend proximal policy optimization with a liquid state machine (LSM) for state representation to achieve better performance in the target reaching task. The results show that we are able to learn to control the arm after training the readout of the LSM with reinforcement learning. The input current encoding used for encoding the state is enough to have a good projection into a higher dimensional space of the LSM. The results also show that we are able to learn a linear readout, which is equivalent to a one-layer neural network to learn to control the arm. We show that there are clear benefits of training the readouts of a LSM with reinforcement learning. These results can lead to demonstrate the benefits of using a LSM as a drop-in state transformation in general.

Juan Camilo Vasquez Tieck, Marin Vlastelica Pogančić, Jacques Kaiser, Arne Roennau, Marc-Oliver Gewaltig, Rüdiger Dillmann

A Supervised Multi-spike Learning Algorithm for Recurrent Spiking Neural Networks

The recurrent spiking neural networks include complex structures and implicit nonlinear mechanisms, the formulation of efficient supervised learning algorithm is difficult and remains an important problem in the research area. This paper proposes a new supervised multi-spike learning algorithm for recurrent spiking neural networks, which can implement the complex spatiotemporal pattern learning of spike trains. Using information encoded in precisely timed spike trains and their inner product operators, the error function is firstly constructed. Furthermore, the proposed algorithm defines the learning rules of synaptic weights based on inner product of spike trains. The algorithm is successfully applied to learn spike train patterns, and the high learning accuracy and efficiency are shown by the experimental results. In addition, the network structure parameters are analyzed, such as the neuron number and connectivity degree in the recurrent layer of spiking neural networks.

Xianghong Lin, Guoyong Shi

Artwork Retrieval Based on Similarity of Touch Using Convolutional Neural Network

In this paper, we propose an artwork retrieval based on similarity of touch using convolutional neural network. In the proposed system, a convolutional neural network is learned so that images can be classified into a group based on a touch, with saturation and value and the histogram of saturation and value as input data, and the trained network is used to realize the retrieval. Using the learned convolution neural network, feature vectors are generated for all images used for learning. The output of the full-connected layer before the soft-max layer when each image is input is obtained and normalized so that the magnitude becomes 1.0 is used as the feature vector. Then, the image and the normalized feature vector corresponding to the image are associated and stored in the database. A retrieval is realized by inputting an image as a retrieval key to the input layer, generating a feature vector, and comparing it with feature vectors in the database. We carried out a series of computer experiments and confirmed that the proposed system can realize artwork retrieval based on similarity of touch with higher accuracy than the conventional system.

Takayuki Fujita, Yuko Osana

Microsaccades for Neuromorphic Stereo Vision

Depth perception through stereo vision is an important feature of biological and artificial vision systems. While biological systems can compute disparities effortlessly, it requires intensive processing for artificial vision systems. The computing complexity resides in solving the correspondence problem – finding matching pairs of points in the two eyes. Inspired by the retina, event-based vision sensors allow a new constraint to solve the correspondence problem: time. Relying on precise spike-time, spiking neural networks can take advantage of this constraint. However, disparities can only be computed from dynamic environments since event-based vision sensors only report local changes in light intensity. In this paper, we show how microsaccadic eye movements can be used to compute disparities from static environments. To this end, we built a robotic head supporting two Dynamic Vision Sensors (DVS) capable of independent panning and simultaneous tilting. We evaluate the method on both static and dynamic scenes perceived through microsaccades. This paper demonstrates the complementarity of event-based vision sensors and active perception leading to more biologically inspired robots.

Jacques Kaiser, Jakob Weinland, Philip Keller, Lea Steffen, J. Camilo Vasquez Tieck, Daniel Reichard, Arne Roennau, Jörg Conradt, Rüdiger Dillmann

A Neural Spiking Approach Compared to Deep Feedforward Networks on Stepwise Pixel Erasement

In real world scenarios, objects are often partially occluded. This requires a robustness for object recognition against these perturbations. Convolutional networks have shown good performances in classification tasks. The learned convolutional filters seem similar to receptive fields of simple cells found in the primary visual cortex. Alternatively, spiking neural networks are more biological plausible. We developed a two layer spiking network, trained on natural scenes with a biologically plausible learning rule. It is compared to two deep convolutional neural networks using a classification task of stepwise pixel erasement on MNIST. In comparison to these networks the spiking approach achieves good accuracy and robustness.

René Larisch, Michael Teichmann, Fred H. Hamker

Sparsity Enables Data and Energy Efficient Spiking Convolutional Neural Networks

In recent days, deep learning has surpassed human performance in image recognition tasks. A major issue with deep learning systems is their reliance on large datasets for optimal performance. When presented with a new task, generalizing from low amounts of data becomes highly attractive. Research has shown that human visual cortex might employ sparse coding to extract features from the images that we see, leading to efficient usage of available data. To ensure good generalization and energy efficiency, we create a multi-layer spiking convolutional neural network which performs layer-wise sparse coding for unsupervised feature extraction. It is applied on MNIST dataset where it achieves 92.3% accuracy with just 500 data samples, which is 4 $$\times $$ less than what vanilla CNNs need for similar values, while reaching 98.1% accuracy with full dataset. Only around 7000 spikes are used per image (6 $$\times $$ reduction in transferred bits per forward pass compared to CNNs) implying high sparsity. Thus, we show that our algorithm ensures better sparsity, leading to improved data and energy efficiency in learning, which is essential for some real-world applications.

Varun Bhatt, Udayan Ganguly

Design of Spiking Rate Coded Logic Gates for C. elegans Inspired Contour Tracking

Bio-inspired energy efficient control is a frontier for autonomous navigation and robotics. Binary input-output neuronal logic gates are demonstrated in literature – while analog input-output logic gates are needed for continuous analog real-world control. In this paper, we design logic gates such as AND, OR and XOR using networks of Leaky Integrate-and-Fire neurons with analog rate (frequency) coded inputs and output, where refractory period is shown to be a critical knob for neuronal design. To demonstrate our design method, we present contour tracking inspired by the chemotaxis network of the worm C. elegans and demonstrate for the first time an end-to-end Spiking Neural Network (SNN) solution. First, we demonstrate contour tracking with an average deviation equal to literature with non-neuronal logic gates. Second, 2x improvement in tracking accuracy is enabled by implementing latency reduction leading to state of the art performance with an average deviation of 0.55% from the set-point. Third, a new feature of local extrema escape is demonstrated with an analog XOR gate, which uses only 5 neurons – better than binary logic neuronal circuits. The XOR gate demonstrates the universality of our logic scheme. Finally, we demonstrate the hardware feasibility of our network based on experimental results on 32 nm Silicon-on-Insulator (SOI) based artificial neurons with tunable refractory periods. Thus, we present a general framework of analog neuronal control logic along with the feasibility of their implementation in mature SOI technology platform for autonomous SNN navigation controller hardware.

Shashwat Shukla, Sangya Dutta, Udayan Ganguly

Gating Sensory Noise in a Spiking Subtractive LSTM

Spiking neural networks are being investigated both as biologically plausible models of neural computation and also as a potentially more efficient type of neural network. Recurrent neural networks in the form of networks of gating memory cells have been central in state-of-the-art solutions in problem domains that involve sequence recognition or generation. Here, we design an analog Long Short-Term Memory (LSTM) cell where its neurons can be substituted with efficient spiking neurons, where we use subtractive gating (following the subLSTM in [1]) instead of multiplicative gating. Subtractive gating allows for a less sensitive gating mechanism, critical when using spiking neurons. By using fast adapting spiking neurons with a smoothed Rectified Linear Unit (ReLU)-like effective activation function, we show that then an accurate conversion from an analog subLSTM to a continuous-time spiking subLSTM is possible. This architecture results in memory networks that compute very efficiently, with low average firing rates comparable to those in biological neurons, while operating in continuous time.

Isabella Pozzi, Roeland Nusselder, Davide Zambrano, Sander Bohté

Spiking Signals in FOC Control Drive

This paper proposes to apply spiking signals to the control of an AC motor drive at variable speed in real-time experimentation. Innovative theoretical concepts of spiking signal processing (SSP, [1]) is introduced using the $$I_{Na,p}+I_{K}$$ neuron model [7]. Based on SSP concepts, we designed a spiking speed controller inspired by the human movement control. The spiking speed controller is then integrated in the field oriented control (FOC, [13]) topology in order to control an induction drive at various mechanical speed. Experimental results are presented and discussed. This paper demonstrates that spiking signals can be straightforwardly used for electrical engineering applications in real time experimentation based on robust SSP theory.

L. M. Grzesiak, V. Meganck

Spiking Neural Network Controllers Evolved for Animat Foraging Based on Temporal Pattern Recognition in the Presence of Noise on Input

We evolved spiking neural network controllers for simple animats, allowing for these networks to change topologies and weights during evolution. The animats’ task was to discern one correct pattern (emitted from target objects) amongst other different wrong patterns (emitted from distractor objects), by navigating towards targets and avoiding distractors in a 2D world. Patterns were emitted with variable silences between signals of the same pattern in the attempt of creating a state memory. We analyse the network that is able to accomplish the task perfectly for patterns consisting of two signals, with 4 interneurons, maintaining its state (although not infinitely) thanks to the recurrent connections.

Chama Bensmail, Volker Steuber, Neil Davey, Borys Wróbel

Spiking Neural Networks Evolved to Perform Multiplicative Operations

Multiplicative or divisive changes in tuning curves of individual neurons to one stimulus (“input”) as another stimulus (“modulation”) is applied, called gain modulation, play an important role in perception and decision making. Since the presence of modulatory synaptic stimulation results in a multiplicative operation by proportionally changing the neuronal input-output relationship, such a change affects the sensitivity of the neuron but not its selectivity. Multiplicative gain modulation has commonly been studied at the level of single neurons. Much less is known about arithmetic operations at the network level. In this work we have evolved small networks of spiking neurons in which the output neurons respond to input with non-linear tuning curves that exhibit gain modulation—the best network showed an over 3-fold multiplicative response to modulation. Interestingly, we have also obtained a network with only 2 interneurons showing an over 2-fold response.

Muhammad Aamir Khan, Volker Steuber, Neil Davey, Borys Wróbel

Very Small Spiking Neural Networks Evolved for Temporal Pattern Recognition and Robust to Perturbed Neuronal Parameters

We evolve both topology and synaptic weights of recurrent very small spiking neural networks in the presence of noise on the membrane potential. The noise is at a level similar to the level observed in biological neurons. The task of the networks is to recognise three signals in a particular order (a pattern ABC) in a continuous input stream in which each signal occurs with the same probability. The networks consist of adaptive exponential integrate and fire neurons and are limited to either three or four interneurons and one output neuron, with recurrent and self-connections allowed only for interneurons. Our results show that spiking neural networks evolved in the presence of noise are robust to the change of neuronal parameters. We propose a procedure to approximate the range, specific for every neuronal parameter, from which the parameters can be sampled to preserve, at least for some networks, high true positive rate and low false discovery rate. After assigning the state of neurons to states of the network corresponding to states in a finite state transducer, we show that this simple but not trivial computational task of temporal pattern recognition can be accomplished in a variety of ways.

Muhammad Yaqoob, Borys Wróbel

Machine Learning/Autoencoders


Machine Learning to Predict Toxicity of Compounds

Toxicology studies are subject to several concerns, and they raise the importance of an early detection of the potential for toxicity of chemical compounds which is currently evaluated through in vitro assays assessing their bioactivity, or using costly and ethically questionable in vivo tests on animals. Thus we investigate the prediction of the bioactivity of chemical compounds from their physico-chemical structure, and propose that it be automated using machine learning (ML) techniques based on data from in vitro assessment of several hundred chemical compounds. We provide the results of tests with this approach using several ML techniques, using both a restricted dataset and a larger one. Since the available empirical data is unbalanced, we also use data augmentation techniques to improve the classification accuracy, and present the resulting improvements.

Ingrid Grenet, Yonghua Yin, Jean-Paul Comet, Erol Gelenbe

Energy-Based Clustering for Pruning Heterogeneous Ensembles

In this work, an energy-based clustering method is used to prune heterogeneous ensembles. Specifically, the classifiers are grouped according to their predictions in a set of validation instances that are independent from the ones used to build the ensemble. In the empirical evaluation carried out, the cluster that minimizes the error in the validations set, besides reducing computational costs for storage and the prediction times, is almost as accurate as the complete ensemble. Furthermore, it outperforms subensembles that summarize the complete ensemble by including representatives from each of the identified clusters.

Javier Cela, Alberto Suárez

Real-Time Hand Gesture Recognition Based on Electromyographic Signals and Artificial Neural Networks

In this paper, we propose a hand gesture recognition model based on superficial electromyographic signals. The model responds in approximately 29.38 ms (real time) with a recognition accuracy of 90.7%. We apply a sliding window approach using a main window and a sub-window. The sub-window is used to observe a segment of the signal seen through the main window. The model is composed of five blocks: data acquisition, preprocessing, feature extraction, classification and postprocessing. For data acquisition, we use the Myo Armband to measure the electromyographic signals. For preprocessing, we rectify, filter, and detect the muscle activity. For feature extraction, we generate a feature vector using the preprocessed signals values and the results from a bag of functions. For classification, we use a feedforward neural network to label every sub-window observation. Finally, for postprocessing we apply a simple majority voting to label the main window observation.

Cristhian Motoche, Marco E. Benalcázar

Fast Communication Structure for Asynchronous Distributed ADMM Under Unbalance Process Arrival Pattern

The alternating direction method of multipliers (ADMM) is an algorithm for solving large-scale data optimization problems in machine learning. In order to reduce the communication delay in a distributed environment, asynchronous distributed ADMM (AD-ADMM) was proposed. However, due to the unbalance process arrival pattern existing in the multiprocessor cluster, the communication of the star structure used in AD-ADMM is inefficient. Moreover, the load in the entire cluster is unbalanced, resulting in a decrease of the data processing capacity. This paper proposes a hierarchical parameter server communication structure (HPS) and an asynchronous distributed ADMM (HAD-ADMM). The algorithm mitigates the unbalanced arrival problem through process grouping and scattered updating global variable, which basically achieves load balancing. Experiments show that the HAD-ADMM is highly efficient in a large-scale distributed environment and has no significant impact on convergence.

Shuqing Wang, Yongmei Lei

Improved Personalized Rankings Using Implicit Feedback

Most users give feedback through a mixture of implicit and explicit information when interacting with websites. Recommender systems should use both sources of information to improve personalized recommendations. In this paper, it is shown how to integrate implicit feedback information in form of pairwise item rankings into a neural network model to improve personalized item recommendations. The proposed two-sided approach allows the model to be trained even for users where no explicit feedback is available. This is especially useful to alleviate a form of the new user cold-start problem. The experiments indicate an improved predictive performance especially for the task of personalized ranking.

Josef Feigl, Martin Bogdan

Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks

Traditionally, multi-layer neural networks use dot product between the output vector of previous layer and the incoming weight vector as the input to activation function. The result of dot product is unbounded, thus increases the risk of large variance. Large variance of neuron makes the model sensitive to the change of input distribution, thus results in poor generalization, and aggravates the internal covariate shift which slows down the training. To bound dot product and decrease the variance, we propose to use cosine similarity or centered cosine similarity (Pearson Correlation Coefficient) instead of dot product in neural networks, which we call cosine normalization. We compare cosine normalization with batch, weight and layer normalization in fully-connected neural networks, convolutional networks on the data sets of MNIST, 20NEWS GROUP, CIFAR-10/100, SVHN. Experiments show that cosine normalization achieves better performance than other normalization techniques.

Chunjie Luo, Jianfeng Zhan, Xiaohe Xue, Lei Wang, Rui Ren, Qiang Yang

Discovering Thermoelectric Materials Using Machine Learning: Insights and Challenges

This work involves the use of combined forces of data-driven machine learning models and high fidelity density functional theory for the identification of new potential thermoelectric materials. The traditional method of thermoelectric material discovery from an almost limitless search space of chemical compounds involves expensive and time consuming experiments. In the current work, the density functional theory (DFT) simulations are used to compute the descriptors (features) and thermoelectric characteristics (labels) of a set of compounds. The DFT simulations are computationally very expensive and hence the database is not very exhaustive. With an anticipation that the important features can be learned by machine learning (ML) from the limited database and the knowledge could be used to predict the behavior of any new compound, the current work adds knowledge related to (a) understanding the impact of selection of influence of training/test data, (b) influence of complexity of ML algorithms, and (c) computational efficiency of combined DFT-ML methodology.

Mandar V. Tabib, Ole Martin Løvvik, Kjetil Johannessen, Adil Rasheed, Espen Sagvolden, Anne Marthine Rustad

Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge

Recently, deep neural networks (DNNs) have been widely applied in mobile intelligent applications. The inference for the DNNs is usually performed in the cloud. However, it leads to a large overhead of transmitting data via wireless network. In this paper, we demonstrate the advantages of the cloud-edge collaborative inference with quantization. By analyzing the characteristics of layers in DNNs, an auto-tuning neural network quantization framework for collaborative inference is proposed. We study the effectiveness of mixed-precision collaborative inference of state-of-the-art DNNs by using ImageNet dataset. The experimental results show that our framework can generate reasonable network partitions and reduce the storage on mobile devices with trivial loss of accuracy.

Guangli Li, Lei Liu, Xueying Wang, Xiao Dong, Peng Zhao, Xiaobing Feng

GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders

Deep learning on graphs has become a popular research topic with many applications. However, past work has concentrated on learning graph embedding tasks, which is in contrast with advances in generative models for images and text. Is it possible to transfer this progress to the domain of graphs? We propose to sidestep hurdles associated with linearization of such discrete structures by having a decoder output a probabilistic fully-connected graph of a predefined maximum size directly at once. Our method is formulated as a variational autoencoder. We evaluate on the challenging task of molecule generation.

Martin Simonovsky, Nikos Komodakis

Generation of Reference Trajectories for Safe Trajectory Planning

Many variants of a sampling-based motion planning algorithm, namely Rapidly-exploring Random Tree, use biased-sampling for faster convergence. One of such recently proposed variant, the Hybrid-Augmented CL-RRT+, uses a predicted predefined template trajectory with a machine learning algorithm as a reference for the biased sampling. Because of the finite number of template trajectories, the convergence time is short only in scenarios where the final trajectory is close to predicted template trajectory. Therefore, a generative model using variational autoencoder for generating many reference trajectories and a 3D-ConvNet regressor for predicting those reference trajectories for critical vehicle traffic-scenarios is proposed in this work. Using this framework, two different safe trajectory planning algorithms, namely GATE and GATE-ARRT+, are presented in this paper. Finally, the simulation results demonstrate the effectiveness of these algorithms for the trajectory planning task in different types of critical vehicle traffic-scenarios.

Amit Chaulwar, Michael Botsch, Wolfgang Utschick

Joint Application of Group Determination of Parameters and of Training with Noise Addition to Improve the Resilience of the Neural Network Solution of the Inverse Problem in Spectroscopy to Noise in Data

In most cases, inverse problems are ill-posed or ill-conditioned, which is the reason for high sensitivity of their solution to noise in the input data. Despite the fact that neural networks have the ability to work with noisy data, in the case of inverse problems, this is not enough, because the incorrectness of the problem “outweighs” the ability of the neural network. In previous studies, the authors have shown that separate use of methods of group determination of parameters and of noise addition during training of neural networks can improve the resilience of the solution to noise in the input data. This study is devoted to the investigation of joint application of these methods. The study is performed at the example of an inverse problem in laser Raman spectroscopy - determination of concentrations of ions in a solution of inorganic salts by Raman spectrum of the solution.

Igor Isaev, Sergey Burikov, Tatiana Dolenko, Kirill Laptinskiy, Alexey Vervald, Sergey Dolenko



Generating Natural Answers on Knowledge Bases and Text by Sequence-to-Sequence Learning

Generative question answering systems aim at generating more contentful responses and more natural answers. Existing generative question answering systems applied to knowledge grounded conversation generate natural answers either with a knowledge base or with raw text. Nevertheless, performance of their methods is often affected by the incompleteness of the KB or text facts. In this paper, we propose an end-to-end generative question answering model. We make use of unstructured text and structured KBs to establish an universal schema as a large external facts library. Each words of a natural answer are dynamically predicted from the common vocabulary and retrieved from the corresponding external facts. And our model can generate natural answer containing arbitrary number of knowledge entities through selecting from multiple relevant external facts by the dynamic knowledge enquirer. Finally, empirical study shows that our model is efficient and outperforms baseline methods significantly in terms of automatic evaluation and human evaluation.

Zhihao Ye, Ruichu Cai, Zhaohui Liao, Zhifeng Hao, Jinfen Li

Mitigating Concept Drift via Rejection

Learning in non-stationary environments is challenging, because under such conditions the common assumption of independent and identically distributed data does not hold; when concept drift is present it necessitates continuous system updates. In recent years, several powerful approaches have been proposed. However, these models typically classify any input, regardless of their confidence in the classification – a strategy, which is not optimal, particularly in safety-critical environments where alternatives to a (possibly unclear) decision exist, such as additional tests or a short delay of the decision. Formally speaking, this alternative corresponds to classification with rejection, a strategy which seems particularly promising in the context of concept drift, i.e. the occurrence of situations where the current model is wrong due to a concept change. In this contribution, we propose to extend learning under concept drift with rejection. Specifically, we extend two recent learning architectures for drift, the self-adjusting memory architecture (SAM-kNN) and adaptive random forests (ARF), to incorporate a reject option, resulting in highly competitive state-of-the-art technologies. We evaluate their performance in learning scenarios with different types of drift.

Jan Philip Göpfert, Barbara Hammer, Heiko Wersing

Strategies to Enhance Pattern Recognition in Neural Networks Based on the Insect Olfactory System

Some strategies used by the insect olfactory system to enhace its discrimination capability are an heterogeneous neural threshold distribution, gain control and sparse activity. To test the influence of these mechanisms on the performance for a classification task, we propose a neural network based on the insect olfactory system. In this model, we introduce a regulation term to control de activity of neurons and a structured connectivity between antennal lobe and mushroom body based on recent findings in Drosophila that differs from the classical stochastic approach. Results show that the model achieves better results for high sparseness and low connectivity between Kenyon cells and projection neurons. For this configuration, the use of gain control further improves performance. The structured connectivity model proposed is able to achieve the same discrimination capacity without using gain control or activiy regulation techniques, which opens up interesting possibilities.

Jessica Lopez-Hazas, Aaron Montero, Francisco B. Rodriguez

HyperNets and Their Application to Learning Spatial Transformations

In this paper we propose a conceptual framework for higher-order artificial neural networks. The idea of higher-order networks arises naturally when a model is required to learn some group of transformations, every element of which is well-approximated by a traditional feedforward network. Thus the group as a whole can be represented as a hyper network. One of typical examples of such groups is spatial transformations. We show that the proposed framework, which we call HyperNets, is able to deal with at least two basic spatial transformations of images: rotation and affine transformation. We show that HyperNets are able not only to generalize rotation and affine transformation, but also to compensate the rotation of images bringing them into canonical forms.

Alexey Potapov, Oleg Shcherbakov, Innokentii Zhdanov, Sergey Rodionov, Nikolai Skorobogatko

Catastrophic Forgetting: Still a Problem for DNNs

We investigate the performance of DNNs when trained on class-incremental visual problems consisting of initial training, followed by retraining with added visual classes. Catastrophic forgetting (CF) behavior is measured using a new evaluation procedure that aims at an application-oriented view of incremental learning. In particular, it imposes that model selection must be performed on the initial dataset alone, as well as demanding that retraining control be performed only using the retraining dataset, as initial dataset is usually too large to be kept. Experiments are conducted on class-incremental problems derived from MNIST, using a variety of different DNN models, some of them recently proposed to avoid catastrophic forgetting. When comparing our new evaluation procedure to previous approaches for assessing CF, we find their findings are completely negated, and that none of the tested methods can avoid CF in all experiments. This stresses the importance of a realistic empirical measurement procedure for catastrophic forgetting, and the need for further research in incremental learning for DNNs.

B. Pfülb, A. Gepperth, S. Abdullah, A. Kilian

Queue-Based Resampling for Online Class Imbalance Learning

Online class imbalance learning constitutes a new problem and an emerging research topic that focusses on the challenges of online learning under class imbalance and concept drift. Class imbalance deals with data streams that have very skewed distributions while concept drift deals with changes in the class imbalance status. Little work exists that addresses these challenges and in this paper we introduce queue-based resampling, a novel algorithm that successfully addresses the co-existence of class imbalance and concept drift. The central idea of the proposed resampling algorithm is to selectively include in the training set a subset of the examples that appeared in the past. Results on two popular benchmark datasets demonstrate the effectiveness of queue-based resampling over state-of-the-art methods in terms of learning speed and quality.

Kleanthis Malialis, Christos Panayiotou, Marios M. Polycarpou

Learning Simplified Decision Boundaries from Trapezoidal Data Streams

We present a novel adaptive feedforward neural network for online learning from doubly-streaming data, where both the data volume and feature space grow simultaneously. Traditional online learning and feature selection algorithms can’t handle this problem because they assume that the feature space of the data stream remains unchanged. We propose a Single Hidden Layer Feedforward Neural Network with Shortcut Connections (SLFN-S) that learns if a data stream needs to be mapped using a non-linear transformation or not, to speed up the learning convergence. We employ a growing strategy to adjust the model complexity to the continuously changing feature space. Finally, we use a weight-based pruning procedure to keep the run time complexity of the proposed model linear in the size of the input feature space, for efficient learning from data streams. Experiments with trapezoidal data streams on 8 UCI datasets were conducted to examine the performance of the proposed model. We show that SLFN-S outperforms the state of the art learning algorithm from trapezoidal data streams [16].

Ege Beyazit, Matin Hosseini, Anthony Maida, Xindong Wu

Improving Active Learning by Avoiding Ambiguous Samples

If label information in a classification task is expensive, it can be beneficial to use active learning to get the most informative samples to label by a human. However, there can be samples which are meaningless to the human or recorded wrongly. If these samples are near the classifier’s decision boundary, they are queried repeatedly for labeling. This is inefficient for training because the human can not label these samples correctly and this may lower human acceptance. We introduce an approach to compensate the problem of ambiguous samples by excluding clustered samples from labeling. We compare this approach to other state-of-the-art methods. We further show that we can improve the accuracy in active learning and reduce the number of ambiguous samples queried while training.

Christian Limberg, Heiko Wersing, Helge Ritter

Solar Power Forecasting Using Dynamic Meta-Learning Ensemble of Neural Networks

We consider the task of predicting the solar power output for the next day from previous solar power data. We propose EN-meta, a meta-learning ensemble of neural networks where the meta-learners are trained to predict the errors of the ensemble members for the new day, and these errors are used to dynamically weight the contribution of the ensemble members in the final prediction. We evaluate the performance of EN-meta on Australian solar data for two years and compare its accuracy with state-of-the-art single models, classical ensemble methods and EN-meta versions without the meta-learning component. The results showed that EN-meta was the most accurate method and thus highlight the potential benefit of using meta-learning for solar power forecasting.

Zheng Wang, Irena Koprinska

Using Bag-of-Little Bootstraps for Efficient Ensemble Learning

The technique bag-of-little bootstrap provides statistical estimates equivalent to the ones of bootstrap in a tiny fraction of the time required by bootstrap. In this work, we propose to combine bag-of-little bootstrap into an ensemble of classifiers composed of random trees. We show that using this bootstrapping procedure, instead of standard bootstrap samples, as the ones used in random forest, can dramatically reduce the training time of ensembles of classifiers. In addition, the experiments carried out illustrate that, for a wide range of training times, the proposed ensemble method achieves a generalization error smaller than that achieved by random forest.

Pablo de Viña, Gonzalo Martínez-Muñoz

Learning Preferences for Large Scale Multi-label Problems

Despite that the majority of machine learning approaches aim to solve binary classification problems, several real-world applications require specialized algorithms able to handle many different classes, as in the case of single-label multi-class and multi-label classification problems. The Label Ranking framework is a generalization of the above mentioned settings, which aims to map instances from the input space to a total order over the set of possible labels. However, generally these algorithms are more complex than binary ones, and their application on large-scale datasets could be untractable.The main contribution of this work is the proposal of a novel general on-line preference-based label ranking framework. The proposed framework is able to solve binary, multi-class, multi-label and ranking problems. A comparison with other baselines has been performed, showing effectiveness and efficiency in a real-world large-scale multi-label task.

Ivano Lauriola, Mirko Polato, Alberto Lavelli, Fabio Rinaldi, Fabio Aiolli

Affinity Propagation Based Closed-Form Semi-supervised Metric Learning Framework

Recent state-of-the-art deep metric learning approaches require large number of labeled examples for their success. They cannot directly exploit unlabeled data. When labeled data is scarce, it is very essential to be able to make use of additionally available unlabeled data to learn a distance metric in a semi-supervised manner. Despite the presence of a few traditional, non-deep semi-supervised metric learning approaches, they mostly rely on the min-max principle to encode the pairwise constraints, although there are a number of other ways as offered by traditional weakly-supervised metric learning approaches. Moreover, there is no flow of information from the available pairwise constraints to the unlabeled data, which could be beneficial. This paper proposes to learn a new metric by constraining it to be close to a prior metric while propagating the affinities among pairwise constraints to the unlabeled data via a closed-form solution. The choice of a different prior metric thus enables encoding of the pairwise constraints by following formulations other than the min-max principle.

Ujjal Kr Dutta, C. Chandra Sekhar

Online Approximation of Prediction Intervals Using Artificial Neural Networks

Prediction intervals offer a means of assessing the uncertainty of artificial neural networks’ point predictions. In this work, we propose a hybrid approach for constructing prediction intervals, combining the Bootstrap method with a direct approximation of lower and upper error bounds. The main objective is to construct high-quality prediction intervals – combining high coverage probability for future observations with small and thus informative interval widths – even when sparse data is available. The approach is extended to adaptive approximation, whereby an online learning scheme is proposed to iteratively update prediction intervals based on recent measurements, requiring a reduced computational cost compared to offline approximation. Our results suggest the potential of the hybrid approach to construct high-coverage prediction intervals, in batch and online approximation, even when data quantity and density are limited. Furthermore, they highlight the need for cautious use and evaluation of the training data to be used for estimating prediction intervals.

Myrianthi Hadjicharalambous, Marios M. Polycarpou, Christos G. Panayiotou



Estimation of Microphysical Parameters of Atmospheric Pollution Using Machine Learning

The estimation of microphysical parameters of pollution (effective radius and complex refractive index) from optical aerosol parameters entails a complex problem. In previous work based on machine learning techniques, Artificial Neural Networks have been used to solve this problem. In this paper, the use of a classification and regression solution based on the k-Nearest Neighbor algorithm is proposed. Results show that this contribution achieves better results in terms of accuracy than the previous work.

C. Llerena, D. Müller, R. Adams, N. Davey, Y. Sun

Communication Style - An Analysis from the Perspective of Automated Learning

This paper is intended to bring added value in the interdisciplinary domains of computer science and psychology, more precisely and in particular, automated learning and applied psychology. We present automated learning techniques for classification of new instances, new observations of a patient, taking into account the particularities of the attributes describing each of these observations. Specifically, information collected by applying a questionnaire for communication style (non-assertive style, manipulative style, aggressive style and assertive style) was analyzed.Through these experiments, we have tried to determine which of the classification models are best suited to be applied in specific situations and, given the type of attributes that make up the instances of the dataset, what kind of preprocessing methods can be applied to get the most qualitative results using the selected classification models: Decision Tree Based Model, Support Vector Machine, Random Forest, Classification based on instances (k-NN), and Logistic Regression. Standard metrics were used to evaluate the performance of each of the analyzed classification patterns: accuracy, sensitivity, precision, and specificity.

Adriana Mihaela Coroiu, Alina Delia Călin, Maria Nuțu

Directional Data Analysis for Shape Classification

In this work we address the problem of learning from images to perform grouping and classification of shapes. The key idea is to encode the instances available for learning in the form of directional data. In two dimensions, the figure to be categorized is characterized by the distribution of the directions of the normal unit vectors along the contour of the object. This directional characterization is used to extract characteristics based on metrics defined in the space of circular distributions. These characteristics can then be used to categorize the encoded shapes. The usefulness of the representation proposed is illustrated in the problem of clustering and classification of otolith shapes.

Adrián Muñoz, Alberto Suárez

Semantic Space Transformations for Cross-Lingual Document Classification

Cross-lingual document representation can be done by training monolingual semantic spaces and then to use bilingual dictionaries with some transform method to project word vectors into a unified space. The main goal of this paper consists in evaluation of three promising transform methods on cross-lingual document classification task. We also propose, evaluate and compare two cross-lingual document classification approaches. We use popular convolutional neural network (CNN) and compare its performance with a standard maximum entropy classifier. The proposed methods are evaluated on four languages, namely English, German, Spanish and Italian from the Reuters corpus. We demonstrate that the results of all transformation methods are close to each other, however the orthogonal transformation gives generally slightly better results when CNN with trained embeddings is used. The experimental results also show that convolutional network achieves better results than maximum entropy classifier. We further show that the proposed methods are competitive with the state of the art.

Jiří Martínek, Ladislav Lenc, Pavel Král

Automatic Treatment of Bird Audios by Means of String Compression Applied to Sound Clustering in Xeno-Canto Database

Compression distances can be a very useful tool in automatic object clustering because of their parameter-free nature. However, when they are used to compare very different-sized objects with a high percentage of noise, their behaviour might be unpredictable. In order to address this drawback, we have develop an automatic object segmentation methodology prior to the string-compression-based object clustering. Our experimental results using the xeno-canto database show that this methodology can be successfully applied to automatic bird species identification from their sounds. These results show that applying our methodology significantly improves the clustering performance of bird sounds compared to the performance obtained without applying our automatic object segmentation methodology.

Guillermo Sarasa, Ana Granados, Francisco B. Rodriguez

FROD: Fast and Robust Distance-Based Outlier Detection with Active-Inliers-Patterns in Data Streams

The detection of distance-based outliers from streaming data is critical for modern applications ranging from telecommunications to cybersecurity. However, existing works mainly concentrate on improving the responding speed, none of these proposals can perform well in streams with varying data distribution. In this paper, we propose a Fast and Robust Outlier Detection method (FROD in short) to solve this dilemma and achieve the promotion in both detection performance and processing throughput. Specifically, to adapt the changing distribution in data streams, we employ the Active-Inliers-Pattern which dynamically selects reserved objects for further outlier analysis. Moreover, an effective micro-cluster-based data storing structure is proposed to improve the detection efficiency, which is supported by our theoretical analysis on the complexity bounds. Moreover, we present a potential background updating optimization approach to hide the updating time. Experiments performed on real-world and synthetic datasets verify our theoretical study and demonstrate that our algorithm is not only faster than state-of-the-art methods, but also achieve a better detection performance when the outlier rate fluctuates.

Zongren Li, Yijie Wang, Guohong Zhao, Li Cheng, Xingkong Ma

Unified Framework for Joint Attribute Classification and Person Re-identification

Person re-identification (re-id) is an essential task in video surveillance. Existing approaches mainly concentrate on extracting useful appearance features from deep convolutional neural networks. However, they don’t utilize or only partially utilize semantic information such as attributes or person orientation. In this paper, we propose a novel deep neural network framework that greatly improves the accuracy of person re-id and also that of attribute classification. The proposed framework includes two branches, the identity one and the attribute one. The identity branch employs the refined triplet loss and exploits local cues from different regions of the pedestrian body. The attribute branch has an effective attribute predictor containing hierarchical attribute loss functions. After training the identification and attribute classifications, pedestrian representations are derived which contains hierarchical attribute information. The experimental results on DukeMTMC-reID and Matket-1501 datasets validate the effectiveness of the proposed framework in both person re-id and attribute classification. For person re-id, the Rank-1 accuracy is improved by 7.99% and 2.76%, and the mAP is improved by 14.72% and 5.45% on DukeMTMC-reID and Market-1501 datasets respectively. Specifically, it yields 90.95% in accuracy of attribute classification on DukeMTMC-reID, which outperforms the state-of-the-art attribute classification methods by 3.42%.

Chenxin Sun, Na Jiang, Lei Zhang, Yuehua Wang, Wei Wu, Zhong Zhou

Associative Graph Data Structures Used for Acceleration of K Nearest Neighbor Classifiers

This paper introduces a new associative approach for significant acceleration of k Nearest Neighbor classifiers (kNN). The kNN classifier is a lazy method, i.e. it does not create a computational model, so it is inefficient during classification using big training data sets because it requires going through all training patterns when classifying each sample. In this paper, we propose to use Associative Graph Data Structures (AGDS) as an efficient model for storing training patterns and their relations, allowing for fast access to nearest neighbors during classification made by kNNs. Hence, the AGDS significantly accelerates the classification made by kNNs, especially for large and huge training datasets. In this paper, we introduce an Associative Acceleration Algorithm and demonstrate how it works on this associative structure substantially reducing the number of checked patterns and quickly selecting k nearest neighbors for kNNs. The presented approach was compared to classic kNN approaches successfully.

Adrian Horzyk, Krzysztof Gołdon

A Game-Theoretic Framework for Interpretable Preference and Feature Learning

We are living in an era that we can call machine learning revolution. Started as a pure academic and research-oriented domain, we have seen widespread commercial adoption across diverse domains, such as retail, healthcare, finance, and many more. However, the usage of machine learning poses its own set of challenges when it comes to explain what is going on under the hood. The reason being models interpretability is very important for the business is to explain each and every decision being taken by the model. In order to take a step forward in this direction, we propose a principled algorithm inspired by both preference learning and game theory for classification. Particularly, the learning problem is posed as a two player zero-sum game which we show having theoretical guarantees about its convergence. Interestingly, feature selection can be straightforwardly plugged into such algorithm. As a consequence, the hypotheses space consists on a set of preference prototypes along with (possibly non-linear) features making the resulting models easy to interpret.

Mirko Polato, Fabio Aiolli

A Dynamic Ensemble Learning Framework for Data Stream Analysis and Real-Time Threat Detection

Security incident tracking systems receive a continuous, unlimited inflow of observations, where in the typical case the most recent ones are the most important. These data flows and characterized by high volatility. Their characteristics can change drastically over time in an unpredictable way, differentiating their typical normal behavior. In most cases it is not possible to store all of the historical samples, since their volume is unlimited. This fact requires the extraction of real-time knowledge over a subset of the flow, which contains a small but recent percentage of all observations. This creates serious objections to the accuracy and reliability of the employed classifiers. The research described herein, uses a Dynamic Ensemble Learning (DYENL) approach for Data Stream Analysis (DELDaStrA) which is employed in RealTime Threat Detection systems. More specifically, it proposes a DYENL model that uses the “Kappa” architecture to perform analysis of data flows. The DELDaStrA is based on the hybrid combination of k Nearest Neighbor (kNN) Classifiers, with Adaptive Random Forest (ARF) and Primal Estimated SubGradient Solver for Support Vector Machines (SVM) (SPegasos). In fact, it performs a dynamic extraction of the weighted average of the three results, to maximize the classification accuracy.

Konstantinos Demertzis, Lazaros Iliadis, Vardis-Dimitris Anezakis

Fuzzy/Feature Selection


Gaussian Kernel-Based Fuzzy Clustering with Automatic Bandwidth Computation

The conventional Gaussian kernel-based fuzzy c-means clustering algorithm has widely demonstrated its superiority to the conventional fuzzy c-means when the data sets are arbitrarily shaped, and not linearly separable. However, its performance is very dependent on the estimation of the bandwidth parameter of the Gaussian kernel function. Usually this parameter is estimated once and for all. This paper presents a Gaussian fuzzy c-means with kernelization of the metric which depends on a vector of bandwidth parameters, one for each variable, that are computed automatically. Experiments with data sets of the UCI machine learning repository corroborate the usefulness of the proposed algorithm.

Francisco de A. T. de Carvalho, Lucas V. C. Santana, Marcelo R. P. Ferreira

Fuzzy Clustering Algorithm Based on Adaptive Euclidean Distance and Entropy Regularization for Interval-Valued Data

Symbolic Data Analysis provides suitable new types of variable that can take into account the variability present in the observed measurements. This paper proposes a partitioning fuzzy clustering algorithm for interval-valued data based on suitable adaptive Euclidean distance and entropy regularization. The proposed method optimizes an objective function by alternating three steps aiming to compute the fuzzy cluster representatives, the fuzzy partition, as well as relevance weights for the interval-valued variables. Experiments on synthetic and real datasets corroborate the usefulness of the proposed algorithm.

Sara Inés Rizo Rodríguez, Francisco de Assis Tenorio de Carvalho

Input-Dependably Feature-Map Pruning

Deep neural networks are an accurate tool for solving, among other things, vision tasks. The computational cost of these networks is often high, preventing their adoption in many real time applications. Thus, there is a constant need for computational saving in this research domain. In this paper we suggest trading accuracy with computation using a gated version of Convolutional Neural Networks (CNN). The gated network selectively activates only a portion of its feature-maps, depending on the given example to be classified. The network’s ‘gates’ imply which feature-maps are necessary for the task, and which are not. Specifically, full feature maps are considered for omission, to enable computational savings in a manner compliant with GPU hardware constraints. The network is trained using a combination of back-propagation for standard weights, minimizing an error-related loss, and reinforcement learning for the gates, minimizing a loss related to the number of feature maps used. We trained and evaluated a gated version of dense-net on the CIFAR-10 dataset [1]. Our results show that with slight impact on the network accuracy, a potential acceleration of up to $$ \times 3 $$ might be obtained.

Atalya Waissman, Aharon Bar-Hillel

Thermal Comfort Index Estimation and Parameter Selection Using Fuzzy Convolutional Neural Network

In order to monitor the comfort level of the city, which depends on several thermal metrics, in many indoor and outdoor applications it is required to estimate the comfort level of the city in real-time. Out of the many thermal comfort indices proposed so far, predicted mean voter (PMV) is one of the widely used measures for both indoor and outdoor ambiances. Due to the complexity of calculating PMV in real-time, many techniques have been proposed to estimate it without using all the required parameters. So far fuzzy networks have shown the best results for PMV estimation because of its rule generation capability. Convolutional neural network (CNN) is an deep learning based technique to classify, or to estimate particular parameter by shrinking them to significant data-collections. In this work, we fuzzified the system before applying CNN for regression to estimate the PMV values. Simulation results show that the proposed model outperforms the existing ANFIS model for PMV estimation with a lower root mean square error value.

Anirban Mitra, Arjun Sharma, Sumit Sharma, Sudip Roy

Soft Computing Modeling of the Illegal Immigration Density in the Borders of Greece

It is a fact that due to the war in Syria and to instability/poverty in wide regions of the world, immigration flows to Europe have increased to a very significant extent. From the EU countries, Greece and Italy are accepting the heaviest load due to their geographical location. This research paper, proposes a flexible and rational Soft Computing approach, aiming to model and classify areas of the Greek (sea and land) borderline, based on the density and range of illegal immigration (ILIM). The proposed model employs Intuitionistic Fuzzy Sets (IFUS) and Fuzzy Similarity indices (FUSI). The application of this methodology can provide significant aid towards the assessment of the situation in each of the involved areas, depending on the extent of the flow they face.

Serafeim Koutsomplias, Lazaros Iliadis

Fuzzy Implications Generating from Fuzzy Negations

A basic building block in the foundation of fuzzy neural networks is the theory of fuzzy implications. Fuzzy implications play a crucial role in this topic. The aim of this paper is to find a new method of generating fuzzy implications. based on a given fuzzy negation. Specifically, we propose using a given fuzzy negation and a function so as to generate rules of fuzzy implications, that is rules which regulate decision making, thus adapting mathematics to human common sense. A great advantage of this construction is that the implications generated in this way fulfil many axioms and serious properties among the set of required ones.

Georgios Souliotis, Basil Papadopoulos

Facial/Emotion Recognition


Improving Ensemble Learning Performance with Complementary Neural Networks for Facial Expression Recognition

Facial expression recognition has significant application value in fields such as human-computer interaction. Recently, Convolutional Neural Networks (CNNs) have been widely utilized for feature extraction and expression recognition. Network ensemble is an important step to improve recognition performance. To improve the inefficiency of existing ensemble strategy, we propose a new ensemble method to efficiently find networks with complementary capabilities. The proposed method is verified on two groups of CNNs with different depth (eight 5-layer shallow CNNs and twelve 11-layer deep VGGNet variants) trained on FER-2013 and RAF-DB, respectively. Experimental results demonstrate that the proposed method achieves the highest recognition accuracy of 74.14% and 85.46% on FER-2013 and RAF-DB database, respectively, to the best of our knowledge, outperforms state-of-the-art CNN-based facial expression recognition methods. In addition, our method also obtains a competitive result of the mean diagonal value in confusion matrix on RAF-DB test set.

Xinmin Zhang, Yingdong Ma

Automatic Beautification for Group-Photo Facial Expressions Using Novel Bayesian GANs

Directly benefiting from the powerful generative adversarial networks (GANs) in recent years, various new image processing tasks pertinent to image generation and synthesis have gained more popularity with the growing success. One such application is individual portrait photo beautification based on facial expression detection and editing. Yet, automatically beautifying group photos without tedious and fragile human interventions still remains challenging. The difficulties inevitably arise from diverse facial expression evaluation, harmonious expression generation, and context-sensitive synthesis from single/multiple photos. To ameliorate, we devise a two-stage deep network for automatic group-photo evaluation and beautification by seamless integration of multi-label CNN with Bayesian network enhanced GANs. First, our multi-label CNN is designed to evaluate the quality of facial expressions. Second, our novel Bayesian GANs framework is proposed to automatically generate photo-realistic beautiful expressions. Third, to further enhance naturalness of beautified group photos, we embed Poisson fusion in the final layer of the GANs in order to synthesize all the beautified individual expressions. We conducted extensive experiments on various kinds of single-/multi-frame group photos to validate our novel network design. All the experiments confirm that, our novel method can uniformly accommodate diverse expression evaluation and generation/synthesis of group photos, and outperform the state-of-the-art methods in terms of effectiveness, versatility, and robustness.

Ji Liu, Shuai Li, Wenfeng Song, Liang Liu, Hong Qin, Aimin Hao

Fast and Accurate Affect Prediction Using a Hierarchy of Random Forests

Hierarchical systems are powerful tools to deal with non-linear data with a high variability. We show in this paper that regressing a bounded variable on such data is a challenging task. As an alternate, we propose here a two-step process. First, an ensemble of ordinal classifiers affect the observation to a given range of the variable to predict and a discrete estimate of the variable. Then, a regressor is trained locally on this range and its neighbors and provides a finer continuous estimate. Experiments on affect audio data from the AVEC’2014 and AV+EC’2015 challenges show that this cascading process can be compared favorably to the state of the art and challengers results.

Maxime Sazadaly, Pierre Pinchon, Arthur Fagot, Lionel Prevost, Myriam Maumy Bertrand

Gender-Aware CNN-BLSTM for Speech Emotion Recognition

Gender information has been widely used to improve the performance of speech emotion recognition (SER) due to different expressing styles of men and women. However, conventional methods cannot adequately utilize gender information by simply representing gender characteristics with a fixed unique integer or one-hot encoding. In order to emphasize the gender factors for SER, we propose two types of features for our framework, namely distributed-gender feature and gender-driven feature. The distributed-gender feature is constructed in a way to represent the gender distribution as well as individual differences, while the gender-driven feature is extracted from acoustic signals through a deep neural network (DNN). These two proposed features are then augmented into the original spectrogram respectively to serve as the input for the following decision-making network, where we construct a hybrid one by combining convolutional neural network (CNN) and bi-directional long short-term memory (BLSTM). Compared with spectrogram only, adding the distributed-gender feature and gender-driven feature in gender-aware CNN-BLSTM improved unweighted accuracy by relative error reduction of 14.04% and 45.74%, respectively.

Linjuan Zhang, Longbiao Wang, Jianwu Dang, Lili Guo, Qiang Yu

Semi-supervised Model for Emotion Recognition in Speech

To recognize emotional traits on speech is a challenging task which became very popular in the past years, especially due to the recent advances in deep neural networks. Although very successful, these models inherited a common problem from strongly supervised deep neural networks: a large number of strongly labeled samples demands necessary, so the model learns a general emotion representation. This paper proposes a solution for this problem with the development of a semi-supervised neural network which can learn speech representation from unlabeled samples and used them in different emotion recognition in speech scenarios. We provide experiments with different datasets, representing natural and controlled scenarios. Our results show that our model is competitive with state-of-the-art solutions in all these scenarios while sharing the same learned representations, which were learned without the necessity of strong labeled data.

Ingryd Pereira, Diego Santos, Alexandre Maciel, Pablo Barros

Real-Time Embedded Intelligence System: Emotion Recognition on Raspberry Pi with Intel NCS

Convolutional Neural Networks (CNNs) have exhibited certain human-like performance on computer vision related tasks. Over the past few years since they have outperformed conventional algorithms in a range of image processing problems. However, to utilise a CNN model with millions of free parameters on a source limited embedded system is a challenging problem. The Intel Neural Compute Stick (NCS) provides a possible route for running large-scale neural networks on a low cost, low power, portable unit. In this paper, we propose a CNN based Raspberry Pi system that can run a pre-trained inference model in real time with an average power consumption of 6.2 W. The Intel Movidius NCS, which avoids requirements of expensive processing units e.g. GPU, FPGA. The system is demonstrated using a facial image-based emotion recogniser. A fine-tuned CNN model is designed and trained to perform inference on each captured frame within the processing modules of NCS.

Y. Xing, P. Kirkland, G. Di Caterina, J. Soraghan, G. Matich


Weitere Informationen

Premium Partner