scroll identifier for mobile
main-content

## Über dieses Buch

The four volume set LNCS 9947, LNCS 9948, LNCS 9949, and LNCS 9950 constitues the proceedings of the 23rd International Conference on Neural Information Processing, ICONIP 2016, held in Kyoto, Japan, in October 2016. The 296 full papers presented were carefully reviewed and selected from 431 submissions. The 4 volumes are organized in topical sections on deep and reinforcement learning; big data analysis; neural data analysis; robotics and control; bio-inspired/energy efficient information processing; whole brain architecture; neurodynamics; bioinformatics; biomedical engineering; data mining and cybersecurity workshop; machine learning; neuromorphic hardware; sensory perception; pattern recognition; social networks; brain-machine interface; computer vision; time series analysis; data-driven approach for extracting latent features; topological and graph based clustering methods; computational intelligence; data mining; deep neural networks; computational and cognitive neurosciences; theory and algorithms.

## Inhaltsverzeichnis

### Chaotic Feature Selection and Reconstruction in Time Series Prediction

The challenge in feature selection for time series lies in achieving similar prediction performance when compared with the original dataset. The method has to ensure that important information has not been lost by with feature selection for data reduction. We present a chaotic feature selection and reconstruction method based on statistical analysis for time series prediction. The method can also be viewed as a way for reduction of data through selection of most relevant features with the hope of reducing training time for learning algorithms. We employ cooperative neuro-evolution as a machine learning tool to evaluate the performance of the proposed method. The results show that our method gives a data reduction of up to 42 % with a similar performance when compared to the literature.

Shamina Hussein, Rohitash Chandra

### L1/2 Norm Regularized Echo State Network for Chaotic Time Series Prediction

Echo state network contains a randomly connected hidden layer and an adaptable output layer. It can overcome the problems associated with the complex computation and local optima. But there may be ill-posed problem when large reservoir state matrix is used to calculate the output weights by least square estimation. In this study, we use L1/2 regularization to calculate the output weights to get a sparse solution in order to solve the ill-posed problem and improve the generalized performance. In addition, an operation of iterated prediction is conducted to test the effectiveness of the proposed L1/2ESN for capturing the dynamics of the chaotic time series. Experimental results illustrate that the predictor has been designed properly. It outperforms other modified ESN models in both sparsity and accuracy.

Meiling Xu, Min Han, Shunshoku Kanae

### SVD and Text Mining Integrated Approach to Measure Effects of Disasters on Japanese Economics

Effects of the Thai Flooding in 2011

In this paper, we analyzed effects of the 2011 Thai flooding on Japanese economics. In the paper, we propose, as a new time series economics data analysis method, an integrated approach of Singular Value Decomposition on stock data and news article text mining. There we first find the correlations among companies’ stock data and then in order to find the latent logical reasons of the associations, we conduct text mining. The paper shows the two-stage approach’s advantages to refine the logical reasoning. Concerning the Thai flooding effects on the Japan’s economy, as unexpected moves, we have found the serious harms on the Japanese food and drink companies and its quick recoveries.

Yuriko Yano, Yukari Shirota

### Deep Belief Network Using Reinforcement Learning and Its Applications to Time Series Forecasting

Artificial neural networks (ANNs) typified by deep learning (DL) is one of the artificial intelligence technology which is attracting the most attention of researchers recently. However, the learning algorithm used in DL is usually with the famous error-backpropagation (BP) method. In this paper, we adopt a reinforcement learning (RL) algorithm “Stochastic Gradient Ascent (SGA)” proposed by Kimura and Kobayashi into a Deep Belief Net (DBN) with multiple restricted Boltzmann machines (RBMs) instead of BP learning method. A long-term prediction experiment, which used a benchmark of time series forecasting competition, was performed to verify the effectiveness of the proposed method.

Takaomi Hirata, Takashi Kuremoto, Masanao Obayashi, Shingo Mabu, Kunikazu Kobayashi

### Neuron-Network Level Problem Decomposition Method for Cooperative Coevolution of Recurrent Networks for Time Series Prediction

The breaking down of a particular problem through problem decomposition has enabled complex problems to be solved efficiently. The two major problem decomposition methods used in cooperative coevolution are synapse and neuron level. The combination of both the problem decomposition as a hybrid problem decomposition has been seen applied in time series prediction. The different problem decomposition methods applied at particular area of a network can share its strengths to solve the problem better, which forms the major motivation. In this paper, we are proposing a problem decomposition method that combines neuron and network level problem decompositions for Elman recurrent neural networks and applied to time series prediction. The results reveal that the proposed method has got better results in few datasets when compared to two popular standalone methods. The results are better in selected cases for proposed method when compared to several other approaches from the literature.

Ravneil Nand, Emmenual Reddy, Mohammed Naseem

### Yet Another Schatten Norm for Tensor Recovery

In this paper, we introduce a new class of Schatten norms for tensor recovery. In the new norm, unfoldings of a tensor along not only every single order but also all combinations of orders are taken into account. Additionally, we prove that the proposed tensor norm has similar properties to matrix Schatten norm, and also provides several propositions which is useful in the recovery problem. Furthermore, for reliable recovery of a tensor with Gaussian measurements, we show the necessary size of measurements using the new norm. Compared to using conventional overlapped Schatten norm, the new norm results in less measurements for reliable recovery with high probability. Finally, experimental results demonstrate the efficiency of the new norm in video in-painting.

Chao Li, Lili Guo, Yu Tao, Jinyu Wang, Lin Qi, Zheng Dou

### Memory of Reading Literature in a Hippocampal Network Model Based on Theta Phase Coding

Using computer simulations, the authors have demonstrated that temporal compression based on theta phase coding in the hippocampus is essential for the encoding of episodic memory occurring on a behavioral timescale (> a few seconds). In this study, the memory of reading literature was evaluated using a network model based on theta phase coding. Input was derived from an eye movement sequence during reading and each fixated word was encoded by a vector computed from a statistical language model with a large text corpus. The results successfully demonstrated a memory generated by a word sequence during a 6-min reading session and this suggests a general role for theta phase coding in the formation of episodic memory.

Naoyuki Sato

### Combining Deep Learning and Preference Learning for Object Tracking

Object tracking is nowadays a hot topic in computer vision. Generally speaking, its aim is to find a target object in every frame of a video sequence. In order to build a tracking system, this paper proposes to combine two different learning frameworks: deep learning and preference learning. On the one hand, deep learning is used to automatically extract latent features for describing the multi-dimensional raw images. Previous research has shown that deep learning has been successfully applied in different computer vision applications. On the other hand, object tracking can be seen as a ranking problem, in the sense that the regions of an image can be ranked according to their level of overlapping with the target object. Preference learning is used to build the ranking function. The experimental results of our method, called $$DPL^{2}$$(Deep & Preference Learning), are competitive with respect to the state-of-the-art algorithms.

Shuchao Pang, Juan José del Coz, Zhezhou Yu, Oscar Luaces, Jorge Díez

### A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data

In this paper, novel cost-sensitive principal component analysis (CSPCA) and cost-sensitive non-negative matrix factorization (CSNMF) methods are proposed for handling the problem of feature extraction from imbalanced data. The presence of highly imbalanced data misleads existing feature extraction techniques to produce biased features, which results in poor classification performance especially for the minor class problem. To solve this problem, we propose a cost-sensitive learning strategy for feature extraction techniques that uses the imbalance ratio of classes to discount the majority samples. This strategy is adapted to the popular feature extraction methods such as PCA and NMF. The main advantage of the proposed methods is that they are able to lessen the inherent bias of the extracted features to the majority class in existing PCA and NMF algorithms. Experiments on twelve public datasets with different levels of imbalance ratios show that the proposed methods outperformed the state-of-the-art methods on multiple classifiers.

Ali Braytee, Wei Liu, Paul Kennedy

### Nonnegative Tensor Train Decompositions for Multi-domain Feature Extraction and Clustering

Tensor train (TT) is one of the modern tensor decomposition models for low-rank approximation of high-order tensors. For nonnegative multiway array data analysis, we propose a nonnegative TT (NTT) decomposition algorithm for the NTT model and a hybrid model called the NTT-Tucker model. By employing the hierarchical alternating least squares approach, each fiber vector of core tensors is optimized efficiently at each iteration. We compared the performances of the proposed method with a standard nonnegative Tucker decomposition (NTD) algorithm by using benchmark data sets including event-related potential data and facial image data in multi-domain feature extraction and clustering tasks. It is illustrated that the proposed algorithm extracts physically meaningful features with relatively low storage and computational costs compared to the standard NTD model.

Namgil Lee, Anh-Huy Phan, Fengyu Cong, Andrzej Cichocki

### Hyper-parameter Optimization of Sticky HDP-HMM Through an Enhanced Particle Swarm Optimization

Faced with the problem of uncertainties in object trajectory and pattern recognition in terms of the non-parametric Bayesian approach, we have derived that 2 major methods of optimizing hierarchical Dirichlet process hidden Markov model (HDP-HMM) for the task. HDP-HMM suffers from poor performance not only on moderate dimensional data, but also sensitivity to its parameter settings. For the purpose of optimizing HDP-HMM on dimensional data, test for optimized results will be carried on the Tum Kitchen dataset [7], which was provided for the purpose of research the motion and activity recognitions. The optimization techniques capture the best hyper-parameters which then produce optimal solution to the task given in a certain search space.

Jiaxi Li, Junfu Yin, Yuk Ying Chung, Feng Sha

### Approximate Inference Method for Dynamic Interactions in Larger Neural Populations

The maximum entropy method has been successfully employed to explain stationary spiking activity of a neural population by using fewer features than the number of possible activity patterns. Modeling network activity in vivo, however, has been challenging because features such as spike-rates and interactions can change according to sensory stimulation, behavior, or brain state. To capture the time-dependent activity, Shimazaki et al. (PLOS Comp Biol, 2012) previously introduced a state-space framework for the latent dynamics of neural interactions. However, the exact method suffers from computational cost; therefore its application was limited to only $${\sim }15$$ neurons. Here we introduce the pseudolikelihood method combined with the TAP or Bethe approximation to the state-space model, and make it possible to estimate dynamic pairwise interactions of up to 30 neurons. These analytic approximations allow analyses of time-varying activity of larger networks in relation to stimuli or behavior.

Christian Donner, Hideaki Shimazaki

### Features Learning and Transformation Based on Deep Autoencoders

Tag recommendation has become one of the most important ways of an organization to index online resources like articles, movies, and music in order to recommend it to potential users. Since recommendation information is usually very sparse, effective learning of the content representation for these resources is crucial to accurate the recommendation.One of the issue of this problem is features transformation or features learning. In one hand, the projection methods allows to find new representations of the data, but it is not adapted for non-linear data or very sparse datasets. In another hand, unsupervised feature learning with deep networks has been widely studied in the recent years. Despite the progress, most existing models would be fragile to non-Gaussian noises, outliers or high dimensional sparse data. In this paper, we propose a study on the use of deep denoising autoencoders and other dimensional reduction techniques to learn relevant representations of the data in order to increase the quality of the clustering model.In this paper, we propose an hybrid framework with a deep learning model called stacked denoising autoencoder (SDAE), the SVD and Diffusion Maps to learn more effective content representation. The proposed framework is tested on real tag recommendation dataset which was validated by using internal clustering indexes and by experts.

Eric Janvier, Thierry Couronne, Nistor Grozavu

### t-Distributed Stochastic Neighbor Embedding with Inhomogeneous Degrees of Freedom

One of the dimension reduction (DR) methods for data-visualization, t-distributed stochastic neighbor embedding (t-SNE), has drawn increasing attention. t-SNE gives us better visualization than conventional DR methods, by relieving so-called crowding problem. The crowding problem is one of the curses of dimensionality, which is caused by discrepancy between high and low dimensional spaces. However, in t-SNE, it is assumed that the strength of the discrepancy is the same for all samples in all datasets regardless of ununiformity of distributions or the difference in dimensions, and this assumption sometimes ruins visualization. Here we propose a new DR method inhomogeneous t-SNE, in which the strength is estimated for each point and dataset. Experimental results show that such pointwise estimation is important for reasonable visualization and that the proposed method achieves better visualization than the original t-SNE.

Jun Kitazono, Nistor Grozavu, Nicoleta Rogovschi, Toshiaki Omori, Seiichi Ozawa

### Parcellating Whole Brain for Individuals by Simple Linear Iterative Clustering

This paper utilizes a supervoxel method called simple linear iterative clustering (SLIC) to parcellate whole brain into functional subunits using resting-state fMRI data. The parcellation algorithm is directly applied on the resting-state fMRI time series without feature extraction, and the parcellation is conducted on the individual subject level. In order to obtain parcellations with multiple granularities, we vary the cluster number in a wide range. To demonstrate the reasonability of the proposed approach, we compare it with a state-of-the-art whole brain parcellation approach, i.e., the normalized cuts (Ncut) approach. The experimental results show that the proposed approach achieves satisfying performances in terms of spatial contiguity, functional homogeneity and reproducibility. The proposed approach could be used to generate individualized brain atlases for applications such as personalized medicine.

Jing Wang, Zilan Hu, Haixian Wang

### Overlapping Community Structure Detection of Brain Functional Network Using Non-negative Matrix Factorization

Community structure, as a main feature of a complex network, has been investigated recently under the assumption that the identified communities are non-overlapping. However, few studies have revealed the overlapping community structure of the brain functional network, despite the fact that communities of most real networks overlap. In this paper, we propose a novel framework to identify the overlapping community structure of the brain functional network by using the symmetric non-negative matrix factorization (SNMF), in which we develop a non-negative adaptive sparse representation (NASR) to produce an association matrix. Experimental results on fMRI data sets show that, compared with modularity optimization, normalized cuts and affinity propagation, SNMF identifies the community structure more accurately and can shed new light on the understanding of brain functional systems.

Xuan Li, Zilan Hu, Haixian Wang

### Collaborative-Based Multi-scale Clustering in Very High Resolution Satellite Images

In this article, we show an application of collaborative clustering applied to real data from very high resolution images. Our proposed method makes it possible to have several algorithms working at different scales of details while exchanging their information on the clusters.Our method that aims at strengthening the hierarchical links between the clusters extracted at different level of detail has shown good results in terms of clustering quality based on common unsupervised learning indexes, but also when using external indexes: We compared our results with other algorithms and analyzed them based on an expert ground truth.

Jérémie Sublime, Antoine Cornuéjols, Younès Bennani

### Towards Ontology Reasoning for Topological Cluster Labeling

In this paper, we present a new approach combining topological unsupervised learning with ontology based reasoning to achieve both: (i) automatic interpretation of clustering, and (ii) scaling ontology reasoning over large datasets. The interest of such approach holds on the use of expert knowledge to automate cluster labeling and gives them high level semantics that meets the user interest. The proposed approach is based on two steps. The first step performs a topographic unsupervised learning based on the SOM (Self-Organizing Maps) algorithm. The second step integrates expert knowledge in the map using ontology reasoning over the prototypes and provides an automatic interpretation of the clusters. We apply our approach to the real problem of satellite image classification. The experiments highlight the capacity of our approach to obtain a semantically labeled topographic map and the obtained results show very promising performances.

Hatim Chahdi, Nistor Grozavu, Isabelle Mougenot, Younès Bennani, Laure Berti-Equille

### Overlapping Community Detection Using Core Label Propagation and Belonging Function

Label propagation is one of the fastest methods for community detection, with a near linear time complexity. It acts locally. Each node interacts with neighbours to change its own label by a majority vote. But this method has three major drawbacks: (i) it can lead to huge communities without sense called also monster communities, (ii) it is unstable, and (iii) it is unable to detect overlapping communities.In this paper, we suggest new techniques that improve considerably the basic technique by using an existing core detection label propagation technique. It is then possible to detect overlapping communities through a belonging function which qualifies the belonging degree of nodes to several communities.Nodes are assigned and replicated by the function a number of times to communities which are found automatically. User may also interact with the technique by imposing and freezing the number of communities a node may belong to. A comparative analysis will be done.

Jean-Philippe Attal, Maria Malek, Marc Zolghadri

### A New Clustering Algorithm for Dynamic Data

In this paper, we propose an algorithm for the discovery and the monitoring of clusters in dynamic datasets. The proposed method is based on a Growing Neural Gas and learns simultaneously the prototypes and their segmentation using and estimation of the local density of data to detect the boundaries between clusters. The quality of our algorithm is evaluated on a set of artificial datasets presenting a set of static and dynamic cluster structures.

Parisa Rastin, Tong Zhang, Guénaël Cabanes

### Decentralized Stabilization for Nonlinear Systems with Unknown Mismatched Interconnections

This paper establishes a neural network and policy iteration based decentralized control scheme to stabilize large-scale nonlinear systems with unknown mismatched interconnections. For relaxing the common assumption of upper boundedness on interconnections when designing the decentralized optimal control, interconnections are approximated by neural networks with local signals of isolated subsystem and replaced reference signals of coupled subsystems. By using the adaptive estimation term, the performance index function is constructed to reflect the replacement error. Hereafter, it is proven that the developed decentralized optimal control policies can guarantee the closed-loop large-scale nonlinear system to be uniformly ultimately bounded. The effectiveness of the developed scheme is verified by a simulation example.

Bo Zhao, Ding Wang, Guang Shi, Derong Liu, Yuanchun Li

### Optimal Constrained Neuro-Dynamic Programming Based Self-learning Battery Management in Microgrids

In this paper, a novel optimal self-learning battery sequential control scheme is investigated for smart home energy systems. Using the iterative adaptive dynamic programming (ADP) technique, the optimal battery control can be obtained iteratively. Considering the power constraints of the battery, a new non-quadratic form performance index function is established, which guarantees the value of the iterative control law not to exceed the maximum charging/discharging power of the battery to extend the service life of the battery. Simulation results are given to illustrate the performance of the presented method.

Qinglai Wei, Derong Liu

### Risk Sensitive Reinforcement Learning Scheme Is Suitable for Learning on a Budget

Risk-sensitive reinforcement learning (Risk-sensitiveRL) has been studied by many researchers. The methods are based on a prospect method, which imitates the value function of a human. Although they are mainly intended at imitating human behaviors, there are fewer discussions about the engineering meaning of it. In this paper, we show that Risk-sensitiveRL is useful for using online-learning machines whose resources are limited. In such a learning method, a part of the learned memories should be removed to create space for recording a new important instance. The experimental results show that risk-sensitive RL is superior to normal RL. This might mean that the human brain is also constructed by a limited number of neurons, so that humans hire the risk-sensitive value function for the learning.

Kazuyoshi Kato, Koichiro Yamauchi

### A Kernel-Based Sarsa() Algorithm with Clustering-Based Sample Sparsification

In the past several decades, as a significant class of solutions to the large scale or continuous space control problems, kernel-based reinforcement learning (KBRL) methods have been a research hotspot. While the existing sample sparsification methods of KBRL exist the problems of low time efficiency and poor effect. For this problem, we propose a new sample sparsification method, clustering-based novelty criterion (CNC), which combines a clustering algorithm with a distance-based novelty criterion. Besides, we propose a clustering-based selective kernel Sarsa($$\lambda$$) (CSKS($$\lambda$$)) on the basis of CNC, which applies Sarsa($$\lambda$$) to learning parameters of the selective kernel-based value function based on local validity. Finally, we illustrate that our CSKS($$\lambda$$) surpasses other state-of-the-art algorithms by Acrobot experiment.

Haijun Zhu, Fei Zhu, Yuchen Fu, Quan Liu, Jianwei Zhai, Cijia Sun, Peng Zhang

### Sparse Kernel-Based Least Squares Temporal Difference with Prioritized Sweeping

How to improve the efficiency of the algorithms to solve the large scale or continuous space reinforcement learning (RL) problems has been a hot research. Kernel-based least squares temporal difference(KLSTD) algorithm can solve continuous space RL problems. But it has the problem of high computational complexity because of kernel-based and complex matrix computation. For the problem, this paper proposes an algorithm named sparse kernel-based least squares temporal difference with prioritized sweeping (PS-SKLSTD). PS-SKLSTD consists of two parts: learning and planning. In the learning process, we exploit the ALD-based sparse kernel function to represent value function and update the parameter vectors based on the Sherman-Morrison equation. In the planning process, we use prioritized sweeping method to select the current updated state-action pair. The experimental results demonstrate that PS-SKLSTD has better performance on convergence and calculation efficiency than KLSTD.

Cijia Sun, Xinghong Ling, Yuchen Fu, Quan Liu, Haijun Zhu, Jianwei Zhai, Peng Zhang

### Vietnamese POS Tagging for Social Media Text

This paper presents an empirical study on Vietnamese part-of-speech (POS) tagging for social media text, which shows several challenges compared with tagging for general text. Social media text does not always conform to formal grammars and correct spelling. It also uses abbreviations, foreign words, and icons frequently. A POS tagger developed for conventional, edited text would perform poorly on such noisy data. We address this problem by proposing a tagging model based on Conditional random fields with various kinds of features for Vietnamese social media text. We introduce a corpus for POS tagging, which consists of more than four thousands sentences from Facebook, the most popular social network in Vietnam. Using this corpus, we performed a series of experiments to evaluate the proposed model. Our model achieved 88.26 % tagging accuracy, which is 11.27 % improvement over a state-of-the-art Vietnamese POS tagger developed for general, conventional text.

Ngo Xuan Bach, Nguyen Dieu Linh, Tu Minh Phuong

### Scaled Conjugate Gradient Learning for Quaternion-Valued Neural Networks

This paper presents the deduction of the scaled conjugate gradient method for training quaternion-valued feedforward neural networks, using the framework of the HR calculus. The performances of the scaled conjugate algorithm in the real- and complex-valued cases lead to the idea of extending it to the quaternion domain, also. Experiments done using the proposed training method on time series prediction applications showed a significant performance improvement over the quaternion gradient descent and quaternion conjugate gradient algorithms.

### Performance of Qubit Neural Network in Chaotic Time Series Forecasting

In recent years, quantum inspired neural networks have been applied to various practical problems since their proposal. Here we investigate whether our qubit neural network(QNN) leads to an advantage over the conventional (real-valued) neural network(NN) in the forecasting of chaotic time series. QNN is constructed from a set of qubit neuron, of which internal state is a coherent superposition of qubit states. In this paper, we evaluate the performance of QNN through a prediction of well-known Lorentz attractor, which produces chaotic time series by three dynamical systems. The experimental results show that QNN can forecast time series more precisely, compared with the conventional NN. In addition, we found that QNN outperforms the conventional NN by reconstructing the trajectories of Lorentz attractor.

Taisei Ueguchi, Nobuyuki Matsui, Teijiro Isokawa

### The Evolutionary Process of Image Transition in Conjunction with Box and Strip Mutation

Evolutionary algorithms have been used in many ways to generate digital art. We study how evolutionary processes are used for evolutionary art and present a new approach to the transition of images. Our main idea is to define evolutionary processes for digital image transition, combining different variants of mutation and evolutionary mechanisms. We introduce box and strip mutation operators which are specifically designed for image transition. Our experimental results show that the process of an evolutionary algorithm in combination with these mutation operators can be used as a valuable way to produce unique generative art.

Aneta Neumann, Bradley Alexander, Frank Neumann

### A Preliminary Model for Understanding How Life Experiences Generate Human Emotions and Behavioural Responses

Whilst human emotional and behaviour responses are generated via a complex mechanism, understanding this process is important for a broader range of applications that span over clinical disciplines including psychiatry and psychology, and computer science. Even though there is a large body of literature and established findings in clinical disciplines, these are under-utilised in developing more realistic computational models. This paper presents a preliminary model based on the integration of a number of established theories in clinical psychology and psychiatry through an interdisciplinary research effort.

D. A. Irosh P. Fernando, Björn Rüffer

### Artificial Bee Colony Algorithm Based on Neighboring Information Learning

Artificial bee colony (ABC) algorithm is one of the most effective and efficient swarm intelligence algorithms for global numerical optimization, which is inspired by the intelligent foraging behavior of honey bees and has shown good performance in most case. However, due to its solution search equation is good at exploration but poor at exploitation, ABC often suffers from a slow convergence speed. In order to solve this concerning issue, in this paper, we propose a novel artificial bee colony algorithm based on neighboring information learning (called NILABC), in which the employed bees and onlooker bees search candidate food source by learning the valuable information from the best food source among their neighbors. Furthermore, the size of the neighbors is linearly increased with the evolutionary process, which is used to ensure the employed bees and onlooker bees obtain the guidance from the best solution in local area at the early stage and the best solution in the global area at the late stage. Through the comparison of NILABC with the basic ABC and some other variants of ABC on 22 benchmark functions, the experimental results demonstrate that NILABC is better than the compared algorithms on most cases in terms of solution quality, robustness and convergence speed.

Laizhong Cui, Genghui Li, Qiuzhen Lin, Jianyong Chen, Nan Lu, Guanjing Zhang

### Data-Driven Design of Type-2 Fuzzy Logic System by Merging Type-1 Fuzzy Logic Systems

Type-2 fuzzy logic systems (T2 FLSs) have shown their superiorities in many real-world applications. With the exponential growth of data, it is a time consuming task to directly design a satisfactory T2 FLS through data-driven methods. This paper presents an ensembling approach based data-driven method to construct T2 FLS through merging type-1 fuzzy logic systems (T1 FLSs) which are generated using the popular ANFIS method. Firstly, T1FLSs are constructed using the ANFIS method based on the sub-data sets. Then, an ensembling approach is proposed to merge the constructed T1 FLSs in order to generate a T2 FLS. Finally, the constructed T2 FLS is applied to a wind speed prediction problem. Simulation and comparison results show that, compared with the well-known BPNN and ANFIS, the proposed method have similar performance but greatly reduced training time.

Chengdong Li, Li Wang, Zixiang Ding, Guiqing Zhang

### Memetic Cooperative Neuro-Evolution for Chaotic Time Series Prediction

Cooperative neuro-evolution has shown to be promising for chaotic time series problem as it provides global search features using evolutionary algorithms. Back-propagation features gradient descent as a local search method that has the ability to give competing results. A synergy between the methods is needed in order to exploit their features and achieve better performance. Memetic algorithms incorporate local search methods for enhancing the balance between diversification and intensification. We present a memetic cooperative neuro-evolution method that features gradient descent for chaotic time series prediction. The results show that the proposed method utilizes lower computational costs while achieving higher prediction accuracy when compared to related methods. In comparison to related methods from the literature, the proposed method has favorable results for highly noisy and chaotic time series problems.

Gary Wong, Rohitash Chandra, Anuraganand Sharma

### SLA Management Framework to Avoid Violation in Cloud

Cloud computing is an emerging technology that have a broad scope to offers a wide range of services to revolutionize the existing IT infrastructure. This internet based technology offers a services like – on demand service, shared resources, multitenant architecture, scalability, portability, elasticity and giving an illusion of having an infinite resource by a consumer through virtualization. Because of the elastic nature of a cloud it is very critical of a service provider specially for a small/medium cloud provider to form a viable SLA with a consumer to avoid any service violation. SLA is a key agreement that need to be intelligently form and monitor, and if there is a chance of service violation then a provider should be informed to take necessary remedial action to avoid violation. In this paper we propose our viable SLA management framework that comprise of two time phases – pre-interaction time phase and post-interaction time phase. Our viable SLA framework help a service provider in making a decision of a consumer request, offer the amount of resources to consumer, predict QoS parameters, monitor run time QoS parameters and take an appropriate action to mitigate risks when there is a variation between a predicted and an agreed QoS parameters.

### Pattern Retrieval by Quaternionic Associative Memory with Dual Connections

An associative memory based on Hopfield-type neural network, called Quaternionic Hopfield Associative Memory with Dual Connection (QHAMDC), is presented and analyzed in this paper. The state of a neuron, input, output, and connection weights are encoded by quaternion, a class of hypercomplex number systems with non-commutativity for its multiplications. In QHAMDC, calculation for an internal state of a neuron is conducted by two types of multiplications for neuron’s output and connection weight. This makes robustness of the proposed associative memory for retrieval of patterns. The experimental results show that the performances of retrieving patterns by QHAMDC are superior to those by the previous QHAM.

Toshifumi Minemoto, Teijiro Isokawa, Masaki Kobayashi, Haruhiko Nishimura, Nobuyuki Matsui

### A GPU Implementation of a Bat Algorithm Trained Neural Network

In recent times, there has been an exponential growth in the viability of Neural Networks (NN) as a Machine Learning tool. Most standard training algorithms for NNs, like gradient descent and its variants fall prey to local optima. Metaheuristics have been found to be a viable alternative to traditional training methods. Among these metaheuristics the Bat Algorithm (BA), has been shown to be superior. Even though BA promises better results, yet being a population based metaheuristic, it forces us to involve many Neural Networks and evaluate them on nearly every iteration. This makes the already computationally expensive task of training a NN even more so. To overcome this problem, we exploit the inherent concurrent characteristics of both NNs as well as BA to design a framework which utilizes the massively parallel architecture of Graphics Processing Units (GPUs). Our framework is able to offer speed-ups of upto 47$$\times$$ depending on the architecture of the NN.

Amit Roy Choudhury, Rishabh Jain, Kapil Sharma

### Investigating a Dictionary-Based Non-negative Matrix Factorization in Superimposed Digits Classification Tasks

Human visual system can recognize superimposed graphical components with ease while sophisticated computer vision systems still struggle to recognize them. This may be attributed to the fact that the image recognition task is framed as a classification task where a classification model is commonly constructed from appearance features. Hence, superimposed components are perceived as a single image unit. It seems logical to approach the recognition of superimposed digits by employing an approach that supports construction/deconstruction of superimposed components. Here, we resort to a dictionary-based non-negative matrix factorization (NMF). The dictionary-based NMF factors a given superimposed digit matrix, V, into the combination of entries in the dictionary matrix W. The H matrix from $$V \approx WH$$ can be interpreted as corresponding superimposed digits. This work investigates three different dictionary representations: pixels’ intensity, Fourier coefficients and activations from RBM hidden layers. The results show that (i) NMF can be employed as a classifier and (ii) dictionary-based NMF is capable of classifying superimposed digits with only a small set of dictionary entries derived from single digits.

Somnuk Phon-Amnuaisuk, Soo-Young Lee

### A Swarm Intelligence Algorithm Inspired by Twitter

For many years, evolutionary computation researchers have been trying to extract the swarm intelligence from biological systems in nature. Series of algorithms proposed by imitating animals’ behaviours have established themselves as effective means for solving optimization problems. However these bio-inspired methods are not yet satisfactory enough because the behaviour models they reference, such as the foraging birds and bees, are too simple to handle different problems. In this paper, by studying a more complicated behaviour model, human’s social behaviour pattern on Twitter which is an influential social media and popular among billions of users, we propose a new algorithm named Twitter Optimization (TO). TO is able to solve most of the real-parameter optimization problems by imitating human’s social actions on Twitter: following, tweeting and retweeting. The experiments show that, TO has a good performance on the benchmark functions.

Zhihui Lv, Furao Shen, Jinxi Zhao, Tao Zhu

### Collaborative Filtering, Matrix Factorization and Population Based Search: The Nexus Unveiled

Collaborative Filtering attempts to solve the problem of recommending m items by n users where the data is represented as an $$n \times m$$ matrix. A popular method is to assume that the solution lies in a low dimensional space, and the task then reduces to the one of inferring the latent factors in that space. Matrix Factorization attempts to find those latent factors by treating it as a matrix completion task. The inference is done by minimizing an objective function by gradient descent. While it’s a robust technique, a major drawback of it is that gradient descent tends to get stuck in local minima for non-convex functions. In this paper we propose four frameworks which are novel combinations of population-based heuristics with gradient descent. We show results from extensive experiments on the large scale MovieLens dataset and demonstrate that our approach provides better and more consistent solutions than gradient descent alone.

Ayangleima Laishram, Satya Prakash Sahu, Vineet Padmanabhan, Siba Kumar Udgata

### Adaptive Hausdorff Distances and Tangent Distance Adaptation for Transformation Invariant Classification Learning

Tangent distances (TDs) are important concepts for data manifold distance description in machine learning. In this paper we show that the Hausdorff distance is equivalent to the TD for certain conditions. Hence, we prove the metric properties for TDs. Thereafter, we consider those TDs as dissimilarity measure in learning vector quantization (LVQ) for classification learning of class distributions with high variability. Particularly, we integrate the TD in the learning scheme of LVQ to obtain a TD adaption during LVQ learning. The TD approach extends the classical prototype concept to affine subspaces. This leads to a high topological richness compared to prototypes as points in the data space. By the manifold theory of TDs we can ensure that the affine subspaces are aligned in directions of invariant transformations with respect to class discrimination. We demonstrate the superiority of this new approach by two examples.

Sascha Saralajew, David Nebel, Thomas Villmann

### Semi-supervised Classification by Nuclear-Norm Based Transductive Label Propagation

In this paper, we propose a new transductive label propagation method, Nuclear-norm based Transductive Label Propagation (N-TLP). To encode the neighborhood reconstruction error more accurately and reliably, we use the nuclear norm that has been proved to be more robust to noise and more suitable to model the reconstruction error than both L1-norm or Frobenius norm for characterizing the manifold smoothing degree. During the optimizations, the Nuclear-norm based reconstruction error term is transformed into the Frobenius norm based one for pursuing the solution. To enhance the robustness in the process of encoding the difference between initial labels and predicted ones, we propose to use a weighted L2,1-norm regularization on the label fitness error so that the resulted measurement would be more accurate. Promising results on several benchmark datasets are delivered by our N-TLP compared with several other related methods.

Lei Jia, Zhao Zhang, Yan Zhang

### Effective and Efficient Multi-label Feature Selection Approaches via Modifying Hilbert-Schmidt Independence Criterion

Hilbert-Schmidt independence criterion (HSIC) is a nonparametric dependence measure to depict all modes of dependencies between two sets of variables via matrix trace. When this criterion with linear feature and label kernels is directly applied to multi-label feature selection, an efficient feature ranking is achieved using diagonal elements, which considers only feature-label relevance. But non-diagonal elements essentially characterize feature-feature conditional redundancy. In this paper, two novel criteria are defined by all matrix elements. For a candidate feature, we both maximize its relevance and minimize its average or maximal redundancy. Then an efficient hybrid strategy combining simple feature ranking and sequential forward selection is implemented, where the former sorts all features in descending order using their relevance and the latter finds out the top discriminative features with relevance maximization and redundancy minimization. Experiments on four data sets illustrate that our proposed methods are effective and efficient, compared with several existing techniques, according to classification performance and computational efficiency.

Jianhua Xu

### Storm Surge Prediction for Louisiana Coast Using Artificial Neural Networks

Storm surge, an offshore rise of water level caused by hurricanes, often results in flooding which is a severe devastation to human lives and properties in coastal regions. It is imperative to make timely and accurate prediction of storm surge levels in order to mitigate the impacts of hurricanes. Traditional process-based numerical models for storm surge prediction suffer from the limitation of high computational demands making timely forecast difficult. In this work, an Artificial Neural Network (ANN) based system is developed to predict storm surge in coastal areas of Louisiana. Simulated and historical storm data are collected for model training and testing, respectively. Experiments are performed using historical hurricane parameters and surge data at tidal stations during hurricane events from the National Oceanic and Atmospheric Administration (NOAA). Analysis of the results show that our ANN-based storm surge predictor produces accurate predictions efficiently.

Qian Wang, Jianhua Chen, Kelin Hu

### Factorization of Multiple Tensors for Supervised Feature Extraction

Tensors are effective representations for complex and time-varying networks. The factorization of a tensor provides a high-quality low-rank compact basis for each dimension of the tensor, which facilitates the interpretation of important structures of the represented data. Many existing tensor factorization (TF) methods assume there is one tensor that needs to be decomposed to low-rank factors. However in practice, data are usually generated from different time periods or by different class labels, which are represented by a sequence of multiple tensors associated with different labels. When one needs to analyse and compare multiple tensors, existing TF methods are unsuitable for discovering all potentially useful patterns, as they usually fail to discover either common or unique factors among the tensors: (1) if each tensor is factorized separately, the factor matrices will fail to explicitly capture the common information shared by different tensors, and (2) if tensors are concatenated together to form a larger “overall” tensor and then factorize this concatenated tensor, the intrinsic unique subspaces that are specific to each tensor will be lost. The cause of such an issue is mainly from the fact that existing tensor factorization methods handle data observations in an unsupervised way, considering only features but not labels of the data. To tackle this problem, we design a novel probabilistic tensor factorization model that takes both features and class labels of tensors into account, and produces informative common and unique factors of all tensors simultaneously. Experiment results on feature extraction in classification problems demonstrate the effectiveness of the factors discovered by our method.

Wei Liu

### A Non-linear Label Compression Coding Method Based on Five-Layer Auto-Encoder for Multi-label Classification

In multi-label classification, high-dimensional and sparse binary label vectors usually make existing multi-label classifiers perform unsatisfactorily, which induces a group of label compression coding (LCC) techniques particularly. So far, several linear LCC methods have been introduced via considering linear relations among labels. In this paper, we extend traditional three-layer auto-encoder to construct a five-layer one (i.e., five-layer symmetrical neural network), and then apply the training principle in extreme learning machine to determine all network weights. Therefore, a non-linear LCC approach is proposed to capture non-linear relations of labels, where the first three-layer network is regarded as a encoder and the last two layers act as a decoder. The experimental results on three benchmark data sets show that our proposed method performs better than four existing linear LCC methods according to five performance measures.

Jiapeng Luo, Lei Cao, Jianhua Xu

### Fast Agglomerative Information Bottleneck Based Trajectory Clustering

Clustering is an important data mining technique for trajectory analysis. The agglomerative Information Bottleneck (aIB) principle is efficient for obtaining an optimal number of clusters without the direct use of a trajectory distance measure. In this paper, we propose a novel approach to trajectory clustering, fast agglomerative Information Bottleneck (faIB), to speed up aIB by two strategies. The first strategy is to do “clipping” based on the so-called feature space, calculating information losses only on fewer cluster pairs. The second is to select and merge more candidate clusters, reducing iterations of clustering. Remarkably, faIB considerably runs above 10 times faster than aIB achieving almost the same clustering performance. In addition, extensive experiments on both synthetic and real datasets demonstrate that faIB performs better than the clustering approaches widely used in practice.

Yuejun Guo, Qing Xu, Yang Fan, Sheng Liang, Mateu Sbert

### Anomaly Detection Using Correctness Matching Through a Neighborhood Rough Set

Abnormal information patterns are signals retrieved from a data source that could contain erroneous or reveal faulty behavior. Despite which signal it is, this abnormal information could affect the distribution of a real data. An anomaly detection method, i.e. Neighborhood Rough Set with Correctness Matching (NRSCM) is presented in this paper to identify abnormal information (outliers). Two real-life data sets, one mixed data and one categorical data, are used to demonstrate the performance of NRSCM. The experiments positively show good performance of NRSCM in detecting anomaly.

Pey Yun Goh, Shing Chiang Tan, Wooi Ping Cheah

### Learning Class-Informed Semantic Similarity

Exponential kernel, which models semantic similarity by means of a diffusion process on a graph defined by lexicon and co-occurrence information, has been successfully applied to the task of text categorization. However, the diffusion is an unsupervised process, which fails to exploit the class information in a supervised classification scenario. To address the limitation, we present a class-informed exponential kernel to make use of the class knowledge of training documents in addition to the co-occurrence knowledge. The basic idea is to construct an augmented term-document matrix by encoding class information as additional terms and appending to training documents. Diffusion is then performed on the augmented term-document matrix. In this way, the words belonging to the same class are indirectly drawn closer to each other, hence the class-specific word correlations are strengthened. The proposed approach was demonstrated with several variants of the popular 20Newsgroup data set.

Tinghua Wang, Wei Li

### Aggregated Temporal Tensor Factorization Model for Point-of-interest Recommendation

Temporal influence plays an important role in a point-of-interest (POI) recommendation system that suggests POIs for users in location-based social networks (LBSNs). Previous studies observe that the user mobility in LBSNs exhibits distinct temporal features, summarized as periodicity, consecutiveness, and non-uniformness. By capturing the observed temporal features, a variety of systems are proposed to enhance POI recommendation. However, previous work does not model the three features together. More importantly, we observe that the temporal influence exists at different time scales, yet this observation cannot be modeled in prior work. In this paper, we propose an Aggregated Temporal Tensor Factorization (ATTF) model for POI recommendation to capture the three temporal features together, as well as at different time scales. Specifically, we employ temporal tensor factorization to model the check-in activity, subsuming the three temporal features together. Furthermore, we exploit a linear combination operator to aggregate temporal latent features’ contributions at different time scales. Experiments on two real life datasets show that the ATTF model achieves better performance than models capturing temporal influence at single scale. In addition, our proposed ATTF model outperforms the state-of-the-art methods.

Shenglin Zhao, Michael R. Lyu, Irwin King

### Multilevel–Multigroup Analysis Using a Hierarchical Tensor SOM Network

This paper describes a method of multilevel–multigroup analysis based on a nonlinear multiway dimensionality reduction. To analyze a set of groups in terms of the probabilistic distribution of their constituent member data, the proposed method uses a hierarchical pair of tensor self-organizing maps (TSOMs), one for the member analysis and the other for the group analysis. This architecture enables more flexible analysis than ordinary parametric multilevel analysis, as it retains a high level of translatability supported by strong visualization. Furthermore, this architecture provides a consistent and seamless computation method for multilevel–multigroup analysis by integrating two different levels into a hierarchical tensor SOM network. The proposed method is applied to a dataset of football teams in a university league, and successfully visualizes the types of players that constitute each team as well as the differences or similarities between the teams.

Hideaki Ishibashi, Ryota Shinriki, Hirohisa Isogai, Tetsuo Furukawa

### A Wavelet Deep Belief Network-Based Classifier for Medical Images

Accurately and quickly classifying high dimensional data using machine learning and data mining techniques is problematic and challenging. This paper proposes an efficient and effective technique to properly extract high level features from medical images using a deep network and precisely classify them using support vector machine. A wavelet filter is applied at the first step of the proposed method to obtain the informative coefficient matrix of each image and to reduce dimensionality of feature space. A four-layer deep belief network is also utilized to extract high level features. These features are then fed to a support vector machine to perform accurate classification. Comparative empirical results demonstrate the strength, precision, and fast-response of the proposed technique.

Amin Khatami, Abbas Khosravi, Chee Peng Lim, Saeid Nahavandi

### Bayesian Neural Networks Based Bootstrap Aggregating for Tropical Cyclone Tracks Prediction in South China Sea

Accurate forecasting of Tropical Cyclone Track (TCT) is very important to cope with the associated disasters. The main objective in the presented study is to develop models to deliver more accurate forecasts of TCT over the South China Sea (SCS) and its coastal regions with 24 h lead time. The model proposed in this study is a Bayesian Neural Network (BNN) based committee machine using bagging (bootstrap aggregating). Two-layered Bayesian neural networks are employed as committee members in the committee machine. Forecast error is measured by calculating the distance between the real position and forecast position of the tropical cyclone. A decrease of 5.6 km in mean forecast error is obtained by our proposed model compared to the stepwise regression model, which is widely used in TCTs forecast. The experimental results indicated that BNN based committee machine using bagging for TCT forecast is an effective approach with improved accuracy.

Lei Zhu, Jian Jin, Alex J. Cannon, William W. Hsieh

### Credit Card Fraud Detection Using Convolutional Neural Networks

Credit card is becoming more and more popular in financial transactions, at the same time frauds are also increasing. Conventional methods use rule-based expert systems to detect fraud behaviors, neglecting diverse situations, extreme imbalance of positive and negative samples. In this paper, we propose a CNN-based fraud detection framework, to capture the intrinsic patterns of fraud behaviors learned from labeled data. Abundant transaction data is represented by a feature matrix, on which a convolutional neural network is applied to identify a set of latent patterns for each sample. Experiments on real-world massive transactions of a major commercial bank demonstrate its superior performance compared with some state-of-the-art methods.

Kang Fu, Dawei Cheng, Yi Tu, Liqing Zhang

### An Efficient Data Extraction Framework for Mining Wireless Sensor Networks

Behavioral patterns for sensors have received a great deal of attention recently due to their usefulness in capturing the temporal relations between sensors in wireless sensor networks. To discover these patterns, we need to collect the behavioral data that represents the sensor’s activities over time from the sensor database that attached with a well-equipped central node called sink for further analysis. However, given the limited resources of sensor nodes, an effective data collection method is required for collecting the behavioral data efficiently. In this paper, we introduce a new framework for behavioral patterns called associated-correlated sensor patterns and also propose a MapReduce based new paradigm for extract data from the wireless sensor network by distributed away. Extensive performance study shows that the proposed method is capable to reduce the data size almost 50 % compared to the centralized model.

Md. Mamunur Rashid, Iqbal Gondal, Joarder Kamruzzaman

### Incorporating Prior Knowledge into Context-Aware Recommendation

In many recommendation applications, like music and movies recommendation, describing the features of items heavily relies on user-generated contents, especially social tags. They suffer from serious problems including redundancy and self-contradiction. Direct exploitation of them in a recommender system leads to reduced performance. However, few systems have taken this problem into consideration.In this paper, we propose a novel framework named as prior knowledge based context aware recommender (PKCAR). We incorporate Dirichlet Forrest priors to encode prior knowledge about item features into our model to deal with the redundancy, and self-contradiction problems. We also develop an algorithm which automatically mine prior knowledge using co-occurrence, lexical and semantic features. We evaluate our framework on two datasets from different domains. Experimental results show that our approach performs better than systems without leveraging prior knowledge about item features.

Haitao Zheng, Xiaoxi Mao

### Unsupervised Video Hashing by Exploiting Spatio-Temporal Feature

Video hashing is a common solution for content-based video retrieval by encoding high-dimensional feature vectors into short binary codes. Videos not only have spatial structure inside each frame but also have temporal correlation structure between frames, while the latter has been largely neglected by many existing methods. Therefore, in this paper we propose to perform video hashing by incorporating the temporal structure as well as the conventional spatial structure. Specifically, the spatial features of videos are obtained by utilizing Convolutional Neural Network (CNN), and the temporal features are established via Long-Short Term Memory (LSTM). The proposed spatio-temporal feature learning framework can be applied to many existing unsupervised hashing methods such as Iterative Quantization (ITQ), Spectral Hashing (SH), and others. Experimental results on the UCF-101 dataset indicate that by simultaneously employing the temporal features and spatial features, our hashing method is able to significantly improve the performance of existing methods which only deploy the spatial feature.

Chao Ma, Yun Gu, Wei Liu, Jie Yang, Xiangjian He

### Selective Dropout for Deep Neural Networks

Dropout has been proven to be an effective method for reducing overfitting in deep artificial neural networks. We present 3 new alternative methods for performing dropout on a deep neural network which improves the effectiveness of the dropout method over the same training period. These methods select neurons to be dropped through statistical values calculated using a neurons change in weight, the average size of a neuron’s weights, and the output variance of a neuron. We found that increasing the probability of dropping neurons with smaller values of these statistics and decreasing the probability of those with larger statistics gave an improved result in training over 10,000 epochs. The most effective of these was found to be the Output Variance method, giving an average improvement of 1.17 % accuracy over traditional dropout methods.

Erik Barrow, Mark Eastwood, Chrisina Jayne

### Real-Time Action Recognition in Surveillance Videos Using ConvNets

The explosive growth of surveillance cameras and its 7 * 24 recording period brings massive surveillance videos data. Therefore how to efficiently retrieve the rare but important event information inside the videos is eager to be solved. Recently deep convolutinal networks shows its outstanding performance in event recognition on general videos. Hence we study the characteristic of surveillance video context and propose a very competitive ConvNets approach for real-time event recognition on surveillance videos. Our approach adopts two-steam ConvNets to respectively recognition spatial and temporal information of one action. In particular, we propose to use fast feature cascades and motion history image as the template of spatial and temporal stream. We conducted our experiments on UCF-ARG and UT-interaction dataset. The experimental results show that our approach acquires superior recognition accuracy and runs in real-time.

Sheng Luo, Haojin Yang, Cheng Wang, Xiaoyin Che, Christoph Meinel

### An Architecture Design Method of Deep Convolutional Neural Network

Deep Convolutional Neural Network (DCNN) is a kind of multi layer neural network models. In these years, the DCNN is attracting the attention since it shows the state-of-the-arts performance in the image and speech recognition tasks. However, the design for the architecture of the DCNN has not so much discussed since we have not found effective guideline to construct. In this research, we focus on within-class variance of SVM histogram proposed in our previous work [8]. We try to apply it as a clue for modifying the architecture of a DCNN, and confirm the modified DCNN shows better performance than that of the original one.

Satoshi Suzuki, Hayaru Shouno

### Investigation of the Efficiency of Unsupervised Learning for Multi-task Classification in Convolutional Neural Network

In this paper, we analyze the efficiency of unsupervised learning features in multi-task classification, where the unsupervised learning is used as initialization of Convolutional Neural Network (CNN) which is trained by a supervised learning for multi-task classification. The proposed method is based on Convolution Auto Encoder (CAE), which maintains the original structure of the target model including pooling layers for the proper comparison with supervised learning case. Experimental results show the efficiency of the proposed feature extraction method based on unsupervised learning in multi-task classification related with facial information. The unsupervised learning can produce more discriminative features than those by supervised learning for multi-task classification.

Jonghong Kim, Gil-Jin Jang, Minho Lee

### Sparse Auto-encoder with Smoothed Regularization

To obtain a satisfying deep network, it is important to improve the performance on data representation of an auto-encoder. One of the strategies to enhance the performance is to incorporate sparsity into an auto-encoder. Fortunately, sparsity for the auto-encoder has been achieved by adding a Kullback-Leibler (KL) divergence term to the risk functional. In compressive sensing and machine learning, it is well known that the $$l_1$$ regularization is a widely used technique which can induce sparsity. Thus, this paper introduces a smoothed $$l_1$$ regularization instead of the mostly used KL divergence to enforce sparsity for auto-encoders. Experimental results show that the smoothed $$l_1$$ regularization works better than the KL divergence.

Li Zhang, Yaping Lu, Zhao Zhang, Bangjun Wang, Fanzhang Li

### Encoding Multi-resolution Two-Stream CNNs for Action Recognition

This paper deals with automatic human action recognition in videos. Rather than considering traditional hand-craft features such as HOG, HOF and MBH, we explore how to learn both static and motion features from CNNs trained on large-scale datasets such as ImagNet and UCF101. We propose a novel method named multi-resolution latent concept descriptor (mLCD) to encode two-stream CNNs. Entensive experiments are conducted to demonstrate the performance of the proposed model. By combining our mLCD features with the improved dense trajectory features, we can achieve comparable performance with state-of-the-art algorithms on both Hollywood2 and Olympic Sports datasets.

Weichen Xue, Haohua Zhao, Liqing Zhang

### Improving Neural Network Generalization by Combining Parallel Circuits with Dropout

In an attempt to solve the lengthy training times of neural networks, we proposed Parallel Circuits (PCs), a biologically inspired architecture. Previous work has shown that this approach fails to maintain generalization performance in spite of achieving sharp speed gains. To address this issue, and motivated by the way Dropout prevents node co-adaption, in this paper, we suggest an improvement by extending Dropout to the PC architecture. The paper provides multiple insights into this combination, including a variety of fusion approaches. Experiments show promising results in which improved error rates are achieved in most cases, whilst maintaining the speed advantage of the PC approach.

Kien Tuong Phan, Tomas Henrique Maul, Tuong Thuy Vu, Weng Kin Lai

### Predicting Multiple Pregrasping Poses by Combining Deep Convolutional Neural Networks with Mixture Density Networks

In this paper, we propose a deep neural network to predict the pregrasp poses of a three-dimensional (3D) object. Specifically, a single RGB-D image is used to determine multiple pregrasp position of three fingers of the robotic hand for various poses of known or unknown objects. Multiple pregrasping pose prediction typically involves the use of complex multi-valued functions where standard regression models fail. To this end, we propose a deep neural network containing a variant of the traditional deep convolutional neural network as well as a mixture density network. Furthermore, in order to overcome the difficulty of learning with insufficient data in the first part of the proposed network, we develop a supervised learning technique to pretrain the variant of the convolutional neural network.

Sungphill Moon, Youngbin Park, Il Hong Suh

### Recurrent Neural Networks for Adaptive Feature Acquisition

We propose to tackle the cost-sensitive learning problem, where each feature is associated to a particular acquisition cost. We propose a new model with the following key properties: (i) it acquires features in an adaptive way, (ii) features can be acquired per block (several at a time) so that this model can deal with high dimensional data, and (iii) it relies on representation-learning ideas. The effectiveness of this approach is demonstrated on several experiments considering a variety of datasets and with different cost settings.

Gabriella Contardo, Ludovic Denoyer, Thierry Artières

### Stacked Robust Autoencoder for Classification

In this work we propose an l p -norm data fidelity constraint for training the autoencoder. Usually the Euclidean distance is used for this purpose; we generalize the l 2 -norm to the l p -norm; smaller values of p make the problem robust to outliers. The ensuing optimization problem is solved using the Augmented Lagrangian approach. The proposed l p -norm Autoencoder has been tested on benchmark deep learning datasets – MNIST, CIFAR-10 and SVHN. We have seen that the proposed robust autoencoder yields better results than the standard autoencoder (l 2 -norm) and deep belief network for all of these problems.

Janki Mehta, Kavya Gupta, Anupriya Gogna, Angshul Majumdar, Saket Anand

### Pedestrian Detection Using Deep Channel Features in Monocular Image Sequences

In this paper, we propose the Deep Channel Features as an extension to Channel Features for pedestrian detection. Instead of using hand-crafted features, our method automatically learns deep channel features as a mid-level feature by using a convolutional neural network. The network is pretrained by the unsupervised sparse filtering and a group of filters is learned for each channel. Combining the learned deep channel features with other low-level channel features (i.e. LUV channels, gradient magnitude channel and histogram of gradient channels) as the final feature, a boosting classifier with depth-2 decision tree as the weak classifier is learned. Our method achieves a significant detection performance on public datasets (i.e. INRIA, ETH, TUD, and CalTech).

Zhao Liu, Yang He, Yi Xie, Hongyan Gu, Chao Liu, Mingtao Pei

### Heterogeneous Multi-task Learning on Non-overlapping Datasets for Facial Landmark Detection

We propose a heterogeneous multi-task learning framework on non-overlapping datasets, where each sample has only part of the labels and the size of each dataset is different. In particular, we propose two batch sampling strategies for stochastic gradient descent to learn shared CNN representation. First one sets same number of iteration on each dataset while the latter sets same batch size ratio of one task to another. We evaluate the proposed framework by learning the facial expression recognition task and facial landmark detection task. The learned network is memory efficient and able to carry out multiple tasks for one feed forward with the shared CNN. In addition, we show that the learned network achieve more robust facial landmark detection under large variation which appears in the heterogeneous dataset, though the dataset does not include landmark labels. We also investigate the effect of weights on each cost function and batch size ratio of one task to another.

Takayuki Semitsu, Xiongxin Zhao, Wataru Matsumoto

### Fuzzy String Matching Using Sentence Embedding Algorithms

Fuzzy string matching has many applications. Traditional approaches mainly use the appearance information of characters or words but do not use their semantic meanings. We postulate that the latter information may also be important for this task. To validate this hypothesis, we build a pipeline in which approximate string matching is used to pre-select some candidates and sentence embedding algorithms are used to select the final results from these candidates. The aim of sentence embedding is to represent semantic meaning of the words. Two sentence embedding algorithms are tested, convolutional neural network (CNN) and averaging word2vec. Experiments show that the proposed pipeline can significantly improve the accuracy and averaging word2vec works slightly better than CNN.

Yu Rong, Xiaolin Hu

### Initializing Deep Learning Based on Latent Dirichlet Allocation for Document Classification

The gradient-descent learning of deep neural networks is subject to local minima, and good initialization may depend on the tasks. In contrast, for document classification tasks, latent Dirichlet allocation (LDA) was quite successful in extracting topic representations, but its performance was limited by its shallow architecture. In this study, LDA was adopted for efficient layer-by-layer pre-training of deep neural networks for a document classification task. Two-layer feedforward networks were added at the end of the process, and trained using a supervised learning algorithm. With 10 different random initializations, the LDA-based initialization generated a much lower mean and standard deviation for false recognition rates than other state-of-the-art initialization methods. This might demonstrate that the multi-layer expansion of probabilistic generative LDA model is capable of extracting efficient hierarchical topic representations for document classification.

Hyung-Bae Jeon, Soo-Young Lee

### Backmatter

Weitere Informationen

## BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

## Whitepaper

- ANZEIGE -

### Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.