Skip to main content
Top

2019 | Book

Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series

28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, Part IV

Editors: Igor V. Tetko, Dr. Věra Kůrková, Pavel Karpov, Prof. Fabian Theis

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

The proceedings set LNCS 11727, 11728, 11729, 11730, and 11731 constitute the proceedings of the 28th International Conference on Artificial Neural Networks, ICANN 2019, held in Munich, Germany, in September 2019. The total of 277 full papers and 43 short papers presented in these proceedings was carefully reviewed and selected from 494 submissions. They were organized in 5 volumes focusing on theoretical neural computation; deep learning; image processing; text and time series; and workshop and special sessions.

Table of Contents

Frontmatter

Text Understanding

Frontmatter
An Ensemble Model for Winning a Chinese Machine Reading Comprehension Competition

To facilitate the application of machine reading comprehension, the 28th Research Institute of China Electronics Technology Group Corporation organized a Chinese machine reading comprehension competition, namely the LES Cup Challenge, in October 2018. The competition introduces a big dataset of long articles and improperly labelled data, therefore challenges the state-of-the-art methods in this area. We proposed an ensemble model of four novel recurrent neural networks, which ranked on the top 2% over more than 250 teams (97 teams successfully submitted results) mainly from top universities and AI companies of China, and won the third prize (3000 USD) of the competition.

Jun He, Yongjing Cheng, Min Wang, Jingyu Xie, Wei Xie, Rui Su, Shandong Yuan, Yao Cui
Dependent Multilevel Interaction Network for Natural Language Inference

Neural networks have attracted great attention for natural language inference in recent years. Interactions between the premise and the hypothesis have been proved to be effective in improving the representations. Existing methods mainly focused on a single interaction, while multiple interactions have not been well studied. In this paper, we propose a dependent multilevel interaction (DMI) Network which models multiple interactions between the premise and the hypothesis to boost the performance of natural language inference. In specific, a single-interaction unit (SIU) structure with a novel combining attention mechanism is presented to capture features in an interaction. Then, we cascade a serial of SIUs in a multilevel interaction layer to obtain more comprehensive features. Experiments on two benchmark datasets, namely SciTail and SNLI, show the effectiveness of our proposed model. Our model outperforms the state-of-the-art approaches on the SciTail dataset without using any external resources. For the SNLI dataset, our model also achieves competitive results.

Yun Li, Yan Yang, Yong Deng, Qinmin Vivian Hu, Chengcai Chen, Liang He, Zhou Yu
Learning to Explain Chinese Slang Words

The explosive development of social media has generated a large number of slang words in Chinese social network. The appearance of Chinese slang words has affected the accuracy of reading comprehension and word segmentation tasks. In this paper, we propose explaining Chinese slang word automatically for the first time. Unlike matching words in dictionary, we use a novel neural network called DCEAnn (a Dual Character-level Encoder using Attention-based neural network) for this specific task. One encodes slang word and its phonetics to learn the word representation, the other encodes example sentence containing slang word to enrich the semantic information of the slang word. Besides, we propose a public dataset for the first time to deal with the absence of parallel corpus for training model. Manual evaluation of experimental results shows that our model can generate reasonable explanations. Furthermore, we find that our model has a better performance on the network digital language which only contains numbers. To be specific, we get the state-of-the-art result on Chinese slang words interpretation whose BLEU score is 23.64, 3.59 higher than our baseline, and the state-of-the-art result on network digital language interpretation whose BLEU score is 54.23, 3.18 higher than our baseline.

Chuanrun Yi, Dong Wang, Chunyu He, Ying Sha
Attention-Based Improved BLSTM-CNN for Relation Classification

Relation Classification as a foundational task with regard to many other natural language processing (NLP) tasks, has caught many attentions in recent years. In this paper, we propose a novel network architecture called Attention-Based Improved Bidirectional Long Short-Term Memory and Convolutional Neural Network (AI-BLSTM-CNN) for this task. To be specific, we take improved BLSTM that makes the utmost of sequential context information and word information in order to obtain temporal features and high-level contextual representation. Besides, attention mechanism is applied to improved BLSTM making it focus on the segments of a sentence related to the relation automatically. Finally, we take advantage of CNN to capture the local important features for relation classification. The experimental results on SemEval-2010 Task 8 and KBP37 benchmark datasets show that AI-BLSTM-CNN achieves better performance than the majority of existing methods.

Qifeng Xiao, Ming Gao, Shaochun Wu, Xiaoqi Sun
An Improved Method of Applying a Machine Translation Model to a Chinese Word Segmentation Task

In recent years, a new approach of processing Chinese word segmentation (CWS) as a machine translation (MT) problem has emerged in CWS task research. However, directly applying the MT model to CWS task would introduce translation errors and result in poor word segmentation. In this paper, we propose a novel method named Translation Correcting to solve this problem. Based on the differences between CWS and MT, Translation Correcting eliminates translation errors by utilizing the information of a sentence that needs to be segmented during the translation process. Consequently, the performance of word segmentation is considerably improved. Additionally, We get a new model called CWSTransformer, which is obtained by improving the MT model Transformer using Translation Correcting. The experiment compares the performances of CWSTransformer, Transformer and the previous translation-based CWS model on the benchmark datasets, PKU and MSR. The experimental results show that CWSTransformer outperforms Transformer and the previous translation-based CWS model.

Yuekun Wei, Binbin Qu, Nan Hu, Liu Han
Interdependence Model for Multi-label Classification

The multi-label classification problem is a supervised learning problem that aims to predict multiple labels for each data instance. One of the key issues in designing multi-label learning approaches is how to incorporate dependencies among different labels. In this study, we propose a new approach called the interdependence model, which consists of a set of single-label predictors each of which predicts a particular label using the other labels. The proposed model can directly consider label interdependencies by reusing arbitrary conventional probabilistic models for single-label classification. We consider three prediction methods and one accelerated method for making predictions with the interdependence model. Experiments show the superior prediction performance of the proposed methods in several evaluation metrics, especially when there is a large number of candidate labels or when labels are partially given in the test phase.

Kosuke Yoshimura, Tomoaki Iwase, Yukino Baba, Hisashi Kashima
Combining Deep Learning and (Structural) Feature-Based Classification Methods for Copyright-Protected PDF Documents

This document describes the implementation of a copyright classification process for user-contributed Portable Document Format (PDF) documents. The implementation employs two ways to classify documents as copyright-protected or non-copyright-protected: first, using structural features extracted from the document metadata, content and underlying document structure; and second, by turning the documents into images and using their pixels to generate features for semi-supervised deep convolutional networks.

Renato Garita Figueiredo, Kai-Uwe Kühnberger, Gordon Pipa, Tobias Thelen

Sentiment Classification

Frontmatter
Collaborative Attention Network with Word and N-Gram Sequences Modeling for Sentiment Classification

Current state-of-the-art models for sentiment classification are CNN-RNN-based models. These models combine CNN and RNN in two ways: parallel models or serial models. Parallel models use CNN to capture n-grams and RNN to model word sequences, while serial models feed n-grams into RNN to model n-gram sequences. However, these models are different from the way humans read text. Intuitively, humans read text by capturing semantic elements that are made up of both words and n-grams. To tackle this problem, we propose a collaborative attention network with word and n-gram sequences modeling. Our model jointly processes sequences in both word granularity and n-gram granularity to form text embedding with collaborative attention. It utilizes a LSTM encoder to capture long-term dependencies among words and a CNN-LSTM component to capture long-term dependencies among n-grams on the same text. Next we incorporate these two parts via an attention mechanism to highlight keywords in sentences. Experimental results show our model effectively outperforms other state-of-the-art CNN-RNN-based models on several public datasets of sentiment classification.

Junwei Bao, Liang Zhang, Bo Han
Targeted Sentiment Classification with Attentional Encoder Network

Targeted sentiment classification aims at determining the sentimental tendency towards specific targets. Most of the previous approaches model context and target words with RNN and attention. However, RNNs are difficult to parallelize and truncated backpropagation through time brings difficulty in remembering long-term patterns. To address this issue, this paper proposes an Attentional Encoder Network (AEN) which eschews recurrence and employs attention based encoders for the modeling between context and target. We raise the label unreliability issue and introduce label smoothing regularization. We also apply pre-trained BERT to this task and obtain new state-of-the-art results. Experiments and analysis demonstrate the effectiveness and lightweight of our model.

Youwei Song, Jiahai Wang, Tao Jiang, Zhiyue Liu, Yanghui Rao
Capturing User and Product Information for Sentiment Classification via Hierarchical Separated Attention and Neural Collaborative Filtering

Sentiment classification which aims to predict a user’s sentiment about a product is becoming more and more useful and important. Some neural network methods achieved improvement by capturing user and product information. However, these methods fail to incorporate user preferences and product characteristics reasonably and effectively. What’s more, these methods all only use the explicit influences observed in texts and ignore the implicit interaction influences between user and product which cannot be observed in texts. In this paper, we propose a novel neural network model HUPSA-NCF (Hierarchical User Product Separated Attention and Neural Collaborative Filtering Network) to address these issues. Firstly, our model uses hierarchical user and product separated attention on BiLSTM to incorporate user preferences and product characteristics into specific text representations respectively. Secondly, our model uses neural collaborative filtering to capture the implicit interaction influences between user and product. Lastly, our model makes full use of both explicit and implicit informations for final classification. Experimental results show that our model outperforms state-of-the-art methods on IMDB and Yelp datasets.

Minghui Yan, Changjian Wang, Ying Sha
Imbalanced Sentiment Classification Enhanced with Discourse Marker

Imbalanced data commonly exists in real world, especially in sentiment-related corpus, making it difficult to train a classifier to distinguish latent sentiment in text data. We observe that humans often express transitional emotion between two adjacent discourses with discourse markers like “but”, “though”, “while”, etc., and the head discourse and the tail discourse usually indicate opposite emotional tendencies. Based on this observation, we propose a novel plug-and-play method, which first samples discourses according to transitional discourse markers and then validates sentimental polarities with the help of a pre-trained attention-based model. Our method increases sample diversity in the first place, obtaining a expanded dataset with relatively low imbalanced-ratio, can serve as a upstream preprocessing part in data augmentation. We conduct experiments on three public sentiment datasets, with several frequently used algorithms. Results show that our method is found to be consistently effective, even in highly imbalanced scenario, and easily be integrated with oversampling method to boost the performance on imbalanced sentiment classification.

Tao Zhang, Xing Wu, Meng Lin, Jizhong Han, Songlin Hu
Revising Attention with Position for Aspect-Level Sentiment Classification

As a fine-grained classification task, aspect-level sentiment classification aims at determining the sentiment polarity given a particular target in a sentence. The key point of this task is to distinguish target-related words and target-unrelated words. To this end, attention mechanism is introduced into this task, which assigns high attention weights to target-related words and ignores target-unrelated words according to the semantic relationships between context words and target. However, existing work not explicitly take into account the position information of context words when calculating the attention weights. Actually, position information is very important for detecting the relevance of the word to target, where words that are closer to the target usually make a greater contribution for determining the sentiment polarity. In this work, we propose a novel approach to combine position information and attention mechanism. We get the position distribution according to the distances between context words and target, then leverage the position distribution to modify the attention weight distribution. In addition, considering that sentiment polarity is usually represented by a phrase, we use CNN for sentiment classification which can capture local n-gram features. We test our model on two public benchmark datasets from SemEval 2014, and the experimental results demonstrate the effectiveness of our approach.

Dong Wang, Tingwen Liu, Bin Wang
Surrounding-Based Attention Networks for Aspect-Level Sentiment Classification

Aspect-level sentiment classification aims to identify the polarity of a target word in a sentence. Studies on sentiment classification have found that a target’s surrounding words have great impacts and global attention to the target. However, existing neural-network-based models either depend on expensive phrase-level annotation or do not fully exploit the association of the context words to the target. In this paper, we propose to model the influences of the target’s surrounding words via two unidirectional long short-term memory neural networks, and introduce a target-based attention mechanism to discover the underlying relationship between the target and the context words. Empirical results on the SemEval 2014 Datasets show that our approach outperforms many competitive sentiment classification baseline methods. Detailed analysis demonstrates the effectiveness of the proposed surrounding-based long-short memory neural networks and the target-based attention mechanism.

Yuheng Sun, Xianchen Wang, Hongtao Liu, Wenjun Wang, Pengfei Jiao

Human Reaction Prediction

Frontmatter
Mid Roll Advertisement Placement Using Multi Modal Emotion Analysis

In recent years, owing to the ever-increasing consumer base of video content over the internet, promoting business via advertising between the videos has become a powerful strategy. Mid roll ads are the video ads that are played between the content of a video being watched by the user. While a lot of research has already been done in the field of analyzing the context of the video to suggest relevant ads, little has been done in the field of effective placement of the ads so that it does not deteriorate users’ experience. In this paper, we are proposing a new model to suggest at which particular spot in a video, an advertisement should be placed such that most people will watch more of the ad. This is done using emotion, text, action, audio and video analysis of different scenes of a video under consideration.

Sumanu Rawat, Aman Chopra, Siddhartha Singh, Shobhit Sinha
DCAR: Deep Collaborative Autoencoder for Recommendation with Implicit Feedback

In recent years, deep neural networks have been widely applied to recommender systems. Although there are extensive explorations of deep neural networks on the collaborative filtering problem in item recommendation, most of the existing methods employ a similar loss function, i.e., the prediction loss of user-item interactions, and only change the form of the input, which may limit the model’s performance. To address this problem, we present a novel framework, named DCAR, short for Deep Collaborative Autoencoder for Recommendation. Specifically, with the implicit feedback matrix as the input, we employ the autoencoder module to obtain the latent representations of users and items respectively. Then, to predict the matching score of corresponding user-item pairs, an interaction prediction module is designed based on the neural network architecture. The two parts are coupled together and employ alternating training to learn. We conduct extensive experiments on several real-world datasets and the results empirically verify the superior performance of DCAR on item recommendation. The code related to this paper is available at: https://github.com/strange-jiong/DCAR .

Jiong Wang, Neng Gao, Jia Peng, Jingjie Mo
Jointly Learning to Detect Emotions and Predict Facebook Reactions

The growing ubiquity of Social Media data offers an attractive perspective for improving the quality of machine learning-based models in several fields, ranging from Computer Vision to Natural Language Processing. In this paper we focus on Facebook posts paired with “reactions” of multiple users, and we investigate their relationships with classes of emotions that are typically considered in the task of emotion detection. We are inspired by the idea of introducing a connection between reactions and emotions by means of First-Order Logic formulas, and we propose an end-to-end neural model that is able to jointly learn to detect emotions and predict Facebook reactions in a multi-task environment, where the logic formulas are converted into polynomial constraints. Our model is trained using a large collection of unsupervised texts together with data labeled with emotion classes and Facebook posts that include reactions. An extended experimental analysis that leverages a large collection of Facebook posts shows that the tasks of emotion classification and reaction prediction can both benefit from their interaction.

Lisa Graziani, Stefano Melacci, Marco Gori
Discriminative Feature Learning for Speech Emotion Recognition

It is encouraged to see that the deep neural networks based speech emotion recognition (DNN-SER) models have achieved the state-of-the-art on public datasets. However, the performance of DNN-SER models is limited due to the following reasons: insufficient training data, emotion ambiguity and class imbalance. Studies show that, without large-scale training data, it is hard for DNN-SER model with cross-entropy loss to learn discriminative features by mapping the speech segments to their category labels. In this study, we propose a deep metric learning based DNN-SER model to facilitate the discriminative feature learning by constraining the feature embeddings in the feature space. For the proof of the concept, we take a four-hidden layer DNN as our backbone for implementation simplicity. Specifically, an emotion identity matrix is formed using one-hot label vectors as supervision information while the emotion embedding matrix is formed using the embedding vectors generated by DNN. An affinity loss is designed based on the above two matrices to simultaneously maximize the inter-class separability and intra-class compactness of the embeddings. Moreover, to restrain the class imbalance problem, the focal loss is introduced to reduce the adverse effect of the majority well-classified samples and gain more focus on the minority misclassified ones. Our proposed DNN-SER model is jointly trained using affinity loss and focal loss. Extensive experiments have been conducted on two well-known emotional speech datasets, EMO-DB and IEMOCAP. Compared to DNN-SER baseline, the unweighted accuracy (UA) on EMO-DB and IEMOCAP increased relatively by 10.19% and 10% respectively. Besides, from the confusion matrix of the test results on Emo-DB, it is noted that the accuracy of the most confusing emotion category, ‘Happiness’, increased relatively by 33.17% and the accuracy of the emotion category with the fewest samples, ‘Disgust’, increased relatively by 13.62%. These results validate the effectiveness of our proposed DNN-SER model and give the evidence that affinity loss and focal loss help to learn better discriminative features.

Yuying Zhang, Yuexian Zou, Junyi Peng, Danqing Luo, Dongyan Huang

Judgment Prediction

Frontmatter
A Judicial Sentencing Method Based on Fused Deep Neural Networks

Nowadays, the judicial system has been hard to satisfy the growing judicial needs of the people. Therefore, the introduction of artificial intelligence into the judicial field is an inevitable trend. This paper incorporates deep learning into intelligent judicial sentencing and proposes a comprehensive network fusion model based on massive legal documents. The proposed method combines multiple networks, e.g., recurrent neural network and convolutional neural network, in the procedure of sentencing prediction. Specially, we use text classification and post-classification regression to predict the defendant’s conviction, articles of law related to the case and prison term. Moreover, we use the simulated gradient descent method to build a fusion model. Experimental results on legal documents datasets justify the effectiveness of the proposed method in sentencing prediction. The fused network model outperforms each individual model in terms of higher accuracy and stability when predicting the conviction, law article and prison term.

Yuhan Yin, Hongtian Yang, Zhihong Zhao, Songyu Chen
SECaps: A Sequence Enhanced Capsule Model for Charge Prediction

Automatic charge prediction aims to predict appropriate final charges according to the fact descriptions for a given criminal case. Automatic charge prediction plays a critical role in assisting judges and lawyers to improve the efficiency of legal decisions, and thus has received much attention. Nevertheless, most existing works on automatic charge prediction perform adequately on high-frequency charges but are not yet capable of predicting few-shot charges with limited cases. In this paper, we propose a Sequence Enhanced Capsule model, dubbed as SECaps model, to relieve this problem. Specifically, following the work of capsule networks, we propose the seq-caps layer, which considers sequence information and spatial information of legal texts simultaneously. Then we design an attention residual unit, which provides auxiliary information for charge prediction. In addition, SECaps model introduces focal loss, which relieves the problem of imbalanced charges. Comparing the state-of-the-art methods, SECaps model obtains 4.5% and 6.4% absolutely considerable improvements under Macro F1 in Criminal-S and Criminal-L respectively. The experimental results consistently demonstrate the superiorities and competitiveness of SECaps model.

Congqing He, Li Peng, Yuquan Le, Jiawei He, Xiangyu Zhu
Learning to Predict Charges for Judgment with Legal Graph

The automatic charge prediction aims to predict the result of the judgment through fact descriptions in criminal cases, which is an important application of intelligent legal judgment system. Generally, this task can be formalized into a multi-label prediction task (i.e., we treat fact descriptions as inputs, and charges as labels). Most previous works on this task usually exploit informative features from fact descriptions for prediction while ignoring the charge space information (e.g., co-occurrence relation of charges or descriptions of charges). To better explore the charge space, in this paper, we propose to establish a Legal Graph Network (LGN for short) to solve this problem. Specifically, LGN fuses all the charge information (i.e., charge descriptions or correlations) into a unified legal graph. Based on the legal graph, four types of charge relations are designed to capture informative relations among charges. Then LGN embeds these relations to learn the robust charge representations. Finally both charge representations and fact representations are fed into an attention-based neural network for prediction. Experimental results on three datasets show that the model we proposed can significantly outperform state-of-the-art multi-label classification methods.

Si Chen, Pengfei Wang, Wei Fang, Xingchen Deng, Feng Zhang
A Recurrent Attention Network for Judgment Prediction

Judgment prediction is a critical technique in legal field. Judges usually scan both of the fact descriptions and articles repeatedly to select valuables information for a correct match (i.e., determine the correct articles for a given fact description). Previous works only analyze semantics to the corresponding articles, while the repeated semantic interactions between fact descriptions and articles are ignored, thus the performance may be limited. In this paper, we propose a novel Recurrent Attention Network (RAN for short) to address this issue. Specifically, RAN utilizes a LSTM to obtain both fact description and article representations, then a recurrent process is designed to model the iterative interactions between fact descriptions and articles to make a correct match. Experimental results on real-world datasets demonstrate that our proposed model achieves significant improvements over the state-of-the-art methods.

Ze Yang, Pengfei Wang, Lei Zhang, Linjun Shou, Wenwen Xu

Text Generation

Frontmatter
Symmetrical Adversarial Training Network: A Novel Model for Text Generation

Text generation has always been the core issue in the field of natural language processing. Over the past decades, Generative Adversarial network (GAN) has proven its great potential in generating realistic synthetic data, performing competitively in various domains like computer vision. However, the characteristics of text discretization limit the application of GANs in natural language processing. In this paper, we proposed a novel Symmetrical Adversarial Training Network (SATN) which employed symmetrical text comparison mechanism for the purpose of generating more realistic and coherent text samples. In the SATN, a Deep Attention Similarity Model (DASM) was designed to extract fine-grained original-synthetic sentence feature match loss for improving the performance of generative network. With DASM, the SATN can identify the difference between sentences in word level and pay attention to relevant meaningful words. Meanwhile, we utilize the DASM loss to compensate for the defect of the objective function in adversarial training. Our experiments demonstrated significant improvement in evaluation.

Yongzhen Gao, ChongJun Wang
A Novel Image Captioning Method Based on Generative Adversarial Networks

Although the image captioning methods based on RNN has made great progress in recent years, these are often lacking in variability and ignore some minor information. In this paper, a novel image captioning method based on Generative Adversarial Networks is proposed, which improve the naturalness and diversity of image description. In the method, matcher is added to the generator to get the feature of the image that does not appear in the standard description, then to produce descriptions conditioned on image, and discriminator to access how well a description fits the visual content. It is noteworthy that training a sequence generator is nontrivial. Experiments on MSCOCO and Flickr30k show that it performed competitively against real people in our user study and outperformed other methods on various tasks.

Yang Fan, Jungang Xu, Yingfei Sun, Yiyu Wang
Quality-Diversity Summarization with Unsupervised Autoencoders

This paper introduces a novel perspective on unlabeled data driven technology for extractive summarization. Because unsupervised autoencoders, combined with neural network language models, help to capture deep semantic features for sentence quality, we propose to integrate autoencoders with sampling method based on Determinantal point processes (DPPs) [1] to extract diverse sentences with high qualities, and generate brief summaries. The unique fusion of unsupervised autoencoders and DPPs sampling has never been adopted before. We illustrate the advantages of this attempt against statistics based approaches through experiments in multilingual environment for single-document and multi-document summarization tasks. Our algorithms evaluated with ROUGE F-measure [2] obtain better scores in several varieties of languages on MMS-2015 dataset and MSS-2015 dataset.

Lei Li, Zuying Huang, Natalia Vanetik, Marina Litvak
Conditional GANs for Image Captioning with Sentiments

The area of automatic image captioning has witnessed much progress recently. However, generating captions with sentiment, which is a common dimension in human generated captions, still remains a challenge. This work presents a generative approach that combines sentiment (positive/negative) and variation for caption generation. The presented approach consists of a Generative Adversarial Network which takes as input, an image and a binary vector indicating the sentiment of the caption to be generated. We evaluate our model quantitatively on the state-of-the-art image caption dataset and qualitatively using a crowdsourcing platform. Our results, along with human evaluation prove that we competitively succeed in the task of creating variations and sentiment in image captions.

Tushar Karayil, Asif Irfan, Federico Raue, Jörn Hees, Andreas Dengel
Neural Poetry: Learning to Generate Poems Using Syllables

Motivated by the recent progresses on machine learning-based models that learn artistic styles, in this paper we focus on the problem of poem generation. This is a challenging task in which the machine has to capture the linguistic features that strongly characterize a certain poet, as well as the semantics of the poet’s production, that are influenced by his personal experiences and by his literary background. Since poetry is constructed using syllables, that regulate the form and structure of poems, we propose a syllable-based neural language model, and we describe a poem generation mechanism that is designed around the poet style, automatically selecting the most representative generations. The poetic work of a target author is usually not enough to successfully train modern deep neural networks, so we propose a multi-stage procedure that exploits non-poetic works of the same author, and also other publicly available huge corpora to learn syntax and grammar of the target language. We focus on the Italian poet Dante Alighieri, widely famous for his Divine Comedy. A quantitative and qualitative experimental analysis of the generated tercets is reported, where we included expert judges with strong background in humanistic studies. The generated tercets are frequently considered to be real by a generic population of judges, with relative difference of 56.25% with respect to the ones really authored by Dante, and expert judges perceived Dante’s style and rhymes in the generated text.

Andrea Zugarini, Stefano Melacci, Marco Maggini
Exploring the Advantages of Corpus in Neural Machine Translation of Agglutinative Language

This study solves the problem of mismatch between rigid model and varied morphology in machine translation of agglutinative language in two ways. (1) a free granularity preprocessing strategy is proposed to construct a multi-granularity mixed input. (2) the value iteration network is further added into the reinforcement learning model, and the rewards of each granularity input are converted into decision values, so that the model training has higher target and efficiency. The experimental results show that our approach has achieved significant improvement in the two representative agglutinative language machine translation tasks, including low-resource Mongolian $$\rightarrow $$ Chinese, and common Japanese $$\rightarrow $$ English, and has greatly shortened the training time.

Yatu Ji, Hongxu Hou, Nier Wu, Junjie Chen
RL Extraction of Syntax-Based Chunks for Sentence Compression

Sentence compression involves selecting key information present in the input and rewriting this information into a short, coherent text. While dependency parses have often been used for this purpose, we propose to exploit such syntactic information within a modern reinforcement learning-based extraction model. Furthermore, compared to other approaches that include syntactic features into deep learning models, we design a model that has better explainability properties and is flexible enough to support various shallow syntactic parsing modules. More specifically, we linearize the syntactic tree into the form of overlapping text segments, which are then selected with reinforcement learning and regenerated into a compressed form. Hence, despite relying on extractive components, our model is also able to handle abstractive summarization. We explore different ways of selecting subtrees from the dependency structure of the input sentence and compare the results of various models on the Gigaword corpus.

Hoa T. Le, Christophe Cerisara, Claire Gardent

Sound Processing

Frontmatter
Robust Sound Event Classification with Local Time-Frequency Information and Convolutional Neural Networks

How to effectively and accurately identify the sound event in a real-world noisy environment is still a challenging problem. Traditional methods for robust sound event classification generally perform well in clean conditions, but get worse in noisy situations. Biological evidence shows that local temporal and spectral information can be utilized for processing noise corrupted signals, motivating our novel approach for sound recognition by combining this with a convolutional neural network (CNN), one of the most popularly applied methods in acoustic processing. We use key-points (KPs) to construct a robust and sparse representation of the sound, followed by a CNN being trained as a classifier. RWCP database is used to evaluate the performance of our system. Our results show that the as-proposed KP-CNN system is effective and efficient for a robust sound event classification task in both mismatched and multi-condition environments.

Yanli Yao, Qiang Yu, Longbiao Wang, Jianwu Dang
Neuro-Spectral Audio Synthesis: Exploiting Characteristics of the Discrete Fourier Transform in the Real-Time Simulation of Musical Instruments Using Parallel Neural Networks

Two main approaches are currently prevalent in the digital emulation of musical instruments: manipulation of pre-recorded samples and techniques of real-time synthesis, generally based on physical models with varying degrees of accuracy. Concerning the first, while the processing power of present-day computers enables their use in real-time, many restrictions arising from this sample-based design persist; the huge on disk space requirements and the stiffness of musical articulations being the most prominent. On the other side of the spectrum, pure synthesis approaches, while offering greater flexibility, fail to capture and reproduce certain nuances central to the verisimilitude of the generated sound, offering a dry, synthetic output, at a high computational cost. We propose a method where ensembles of lightweight neural networks working in parallel are learned, from crafted frequency-domain features of an instrument sound spectra, an arbitrary instrument’s voice and articulations realistically and efficiently. We find that our method, while retaining perceptual sound quality on par with sampled approaches, exhibits 1/10 of latency times of industry standard real-time synthesis algorithms, and 1/100 of the disk space requirements of industry standard sample-based digital musical instruments. This method can, therefore, serve as a basis for more efficient implementations in dedicated devices, such as keyboards and electronic drumkits and in general purpose platforms, like desktops and tablets or open-source hardware like Arduino and Raspberry Pi. From a conceptual point of view, this work highlights the advantages of a closer integration of machine learning with other subjects, especially in the endeavor of new product development. Exploiting the synergy between neural networks, digital signal processing techniques and physical modelling, we illustrate the proposed method via the implementation of two virtual instruments: a conventional grand piano and a hibrid stringed instrument.

Carlos Tarjano, Valdecy Pereira
Ensemble of Convolutional Neural Networks for P300 Speller in Brain Computer Interface

A Brain Computer Interface (BCI) speller allows human-beings to directly spell characters using eye-gazes, thereby building communication between the human brain and a computer. Convolutional Neural Networks (CNNs) have shown better ability than traditional machine learning methods to increase the character spelling accuracy for the BCI speller. Unfortunately, current CNNs can not learn well the features related to the target signal of the BCI speller. This issue limits these CNNs from further character spelling accuracy improvements. To address this issue, we propose a network, which combines our proposed two CNNs, with an existing CNN. These three CNNs of our network extract different features related to the target BCI signal. Our network uses the ensemble of the features extracted by these CNNs for BCI character spelling. Experimental results on three benchmark datasets show that our network outperforms other methods in most cases, with a significant spelling accuracy improvement up to 38.72%. In addition, the communication speed of the P300 speller based on our network is up to 2.56 times faster than the communication speed of the P300 speller based on other methods.

Hongchang Shan, Yu Liu, Todor Stefanov

Time Series and Forecasting

Frontmatter
Deep Recurrent Neural Networks with Nonlinear Masking Layers and Two-Level Estimation for Speech Separation

Over the past few decades, monaural speech separation has always been an interesting but challenging problem. The goal of speech separation is to separate a specific target speech from some background interferences and it has been treated as a signal processing problem traditionally. In recent years, with the rapid advances of deep learning techniques, deep learning has made a great breakthrough in speech separation. In this paper, recurrent neural networks (RNNs) which integrate multiple nonlinear masking layers (NMLs) to learn two-level estimation are proposed for speech separation. Experimental results show that our proposed model “RNN + SMMs + 3 NMLs” outperforms the baseline RNN without any mask in all the SDR, SIR and SAR indices, and it also obtains much better SDR and SIR than the RNN simply with original deterministic time-frequency masks.

Jiantao Zhang, Pingjian Zhang
Auto-Lag Networks for Real Valued Sequence to Sequence Prediction

Many machine learning problems involve predicting a sequence of future values of a target variable. State-of-the-art approaches for such use cases involve LSTM based sequence to sequence models. To improve they performances, those models generally use lagged values of the target variable as additional input features. Therefore, appropriate lag factor has to be chosen during feature engineering. This choice often requires business knowledge of the data. Furthermore, state-of-the-art sequence to sequence models are not designed to naturally handle hierarchical time series use cases. In this paper, we propose a novel architecture that naturally handles hierarchical time series. The contribution of this paper is thus two-folds. First we show the limitations of classical sequence to sequence models in the case of problems involving a real valued target variable, namely the error accumulation problem and we propose a novel LSTM based approach to overcome those limitations. Second, we highlight the limitations of manually selecting fixed lag values to improve the performance of a model. We then use an attention mechanism to introduce a dynamic and automatic lag factor selection that overcomes the former limitations, and requires no business knowledge of the data. We call this architecture Auto-Lag Network (AL-Net). We finally validate our Auto-Lag Net model against state-of-the-art results.

Gilles Madi Wamba, Nicolas Gaude
LSTM Prediction on Sudden Occurrence of Maintenance Operation of Air-Conditioners in Real-Time Pricing Adaptive Control

Predicting the occurrence of embedded maintenance operations in building multi-type air-conditioners is desirable during the Real-Time Pricing (RTP) scheme in the future smart grid. The maintenance operation is a kind of a high priority embedded control for complicated refrigerant circuit network in an office building. Since it suddenly operates and consumes large electric power, it becomes a big disturbance from the viewpoint of RTP control system in the cloud. In this research, we propose a model that forecasts the sudden occurrence of the maintenance operation. Since the occurrence of the operation depends on the refrigerant circuit operation history, the model is implemented as a Long Short Term Memory (LSTM) neural network. An accuracy of prediction was evaluated and then simulation experiments showed the improvement by 27% on RTP adaptive control result.

Shun Matsukawa, Chuzo Ninagawa, Junji Morikawa, Takashi Inaba, Seiji Kondo
Dynamic Ensemble Using Previous and Predicted Future Performance for Multi-step-ahead Solar Power Forecasting

We consider the task of predicting the solar power generated by a photovoltaic system, for multiple steps ahead, from previous solar power data. We propose DEN-PF, a dynamic heterogeneous ensemble of prediction models, which weights the individual predictions by considering two components – the ensemble member’s error on recent data and its predicted error for the new time points. We compare the performance of DEN-PF with dynamic ensembles using only one of these components, a static ensemble, the single models comprising the ensemble and a baseline. The evaluation is conducted on data for two years, sampled every 5 min, for prediction horizons from 5 to 180 min ahead, under three prediction strategies: direct, iterative and direct-ds, which uses downsampling. The results show the effectiveness of DEN-PF and the benefit of considering both error components for the direct and direct-ds strategies. The most accurate prediction model was DEN-PF using the direct-ds strategy.

Irena Koprinska, Mashud Rana, Ashfaqur Rahman
Timage – A Robust Time Series Classification Pipeline

Time series are series of values ordered by time. This kind of data can be found in many real world settings. Classifying time series is a difficult task and an active area of research. This paper investigates the use of transfer learning in Deep Neural Networks and a 2D representation of time series known as Recurrence Plots. In order to utilize the research done in the area of image classification, where Deep Neural Networks have achieved very good results, we use a Residual Neural Networks architecture known as ResNet. As preprocessing of time series is a major part of every time series classification pipeline, the method proposed simplifies this step and requires only few parameters. For the first time we propose a method for multi time series classification: Training a single network to classify all datasets in the archive with one network. We are among the first to evaluate the method on the latest 2018 release of the UCR archive, a well established time series classification benchmarking dataset.

Marc Wenninger, Sebastian P. Bayerl, Jochen Schmidt, Korbinian Riedhammer
Prediction of the Next Sensor Event and Its Time of Occurrence in Smart Homes

We present work on sequential sensor events in smart homes with results on the prediction of the next sensor event and its time of occurrence in the same model using Recurrent Neural Network with Long Short-Term Memory. We implement four configurations for converting binary sensor events and elapsed time between events into different input sequences. Our dataset has been collected from a real home with one resident over a period of 40 weeks and contains data from a set of fifteen sensors including motion, magnetic, and power sensors. When including the time information in the input data, the accuracy of predicting the next sensor event was 84%. In our best implementation, the model is able to predict both the next sensor event and the mean elapsed time to the next event with a peak average accuracy of 80%.

Flávia Dias Casagrande, Jim Tørresen, Evi Zouganeli
Multi-task Learning Method for Hierarchical Time Series Forecasting

Hierarchical time series is a set of time series organized by aggregation constraints and it is widely used in many real-world applications. Usually, hierarchical time series forecasting can be realized with a two-step method, in which all time series are forecasted independently and then the forecasting results are reconciled to satisfy aggregation consistency. However, these two-step methods have a high computational complexity and are unable to ensure optimal forecasts for all time series. In this paper, we propose a novel hierarchical forecasting approach to solve the above problems. Based on multi-task learning, we construct an integrated model that combines features of the bottom level series and the hierarchical structure. Then forecasts of all time series are output simultaneously and they are aggregated consistently. The model has the advantage of utilizing the correlation between time series. And the forecasting results are overall optimal by optimizing a global loss function. In order to avoid the curse of dimensionality as the number of time series grows larger, we further learn a sparse model with group sparsity and element-wise sparsity constraints according to data characteristics. The experimental results on simulation data and tourism data demonstrate that our method has a better overall performance while simplifying forecasting process.

Maoxin Yang, Qinghua Hu, Yun Wang
Demand-Prediction Architecture for Distribution Businesses Based on Multiple RNNs with Alternative Weight Update

Predicting future demand is important for reducing costs, such as under- and over-stocking cost, in distribution business. To predict item demand, a prediction model such as an autoregressive model, directly uses order histories. It is difficult to manage models since the number of models equals that of items with in such an approach. It is not easy to apply of multi-step prediction. In this research, we propose an asynchronous-updating heterogeneous stacking model (AHSM) which is based on recurrent neural networks (RNNs). AHSM has three modules for prediction: feature extractor, predictor, and inner-state generator. By using the inner-state generator, the model enables stable learning and accurate prediction. We applied AHSM to demand prediction and compared it with other models, i.e., the auto-regressive, integral and moving average model and Prophet, and RNN-based model. The results indicate that AHSM enables the accurate demand prediction even in multi-step prediction.

Yuya Okadome, Wenpeng Wei, Ryo Sakai, Toshiko Aizono
A Study of Deep Learning for Network Traffic Data Forecasting

We present a study of deep learning applied to the domain of network traffic data forecasting. This is a very important ingredient for network traffic engineering, e.g., intelligent routing, which can optimize network performance, especially in large networks. In a nutshell, we wish to predict, in advance, the bit rate for a transmission, based on low-dimensional connection metadata (“flows”) that is available whenever a communication is initiated. Our study has several genuinely new points: First, it is performed on a large dataset ( $$\approx $$ 50 million flows), which requires a new training scheme that operates on successive blocks of data since the whole dataset is too large for in-memory processing. Additionally, we are the first to propose and perform a more fine-grained prediction that distinguishes between low, medium and high bit rates instead of just “mice” and “elephant” flows. Lastly, we apply state-of-the-art visualization and clustering techniques to flow data and show that visualizations are insightful despite the heterogeneous and non-metric nature of the data. We developed a processing pipeline to handle the highly non-trivial acquisition process and allow for proper data preprocessing to be able to apply DNNs to network traffic data. We conduct DNN hyper-parameter optimization as well as feature selection experiments, which show that fine-grained network traffic forecasting is feasible, and that domain-dependent data enrichment and augmentation strategies can improve results. An outlook about the fundamental challenges presented by network traffic analysis (data throughput, unbalanced and dynamic classes, changing statistics, outlier detection) concludes the article.

Benedikt Pfülb, Christoph Hardegen, Alexander Gepperth, Sebastian Rieger
Composite Quantile Regression Long Short-Term Memory Network

Based on quantile long short-term memory (Q-LSTM), we consider the comprehensive utilization of multiple quantiles, proposing a simultaneous estimation version of Q-LSTM, composite quantile regression LSTM (CQR-LSTM). The method simultaneously estimates multiple quantile functions instead of estimating them separately. It makes sense that simultaneous estimation allows multiple quantiles to share strength among them to get better predictions. Furthermore, we also propose a novel approach, noncrossing composite quantile regression LSTM (NCQR-LSTM), to solve the quantile crossing problem. This method uses an indirect way as follows. Instead of estimating multiple quantiles directly, we estimate the intervals between adjacent quantiles. Since the intervals are guaranteed to be positive by using exponential functions, this completely avoids the problem of quantile crossing. Compared with the commonly used constraint methods for solving the quantile crossing problem, this indirect method makes model optimization easier and more suitable for deep learning. Experiments on a real wind speed dataset show that our methods improve the probabilistic prediction performance and reduce the training cost. In addition, our methods are simple to implement and highly scalable.

Zongxia Xie, Hao Wen
Short-Term Temperature Forecasting on a Several Hours Horizon

Outside temperature is an important quantity in building control. It enables improvement in inhabitant energy consumption forecast or heating requirement prediction. However most previous works on outside temperature forecasting require either a lot of computation or a lot of different sensors. In this paper we try to forecast outside temperature at a multiple hour horizon knowing only the last 24 h of temperature and computed clear-sky irradiance up to the prediction horizon. We propose the use different neural networks to predict directly at each hour of the horizon instead of using forecast of one hour to predict the next. We show that the most precise one is using one dimensional convolutions, and that the error is distributed across the year. The biggest error factor we found being unknown cloudiness at the beginning of the day. Our findings suggest that the precision improvement seen is not due to trend accuracy improvement but only due to an improvement in precision.

Louis Desportes, Pierre Andry, Inbar Fijalkow, Jérôme David
Using Long Short-Term Memory for Wavefront Prediction in Adaptive Optics

Time lag between wavefront detection and correction in Adaptive Optics (AO) systems can sometimes severely degrade its targeted performance. We propose a nonlinear predictor based on long short-term memory (LSTM) to predict open-loop wavefronts in the next time step based on a time series of past measurements. Compared with linear predictive control technique, this approach is inherently model free. Incorporation of LSTMs offer additional benefit of self tuning, which is especially favourable in terms of evolving turbulence. Numerical simulations based on a low-order single-conjugate AO (SCAO) system demonstrate over 50% reduction in bandwidth error in a relatively wide range of application scenarios. Agility and robustness against non-stationary turbulence is also demonstrated using time-variant wind profile.

Xuewen Liu, Tim Morris, Chris Saunter
Incorporating Adaptive RNN-Based Action Inference and Sensory Perception

In this paper we investigate how directional distance signals can be incorporated in RNN-based adaptive goal-direction behavior inference mechanisms, which is closely related to formalizations of active inference. It was shown previously that RNNs can be used to effectively infer goal-directed action control policies online. This is achieved by projecting hypothetical environmental interactions dependent on anticipated motor neural activities into the future, back-projecting the discrepancies between predicted and desired future states onto the motor neural activities. Here, we integrate distance signals surrounding a simulated robot flying in a 2D space into this active motor inference process. As a result, local obstacle avoidance emerges in a natural manner. We demonstrate in several experiments with static as well as dynamic obstacle constellations that a simulated flying robot controlled by our RNN-based procedure automatically avoids collisions, while pursuing goal-directed behavior. Moreover, we show that the flight direction dependent regulation of the sensory sensitivity facilitates fast and smooth traversals through tight maze-like environments. In conclusion, it appears that local and global objectives can be integrated seamlessly into RNN-based, model-predictive active inference processes, as long as the objectives do not yield competing gradients.

Sebastian Otte, Jakob Stoll, Martin V. Butz
Quality of Prediction of Daily Relativistic Electrons Flux at Geostationary Orbit by Machine Learning Methods

This study presents the results of prediction 1–3 days ahead for the daily maximum of hourly average values of relativistic electrons flux (E > 2 MeV) in the outer radiation belt of the Earth. The input physical variables were geomagnetic indexes, interplanetary magnetic field, solar wind velocity and proton density, special ultra-low frequency (ULF) indexes and hourly average values of relativistic electron flux. The phase-space for each physical component was reconstructed by time delay vectors with their own different embedding dimensions, and all of these vectors were concatenated. Next, various adaptive models were trained on this multivariate dataset. The following models were used for prediction: multi-dimensional autoregressive model, ensembles of decision trees within bagging approach, artificial neural networks of multi-layer perceptron type. The obtained results are analyzed and compared to the results of similar predictions by other authors. The best prediction quality was demonstrated by ensembles of decision trees. Also it has been demonstrated that using embedding depth based on autocorrelation function significantly improves prediction quality for one day prediction horizon.

Irina Myagkova, Alexander Efitorov, Vladimir Shiroky, Sergey Dolenko

Clustering

Frontmatter
Soft Subspace Growing Neural Gas for Data Stream Clustering

Subspace clustering aims at discovering the clusters embedded in multiple, overlapping subspaces of high dimensional data. It has been successfully applied in many domains such as financial transactions, telephone records, sensor network monitoring, website analysis, weather monitoring, etc. Data stream are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Clustering this type of data requires both time and memory restrictions. In this paper, we propose S2G-Stream based on growing neural gas and soft subspace clustering. We introduce two types of entropy weighting for both features and subspaces. Experiments on public datasets showed the ability of S2G-Stream to: (1) detect relevant features and subspaces; (2) detect clusters of arbitrary shape; (3) enhance the clustering results.

Mohammed Oualid Attaoui, Mustapha Lebbah, Nabil Keskes, Hanene Azzag, Mohammed Ghesmoune
Region Prediction from Hungarian Folk Music Using Convolutional Neural Networks

Early 20th century research on folk music and its connection to regional cultures has revealed potential clues for understanding the dynamics and organization of communities over history. Therefore, significant effort has been allocated to collecting and organizing folk music into databases both in written and recorded form. Recent years have provided great advances in the fields of data analysis and machine learning, prompting musicologists to apply these advanced statistical methods to analyze the musical remnants.The present work studies how supervised machine learning methods can be applied to analyze folk music: we train different convolutional neural network classifiers—time-content, frequency-content, black-box, convolutional recurrent neural network and time-frequency architectures—to predict folkloric regions. Results suggest that Transylvanian folkloric regions are distinguishable by the rhythmic content of their music, and while nearby villages have a higher probability of having their predicted labels swapped, the two most often confused regions are geographically remote areas having historically motivated similarity.

Anna Kiss, Csaba Sulyok, Zalán Bodó
Merging DBSCAN and Density Peak for Robust Clustering

In data clustering, density based algorithms are well known for the ability of detecting clusters of arbitrary shapes. DBSCAN is a widely used density based clustering approach, and the recently proposed density peak algorithm has shown significant potential in experiments. However, the DBSCAN algorithm may misclassify border data points of small density as noises and does not work well with large density variance across clusters, and the density peak algorithm has a large dependence on the detected cluster centers. To circumvent these problems, we make a study of these two algorithms and find that they have some complementary properties. We then propose to combine these two algorithms to overcome their problems. Specifically, we use the DP algorithm to detect cluster centers and then determine the parameters for DBSCAN adaptively. After DBSCAN clustering, we further use the DP algorithm to include border data points of small density into clusters. By combining the complementary properties of these two algorithms, we manage to relieve the problems of DBSCAN and avoid the drawbacks of the density peak algorithm in the meanwhile. Our algorithm is tested with synthetic and real datasets, and is demonstrated to perform better than DBSCAN and density peak algorithms, as well as some other clustering algorithms.

Jian Hou, Chengcong Lv, Aihua Zhang, Xu E
Market Basket Analysis Using Boltzmann Machines

In this paper we present a proposal to analyze market baskets using minimum spanning trees, based on couplings between products. The couplings are the result of a learning process with Boltzmann machines from transactional databases, in which the interaction between the different offers of the market are modeled as a network composed by magnetic dipoles of spins that can be in two states ( $$+$$ 1 or −1). The results offer a systematic way to explore potential courses of action to determine promotions and offers for the retail manager.

Mauricio A. Valle, Gonzalo A. Ruz
Dimensionality Reduction for Clustering and Cluster Tracking of Cytometry Data

Mass cytometry is a new high-throughput technology that is becoming a cornerstone in immunology and cell biology research. With technological advancement, the number of cellular characteristics cytometry can simultaneously quantify grows, making analysis increasingly computationally onerous. In this paper, we investigate the potential of dimensionality reduction techniques to ease computational burden in clustering cytometry data whilst minimally diminishing clustering performance. We explore 3 such techniques: Principal Component Analysis (PCA), Autoencoders (AE) and Uniform Manifold Approximation and Projection (UMAP). Thereafter we employ a recent clustering algorithm, ChronoClust, which clusters data at each time-point into cell populations and explicitly tracks them over time. We evaluate this approach through a 14-dimensional cytometry dataset describing the immune response to West Nile Virus over 8 days in mice. To obtain a broad sample of clustering performance, each of the four datasets (unreduced, PCA-, AE- and UMAP-reduced) is independently clustered 400 times, using 400 unique ChronoClust parameter value sets. We find that PCA and AE can reduce the computational expense whilst incurring a minimal degradation in clustering and cluster tracking performance.

Givanna H. Putri, Mark N. Read, Irena Koprinska, Thomas M. Ashhurst, Nicholas J. C. King
Improving Deep Image Clustering with Spatial Transformer Layers

Image clustering is an important but challenging task in machine learning. As in most image processing areas, the latest improvements came from models based on the deep learning approach. However, classical deep learning methods have problems to deal with spatial image transformations like scale and rotation. In this paper, we propose the use of visual attention techniques to reduce this problem in image clustering methods. We evaluate the combination of a deep image clustering model called Deep Adaptive Clustering (DAC) with the Spatial Transformer Networks (STN). The proposed model is evaluated in the datasets MNIST and FashionMNIST and outperformed the baseline model.

Thiago V. M. Souza, Cleber Zanchettin
Collaborative Non-negative Matrix Factorization

Non-negative matrix factorization is a machine learning technique that is used to decompose large data matrices imposing the non-negativity constraints on the factors. This technique has received a significant amount of attention as an important problem with many applications in different areas such as language modeling, text mining, clustering, music transcription, and neurobiology (gene separation). In this paper, we propose a new approach called Collaborative Non-negative Matrix Factorization ( $$NMF_{Collab}$$ ) which is based on the collaboration between several NMF (Non-negative Matrix Factorization) models. Our approach $$NMF_{Collab}$$ was validated on variant datasets and the experimental results show the effectiveness of the proposed approach.

Kaoutar Benlamine, Nistor Grozavu, Younès Bennani, Basarab Matei

Anomaly Detection of Sequential Data

Frontmatter
Cosine Similarity Drift Detector

Concept drift detection algorithms have several applications. For example, nowadays many systems are interconnected by computer networks and generate a lot of data constantly over time (data stream). Thus, it is essential to detect when this data flow presents an abnormal behavior as this might be an attack on the security of the network. This paper proposes CSDD, a new method that uses the Cosine similarity and windowing techniques to compare recent and older data and detect concept drifts. To validate it, experiments were run with both synthetic and real-world datasets and using Naive Bayes and Hoeffding Tree as base learners. The accuracy results were evaluated using a variation of the Friedman test and the Bonferroni-Dunn post-hoc test, whereas the detections were evaluated using several metrics including the mean distance ( $$\mu $$ D), False Positives (FP), false Negatives (FN), precision, recall, and Matthews Correlation Coefficient (MCC). The experimental results show the effectiveness of CSDD in scenarios with abrupt and gradual changes as it delivered the best results in nearly all artificial datasets.

Juan Isidro González Hidalgo, Laura Maria Palomino Mariño, Roberto Souto Maior de Barros
Unsupervised Anomaly Detection Using Optimal Transport for Predictive Maintenance

Anomaly detection is of crucial importance in industrial environment, especially in the context of predictive maintenance. As it is very costly to add an extra monitoring layer on production machines, non-invasive solutions are favored to watch for precursory clue indicating the possible need for a maintenance operation. Those clues are to be detected in evolving and highly variable working environment, calling for online and unsupervised methods. This contribution proposes a framework grounded in optimal transport, for the specific characterization of a system and the automatic detection of abnormal events. This method is evaluated on acoustic dataset and demonstrate the superiority of metrics derived from optimal transport on the Euclidean ones. The proposed method is shown to outperform one-class SVM on real datasets, which is the state-of-the-art method for anomaly detection.

Amina Alaoui-Belghiti, Sylvain Chevallier, Eric Monacelli
Robust Gait Authentication Using Autoencoder and Decision Tree

Various biometric authentication technologies have been developed for protecting smartphones against unauthorized access. Most authentication methods provide highly accurate authentication; however, an unlocked device can be used freely until it is re-locked. This study proposes a robust gait-based authentication method that identifies various walking styles using only a smartphone accelerometer. However, walking motion is dependent on individuals and their walking style. Based on features extracted from acceleration data, the proposed method first introduces a decision tree for classifying walking style prior to verifying identity. Then, identification is performed using the reconstruction error of the autoencoder for a specified walking style. Results confirm the effectiveness of the proposed method, which utilizes the novel approach of combining two simple methods to achieve superior performance.

Mitsuhiro Ogihara, Hideyuki Mizuno
MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks

Many real-world cyber-physical systems (CPSs) are engineered for mission-critical tasks and usually are prime targets for cyber-attacks. The rich sensor data in CPSs can be continuously monitored for intrusion events through anomaly detection. On one hand, conventional supervised anomaly detection methods are unable to exploit the large amounts of data due to the lack of labelled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system when detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs), using the Long-Short-Term-Memory Recurrent Neural Networks (LSTM-RNN) as the base models (namely, the generator and discriminator) in the GAN framework to capture the temporal correlation of time series distributions. Instead of treating each data stream independently, our proposed Multivariate Anomaly Detection with GAN (MAD-GAN) framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies through discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPSs: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results show that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-attacks inserted in these complex real-world systems.

Dan Li, Dacheng Chen, Baihong Jin, Lei Shi, Jonathan Goh, See-Kiong Ng
Intrusion Detection via Wide and Deep Model

Intrusion detection system is designed to detect threats and attacks, which are especially important in nowadays’ constantly emerging information security incidents. There has been a lot of work devoted to realizing anomaly detection mode of intrusion detection via deep learning, since deep learning becomes a research hot spot. However, there is rarely work that uses different deep learning networks as hybrid architecture to benefit the advantages of each special part. In this paper, we are inspired by the Google’s Wide & Deep model which is proposed to combine memorization with generalization via different networks. We propose a framework to use Wide & Deep model for intrusion detection. To get comprehensive categorical representations of continuous features, we use a density-based clustering (DBSCAN) to convert the KDD’99 $$\backslash $$ NSL_KDD format features into sparse categorical feature representations. A widely used and popular NSL_KDD dataset is used for evaluating the proposed model. A comprehensive empirical evaluation with hypothesis testing demonstrates that the revised Wide & Deep framework outperforms the separated part alone. Compared with other machine learning base line methods and advanced deep learning methods, the proposed model outperforms the baseline results and achieves a steady and promising performance in tests with different levels.

Zhipeng Li, Zheng Qin, Pengbo Shen
Towards Attention Based Vulnerability Discovery Using Source Code Representation

Vulnerability discovery in software is an important task in the field of computer security. As vulnerabilities can be abused to enable cyber criminals and other malicious actors to exploit systems, it is crucial to keep software as free from vulnerabilities as is possible. Traditional approaches often comprise code scanning tasks to find specific and already-known classes of cyber vulnerabilities. However these approaches do not in general discover new classes of vulnerabilities. In this paper, we leverage a machine learning approach to model source code representation using syntax, semantics and control flow of source code and to infer vulnerable code patterns to tackle large code bases and identify potential vulnerabilities that missed by any existing static software analysis tools. In addition, our attention-based bidirectional long short-term memory framework adaptively localise regions of code illustrating where the possible vulnerable code fragment exists. The highlighted region may provide informative guidance to human developers or security experts. The experimental results demonstrate the feasibility of the proposed approach in the problem of software vulnerability discovery.

Junae Kim, David Hubczenko, Paul Montague
Convolutional Recurrent Neural Networks for Computer Network Analysis

The paper proposes a method of computer network user detection with recurrent neural networks. We use long short-term memory and gated recurrent unit neural networks. To present URLs from computer network sessions to the neural networks, we add convolutional input layers. Moreover, we transform requested URLs by one-hot character-level encoding. We show detailed analysis and comparison of the experiments with the aforementioned neural networks. The system was checked on real network data collected in a local municipal network. It can classify network users; hence, it can also detect anomalies and security compromises.

Jakub Nowak, Marcin Korytkowski, Rafał Scherer
Backmatter
Metadata
Title
Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series
Editors
Igor V. Tetko
Dr. Věra Kůrková
Pavel Karpov
Prof. Fabian Theis
Copyright Year
2019
Electronic ISBN
978-3-030-30490-4
Print ISBN
978-3-030-30489-8
DOI
https://doi.org/10.1007/978-3-030-30490-4

Premium Partner