Skip to main content
main-content

Über dieses Buch

This book constitutes the post-conference proceedings of the 5th International Conference on Machine Learning, Optimization, and Data Science, LOD 2019, held in Siena, Italy, in September 2019. The 54 full papers presented were carefully reviewed and selected from 158 submissions. The papers cover topics in the field of machine learning, artificial intelligence, reinforcement learning, computational optimization and data science presenting a substantial array of ideas, technologies, algorithms, methods and applications.

Inhaltsverzeichnis

Frontmatter

Deep Neural Network Ensembles

Current deep neural networks suffer from two problems; first, they are hard to interpret, and second, they suffer from overfitting. There have been many attempts to define interpretability in neural networks, but they typically lack causality or generality. A myriad of regularization techniques have been developed to prevent overfitting, and this has driven deep learning to become the hot topic it is today; however, while most regularization techniques are justified empirically and even intuitively, there is not much underlying theory. This paper argues that to extract the features used in neural networks to make decisions, it’s important to look at the paths between clusters existing in the hidden spaces of neural networks. These features are of particular interest because they reflect the true decision-making process of the neural network. This analysis is then furthered to present an ensemble algorithm for arbitrary neural networks which has guarantees for test accuracy. Finally, a discussion detailing the aforementioned guarantees is introduced and the implications to neural networks, including an intuitive explanation for all current regularization methods, are presented. The ensemble algorithm has generated state-of-the-art results for Wide-ResNets on CIFAR-10 (top 5 for all models) and has improved test accuracy for all models it has been applied to.

Sean Tao

Driver Distraction Detection Using Deep Neural Network

Driver distraction, drunk driving and speed are three main causes of fatal car crashes. Interacting with intricated infotainment system of modern cars increases the driver’s cognitive load notably and consequently, it increases the chance of car accident. Analyzing driver behavior using machine learning methods is one of the suggested solutions to detect driver distraction and cognitive load. A variety of machine learning methods and data types have been used to detect driver status or observe the environment to detect dangerous situations. In many applications with a huge dataset, deep learning methods outperform other machine learning approaches since they can extract very intricated patterns from enormous datasets and learn them accurately. We conducted an experiment, using a car simulator, in eight contexts of driving including four distracted and four non-distracted contexts. We used a deep neural network to detect the context of driving using driving data which have collected by the simulator automatically. We analyzed the effect of the depth and width of the network on the train and test accuracy and found the most optimum depth and width of the network. We can detect driver status with 93% accuracy.

Shokoufeh Monjezi Kouchak, Ashraf Gaffar

Deep Learning Algorithms for Complex Pattern Recognition in Ultrasonic Sensors Arrays

Nowadays, applications of ultrasonic proximity sensors are limited to a post-processing of the acquired signals with a pipeline of filters and threshold comparators. This article proposes two different and novel processing methodologies, based on machine learning algorithms, that outperform classical approaches. Indeed, noisy signals and presence of thin or soundproofing objects are likely sources of false positive detections that can make traditional approaches useless and unreliable. In order to take advantage of correlations among the data, multiple parallel signals, coming from a cluster of ultrasonic sensors, have been exploited, producing a number of different features that allowed to achieve more accurate and precise predictions for object detection. Firstly, model-based learning as well as instance-based learning systems have been investigated for an independent time correlation analysis of the different signals. Particular attention has been given to the training and testing of the deep fully connected network that showed, since the beginning, more promising results. In the second part, a recurrent neural network, based on long short term memory cells, has been devised. As a result of its intrinsic nature, time correlations between successive samples are not more overlooked, further improving the overall prediction capability of the system. Finally, cutting edge training methodologies and strategies to find the different hyperparameters have been adopted in order to obtain the best results and performance from the available data.

Vittorio Mazzia, Angelo Tartaglia, Marcello Chiaberge, Dario Gandini

An Information Analysis Approach into Feature Understanding of Convolutional Deep Neural Networks

This paper is centered on feature analysis and visualization of pre-trained deep neural networks based on responses of neurons to input images. In order to address this problem, first the information content of learned encodings of neurons is investigated based on the calculation of the salient activation map of each neuron. The salient activation map is considered to be the activation map that has the highest aggregative value over all its cells. Second, neurons are visualized based on their maximum activation response at each location. The results put forward the uncertainty reduction over the stage of deeper layers as well as a decrement pattern in Variation of Information.

Zahra Sadeghi

Stochastic Weight Matrix-Based Regularization Methods for Deep Neural Networks

The aim of this paper is to introduce two widely applicable regularization methods based on the direct modification of weight matrices. The first method, Weight Reinitialization, utilizes a simplified Bayesian assumption with partially resetting a sparse subset of the parameters. The second one, Weight Shuffling, introduces an entropy- and weight distribution-invariant non-white noise to the parameters. The latter can also be interpreted as an ensemble approach. The proposed methods are evaluated on benchmark datasets, such as MNIST, CIFAR-10 or the JSB Chorales database, and also on time series modeling tasks. We report gains both regarding performance and entropy of the analyzed networks. We also made our code available as a GitHub repository ( https://github.com/rpatrik96/lod-wmm-2019 ).

Patrik Reizinger, Bálint Gyires-Tóth

Quantitative and Ontology-Based Comparison of Explanations for Image Classification

Deep Learning models have recently achieved incredible performances in the Computer Vision field and are being deployed in an ever-growing range of real-life scenarios. Since they do not intrinsically provide insights of their inner decision processes, the field of eXplainable Artificial Intelligence emerged. Different XAI techniques have already been proposed, but the existing literature lacks methods to quantitatively compare different explanations, and in particular the semantic component is systematically overlooked. In this paper we introduce quantitative and ontology-based techniques and metrics in order to enrich and compare different explanations and XAI algorithms.

Valentina Ghidini, Alan Perotti, Rossano Schifanella

About Generative Aspects of Variational Autoencoders

An essential prerequisite for random generation of good quality samples in Variational Autoencoders (VAE) is that the distribution of variables in the latent space has a known distribution, typically a normal distribution N(0, 1). This should be induced by a regularization term in the loss function, minimizing for each data X, the Kullback-Leibler distance between the posterior inference distribution of latent variables Q(z|X) and N(0, 1). In this article, we investigate the marginal inference distribution Q(z) as a Gaussian Mixture Model, proving, under a few reasonable assumptions, that although the first and second moment of Q(z) might indeed be coherent with those of a normal distribution, there is no reason to believe the same for other moments; in particular, its Kurtosis is likely to be different from 3. The actual distribution of Q(z) is possibly very far from a Normal, raising doubts on the effectiveness of generative sampling according to the vanilla VAE framework.

Andrea Asperti

Adapted Random Survival Forest for Histograms to Analyze NOx Sensor Failure in Heavy Trucks

In heavy duty trucks operation, important components need to be examined regularly so that any unexpected breakdowns can be prevented. Data-driven failure prediction models can be built using operational data from a large fleet of trucks. Machine learning methods such as Random Survival Forest (RSF) can be used to generate a survival model that can predict the survival probabilities of a particular component over time. Operational data from the trucks usually have many feature variables represented as histograms. Although bins of a histogram can be considered as an independent numeric variable, dependencies among the bins might exist that could be useful and neglected when bins are treated individually. Therefore, in this article, we propose extension to the standard RSF algorithm that can handle histogram variables and use it to train survival models for a NOx sensor. The trained model is compared in terms of overall error rate with the standard RSF model where bins of a histogram are treated individually as numeric features. The experiment results shows that the adapted approach outperforms the standard approach and the feature variables considered important are ranked.

Ram B. Gurung

Incoherent Submatrix Selection via Approximate Independence Sets in Scalar Product Graphs

This paper addresses the problem of extracting the largest possible number of columns from a given matrix $$X\in \mathbb R^{n\times p}$$ in such a way that the resulting submatrix has an coherence smaller than a given threshold $$\eta $$. This problem can clearly be expressed as the one of finding a maximum cardinality stable set in the graph whose adjacency matrix is obtained by taking the componentwise absolute value of $$X^tX$$ and setting entries less than $$\eta $$ to 0 and the other entries to 1. We propose a spectral-type relaxation which boils down to optimising a quadratic function on a sphere. We prove a theoretical approximation bound for the solution of the resulting relaxed problem.

Stéphane Chrétien, Zhen Wai Olivier Ho

LIA: A Label-Independent Algorithm for Feature Selection for Supervised Learning

The current study considers an unconventional framework of unsupervised feature selection for supervised learning. We provide a new unsupervised algorithm which we call LIA, for Label-Independent Algorithm, which combines information theory and network science techniques. In addition, we present the results of an empirical study comparing LIA with a standard supervised algorithm (MRMR). The two algorithms are similar in that both minimize redundancy, but MRMR uses the labels of the instances in the input data set to maximize the relevance of the selected features. This is an advantage when such labels are available, but a shortcoming when they are not. We used cross-validation to evaluate the effectiveness of selected features for generating different well-known classifiers for a variety of publicly available data sets. The results show that the performance of classifiers using the features selected by our proposed label-independent algorithm is very close to or in some cases better than their performance when using the features selected by a common label-dependent algorithm (MRMR). Thus, LIA has potential to be useful for a wide variety of applications where dimension reduction is needed and the instances in the data set are unlabeled.

Gail Gilboa-Freedman, Alon Patelsky, Tal Sheldon

Relationship Estimation Metrics for Binary SoC Data

System-on-Chip (SoC) designs are used in every aspect of computing and their optimization is a difficult but essential task in today’s competitive market. Data taken from SoCs to achieve this is often characterised by very long concurrent bit vectors which have unknown relationships to each other. This paper explains and empirically compares the accuracy of several methods used to detect the existence of these relationships in a wide range of systems. A probabilistic model is used to construct and test a large number of SoC-like systems with known relationships which are compared with the estimated relationships to give accuracy scores. The metrics $${{\,\mathrm{\dot{C}ov}\,}}$$ and $${{\,\mathrm{\dot{D}ep}\,}}$$ based on covariance and independence are demonstrated to be the most useful, whereas metrics based on the Hamming distance and geometric approaches are shown to be less useful for detecting the presence of relationships between SoC data.

Dave McEwan, Jose Nunez-Yanez

Network Alignment Using Graphlet Signature and High Order Proximity

Network alignment problem arises in graph-based problem formulation of many computer science and biological problems. The alignment task is to identify the best one-to-one matching between vertices for a pair of networks by considering the local topology or vertex attributes or both. Existing algorithms for network alignment uses a diverse set of methodologies, such as, Eigen-decomposition of a similarity matrix, solving a quadratic assignment problem through subgradiant optimization, or heuristic-based iterative greedy matching. However, these methods are either too slow, or they have poor matching performance. Some existing methods also require extensive external node attributes as prior information for the purpose of node matching. In this paper, we develop a novel topology-based network alignment approach which we call GraphletAlign. The proposed method uses graphlet signature as node attributes and then uses a bi-partite matching algorithm for obtaining an initial alignment, which is later refined by considering higher-order matching. Our results on large real-life networks show the superiority of GraphletAlign over the existing methods; specifically, GraphletAlign’s accuracy improvement is up to 20%–72% compared to existing network alignment methods over six large real-life networks.

Aljohara Almulhim, Vachik S. Dave, Mohammad Al Hasan

Effect of Market Spread Over Reinforcement Learning Based Market Maker

Market Making (also known as liquidity providing service) is a well-known trading problem studied in multiple disciplines including Finance, Economics and Artificial Intelligence. This paper examines the impact of Market Spread over the market maker’s (or liquidity provider’s) convergence ability through testing the hypothesis that “Knowledge of market spread while learning leads to faster convergence to an optimal and less volatile market making policy”. Reinforcement Learning was used to mimic the behaviour of a liquidity provider with Limit Order Book using historical Trade and Quote data of five equities, as the trading environment. An empirical study of results obtained from experiments (comparing our reward function with benchmark) shows significant improvement in the magnitude of returns obtained by a market maker with knowledge of market spread compared to a market maker without such knowledge, which proves our stated hypothesis.

Abbas Haider, Hui Wang, Bryan Scotney, Glenn Hawe

A Beam Search for the Longest Common Subsequence Problem Guided by a Novel Approximate Expected Length Calculation

The longest common subsequence problem (LCS) aims at finding a longest string that appears as subsequence in each of a given set of input strings. This is a well known $$\mathcal {NP}$$-hard problem which has been tackled by many heuristic approaches. Among them, the best performing ones are based on beam search (BS) but differ significantly in various aspects. In this paper we compare the existing BS-based approaches by using a common BS framework making the differences more explicit. Furthermore, we derive a novel heuristic function to guide BS, which approximates the expected length of an LCS of random strings. In a rigorous experimental evaluation we compare all BS-based methods from the literature and investigate the impact of our new heuristic guidance. Results show in particular that our novel heuristic guidance leads frequently to significantly better solutions. New best solutions are obtained for a wide range of the existing benchmark instances.

Marko Djukanovic, Günther R. Raidl, Christian Blum

An Adaptive Parameter Free Particle Swarm Optimization Algorithm for the Permutation Flowshop Scheduling Problem

The finding of suitable values for all parameters of a Particle Swarm Optimization (PSO) algorithm is a crucial issue in the design of the algorithm. A trial and error procedure is the most common way to find the parameters but, also, a number of different procedures have been applied in the past. In this paper, an adaptive strategy is used where random values are assigned in the initialization of the algorithm and, then, during the iterations the parameters are optimized together and simultaneously with the optimization of the objective function of the problem. This approach is used for the solution of the Permutation Flowshop Scheduling Problem. The algorithm is tested in 120 benchmark instances and is compared with a number of algorithms from the literature.

Yannis Marinakis, Magdalene Marinaki

The Measure of Regular Relations Recognition Applied to the Supervised Classification Task

The probability measure of regularities recognition in an information stream is introduced in the paper. The measure allows for the creation of machine-learning models without a supervisor. The experiment described in the paper empirically proves that the measure allows the recognition of regularities and helps to find out regular relations between the values of variables.The machine learning model finds out regular relations in data set and by the model allow reconstructing unknown values of the classification variable. The classification algorithm on the basis of the probability measure of regularities recognition is described in the paper. The measure connection with entropy is demonstrated and mutual information is used to optimise the algorithm’s performance. The accuracy of the algorithm matches the accuracy of well-known supervised machine learning algorithms and also exceeds them.

Yuriy Mikheev

Simple and Accurate Classification Method Based on Class Association Rules Performs Well on Well-Known Datasets

Existing classification rule learning algorithms use mainly greedy heuristic search to find regularities in datasets for classification. In recent years, extensive research on association rule mining was performed in the machine learning community on learning rules by using exhaustive search. The main objective is to find all rules in data that satisfy the user-specified minimum support and minimum confidence constraints. Although the whole set of rules may not be used directly for accurate classification, effective and efficient classifiers have been built using these, so called, classification association rules.In this paper, we compare “classical” classification rule learning algorithms that use greedy heuristic search to produce the final classifier with a class association rule learner that uses constrained exhaustive search to find classification rules on “well known” datasets. We propose a simple method to extract class association rules by simple pruning to form an accurate classifier. This is a preliminary study that aims to show that an adequate choice of the “right” class association rules by considering the dependent (class) attribute distribution of values can produce a compact, understandable and relatively accurate classifier. We have performed experiments on 12 datasets from UCI Machine Learning Database Repository and compared the results with well-known rule-based and tree-based classification algorithms. Experimental results show that our method was consistent and comparative with other well-known classification algorithms. Although not achieving the best results in terms of classification accuracy, our method is relatively simple and produces compact and understandable classifiers by exhaustively searching the entire example space.

Jamolbek Mattiev, Branko Kavšek

Analyses of Multi-collection Corpora via Compound Topic Modeling

Popular probabilistic topic models have typically centered on one single text collection, which is deficient for comparative text analyses. We consider a setting where we have partitionable corpora. Each subcollection shares a single set of topics, but there exists relative variation in topic proportions among collections. We propose the compound latent Dirichlet allocation (cLDA) model that encourages generalizability, depends less on user-input parameters, and includes any prior knowledge corpus organization structure. For parameter estimation, we study Markov chain Monte Carlo (MCMC) and variational inference approaches extensively and suggest an efficient MCMC method. We evaluate cLDA using both synthetic and real-world corpora and cLDA shows superior performance over the state-of-the-art models.

Clint P. George, Wei Xia, George Michailidis

Text Mining with Constrained Tensor Decomposition

Text mining, as a special case of data mining, refers to the estimation of knowledge or parameters necessary for certain purposes, such as unsupervised clustering by observing various documents. In this context, the topic of a document can be seen as a hidden variable, and words are multi-view variables related to each other by a topic. The main goal in this paper is to estimate the probability of topics, and conditional probability of words given topics. To this end, we use non negative Canonical Polyadic (CP) decomposition of a third order moment tensor of observed words. Our computer simulations show that the proposed algorithm has better performance compared to a previously proposed algorithm, which utilizes the Robust tensor power method after whitening by second order moment. Moreover, as our cost function includes the non negativity constraint on estimated probabilities, we never obtain negative values in our estimated probabilities, whereas it is often the case with the power method combined with deflation. In addition, our algorithm is capable of handling over-complete cases, where the number of hidden variables is larger than that of multi-view variables, contrary to deflation-based techniques. Further, the method proposed therein supports a larger over-completeness compared to modified versions of the tensor power method, which has been customized to handle over-complete case.

Elaheh Sobhani, Pierre Comon, Christian Jutten, Massoud Babaie-Zadeh

The Induction Problem: A Machine Learning Vindication Argument

The problem of induction is a central problem in philosophy of science and concerns whether it is sound or not to extract laws from observational data. Nowadays, this issue is more relevant than ever given the pervasive and growing role of the data discovery process in all sciences. If on one hand induction is routinely employed by automatic machine learning techniques, on the other most of the philosophical work criticises induction as if an alternative could exist. But is there indeed a reliable alternative to induction? Is it possible to discover or predict something in a non inductive manner?This paper formalises the question on the basis of statistical notions (bias, variance, mean squared error) borrowed from estimation theory and statistical machine learning. The result is a justification of induction as rational behaviour. In a decision-making process a behaviour is rational if it is based on making choices that result in the most optimal level of benefit or utility. If we measure utility in a prediction context in terms of expected accuracy, it follows that induction is the rational way of conduct.

Gianluca Bontempi

Geospatial Dimension in Association Rule Mining: The Case Study of the Amazon Charcoal Tree

Despite the potential of association rules to extract knowledge in datasets, the selection of valuable rules has been the specialists’ big challenge because of the great number of rules generated by association rules’ algorithms. Unfortunately, for Species Distribution Modeling, researchers and analysts alike has been discourage regarding the efficacy and relative merits to the studies due to the conflicting of results. In this case study, we integrate geoprocessing techniques with association rules to analyze the potential distribution of the charcoal tree species of Amazon, Tachigali Subvelutina. This integration allowed the exploration of a new method to select valuable association rules based not only by frequent item sets, like confidence, support and lift. But also considering the geospatial distribution of the species occurrences. The application of one more dimensional to the measures of the frequent item sets provides a more effective understanding and adequate amount of relevance to represent significant association rules. Further, this new approach can support experts to: identify effective rules, i.e., rules that can be effective used with regression techniques to predict species in geospatial environments; identify variables that play a decisive role to discover the occurrence of species; and evaluate the effectivity of the datasets used for the study to answer the specified scientific questions.

David N. Prata, Marcelo L. Rocha, Leandro O. Ferreira, Rogério Nogueira

On Probabilistic k-Richness of the k-Means Algorithms

With Kleinberg’s axiomatic system for clustering, a discussion has been initiated, what kind of properties clustering algorithms should have and have not. As Ackerman et al. pointed out, the static properties studied by Kleinberg and other are not appropriate for clustering algorithms with elements of randomness. Therefore they introduced the property of probabilistic k-richness and claimed, without a proof that the versions of k-means both with random initialisation and k-means++ initialization have this property. We prove that k-means++ has the property of probabilistic k-richness, while k-means with random initialisation for well separated clusters does not. To characterize the latter, we introduce the notion of weak probabilistic k-richness and prove it for this algorithm. For completeness, we provide with a constructive proof that the theoretical k-means has the (deterministic) k-richness property.

Robert A. Kłopotek, Mieczysław A. Kłopotek

Using Clustering for Supervised Feature Selection to Detect Relevant Features

In many applications in machine learning, large quantities of features and information are available, but these can be of low quality. A novel filter method for feature selection for classification termed COLD is presented that uses class-wise clustering to reduce the dimensionality of the data. The idea behind this approach is that if a relevant feature would be removed from the set of features, the separation of clusters belonging to different classes will deteriorate. Four artificial examples and two real-world data sets are presented on which COLD is compared with several popular filter methods. For the artificial examples, only COLD is capable to consistently rank the features according to their contribution to the separation of the classes. For the real-world Dermatology and Arrhythmia dataset, COLD demonstrates the ability to remove a large number of features and improve the classification accuracy or, at a minimum, not degrade the performance considerably.

Christoph Lohrmann, Pasi Luukka

A Structural Theorem for Center-Based Clustering in High-Dimensional Euclidean Space

We prove that, for any finite set X in Euclidean space of any dimension and for any fixed $$\varepsilon >0$$, there exists a polynomial-cardinality set of points in this space which can be constructed in polynomial time and which contains $$(1+\varepsilon )$$-approximations of all the points of space in terms of the distances to every element of X. The proved statement allows to approximate a lot of clustering problems which can be reduced to finding optimal cluster centers in high-dimensional space. In fact, we construct a polynomial-time approximation-preserving reduction of such problems to their discrete versions, in which the desired centers must be selected from a given finite set.

Vladimir Shenmaier

Modification of the k-MXT Algorithm and Its Application to the Geotagged Data Clustering

The paper considers the problem of detection of the most attractive city sights using datasets with geotagged photographs. We form a graph on the basis of the geotagged spot coordinates and rewrite the problem as the problem of graph clustering. In this paper, we propose a modification of the k-MXT algorithm, which we called the k-MXT-W algorithm and which uses window functions. We compare the proposed algorithm with k-Means and k-MXT algorithms on simulated data using ARI, one of the most common metrics for assessing clustering quality. In this paper we also use the k-MXT-W algorithm to find the most popular places in St. Petersburg (Russia) and we compare the performance of the proposed algorithm with the k-MXT algorithm on real-world data using the modularity metric that does not require knowledge of true clustering.

Anastasia Stepanova, Sergei V. Mironov, Sergei Sidorov, Alexey Faizliev

CoPASample: A Heuristics Based Covariance Preserving Data Augmentation

An efficient data augmentation algorithm generates samples that improves accuracy and robustness of training models. Augmentation with informative samples imparts meaning to the augmented data set. In this paper, we propose CoPASample (Covariance Preserving Algorithm for generating Samples), a data augmentation algorithm that generates samples which reflects the first and second order statistical information of the data set, thereby augmenting the data set in a manner that preserves the total covariance of the data. To address the issue of exponential computations in the generation of points for augmentation, we formulate an optimisation problem motivated by the approach used in $$\nu $$-SVR to iteratively compute a heuristics based optimal set of points for augmentation in polynomial time. Experimental results for several data sets and comparisons with other data augmentation algorithms validate the potential of our proposed algorithm.

Rishabh Agrawal, Paridhi Kothari

Active Matrix Completion for Algorithm Selection

The present work accommodates active matrix completion to generate cheap and informative incomplete algorithm selection datasets. Algorithm selection is being used to detect the best possible algorithm(s) for a given problem ($$\sim $$ instance). Although its success has been shown in varying problem domains, the performance of an algorithm selection technique heavily depends on the quality of the existing dataset. One critical and likely to be the most expensive part of an algorithm selection dataset is its performance data. Performance data involves the performance of a group of algorithms on a set of instance of a particular problem. Thus, matrix completion [1] has been studied to be able to perform algorithm selection when the performance data is incomplete. The focus of this study is to come up with a strategy to generate/sample low-cost, incomplete performance data that can lead to effective completion results. For this purpose, a number of matrix completion methods are utilized in the form of active matrix completion. The empirical analysis carried out on a set of algorithm selection datasets revealed significant gains in terms of the computation time, required to produce the relevant performance data.

Mustafa Mısır

A Framework for Multi-fidelity Modeling in Global Optimization Approaches

Optimization of complex systems often involves running a detailed simulation model that requires large computational time per function evaluation. Many methods have been researched to use a few detailed, high-fidelity, function evaluations to construct a low-fidelity model, or surrogate, including Kriging, Gaussian processes, response surface approximation, and meta-modeling. We present a framework for global optimization of a high-fidelity model that takes advantage of low-fidelity models by iteratively evaluating the low-fidelity model and providing a mechanism to decide when and where to evaluate the high-fidelity model. This is achieved by sequentially refining the prediction of the computationally expensive high-fidelity model based on observed values in both high- and low-fidelity. The proposed multi-fidelity algorithm combines Probabilistic Branch and Bound, that uses a partitioning scheme to estimate subregions with near-optimal performance, with Gaussian processes, that provide predictive capability for the high-fidelity function. The output of the multi-fidelity algorithm is a set of subregions that approximates a target level set of best solutions in the feasible region. We present the algorithm for the first time and an analysis that characterizes the finite-time performance in terms of incorrect elimination of subregions of the solution space.

Zelda B. Zabinsky, Giulia Pedrielli, Hao Huang

Performance Evaluation of Local Surrogate Models in Bilevel Optimization

Bilevel problems (BLPs) involve solving two nested levels of optimization, namely the upper (leader) and the lower (follower) level, usually motivated by real-world situations involving a hierarchical structure. BLPs are known to be hard and computationally expensive. When the computation of the objective functions and/or constraints require an expensive computer simulation, as in “black-box” optimization, evolutionary algorithms (EAs) are often used. As EAs may become very expensive by requiring a large number of function evaluations, surrogate models can help overcome this drawback, either by replacing expensive evaluations or allowing for increased exploration of the search space. Here we apply different types of local surrogate models at the upper level optimization, in order to enhance the overall performance of the proposed method, which is studied by means of computational experiments conducted on the well-known SMD benchmark problems.

Jaqueline S. Angelo, Eduardo Krempser, Helio J. C. Barbosa

BowTie - A Deep Learning Feedforward Neural Network for Sentiment Analysis

How to model and encode the semantics of human-written text and select the type of neural network to process it are not settled issues in sentiment analysis. Accuracy and transferability are critical issues in machine learning in general. These properties are closely related to the loss estimates for the trained model. I present a computationally-efficient and accurate feedforward neural network for sentiment prediction capable of maintaining low losses. When coupled with an effective semantics model of the text, it provides highly accurate models with low losses. Experimental results on representative benchmark datasets and comparisons to other methods (DISCLAIMER: This paper is not subject to copyright in the United States. Commercial products are identified in order to adequately specify certain procedures. In no case does such identification imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the identified products are necessarily the best available for the purpose.) show the advantages of the new approach.

Apostol Vassilev

To What Extent Can Text Classification Help with Making Inferences About Students’ Understanding

In this paper we apply supervised machine learning algorithms to automatically classify the text of students’ reflective learning journals from an introductory Java programming module with the aim of identifying students who need help with their understanding of the topic they are reflecting on. Such a system could alert teaching staff to students who may need an intervention to support their learning.Several different classifier algorithms have been validated on the training data set to find the best model in two situations; with equal cost for a positive or negative classification and with cost sensitive classification. Methods were used to identify those individual parameters which maximise the performance of each algorithm. Precision, recall and F1-score, as well as confusion matrices were used to understand the behaviour of each classifier and choose the one with the best performance.The classifiers that obtained the best results from the validation were then evaluated on a testing data set containing different data to that used for training.We believe that although the results could be improved with further work, our initial results show that machine learning could be applied to students’ reflective writing to assist staff in identifying those students who are struggling to understand the topic.

A. J. Beaumont, T. Al-Shaghdari

Combinatorial Learning in Traffic Management

We describe an exact combinatorial learning approach to solve dynamic job-shop scheduling problems arising in traffic management. When a set of vehicles has to be controlled in real-time, a new schedule must be computed whenever a deviation from the current plan is detected, or periodically after a short amount of time. This suggests that each two (or more) consecutive instances will be very similar. We exploit a recently introduced MILP formulation for job-shop scheduling (called path&cycle) to develop an effective solution algorithm based on delayed row generation. In our re-optimization framework, the algorithm maintains a pool of combinatorial cuts separated during the solution of previous instances, and adapts them to warm start each new instance. In our experiments, this adaptive approach led to a 4-time average speedup over the static approach (where each instance is solved independently) for a critical application in air traffic management.

Giorgio Sartor, Carlo Mannino, Lukas Bach

Cartesian Genetic Programming with Guided and Single Active Mutations for Designing Combinational Logic Circuits

The design of digital circuits using Cartesian Genetic Programming (CGP) has been widely investigated but the evolution of complex combinational logic circuits is a hard task for CGP. We introduce here a new mutation operator for CGP that aims to reduce the number of evaluations needed to find a feasible solution by modifying the subgraph of the worst output of the candidate circuits. Also, we propose a variant of the standard evolutionary strategy commonly adopted in CGP, where (i) the Single Active Mutation (SAM) and (ii) the proposed mutation operator is used in order to improve the capacity of CGP in generating feasible circuits. The proposals are applied to a benchmark of combinational logic circuits with multiple outputs and the results obtained are compared to those found by a CGP with SAM. The main advantages observed when both mutation operators are combined are the reduction of the number of objective function evaluations required to find a feasible solution and the improvement in the success rate.

José Eduardo H. da Silva, Lucas A. M. de Souza, Heder S. Bernardino

Designing an Optimal and Resilient iBGP Overlay with Extended ORRTD

The Internet results from interconnecting several thousands of Autonomous Systems (ASes), which are networks under a single administrative domain such as: corporations, service providers, universities, and content providers, among others. To ensure global communication between end users, it is necessary that routers of every AS get to learn all IP addresses in this immense and extremely decentralized network. The Border Gateway Protocol (BGP) is the responsible of learning and distributing that reachability information among ASes in the form of groups of addresses (a.k.a. prefixes). Unlike other routing protocols, BGP routers communicate through administratively set point-to-point BGP sessions over TCP. BGP sessions are either external (eBGP, between routers of different ASes, a.k.a. Border Routers or ASBRs) or internal (iBGP, between routers whit to the same AS). While eBGP is needed to exchange reachability information among ASes, iBGP makes it possible for internal routers to learn prefixes necessary to forward IP packets to the appropriate ASBRs. To make sure that the whole information is learnt and no traffic deflection occur, a full-mesh of iBGP sessions among routers within each AS can be used, which causes scalability issues. Route Reflectors (RR) is a mechanism to improve performance, but designing a: correct, reliable and consistent iBGP overlay of sessions whith RRs is a delicate, far from easy task for ASes engineers, even though several popular heuristics are common practice. In previous works we proposed combinatorial optimization models to design consistent and resilient BGP overlays, when only non-Border-Routers are eligible for RRs. The present work extends previous models to allow any router (including Border Routers) to be Route Reflectors, which matches better to some application contexts.

Cristina Mayr, Claudio Risso, Eduardo Grampín

GRASP Heuristics for the Stochastic Weighted Graph Fragmentation Problem

Vulnerability metrics play a key role nowadays in order to cope with natural disasters and deliberate attacks. The scientific literature is vast, sometimes focused on connectivity aspects in networks or cascading failures, among others. The Graph Fragmentation Problem (GFP) has its origin in epidemiology, and it represents the worst case analysis of a random attack.The contributions of this paper are three-fold. First, a valuable interplay between the GFP and a connectivity-based metric, called the Critical Node Detection (CND) problem, is here explored. Second, a novel combinatorial optimization problem is introduced. It is called Stochastic Weighted Graph Fragmentation Problem (SWGFP), and it is a generalization of the GFP. The node-importance is included in the model, as well as a level of network awareness from the attacker viewpoint. Finally, a novel GRASP (Greedy Randomized Adaptive Search Procedure) heuristic is proposed for the SWGFP. A faithful comparison with a previous heuristic shows its competitiveness.

Nicole Rosenstock, Juan Piccini, Guillermo Rela, Franco Robledo, Pablo Romero

Uniformly Most-Reliable Graphs and Antiholes

In network design, the all-terminal reliability maximization is of paramount importance. In this classical setting, we assume a simple graph with perfect nodes but independent link failures with identical probability $$\rho $$. The goal is to communicate p terminals using q links, in such a way that the connectedness probability of the resulting random graph is maximum. A graph with p nodes and q links that meets the maximum reliability property is called uniformly most-reliable (p, q)-graph (UMRG). The discovery of these graphs is a challenging problem that involves an interplay between extremal graph theory and computational optimization. Recent works confirm the existence of special cubic UMRG, such as Wagner, Petersen and Yutsis graphs. To the best of our knowledge, there is no prior works from the literature that find 4-regular UMRG. In this paper, we revisit the breakthroughs in the theory of UMRG. Finally, we mathematically prove that the complement of a cycle with seven nodes, $$\overline{C_7}$$, is a 4-regular UMRG. This graph is also identified as an odd antihole using terms from perfect graph theory.

Guillermo Rela, Franco Robledo, Pablo Romero

Merging Quality Estimation for Binary Decision Diagrams with Binary Classifiers

Relaxed binary decision diagrams (BDDs) are used in combinatorial optimization as a compact representation of a relaxed solution space. They are directed acyclic multigraphs which are derived from the state space of a recursive dynamic programming formulation of the considered optimization problem. The compactness of a relaxed BDD is achieved by superimposing states, which corresponds to merging BDD nodes in the classical layer-wise top-down BDD construction. Selecting which nodes to merge crucially determines the quality of the resulting BDD and is the task of a merging heuristic, for which the minimum longest path value (minLP) heuristic has turned out to be highly effective for a number of problems. This heuristic sorts the nodes in a layer by decreasing current longest path value and merges the necessary number of worst ranked nodes into one. There are, however, also other merging heuristics available and usually it is not easy to decide which one is more promising to use in which situation. In this work we propose a prediction mechanism to evaluate a set of different merging mechanisms at each layer during the construction of a relaxed BDD, in order to always select and apply the most promising heuristic. This prediction is implemented by either a perfect or by a k-layers lookahead construction of the BDD, gathering feature vectors for two competing merging heuristics which are then fed into a binary classifier. Models based on statistical tests and a feed-forward neural network are considered for the classifier. We study this approach for the maximum weighted independent set problem and in conjunction with a parameterized merging heuristic that takes also the similarity between states into account. We train and validate the binary classifiers on random graphs and finally test on weighted DIMACS instances. Results indicate that relaxed BDDs can be obtained whose upper bounds are on average up to $$\approx $$16% better than those of BDDs constructed with the sole use of minLP.

Nikolaus Frohner, Günther R. Raidl

Directed Acyclic Graph Reconstruction Leveraging Prior Partial Ordering Information

Reconstructing directed acyclic graphs (DAGs) from observed data constitutes an important machine learning task. It has important applications in systems biology and functional genomics. However, it is a challenging learning problem, due to the need to consider all possible orderings of the network nodes and score the resulting network structures based on the available data. The resulting computational complexity for enumerating all possible orderings is exponential in the size of the network. A large number of methods based on various modeling formalisms have been developed to address this problem, primarily focusing on developing fast algorithms to reduce computational time. On many instances, partial topological information may be available for subsets of the nodes; for example, in biology one has information about transcription factors that regulate (precede) other genes, or such information can be obtained from perturbation/silencing experiments for subsets of DAG nodes (genes).We develop a framework for estimating DAGs from observational data under the assumption that the nodes are partitioned into sets and a complete topological ordering exists amongst them. The proposed approach combines (i) (penalized) regression to estimate edges between nodes across different sets, with (ii) the popular PC-algorithm that identifies the skeleton of the graph within each set. In the final step, we combine the results from the previous two steps to screen out redundant edges. We illustrate the performance of the proposed approach on topologies extracted from the DREAM3 competition. The numerical results showcase the usefulness of the additional partial topological ordering information for this challenging learning problem.

Pei-Li Wang, George Michailidis

Learning Scale and Shift-Invariant Dictionary for Sparse Representation

Sparse representation is a signal model to represent signals with a linear combination of a small number of prototype signals called atoms, and a set of atoms is called a dictionary. The design of the dictionary is a fundamental problem for sparse representation. However, when there are scaled or translated features in the signals, unstructured dictionary models cannot extract such features. In this paper, we propose a structured dictionary model which is scale and shift-invariant to extract features which commonly appear in several scales and locations. To achieve both scale and shift invariance, we assume that atoms of a dictionary are generated from vectors called ancestral atoms by scaling and shift operations, and an algorithm to learn these ancestral atoms is proposed.

Toshimitsu Aritake, Noboru Murata

Robust Kernelized Bayesian Matrix Factorization for Video Background/Foreground Separation

Development of effective and efficient techniques for video analysis is an important research area in machine learning and computer vision. Matrix factorization (MF) is a powerful tool to perform such tasks. In this contribution, we present a hierarchical robust kernelized Bayesian matrix factorization (RKBMF) model to decompose a data set into low rank and sparse components. The RKBMF model automatically infers the parameters and latent variables including the reduced rank using variational Bayesian inference. Moreover, the model integrates the side information of similarity between frames to improve information extraction from the video. We employ RKBMF to extract background and foreground information from a traffic video. Experimental results demonstrate that RKBMF outperforms state-of-the-art approaches for background/foreground separation, particularly where the video is contaminated.

Hong-Bo Xie, Caoyuan Li, Richard Yi Da Xu, Kerrie Mengersen

Parameter Optimization of Polynomial Kernel SVM from miniCV

Polynomial kernel support vector machine (SVM) is one of the most computational efficient kernel-based SVM. Implementing an iterative optimization method, sequential minimal optimization (SMO) makes it more hardware independent. However, the test accuracy is sensitive to the values of hyperparameters. Moreover, polynomial kernel SVM has four hyperparameters which complicate cross-validation in parameter optimization. In this research, we transform polynomial kernels to have bounded values and analyze the relations between hyperparameters and the test error rate. Based on our discoveries, we propose mini core validation (miniCV) to fast screen out an optimized hyperparameter combination especially for large datasets. The proposed miniCV is a parameter optimization approach completely built on the distribution of the data generated via the iterative SMO training process. Since miniCV depends on the kernel matrix directly, it saves miniCV from cross-validation to optimize hyperparameters in kernel-based SVM.

Li-Chia Yeh, Chung-Chin Lu

Analysing the Overfit of the Auto-sklearn Automated Machine Learning Tool

With the ever-increasing number of pre-processing and classification algorithms, manually selecting the best algorithm and their best hyper-parameter settings (i.e. the best classification workflow) is a daunting task. Automated Machine Learning (Auto-ML) methods have been recently proposed to tackle this issue. Auto-ML tools aim to automatically choose the best classification workflow for a given dataset. In this work we analyse the predictive accuracy and overfit of the state-of-the-art auto-sklearn tool, which iteratively builds a classification ensemble optimised for the user’s dataset. This work has 3 contributions. First, we measure 3 types of auto-sklearn’s overfit, involving the differences of predictive accuracies measured on different data subsets: two parts of the training set (for learning and internal validation of the model) and the hold-out test set used for final evaluation. Second, we analyse the distribution of types of classification models selected by auto-sklearn across all 17 datasets. Third, we measure correlations between predictive accuracies on different data subsets and different types of overfitting. Overall, substantial degrees of overfitting were found in several datasets, and decision tree ensembles were the most frequently selected types of models.

Fabio Fabris, Alex A. Freitas

A New Baseline for Automated Hyper-Parameter Optimization

Finding the optimal hyper-parameters values for a given problem is essential for most machine learning algorithms. In this paper, we propose a novel hyper-parameter optimization algorithm that is very simple to implement and still competitive with the state-of-the-art L-SHADE variant of Differential Evolution. While the most common method for hyper-parameter optimization is a combination of grid and manual search, random search has recently shown itself to be more effective and has been proposed as a baseline against which to measure other methods. In this paper, we compare three optimization algorithms, namely, the state-of-the-art L-SHADE algorithm, the random search algorithm, and our novel and simple adaptive random search algorithm. We find that our simple adaptive random search strategy is capable of finding parameters that achieve results comparable to the state-of-the-art L-SHADE algorithm, both of which achieve significantly better performance than random search when optimizing the hyper-parameters of the state-of-the-art XGBoost algorithm for 11 datasets. Because of the significant performance increase of our simple algorithm when compared to random search, we propose this as the new go-to method for tuning the hyper-parameters of machine learning algorithms when desiring a simple-to-implement algorithm, and also propose to use this algorithm as a new baseline against which other strategies should be measured.

Marius Geitle, Roland Olsson

Optimal Trade-Off Between Sample Size and Precision of Supervision for the Fixed Effects Panel Data Model

We investigate a modification of the classical fixed effects panel data model (a linear regression model able to represent unobserved heterogeneity in the data), in which one has the additional possibility of controlling the conditional variance of the output given the input, by varying the cost associated with the supervision of each training example. Assuming an upper bound on the total supervision cost, we analyze and optimize the trade-off between the sample size and the precision of supervision (the reciprocal of the conditional variance of the output), by formulating and solving a suitable optimization problem, based on a large-sample approximation of the output of the classical algorithm used to estimate the parameters of the fixed effects panel data model. Considering a specific functional form for that precision, we prove that, depending on the “returns to scale” of the precision with respect to the supervision cost per example, in some cases “many but bad” examples provide a smaller generalization error than “few but good” ones, whereas in other cases the opposite occurs. The results extend to the fixed effects panel data model the ones we obtained in recent works for a simpler linear regression model. We conclude discussing possible applications of our results, and extensions of the proposed optimization framework to other panel data models.

Giorgio Gnecco, Federico Nutarelli

Restaurant Health Inspections and Crime Statistics Predict the Real Estate Market in New York City

Predictions of apartments prices in New York City (NYC) have always been of interest to new homeowners, investors, Wall Street funds managers, and inhabitants of the city. In recent years, average prices have risen to the highest ever recorded rebounding after the 2008 economic recession. Although prices are trending up, not all apartments are. Different regions of the city have appreciated differently over time; knowing where to buy or sell is essential for all stakeholders. In this project, we propose a predictive analytics framework that analyzes new alternative data sources to extract predictive features of the NYC real estate market. Our experiments indicated that restaurant health inspection data and crime statistics can help predict apartments prices in NYC. The framework we introduce in this work uses an artificial recurrent neural network with Long Short-Term Memory (LSTM) units and incorporates the two latter predictive features to predict future prices of apartments. Empirical results show that feeding predictive features from (1) restaurant inspections data and (2) crime statistics to a neural network with LSTM units results in smaller errors than the traditional Autoregressive Integrated Moving Average (ARIMA) model, which is normally used for this type of regression. Predictive analytics based on non-linear models with features from alternative data sources can capture hidden relationships that linear models are not able not discover. The framework presented in this study has the potential to serve as a supplement to the traditional forecasting tools of real estate markets.

Rafael M. Moraes, Anasse Bari, Jiachen Zhu

Load Forecasting in District Heating Networks: Model Comparison on a Real-World Case Study

District Heating networks (DHNs) are promising technologies for heat distribution in residential and commercial buildings since they enable high efficiency and low emissions. Within the recently proposed paradigm of smart grids, DHNs have acquired intelligent tools able to enhance their efficiency. Among these tools, there are demand forecasting technologies that enable improved planning of heat production and power station maintenance. In this work we propose a comparative study for heat load forecasting methods on a real case study based on a dataset provided by an Italian utility company. We trained and tested three kinds of models, namely non-autoregressive, autoregressive and hybrid models, on the available dataset of heat load and meteorological variables. The optimal model, in terms of root mean squared error of prediction, was selected. It considers the day of the week, the hour of the day, some meteorological variables, past heat loads and social components, such as holidays. Results show that the selected model is able to achieve accurate 48-hours predictions of the heat load in several conditions (e.g., different days of the week, different times, holidays and workdays). Moreover, an analysis of the parameters of the selected models enabled to identify a few informative variables.

Federico Bianchi, Alberto Castellini, Pietro Tarocco, Alessandro Farinelli

A Chained Neural Network Model for Photovoltaic Power Forecast

Photovoltaic (PV) power forecasting is an important task preceding the scheduling of dispatchable power plants for the day-ahead market. Commercially available methods rely on conventional meteorological data and parameters to produce reliable predictions. These costs increase linearly with a rising number of plants. Recently, publicly available sources of free meteorological data have become available which allows for forecasting models based on machine learning, albeit offering heterogeneous data quality. We investigate a chained neural network model for PV power forecasting that takes into account varying data quality and follows the business requirement of frequently introducing new plants. This two-step model allows for easier integration of new plants in terms of manual efforts and achieves high-quality forecasts comparable to those of raw forecasting models from meteorological data.

Carola Gajek, Alexander Schiendorfer, Wolfgang Reif

Trading-off Data Fit and Complexity in Training Gaussian Processes with Multiple Kernels

Gaussian processes (GPs) belong to a class of probabilistic techniques that have been successfully used in different domains of machine learning and optimization. They are popular because they provide uncertainties in predictions, which sets them apart from other modelling methods providing only point predictions. The uncertainty is particularly useful for decision making as we can gauge how reliable a prediction is. One of the fundamental challenges in using GPs is that the efficacy of a model is conferred by selecting an appropriate kernel and the associated hyperparameter values for a given problem. Furthermore, the training of GPs, that is optimizing the hyperparameters using a data set is traditionally performed using a cost function that is a weighted sum of data fit and model complexity, and the underlying trade-off is completely ignored. Addressing these challenges and shortcomings, in this article, we propose the following automated training scheme. Firstly, we use a weighted product of multiple kernels with a view to relieve the users from choosing an appropriate kernel for the problem at hand without any domain specific knowledge. Secondly, for the first time, we modify GP training by using a multi-objective optimizer to tune the hyperparameters and weights of multiple kernels and extract an approximation of the complete trade-off front between data-fit and model complexity. We then propose to use a novel solution selection strategy based on mean standardized log loss (MSLL) to select a solution from the estimated trade-off front and finalise training of a GP model. The results on three data sets and comparison with the standard approach clearly show the potential benefit of the proposed approach of using multi-objective optimization with multiple kernels.

Tinkle Chugh, Alma Rahat, Pramudita Satria Palar

Designing Combinational Circuits Using a Multi-objective Cartesian Genetic Programming with Adaptive Population Size

This paper proposes a multiobjective Cartesian Genetic Programming with an adaptive population size to design approximate digital circuits via evolutionary algorithms, analyzing the trade-off between the most often used objectives: error, area, power dissipation, and delay. Combinational digital circuits such as adders, multipliers, and arithmetic logic units (ALUs) with up to 16 inputs and 370 logic gates are considered in the computational experiments. The proposed method was able to produce approximate circuits with good operational characteristics when compared with other methods from the literature.

Leandro S. Lima, Heder S. Bernardino, Helio J. C. Barbosa

Multi-task Learning by Pareto Optimality

Deep Neural Networks (DNNs) are often criticized because they lack the ability to learn more than one task at a time: Multitask Learning is an emerging research area whose aim is to overcome this issue. In this work, we introduce the Pareto Multitask Learning framework as a tool that can show how effectively a DNN is learning a shared representation common to a set of tasks. We also experimentally show that it is possible to extend the optimization process so that a single DNN simultaneously learns how to master two or more Atari games: using a single weight parameter vector, our network is able to obtain sub-optimal results for up to four games.

Deyan Dyankov, Salvatore Danilo Riccio, Giuseppe Di Fatta, Giuseppe Nicosia

Vital Prognosis of Patients in Intensive Care Units Using an Ensemble of Bayesian Classifiers

An Ensemble of Bayesian Classifiers (EBC) is constructed to perform vital prognosis of patients in the Intensive Care Units (ICU). The data are scarce and unbalanced, so that the size of the minority class (critically ill patients who die) is very small, and this fact prevents the use of accuracy as a measure of performance in classification; instead we use the Area Under the Precision-Recall curve (AUPR). To address the classification in this setting, we propose the use of an ensemble constructed from five base Bayesian classifiers with the weighted majority vote rule, where the weights are defined from AUPR.We compare this EBC model with the base Bayesian classifiers used to build it, as well as with the ensemble obtained using the mere majority vote criterion, and with some state-of-the-art machine learning supervised classifiers. Our results show that the EBC model outperforms most of the competing classifiers, being only slightly surpassed by Random Forest.

Rosario Delgado, J. David Núñez-González, J. Carlos Yébenes, Ángel Lavado

On the Role of Hub and Orphan Genes in the Diagnosis of Breast Invasive Carcinoma

Network information is gaining importance in the generation of predictive models in cancer genomics, with the premise that prior biological knowledge offers the models interpretability and reproducibility, an invaluable contribution in precision medicine. This work evaluates the usefulness of accounting for gene network information provided by the data correlation structure and external STRING information in the classification of Breast Invasive Carcinoma (BRCA) RNA-Seq data from The Cancer Genome Atlas (TCGA) into tumor and normal tissue, by sparse logistic regression (SLR). Within the correlation-based approaches, two directions were investigated: first, imposing smaller penalties on hub genes, i.e., highly connected genes in the network (hubSLR); second, favouring the selection of orphan or weakly correlated genes (orphanSLR). Without loss of predictive ability, a considerable overlap between the genes selected by the methods was achieved, with fewer genes exclusively selected by each method. Besides a consensus list of genes, the complementarity offered by sets of genes exclusively selected by each model based on different network information shall be regarded as a means to enhance biological interpretability, drawing attention to genes with a known role in the network, either hubs, orphans or highly connected genes in protein-protein interaction networks. This represents a major advantage over non network-based methods, enabling the disclosure of the relevance of known gene subnetworks in the disease under study, while boosting biomarker discovery and precision medicine.

Marta B. Lopes, André Veríssimo, Eunice Carrasquinha, Susana Vinga

Approximating Probabilistic Constraints for Surgery Scheduling Using Neural Networks

The problem of generating surgery schedules is formulated as a mathematical model with probabilistic constraints. The approach presented uses modern machine learning to approximate the model’s probabilistic constraints. The technique is inspired by models that use slacks in capacity planning. Essentially a neural-network is used to learn linear constraints that will replace the probabilistic constraint. The data used to learn these constraints is verified and labeled using Monte Carlo simulations. The solutions iteratively discovered, during the optimization procedure, produce also new training data. The neural-network continues its training on this data until the solution discovered is verified to be feasible. The stochastic surgery model studied is inspired by real challenges faced by many hospitals today and tested on real-life data.

Thomas Philip Runarsson

Determining Principal Component Cardinality Through the Principle of Minimum Description Length

PCA (Principal Component Analysis) and its variants are ubiquitous techniques for matrix dimension reduction and reduced-dimension latent-factor extraction. One significant challenge in using PCA, is the choice of the number of principal components. The information-theoretic MDL (Minimum Description Length) principle gives objective compression-based criteria for model selection, but it is difficult to analytically apply its modern definition - NML (Normalized Maximum Likelihood) - to the problem of PCA. This work shows a general reduction of NML problems to lower-dimension problems. Applying this reduction, it bounds the NML of PCA, by terms of the NML of linear regression, which are known.

Ami Tavory

Modelling Chaotic Time Series Using Recursive Deep Self-organising Neural Networks

This paper proposes a Machine Learning (ML) algorithm that exhibits markedly different behaviour when trained with chaotic input data, compared to non-chaotic data. It will be demonstrated that the output of the ML system is itself chaotic when it is trained on a chaotic input. The proposed algorithm is a deep network with both direct and recurrent connections between layers, as well as recurrent connections within each layer.

Erik Berglund

On Tree-Based Methods for Similarity Learning

In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods. Building automatically such measures is the specific purpose of metric/similarity learning. In [21], similarity learning is formulated as a pairwise bipartite ranking problem: ideally, the larger the probability that two observations in the feature space belong to the same class (or share the same label), the higher the similarity measure between them. From this perspective, the $$\mathrm{ROC}$$ curve is an appropriate performance criterion and it is the goal of this article to extend recursive tree-based $$\mathrm{ROC}$$ optimization techniques in order to propose efficient similarity learning algorithms. The validity of such iterative partitioning procedures in the pairwise setting is established by means of results pertaining to the theory of U-processes and from a practical angle, it is discussed at length how to implement them by means of splitting rules specifically tailored to the similarity learning task. Beyond these theoretical/methodological contributions, numerical experiments are displayed and provide strong empirical evidence of the performance of the algorithmic approaches we propose.

Stephan Clémençon, Robin Vogel

Active Learning Approach for Safe Process Parameter Tuning

The amount of sensor data in industry increases dramatically due to the digitization of plants and the Industrial Internet of Things (IIoT) evolution. This rapid technological revolution creates new opportunities in intelligent automation, but also challenges. As the complexity of the manufacturing processes increases accordingly, it becomes increasingly difficult to understand how a process is affected by physical conditions, not to mention optimizing process parameters with traditional methods.In order to deal with the latter problem, an active learning approach is presented that can be used during production processes and that guarantees a user-defined minimum performance during learning. Based on Gaussian processes it is also possible to contribute already existing domain expertise to the learning process. We demonstrate that an additional solid expertise prevents an increase in the risk of failure throughout the automated optimization process.With the proposed method manufacturers are able to evaluate their expertise based policy or even gain further knowledge about correlations by made performance improvements. The approach is generic and suitable for applications in mass production or frequently recurring tasks. Since the test results are promising, the next step is to create initial industrial prototypes.

Stefano De Blasi

Federated Learning of Deep Neural Decision Forests

Modern technical products have access to a huge amount of data and by utilizing machine learning algorithms this data can be used to improve usability and performance of the products. However, the data is likely to be large in quantity and privacy sensitive, which excludes the possibility of sending and storing all the data centrally. This in turn makes it difficult to train global machine learning models on the combined data of different devices. A decentralized approach known as federated learning solves this problem by letting devices, or clients, update a global model using their own data and only sending changes of the global model, which means that they do not need to communicate privacy sensitive data.Deep neural decision forests (DNDF), inspired by the versatile algorithm random forests, combine the divide-and-conquer principle together with the property representation learning. In this paper we further develop the concept of DNDF to be more suited for the framework of federated learning. By parameterizing the probability distributions in the prediction nodes of the forest, and include all trees of the forest in the loss function, a gradient of the whole forest can be computed which some/several federated learning algorithms utilize. We demonstrate the inclusion of DNDF in federated learning by an empirical experiment with both homogeneous and heterogeneous data and baseline it against a convolutional neural network with the same architecture as the DNDF. Experimental results show that the modified DNDF, consisting of three to five decision trees, outperform the baseline convolutional neural network.

Anders Sjöberg, Emil Gustavsson, Ashok Chaitanya Koppisetty, Mats Jirstrand

Conditional Anomaly Detection for Quality and Productivity Improvement of Electronics Manufacturing Systems

Today the integration of Artificial Intelligence (AI) solutions is part of the strategy in the industrial environment. We focus on anomaly detection in the framework of manufacturing electronic cards manufacturing under mass production conditions (24/7). Early anomaly detection is critical to avoid defects. Researches and applications of anomaly detection techniques in the industry have been published but when they face production constraints success is not guaranteed. Today’s manufacturing systems are complex and involve different behaviors. We propose and evaluate a new realistic methodology for detecting conditional anomalies that could be successfully implemented in production. The proposed solution is based on Variational Autoencoders (VAEs) which provide interesting scores under the near real-time constraints of the production environment. The results have been thoroughly evaluated and validated with the support of expert process engineers.

Eva Jabbar, Philippe Besse, Jean-Michel Loubes, Christophe Merle

Data Anonymization for Privacy Aware Machine Learning

The increase of data leaks, attacks, and other ransom-ware in the last few years have pointed out concerns about data security and privacy. All this has negatively affected the sharing and publication of data. To address these many limitations, innovative techniques are needed for protecting data. Especially, when used in machine learning based-data models. In this context, differential privacy is one of the most effective approaches to preserve privacy. However, the scope of differential privacy applications is very limited (e. g. numerical and structured data). Therefore, in this study, we aim to investigate the behavior of differential privacy applied to textual data and time series. The proposed approach was evaluated by comparing two Principal Component Analysis based differential privacy algorithms. The effectiveness was demonstrated through the application of three machine learning models to both anonymized and primary data. Their performances were thoroughly evaluated in terms of confidentiality, utility, scalability, and computational efficiency. The PPCA method provides a high anonymization quality at the expense of a high time-consuming, while the DPCA method preserves more utility and faster time computing. We show the possibility to combine a neural network text representation approach with differential privacy methods. We also highlighted that it is well within reach to anonymize real-world measurements data from satellites sensors for an anomaly detection task. We believe that our study will significantly motivate the use of differential privacy techniques, which can lead to more data sharing and privacy preserving.

David Nizar Jaidan, Maxime Carrere, Zakaria Chemli, Rémi Poisvert

Exploiting Similar Behavior of Users in a Cooperative Optimization Approach for Distributing Service Points in Mobility Applications

In this contribution we address scaling issues of our previously proposed cooperative optimization approach (COA) for distributing service points for mobility applications in a geographical area. COA is an iterative algorithm that solves the problem by combining an optimization component with user interaction on a large scale and a machine learning component that provides the objective function for the optimization. In each iteration candidate solutions are generated, suggested to the future potential users for evaluation, the machine learning component is trained on the basis of the collected feedback, and the optimization is used to find a new solution fitting the needs of the users as good as possible. While the former concept study showed promising results for small instances, the number of users that could be considered was quite limited and each user had to evaluate a relatively large number of candidate solutions. Here we deviate from this previous approach by using matrix factorization as central machine learning component in order to identify and exploit similar needs of many users. Furthermore, instead of the black-box optimization we are now able to apply mixed integer linear programming to obtain a best solution in each iteration. While being still a conceptual study, experimental simulation results clearly indicate that the approach works in the intended way and scales better to more users.

Thomas Jatschka, Tobias Rodemann, Günther R. Raidl

Long Short-Term Memory Networks for Earthquake Detection in Venezuelan Regions

Reliable earthquake detection and location algorithms are necessary to properly catalog and analyze the continuously growing seismic records. This paper reports the results of applying Long Short-Term Memory (LSTM) networks to single-station three-channel waveforms for P-wave earthquake detection in western and north central regions of Venezuela. Precisely, we apply our technique to study the seismicity along the dextral strike-slip Boconó and La Victoria - San Sebastián faults, with complex tectonics driven by the interactions between the South American and Caribbean plates.

Sergi Mus, Norma Gutiérrez, Ruben Tous, Beatriz Otero, Leonel Cruz, David Llácer, Leonardo Alvarado, Otilio Rojas

Zero-Shot Fashion Products Clustering on Social Image Streams

Computer Vision methods have been proposed to solve the problem of matching photographs containing some products from users in social media to products in retail catalogues. This is challenging due to the quality of the photographies, difficulties in dealing with garments and their category taxonomy. A N-Shot Learning approach is required as retail catalogues may contain hundreds of different products for which, in many cases, only one image is provided. This framework can be solved by means of Deep Metric Learning (DML) techniques, in which a metric to discriminate similar than dissimilar samples is learnt. The performance of different authors tackling this problem varies a lot but even if they perform reasonably well, the set of elements they need to return in order to include the exact product is large. As after the query there is a person curating the results, it is important to return the smallest set of elements possible, being ideally just to return only one: the related product. This paper proposes to solve the image-to-product image matching problem through a product retrieval system using DML and Zero-short Learning, focusing on garments, and applying some of the last advances on clustering techniques.

Jonatan Poveda, Rubén Tous

Treating Artificial Neural Net Training as a Nonsmooth Global Optimization Problem

We attack the classical neural network training problem by successive piecewise linearization, applying three different methods for the global optimization of the local piecewise linear models. The methods are compared to each other and steepest descent as well as stochastic gradient on the regression problem for the Griewank function.

Andreas Griewank, Ángel Rojas

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise