Skip to main content

2022 | Buch

Machine Learning, Optimization, and Data Science

7th International Conference, LOD 2021, Grasmere, UK, October 4–8, 2021, Revised Selected Papers, Part II

herausgegeben von: Prof. Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Gabriele La Malfa, Giorgio Jansen, Panos M. Pardalos, Prof. Giovanni Giuffrida, Renato Umeton

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This two-volume set, LNCS 13163-13164, constitutes the refereed proceedings of the 7th International Conference on Machine Learning, Optimization, and Data Science, LOD 2021, together with the first edition of the Symposium on Artificial Intelligence and Neuroscience, ACAIN 2021.

The total of 86 full papers presented in this two-volume post-conference proceedings set was carefully reviewed and selected from 215 submissions. These research articles were written by leading scientists in the fields of machine learning, artificial intelligence, reinforcement learning, computational optimization, neuroscience, and data science presenting a substantial array of ideas, technologies, algorithms, methods, and applications.​

Inhaltsverzeichnis

Frontmatter
Boosted Embeddings for Time-Series Forecasting

Time-series forecasting is a fundamental task emerging from diverse data-driven applications. Many advanced autoregressive methods such as ARIMA were used to develop forecasting models. Recently, deep learning based methods such as DeepAR, NeuralProphet, and Seq2Seq have been explored for the time-series forecasting problem. In this paper, we propose a novel time-series forecast model, DeepGB. We formulate and implement a variant of gradient boosting wherein the weak learners are deep neural networks whose weights are incrementally found in a greedy manner over iterations. In particular, we develop a new embedding architecture that improves the performance of many deep learning models on time-series data using a gradient boosting variant. We demonstrate that our model outperforms existing comparable state-of-the-art methods using real-world sensor data and public data sets.

Sankeerth Rao Karingula, Nandini Ramanan, Rasool Tahmasbi, Mehrnaz Amjadi, Deokwoo Jung, Ricky Si, Charanraj Thimmisetty, Luisa F. Polania, Marjorie Sayer, Jake Taylor, Claudionor Nunes Coelho Jr
Deep Reinforcement Learning for Optimal Energy Management of Multi-energy Smart Grids

This paper proposes a Deep Reinforcement Learning approach for optimally managing multi-energy systems in smart grids. The optimal control problem of the production and storage units within the smart grid is formulated as a Partially Observable Markov Decision Process (POMDP), and is solved using an actor-critic Deep Reinforcement Learning algorithm. The framework is tested on a novel multi-energy residential microgrid model that encompasses electrical, heating and cooling storage as well as thermal production systems and renewable energy generation. One of the main challenges faced when dealing with real-time optimal control of such multi-energy systems is the need to take multiple continuous actions simultaneously. The proposed Deep Deterministic Policy Gradient (DDPG) agent has shown to handle well the continuous state and action spaces and learned to simultaneously take multiple actions on the production and storage systems that allow to jointly optimize the electrical, heating and cooling usages within the smart grid. This allows the approach to be applied for the real-time optimal energy management of larger scale multi-energy Smart Grids like eco-distrits and smart cities where multiple continuous actions need to be taken simultaneously.

Dhekra Bousnina, Gilles Guerassimoff
A k-mer Based Sequence Similarity for Pangenomic Analyses

In this work we propose an approach to improve the performance of a current methodology, computing k-mer based sequence similarity via Jaccard index, for pangenomic analyses. Recent studies have shown a good performance of such a measure for retrieving homology among genetic sequences belonging to a group of genomes.Our improvement is obtained by exploiting a suitable k-mer representation, which enables a fast and memory-cheap computation of sequence similarity. Experimental results on genomes of living organisms of different species give an evidence that a state of the art methodology is here improved, in terms of running time and memory requirements.

Vincenzo Bonnici, Andrea Cracco, Giuditta Franco
A Machine Learning Approach to Daily Capacity Planning in E-Commerce Logistics

Due to the accelerated activity in e-commerce especially since the COVID-19 outbreak, the congestion in the transportation systems is continually increasing, which affects on-time delivery of regular parcels and groceries. An important constraint is the fact that a given number of delivery drivers have a limited amount of time and daily capacity, leading to the need for effective capacity planning. In this paper, we employ a Gaussian Process Regression (GPR) approach to predict the daily delivery capacity of a fleet starting their routes from a cross-dock depot and for a specific time slot. Each prediction specifies how many deliveries in total the drivers in a given cross-dock can make for a certain time-slot of the day. Our results show that the GPR model outperforms other state-of-the-art regression methods. We also improve our model by updating it daily using shipments delivered within the day, in response to unexpected events during the day, as well as accounting for special occasions like Black Friday or Christmas.

Barış Bayram, Büşra Ülkü, Gözde Aydın, Raha Akhavan-Tabatabaei, Burcin Bozkaya
Explainable AI for Financial Forecasting

One of the most important steps when employing machine learning approaches is the feature engineering process. It plays a key role in the identification of features that can effectively help modeling the given classification or regression task. This process is usually not trivial and it might lead to the development of handcrafted features. Within the financial domain, this step is even more complex given the general low correlation between features extracted from financial data and their associated labels. This represents indeed a challenging task that it is possible to explore today through the explainable artificial intelligence approaches that have recently appeared in the literature. This paper examines the potential of machine learning automatic feature selection process to support decisions in financial forecasting. Using explainable artificial intelligence methods, we develop different feature selection strategies in an applied financial setting where we want to predict the next-day returns for a set of input stocks. We propose to identify the relevant features for each stock individually; in this way, we take into account the heterogeneous stocks’ behavior. We demonstrate that our approach can separate important features from unimportant ones and bring prediction performance improvements as shown by our performed comparisons between our proposed strategies and several state-of-the-art baselines on real-world financial time series.

Salvatore Carta, Alessandro Sebastian Podda, Diego Reforgiato Recupero, Maria Madalina Stanciu
Online Semi-supervised Learning from Evolving Data Streams with Meta-features and Deep Reinforcement Learning

Online semi-supervised learning (SSL) from data streams is an emerging area of research with many applications due to the fact that it is often expensive, time-consuming, and sometimes even unfeasible to collect labelled data from streaming domains. State-of-the-art online SSL algorithms use clustering techniques to maintain micro-clusters, or, alternatively, employ wrapper methods that utilize pseudo-labeling based on confidence scores. Current approaches may introduce false behaviour or make limited use of labelled instances, thus potentially leading to important information being overlooked. In this paper, we introduce the novel Online Reinforce SSL algorithm that uses various K Nearest Neighbour (KNN) classifiers to learn meta-features across diverse domains. Our Online Reinforce SSL algorithm features a meta-reinforcement learning agent trained on multiple-source streams obtained by extracting meta-features and subsequently transferring this meta-knowledge to our target domain. That is, the predictions of the KNN learners are used to select pseudo-labels for the target domain as instances arrive via an incremental learning paradigm. Extensive experiments on benchmark datasets demonstrate the value of our approach and confirm that Online Reinforce SSL outperforms both the state-of-the-art and a self-training baseline.

Parsa Vafaie, Herna Viktor, Eric Paquet, Wojtek Michalowski
Dissecting FLOPs Along Input Dimensions for GreenAI Cost Estimations

The term GreenAI refers to a novel approach to Deep Learning, that is more aware of the ecological impact and the computational efficiency of its methods. The promoters of GreenAI suggested the use of Floating Point Operations (FLOPs) as a measure of the computational cost of Neural Networks; however, that measure does not correlate well with the energy consumption of hardware equipped with massively parallel processing units like GPUs or TPUs. In this article, we propose a simple refinement of the formula used to compute floating point operations for convolutional layers, called $$\alpha $$ α -FLOPs, explaining and correcting the traditional discrepancy with respect to different layers, and closer to reality. The notion of $$\alpha $$ α -FLOPs relies on the crucial insight that, in case of inputs with multiple dimensions, there is no reason to believe that the speedup offered by parallelism will be uniform along all different axes.

Andrea Asperti, Davide Evangelista, Moreno Marzolla
Development of a Hybrid Modeling Methodology for Oscillating Systems with Friction

Modeling of dynamical systems is essential in many areas of engineering, such as product development and condition monitoring. Currently, the two main approaches in modeling of dynamical systems are the physical and the data-driven one. Both approaches are sufficient for a wide range of applications but suffer from various disadvantages, e.g., a reduced accuracy due to the limitations of the physical model or due to missing data. In this work, a methodology for modeling dynamical systems is introduced, which expands the area of application by combining the advantages of both approaches while weakening the respective disadvantages. The objective is to obtain increased accuracy with reduced complexity. Two models are used, a physical model predicts the system behavior in a simplified manner, while the data-driven model accounts for the discrepancy between reality and the simplified model. This hybrid approach is validated experimentally on a double pendulum.

Meike Wohlleben, Amelie Bender, Sebastian Peitz, Walter Sextro
Numerical Issues in Maximum Likelihood Parameter Estimation for Gaussian Process Interpolation

This article investigates the origin of numerical issues in maximum likelihood parameter estimation for Gaussian process (GP) interpolation and investigates simple but effective strategies for improving commonly used open-source software implementations. This work targets a basic problem but a host of studies, particularly in the literature of Bayesian optimization, rely on off-the-shelf GP implementations. For the conclusions of these studies to be reliable and reproducible, robust GP implementations are critical.

Subhasish Basak, Sébastien Petit, Julien Bect, Emmanuel Vazquez
KAFE: Knowledge and Frequency Adapted Embeddings

Word embeddings are widely used in several Natural Language Processing (NLP) applications. The training process typically involves iterative gradient updates of each word vector. This makes word frequency a major factor in the quality of embedding, and in general the embedding of words with few training occurrences end up being of poor quality. This is problematic since rare and frequent words, albeit semantically similar, might end up far from each other in the embedding space.In this study, we develop KAFE (Knowledge And Frequency adapted Embeddings) which combines adversarial principles and knowledge graph to efficiently represent both frequent and rare words. The goal of adversarial training in KAFE is to minimize the spatial distinguishability (separability) of frequent and rare words in the embedding space. The knowledge graph encourages the embedding to follow the structure of the domain-specific hierarchy, providing an informative prior that is particularly important for words with low amount of training data. We demonstrate the performance of KAFE in representing clinical diagnoses using real-world Electronic Health Records (EHR) data coupled with a knowledge graph. EHRs are notorious for including ever-increasing numbers of rare concepts that are important to consider when defining the state of the patient for various downstream applications. Our experiments demonstrate better intelligibility through visualisation, as well as higher prediction and stability scores of KAFE over state-of-the-art.

Awais Ashfaq, Markus Lingman, Slawomir Nowaczyk
Improved Update Rule and Sampling of Stochastic Gradient Descent with Extreme Early Stopping for Support Vector Machines

We propose three techniques for improving accuracy and speed of margin stochastic gradient descent support vector machines (MSGDSVM). The first technique is to use sampling with full replacement. The second technique is to use the new update rule derived from the squared hinge loss function. The third technique is to limit the number of values for tuning of the margin hyperparameter M. We also provide theoretical analysis of a novel optimization problem for the proposed update rule. The first two techniques improve accuracy of MSGDSVM and the last one speed of tuning. Experiments show that the proposed method achieves superior accuracy compared to MSGDSVM for binary and multiclass classification, with similar generalization performance to sequential minimal optimization (SMO) and is faster than MSGDSVM.

Marcin Orchel, Johan A. K. Suykens
A Hybrid Surrogate-Assisted Accelerated Random Search and Trust Region Approach for Constrained Black-Box Optimization

This paper presents a hybrid surrogate-based approach for constrained expensive black-box optimization that combines RBF-assisted Constrained Accelerated Random Search (CARS-RBF) with the CONORBIT trust region method. Extensive numerical experiments have shown the effectiveness of the CARS-RBF and CONORBIT algorithms on many test problems and the hybrid algorithm combines the strengths of these methods. The proposed CARS-RBF-CONORBIT hybrid alternates between running CARS-RBF for global search and a series of local searches using the CONORBIT trust region algorithm. In particular, after each CARS-RBF run, a fraction of the best feasible sample points are clustered to identify potential basins of attraction. Then, CONORBIT is run several times using each cluster of sample points as initial points together with infeasible sample points within a certain radius of the centroid of each cluster. One advantage of this approach is that the CONORBIT runs reuse some of the feasible and infeasible sample points that were previously generated by CARS-RBF and other CONORBIT runs. Numerical experiments on the CEC 2010 benchmark problems showed promising results for the proposed hybrid in comparison with CARS-RBF or CONORBIT alone given a relatively limited computational budget.

Rommel G. Regis
Health Change Detection Using Temporal Transductive Learning

Industrial equipment, devices and patients typically undergo change from a healthy state to an unhealthy state. We develop a novel approach to detect unhealthy entities and also discover the time of change to enable deeper investigation into the cause for change. In the absence of an engineering or medical intervention, health degradation only happens in one direction — healthy to unhealthy. Our transductive learning framework, known as max-margin temporal transduction (MMTT), leverages this chronology of observations for learning a superior model with minimal supervision. Temporal Transduction is achieved by incorporating chronological constraints in the conventional max-margin classifier — Support Vector Machines (SVM). We utilize stochastic gradient descent to solve the resulting optimization problem. We prove that with high probability, an $$\epsilon $$ ϵ -accurate solution for the proposed model can be achieved in $${\text {O}}\left( \frac{1}{\lambda \epsilon }\right) $$ O 1 λ ϵ iterations. The runtime is $${\text {O}}\left( \frac{1}{\lambda \epsilon }\right) $$ O 1 λ ϵ for the linear kernel and $${\text {O}}\left( \frac{n}{\lambda \epsilon }\right) $$ O n λ ϵ for a non-linear Mercer kernel, where n is the number of observations from all entities — labeled and unlabeled. Our experiments on publicly available benchmark datasets demonstrate the effectiveness of our approach in accurately detecting unhealthy entities with less supervision as compared to other strong baselines — conventional and transductive SVM.

Abhay Harpale
A Large Visual Question Answering Dataset for Cultural Heritage

Visual Question Answering (VQA) is gaining momentum for its ability of bridging Computer Vision and Natural Language Processing. VQA approaches mainly rely on Machine Learning algorithms that need to be trained on large annotated datasets. Once trained, a machine learning model is barely portable on a different domain. This calls for agile methodologies for building large annotated datasets from existing resources. The cultural heritage domain represents both a natural application of this task and an extensive source of data for training and validating VQA models. To this end, by using data and models from ArCo, the knowledge graph of the Italian cultural heritage, we generated a large dataset for VQA in Italian and English. We describe the results and the lessons learned by our semi-automatic process for the dataset generation and discuss the employed tools for data extraction and transformation.

Luigi Asprino, Luana Bulla, Ludovica Marinucci, Misael Mongiovì, Valentina Presutti
Expressive Graph Informer Networks

Applying machine learning to molecules is challenging because of their natural representation as graphs rather than vectors. Several architectures have been recently proposed for deep learning from molecular graphs, but they suffer from information bottlenecks because they only pass information from a graph node to its direct neighbors. Here, we introduce a more expressive route-based multi-attention mechanism that incorporates features from routes between node pairs. We call the resulting method Graph Informer. A single network layer can therefore attend to nodes several steps away. We show empirically that the proposed method compares favorably against existing approaches in two prediction tasks: (1) 13C Nuclear Magnetic Resonance (NMR) spectra, improving the state-of-the-art with an MAE of 1.35 ppm and (2) predicting drug bioactivity and toxicity. Additionally, we develop a variant called injective Graph Informer that is provably more powerful than the Weisfeiler-Lehman test for graph isomorphism. We demonstrate that the route information allows the method to be informed about the non-local topology of the graph and, thus, it goes beyond the capabilities of the Weisfeiler-Lehman test. Our code is available at github.com/jaak-s/graphinformer .

Jaak Simm, Adam Arany, Edward De Brouwer, Yves Moreau
Zero-Shot Learning-Based Detection of Electric Insulators in the Wild

An electric insulator is an essential device for an electric power system. Therefore, maintenance of insulators on electric poles has vital importance. Unmanned Aerial Vehicles (UAV’s) are used to inspect conditions of electric insulators placed in remote and hostile terrains where human inspection is not possible. Insulators vary in terms of physical appearance and hence the insulator detection technology present in the UAV in principle should be able to identify an insulator device in the wild, even though it has never seen that particular type of insulator before. To address this problem a Zero-Shot Learning-based technique is proposed that can detect an insulator device, that has never seen during the training phase. Different convolutional neural network models are used for feature extraction and are coupled with various signature attributes to detect an unseen insulator type. Experimental results show that inceptionsV3 has better performance on electric insulators dataset and basic signature attributes; “Color and number of plates” of the insulator is the best way to classify insulators dataset while the number of training classes doesn’t have much effect on performance. Encouraging results were obtained.

Ibraheem Azeem, Moayid Ali Zaidi
Randomized Iterative Methods for Matrix Approximation

Standard tools to update approximations to a matrix A (for example, Quasi-Newton Hessian approximations in optimization) incorporate computationally expensive one-sided samples $$A\,V$$ A V . This article develops randomized algorithms to efficiently approximate A by iteratively incorporating cheaper two-sided samples $$U^\top A\,V$$ U ⊤ A V . Theoretical convergence rates are proved and realized in numerical experiments. A heuristic accelerated variant is developed and shown to be competitive with existing methods based on one-sided samples.

Joy Azzam, Benjamin W. Ong, Allan A. Struthers
Improved Migrating Birds Optimization Algorithm to Solve Hybrid Flowshop Scheduling Problem with Lot-Streaming of Random Breakdown

An improved migrating birds optimization (IMBO) algorithm is proposed to solve the hybrid flowshop scheduling problem with lot-streaming of random breakdown (RBHLFS) with the aim of minimizing the total flow time. To ensure the diversity of the initial population, a Nawaz-Enscore-Ham (NEH) heuristic algorithm is used. A greedy algorithm is used to construct a combined neighborhood search structure. An effective local search procedure is utilized to explore potential promising neighborhoods. In addition, a reset mechanism is added to avoid falling into local optimum. Extensive experiments and comparisons demonstrate the feasibility and effectiveness of the proposed algorithm.

Ping Wang, Renato De Leone, Hongyan Sang
Building Knowledge Base for the Domain of Economic Mobility of Older Workers

This paper presents the work of building a knowledge base for the domain of economic mobility for older workers. To extract high-quality entities and relations that are important to the specific domain, domain specificity scores for entities and relations are designed and applied. To assist human-in-the-loop ontology construction, a novel topic modeling method, named “description guided topic modeling”, is developed. It clusters domain entities based on their embedding and organizes those clusters according to descriptions of potential topics important to the domain. To demonstrate feasibility, these methods are applied to a collection of knowledge sources related to economic mobility for older workers. These methods are further tested through a case study on one specific barrier for economic mobility, i.e., limited broadband access for older workers, to show the potential of these methods.

Ying Li, Vitalii Zakhozhyi, Yu Fu, Joy He-Yueya, Vishwa Pardeshi, Luis J. Salazar
Optimisation of a Workpiece Clamping Position with Reinforcement Learning for Complex Milling Applications

Fine-tuning and optimisation of production processes in manufacturing are often conducted with the help of algorithms from the field of Operations Research (OR) or directly by human experts. Machine Learning (ML) methods demonstrate outstanding results in tackling optimisation tasks within the research field referred to as Neural Combinatorial Optimisation (NCO). This opens multiple opportunities in manufacturing for learning-based optimisation solutions. In this work, we show a successful application of Reinforcement Learning (RL) to the task of workpiece (WP) clamping position and orientation optimisation for milling processes. A carefully selected clamping position and orientation of a WP are essential for minimising machine tool wear and energy consumption. With the example of 3- and 5-axis milling, we demonstrate that a trained RL agent can successfully find a near-optimal orientation and positioning for new, previously unseen WPs. The achieved solution quality is comparable to alternative optimisation solutions relying on Simulated Annealing (SA) and Genetic Algorithms (GA) while requiring orders of magnitude fewer optimisation iterations.

Chrismarie Enslin, Vladimir Samsonov, Hans-Georg Köpken, Schirin Bär, Daniel Lütticke
Thresholding Procedure via Barzilai-Borwein Rules for the Steplength Selection in Stochastic Gradient Methods

A crucial aspect in designing a learning algorithm is the selection of the hyperparameters (parameters that are not trained during the learning process). In particular the effectiveness of the stochastic gradient methods strongly depends on the steplength selection. In recent papers [9, 10], Franchini et al. propose to adopt an adaptive selection rule borrowed from the full-gradient scheme known as Limited Memory Steepest Descent method [8] and appropriately tailored to the stochastic framework. This strategy is based on the computation of the eigenvalues (Ritz-like values) of a suitable matrix obtained from the gradients of the most recent iterations, and it enables to give an estimation of the local Lipschitz constant of the current gradient of the objective function, without introducing line-search techniques. The possible increase of the size of the sub-sample used to compute the stochastic gradient is driven by means of an augmented inner product test approach [3]. The whole procedure makes the tuning of the parameters less expensive than the selection of a fixed steplength, although it remains dependent on the choice of threshold values bounding the variability of the steplength sequences. The contribution of this paper is to exploit a stochastic version of the Barzilai-Borwein formulas [1] to adaptively select the endpoints range for the Ritz-like values. A numerical experimentation for some convex loss functions highlights that the proposed procedure remains stable as well as the tuning of the hyperparameters appears less expensive.

Giorgia Franchini, Valeria Ruggiero, Ilaria Trombini
Learning Beam Search: Utilizing Machine Learning to Guide Beam Search for Solving Combinatorial Optimization Problems

Beam search (BS) is a well-known incomplete breadth-first-search variant frequently used to find heuristic solutions to hard combinatorial optimization problems. Its key ingredient is a guidance heuristic that estimates the expected length (cost) to complete a partial solution. While this function is usually developed manually for a specific problem, we propose a more general Learning Beam Search (LBS) that uses a machine learning model for guidance. Learning is performed by utilizing principles of reinforcement learning: LBS generates training data on its own by performing nested BS calls on many representative randomly created problem instances. The general approach is tested on two specific problems, the longest common subsequence problem and the constrained variant thereof. Results on established sets of benchmark instances indicate that the BS with models trained via LBS is highly competitive. On many instances new so far best solutions could be obtained, making the approach a new state-of-the-art method for these problems and documenting the high potential of this general framework.

Marc Huber, Günther R. Raidl
Modular Networks Prevent Catastrophic Interference in Model-Based Multi-task Reinforcement Learning

In a multi-task reinforcement learning setting, the learner commonly benefits from training on multiple related tasks by exploiting similarities among them. At the same time, the trained agent is able to solve a wider range of different problems. While this effect is well documented for model-free multi-task methods, we demonstrate a detrimental effect when using a single learned dynamics model for multiple tasks. Thus, we address the fundamental question of whether model-based multi-task reinforcement learning benefits from shared dynamics models in a similar way model-free methods do from shared policy networks. Using a single dynamics model, we see clear evidence of task confusion and reduced performance. As a remedy, enforcing an internal structure for the learned dynamics model by training isolated sub-networks for each task notably improves performance while using the same amount of parameters. We illustrate our findings by comparing both methods on a simple gridworld and a more complex vizdoom multi-task experiment.

Robin Schiewer, Laurenz Wiskott
A New Nash-Probit Model for Binary Classification

The Nash equilibrium is used to estimate the parameters of a Probit binary classification model transformed into a multiplayer game. Each training data instance is a player of the game aiming to maximize its own log likelihood function. The Nash equilibrium of this game is approximated by modifying the Covariance Matrix Adaptation Evolution Strategy to search for the Nash equilibrium by using tournament selection with a Nash ascendancy relation based fitness assignment. The Nash ascendancy relation allows the comparison of two strategy profiles of the game. The purpose of the approach is to explore the Nash equilibrium as an alternate solution concept to the maximization of the log likelihood function. Numerical experiments illustrate the behavior of this approach, showing that for some instances the Nash equilibrium based solution can be better than the one offered by the baseline Probit model.

Mihai-Alexandru Suciu, Rodica Ioana Lung
An Optimization Method for Accurate Nonparametric Regressions on Stiefel Manifolds

We consider the problem of regularized nonlinear regression on Riemannian Stiefel manifolds when only few observations are available. In this paper, we introduce a novel geometric method to estimate missing data using continuous and smooth temporal trajectories to overcome the discrete nature of observations. This approach is important in many applications and representation spaces where nonparametric regression for data that verify orthogonality constraints is crucial. To illustrate the behavior of the proposed approach, we give all numerical details and provide geometric tools for computational efficiency.

Ines Adouani, Chafik Samir
Using Statistical and Artificial Neural Networks Meta-learning Approaches for Uncertainty Isolation in Face Recognition by the Established Convolutional Models

We investigate ways of increasing trust in verdicts of the established Convolutional Neural Network models for the face recognition task. In the mission-critical application settings, additional metrics of the models’ uncertainty in their verdicts can be used for isolating low-trust verdicts in the additional ‘uncertain’ class, thus increasing trusted accuracy of the model at the expense of the sheer number of the ‘certain’ verdicts. In the study, six established Convolutional Neural Network models are tested on the makeup and occlusions data set partitioned to emulate and exaggerate the usual for real-life conditions training and test set disparity. Simple A/B and meta-learning supervisor Artificial Neural Network solutions are tested to learn the error patterns of the underlying Convolutional Neural Networks.

Stanislav Selitskiy, Nikolaos Christou, Natalya Selitskaya
Multi-Asset Market Making via Multi-Task Deep Reinforcement Learning

Market making (MM) is a trading activity by an individual market participant or a member firm of an exchange that buys and sells same securities with the primary goal of profiting on the bid-ask spread, which contributes to the market liquidity. Reinforcement learning (RL) is emerging as a quite popular method for automated market making, in addition to many other financial problems. The current state of the art in MM based on RL includes two recent benchmarks which use temporal-difference learning with Tile-Codings and Deep Q Networks (DQN). These two benchmark approaches focus on single-asset modelling, limiting their applicability in realistic scenarios, where the MM agents are required to trade on a collection of assets. Moreover, the Multi-Asset trading reduces the risk associated with the returns. Therefore, we design a Multi-Asset Market Making (MAMM) model, known as MTDRLMM, based on Multi-Task Deep RL. From a Multi-Task Learning perspective, multiple assets are considered as multiple tasks of the same nature. These assets share common characteristics among them, along with their individual traits. The experimental results show that the MAMM is more profitable than Single-Asset MM, in general. Moreover, the MTDRLMM model achieves the state-of-the-art in terms of investment return in a collection of assets.

Abbas Haider, Glenn I. Hawe, Hui Wang, Bryan Scotney
Evaluating Hebbian Learning in a Semi-supervised Setting

We propose a semi-supervised learning strategy for deep Convolutional Neural Networks (CNNs) in which an unsupervised pre-training stage, performed using biologically inspired Hebbian learning algorithms, is followed by supervised end-to-end backprop fine-tuning. We explored two Hebbian learning rules for the unsupervised pre-training stage: soft-Winner-Takes-All (soft-WTA) and nonlinear Hebbian Principal Component Analysis (HPCA). Our approach was applied in sample efficiency scenarios, where the amount of available labeled training samples is very limited, and unsupervised pre-training is therefore beneficial. We performed experiments on CIFAR10, CIFAR100, and Tiny ImageNet datasets. Our results show that Hebbian outperforms Variational Auto-Encoder (VAE) pre-training in almost all the cases, with HPCA generally performing better than soft-WTA.

Gabriele Lagani, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato
Experiments on Properties of Hidden Structures of Sparse Neural Networks

Sparsity in the structure of Neural Networks can lead to less energy consumption, less memory usage, faster computation times on convenient hardware, and automated machine learning. If sparsity gives rise to certain kinds of structure, it can explain automatically obtained features during learning.We provide insights into experiments in which we show how sparsity can be achieved through prior initialization, pruning, and during learning, and answer questions on the relationship between the structure of Neural Networks and their performance. This includes the first work of inducing priors from network theory into Recurrent Neural Networks and an architectural performance prediction during a Neural Architecture Search. Within our experiments, we show how magnitude class blinded pruning achieves 97.5% on MNIST with 80% compression and re-training, which is 0.5 points more than without compression, that magnitude class uniform pruning is significantly inferior to it and how a genetic search enhanced with performance prediction achieves 82.4% on CIFAR10. Further, performance prediction for Recurrent Networks learning the Reber grammar shows an $$R^2$$ R 2 of up to 0.81 given only structural information.

Julian Stier, Harshil Darji, Michael Granitzer
Active Learning for Capturing Human Decision Policies in a Data Frugal Context

Modeling human expert decision patterns can potentially help create training and decision support systems when no ground truth data is available. A cognitive modeling approach presented herein uses a combination of supervised learning methods to mimic expert strategies. Yet without historical data logs on human expert judgments in a given domain, training machine learning algorithms with new examples to be labelled one by one by human experts can be time-consuming and costly. This paper investigates the use of active learning methods for example selection in policy capturing sessions with an oracle in order to optimize frugal learning efficiency. It also introduces a new hybrid method aimed at improving predictive accuracy based on a better management of the exploration/exploitation tradeoff. Analyses on three datasets evaluated data exploration, data exploitation and finally hybrid methods. Results highlight different tradeoffs of those methods and show the benefits of using a hybrid approach.

Loïc Grossetête, Alexandre Marois, Bénédicte Chatelais, Christian Gagné, Daniel Lafond
Adversarial Perturbations for Evolutionary Optimization

Sampling methods are a critical step for model-based evolutionary algorithms, their goal being the generation of new and promising individuals based on the information provided by the model. Adversarial perturbations have been proposed as a way to create samples that deceive neural networks. In this paper we introduce the idea of creating adversarial perturbations that correspond to promising solutions of the search space. A surrogate neural network is “fooled” by an adversarial perturbation algorithm until it produces solutions that are likely to be of higher fitness than the present ones. Using a benchmark of functions with varying levels of difficulty, we investigate the performance of a number of adversarial perturbation techniques as sampling methods. The paper also proposes a technique to enhance the effect that adversarial perturbations produce in the network. While adversarial perturbations on their own are not able to produce evolutionary algorithms that compete with state of the art methods, they provide a novel and promising way to combine local optimizers with evolutionary algorithms.

Unai Garciarena, Jon Vadillo, Alexander Mendiburu, Roberto Santana
Cascaded Classifier for Pareto-Optimal Accuracy-Cost Trade-Off Using Off-the-Shelf ANNs

Machine-learning classifiers provide high quality of service in classification tasks. Research now targets cost reduction measured in terms of average processing time or energy per solution. Revisiting the concept of cascaded classifiers, we present a first of its kind analysis of optimal pass-on criteria between the classifier stages. Based on this analysis, we derive a methodology to maximize accuracy and efficiency of cascaded classifiers. On the one hand, our methodology allows cost reduction of 1.32 $$\times $$ × while preserving reference classifier’s accuracy. On the other hand, it allows to scale cost over two orders while gracefully degrading accuracy. Thereby, the final classifier stage sets the top accuracy. Hence, the multi-stage realization can be employed to optimize any state-of-the-art classifier.

Cecilia Latotzke, Johnson Loh, Tobias Gemmeke
Conditional Generative Adversarial Networks for Speed Control in Trajectory Simulation

Motion behaviour is driven by several factors - goals, neighbouring agents, social relations, physical and social norms, the environment with its variable characteristics, and further. Most factors are not directly observable and must be modelled from context. Trajectory prediction, is thus a hard problem, and has seen increasing attention from researchers in the recent years. Prediction of motion, in application, must be realistic, diverse and controllable. In spite of increasing focus on multimodal trajectory generation, most methods still lack means for explicitly controlling different modes of the data generation. Further, most endeavours invest heavily in designing special mechanisms to learn the interactions in latent space. We present Conditional Speed GAN (CSG), that allows controlled generation of diverse and socially acceptable trajectories, based on user controlled speed. During prediction, CSG forecasts future speed from latent space and conditions its generation based on it. CSG is comparable to recent GAN methods in terms of the benchmark distance metrics, with the additional advantage of controlled simulation and data augmentation for different contexts. Furthermore, we compare the effect of different aggregation mechanisms and demonstrate that a naive approach of concatenation works comparable to its attention and pooling alternatives. (Open source code available at: https://github.com/ConditionalSpeedGAN/CSG ).

Sahib Julka, Vishal Sowrirajan, Joerg Schloetterer, Michael Granitzer
An Integrated Approach to Produce Robust Deep Neural Network Models with High Efficiency

Deep Neural Networks (DNNs) need to be both efficient and robust for practical uses. Quantization and structure simplification are promising ways to adapt DNNs to mobile devices, and adversarial training is one of the most successful methods to train robust DNNs. In this work, we aim to realize both advantages by applying a convergent relaxation quantization algorithm, i.e., Binary-Relax (BR), to an adversarially trained robust model, i.e. the ResNets Ensemble via Feynman-Kac Formalism (EnResNet). We discover that high-precision quantization, such as ternary (tnn) or 4-bit, produces sparse DNNs. However, this sparsity is unstructured under adversarial training. To solve the problems that adversarial training jeopardizes DNNs’ accuracy on clean images and break the structure of sparsity, we design a trade-off loss function that helps DNNs preserve natural accuracy and improve channel sparsity. With our newly designed trade-off loss function, we achieve both goals with no reduction of resistance under weak attacks and very minor reduction of resistance under strong adversarial attacks. Together with our model and algorithm selections and loss function design, we provide an integrated approach to produce robust DNNs with high efficiency and accuracy. Furthermore, we provide a missing benchmark on robustness of quantized models.

Zhijian Li, Bao Wang, Jack Xin
Leverage Score Sampling for Complete Mode Coverage in Generative Adversarial Networks

Commonly, machine learning models minimize an empirical expectation. As a result, the trained models typically perform well for the majority of the data but the performance may deteriorate in less dense regions of the dataset. This issue also arises in generative modeling. A generative model may overlook underrepresented modes that are less frequent in the empirical data distribution. This problem is known as complete mode coverage. We propose a sampling procedure based on ridge leverage scores which significantly improves mode coverage when compared to standard methods and can easily be combined with any GAN. Ridge leverage scores are computed by using an explicit feature map, associated with the next-to-last layer of a GAN discriminator or of a pre-trained network, or by using an implicit feature map corresponding to a Gaussian kernel. Multiple evaluations against recent approaches of complete mode coverage show a clear improvement when using the proposed sampling strategy.

Joachim Schreurs, Hannes De Meulemeester, Michaël Fanuel, Bart De Moor, Johan A. K. Suykens
Public Transport Arrival Time Prediction Based on GTFS Data

Public transport (PT) systems are essential to human mobility. PT investments continue to grow, in order to improve PT services. Accurate PT arrival time prediction (PT-ATP) is vital for PT systems delivering an attractive service, since the waiting experience for urban residents is an urgent problem to be solved. However, accurate PT-ATP is a challenging task due to the fact that urban traffic conditions are complex and changeable. Nowadays thousands of PT agencies publish their public transportation route and timetable information with the General Transit Feed Specification (GTFS) as the standard open format. Such data provide new opportunities for using the data-driven approaches to provide effective bus information system. This paper proposes a new framework to address the PT-ATP problem by using GTFS data. Also, an overview of various ML models for PT-ATP purposes is presented, along with the insightful findings through the comparison procedure based on real GTFS datasets. The results showed that the neural network -based method outperforms its rivals in terms of prediction accuracy.

Eva Chondrodima, Harris Georgiou, Nikos Pelekis, Yannis Theodoridis
The Optimized Social Distance Lab
A Methodology for Automated Building Layout Redesign for Social Distancing

The research considers buildings as a test case for the development and implementation of multi-objective optimized social distance layout redesign. This research aims to develop and test a unique methodology using software Wallacei and the NSGA-II algorithm to automate the redesign of an interior layout to automatically provide compliant social distancing using fitness functions of social distance, net useable space and total number of users. The process is evaluated in a live lab scenario, with results demonstrating that the methodology provides an agile, accurate, efficient and visually clear outcome for automating a compliant layout for social distancing.

Des Fagan, Ruth Conroy Dalton
Distilling Financial Models by Symbolic Regression

Symbolic Regression has been widely used during the last decades for inferring complex models. The foundation of its success is due to the ability to recognize data correlations, defining non-trivial and interpretable models. In this paper, we apply Symbolic Regression to explore possible uses and obstacles for describing stochastic financial processes. Symbolic Regression (SR) with Genetic Programming (GP) is used to extract financial formulas, inspired by the theory of financial stochastic processes and Itô Lemma. For this purpose, we introduce in the model two operators: the derivative and the integral. The experiments are conducted on five market indices that are reliable at defining the evolution of the processes in time: Tokyo Stock Price Index (TOPIX), Standard & Poors 500 Index (SPX), Dow Jones (DJI), FTSE 100 (FTSE) and Nasdaq Composite (NAS). To avoid both trivial and not interpretable results, an error-complexity optimization is accomplished. We perform computational experiments to obtain and investigate simple and accurate financial models. The Pareto Front is used to select between multiple candidates removing the over specified ones. We also test Eureqa as a benchmark to extract invariant equations. The results we obtain highlight the limitations and some pursuable paths in the study of financial processes with SR and GP techniques.

Gabriele La Malfa, Emanuele La Malfa, Roman Belavkin, Panos M. Pardalos, Giuseppe Nicosia
Analyzing Communication Broadcasting in the Digital Space

This paper aims to understand complex social events that arise when communicating general concepts in the digital space. Today, we get informed through many different channels, at different times of the day, in different contexts, and on many different devices. In addition to that, more complexity is added by the bidirectional nature of the communication itself. People today react very quickly to specific topics through various means such as rating, sharing, commenting, tagging, icons, tweeting, etc. Such activities generate additional metadata to the information itself which become part of the original message. When planning proper communication we should consider all this. In such a complicated environment, the likelihood of a message’s real meaning being received in a distorted or confused way is very high.However, as we have seen recently during the Covid-19 pandemic, at times, there is the need to communicate something, somewhat complicated in nature, while we need to make sure citizens fully understand the actual terms and meaning of the communication. This was the case faced by many governments worldwide when informing their population on the rules of conduct during the various lockdown periods.We analyzed trends and structure of social network data generated as a reaction to those official communications in Italy. Our goal is to derive a model to estimate whether the communication intended by the government was properly understood by the large population. We discovered some regularities in social media generated data related to “poorly” communicated issues.We believe it is possible to derive a model to measure how well the recipients grasp a specific topic. And this can be used to trigger real-time alerts when the need for clarification arises.

Giovanni Giuffrida, Francesco Mazzeo Rinaldi, Andrea Russo
Multivariate LSTM for Stock Market Volatility Prediction

Volatility is a measure of fluctuation in financial asset returns, practical measurement of risk, and a key variable for calculating options prices. Accurate prediction of volatility is crucial to maintaining profitable investments and trading strategies. Statistical models such as GARCH are used today to predict volatility and time series, though new methods are actively being researched to improve the prediction accuracy to cope with the rapidly increasing trading volumes and stock market influencing factors. The aim of this paper is to investigate a new method to improve market volatility forecasting accuracy by innovatively introducing a new setup of the Recurrent Neural Network (RNN) algorithm. In particular, the proposed model is a stacked Long Short-Term Memory (LSTM) with multivariate input composed of multiple asset daily prices of different lag time-steps. The proposed model is used to predict volatility under different market conditions and is compared to the predictions obtained with GARCH as well as to the actual volatility of the same forecasting period. The results show that the prediction of the future realized volatility using a single feature LSTM has comparable accuracy to GARCH. They also indicate that a stacked LSTM can significantly improve the volatility prediction accuracy when configured with multivariate input of more than one asset and a lagging period of more than a day. A stacked multivariate LSTM setup enables the prediction model to capture complex patterns in the time series data of assets prices and provides a superior alternative to statistical models in volatility modelling and prediction. The proposed multivariate LSTM architecture clearly shows faster and more accurate modelling of daily volatility and therefore can be used for intra-day modelling specifically for high frequency trading environments.

Osama Assaf, Giuseppe Di Fatta, Giuseppe Nicosia
Backmatter
Metadaten
Titel
Machine Learning, Optimization, and Data Science
herausgegeben von
Prof. Giuseppe Nicosia
Varun Ojha
Emanuele La Malfa
Gabriele La Malfa
Giorgio Jansen
Panos M. Pardalos
Prof. Giovanni Giuffrida
Renato Umeton
Copyright-Jahr
2022
Electronic ISBN
978-3-030-95470-3
Print ISBN
978-3-030-95469-7
DOI
https://doi.org/10.1007/978-3-030-95470-3