Skip to main content

2024 | Buch

Machine Learning, Optimization, and Data Science

9th International Conference, LOD 2023, Grasmere, UK, September 22–26, 2023, Revised Selected Papers, Part II

herausgegeben von: Giuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Gabriele La Malfa, Panos M. Pardalos, Renato Umeton

Verlag: Springer Nature Switzerland

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 9th International Conference on Machine Learning, Optimization, and Data Science, LOD 2023, which took place in Grasmere, UK, in September 2023.

The 72 full papers included in this book were carefully reviewed and selected from 119 submissions. The proceedings also contain 9 papers from and the Third Symposium on Artificial Intelligence and Neuroscience, ACAIN 2023. The contributions focus on the state of the art and the latest advances in the integration of machine learning, deep learning, nonlinear optimization and data science to provide and support the scientific and technological foundations for interpretable, explainable and trustworthy AI.

Inhaltsverzeichnis

Frontmatter

Machine Learning, Optimization, and Data Science (LOD 2023)

Frontmatter
Integrated Human-AI Forecasting for Preventive Maintenance Task Duration Estimation

Maintenance task duration estimations help manage shipyard resource usage and allow planners to decide on maintenance priorities within a limited time frame. Better estimated task durations help produce more robust resource schedules, perform more tasks in facilities such as shipyards, reduce resource idling time and increase ship operational availability. However, task duration estimations have until now been historically performed by human experts with essentially no artificial intelligence-based forecasting for shipyard operations. The analysis of historical data is also not a common practice to complement any expert-driven forecasting. To explore opportunities for using AI in this work domain, and to improve on human estimations for task durations, we propose a novel hybrid Human-AI approach that involves integrating human forecasts with data-driven models. Our empirical data comes from two fleet maintenance facilities in Canada, containing more than 13,000 anonymized historical ship work orders (WO) ranging from 2017 to 2022. We used supervised learning algorithms to forecast the preventive maintenance task duration on this data, with and without expert task duration estimates and the results demonstrate that hybrid models perform better than both human expert model and historical data alone. An average of 8.6% improvement from Hybrid human-AI model over human expert model is observed based on R2 evaluation metric. Results suggest that human forecasts, which tend to rely on a broader contextual knowledge than the inputs captured in a historical database, remain key for effective task duration estimation, yet can be fine-tuned by the pattern recognition capabilities of machine learning algorithms.

Jiye Li, Yun Yin, Daniel Lafond, Alireza Ghasemi, Claver Diallo, Eric Bertrand
Exploring Image Transformations with Diffusion Models: A Survey of Applications and Implementation Code

Diffusion Models have become increasingly popular in recent years and their applications span a wide range of fields. This survey focuses on the use of diffusion models in computer vision, specially in the branch of image transformations. The objective of this survey is to provide an overview of state-of-the-art applications of diffusion models in image transformations, including image inpainting, super-resolution, restoration, translation, and editing. This survey presents a selection of notable papers and repositories including practical applications of diffusion models for image transformations. The applications are presented in a practical and concise manner, facilitating the understanding of concepts behind diffusion models and how they function. Additionally, it includes a curated collection of GitHub repositories featuring popular examples of these subjects.

Silvia Arellano, Beatriz Otero, Ruben Tous
Geolocation Risk Scores for Credit Scoring Models

Customer location is considered as one of the most informative demographic data for predictive modeling. It has been widely used in various sectors including finance. Commercial banks use this information in the evaluation of their credit scoring systems. Generally, customer city and district are used as demographic features. Even if these features are quite informative, they are not fully capable of capturing socio-economical heterogeneity of customers within cities or districts. In this study, we introduced a micro-region approach alternative to this district or city approach. We created features based on characteristics of micro-regions and developed predictive credit risk models. Since models only used micro-region specific data, we were able to apply it to all possible locations and calculate risk scores of each micro-region. We showed their positive contribution to our regular credit risk models.

Erdem Ünal, Uğur Aydın, Murat Koraş, Barış Akgün, Mehmet Gönen
Social Media Analysis: The Relationship Between Private Investors and Stock Price

Understanding peoples’ opinions on social networks, such as Twitter or Reddit, has become easier with the assistance of analysis of users’ sentiments on these networks. In our model, social media reveals public opinions and expectations that potentially correlate with stock market price movements. Using natural language processing (NLP), this paper examines reasons for the correlation between public sentiments and stock price fluctuations in the United States. Further, we demonstrate these correlations and provide promising directions for future research.

Zijun Liu, Xinxin Wu, Wei Yao
Deep Learning Model of Two-Phase Fluid Transport Through Fractured Media: A Real-World Case Study

Modelling of fluid flow in well’s vicinity in naturally fractured reservoirs is a commonly employed technique used for wells’ productivity enhancement, like acid stimulation. Unfortunately, a detailed model reflecting the complex geophysical structure of the porous media is a timely and computationally demanding task. In this paper, a deep learning model is proposed for solving Darcy equation coupled with the transport equation, based on physics-informed neural network (PINN) deep learning technology. Datasets obtained from the 3D numerical simulator are used to train and test our method. We test the sensitivity of our method to the type of optimizer and learning rate, time step size and the number of timesteps, DNN architecture, and spatial resolution. The results of computational experiments on a real-world problem prove a good numerical stability of the solution, its computational efficiency and high precision of the PINN model.

Leonid Sheremetov, Luis A. Lopez-Peña, Gabriela B. Díaz-Cortes, Dennys A. Lopez-Falcon, Erick E. Luna-Rojero
A Proximal Algorithm for Network Slimming

As a popular channel pruning method for convolutional neural networks (CNNs), network slimming (NS) has a three-stage process: (1) it trains a CNN with $$\ell _1$$ ℓ 1 regularization applied to the scaling factors of the batch normalization layers; (2) it removes channels whose scaling factors are below a chosen threshold; and (3) it retrains the pruned model to recover the original accuracy. This time-consuming, three-step process is a result of using subgradient descent to train CNNs. Because subgradient descent does not exactly train CNNs towards sparse, accurate structures, the latter two steps are necessary. Moreover, subgradient descent does not have any convergence guarantee. Therefore, we develop an alternative algorithm called proximal NS. Our proposed algorithm trains CNNs towards sparse, accurate structures, so identifying a scaling factor threshold is unnecessary and fine tuning the pruned CNNs is optional. Using Kurdyka-Łojasiewicz assumptions, we establish global convergence of proximal NS. Lastly, we validate the efficacy of the proposed algorithm on VGGNet, DenseNet and ResNet on CIFAR 10/100. Our experiments demonstrate that after one round of training, proximal NS yields a CNN with competitive accuracy and compression.

Kevin Bui, Fanghui Xue, Fredrick Park, Yingyong Qi, Jack Xin
Diversity in Deep Generative Models and Generative AI

The decoder-based machine learning generative algorithms such as Generative Adversarial Networks (GAN), Variational Auto-Encoders (VAE), Transformers show impressive results when constructing objects similar to those in a training ensemble. However, the generation of new objects builds mainly on the understanding of the hidden structure of the training dataset followed by a sampling from a multi-dimensional normal variable. In particular each sample is independent from the others and can repeatedly propose same type of objects. To cure this drawback we introduce a kernel-based measure quantization method that can produce new objects from a given target measure by approximating it as a whole and even staying away from elements already drawn from that distribution. This ensures a better diversity of the produced objects. The method is tested on classic machine learning benchmarks.

Gabriel Turinici
Improving Portfolio Performance Using a Novel Method for Predicting Financial Regimes

This work extends a previous work in regime detection, which allowed trading positions to be profitably adjusted when a new regime was detected, to ex ante prediction of regimes, leading to substantial performance improvements over the earlier model, over all three asset classes considered (equities, commodities, and foreign exchange), over a test period of four years. The proposed new model is also benchmarked over this same period against a hidden Markov model, the most popular current model for financial regime prediction, and against an appropriate index benchmark for each asset class, in the case of the commodities model having a test period cost-adjusted cumulative return over four times higher than that expected from the index. Notably, the proposed model makes use of a contrarian trading strategy, not uncommon in the financial industry but relatively unexplored in machine learning models. The model also makes use of frequent short positions, something not always desirable to investors due to issues of both financial risk and ethics; however, it is discussed how further work could remove this reliance on shorting and allow the construction of a long-only version of the model.

Piotr Pomorski, Denise Gorse
Ökolopoly: Case Study on Large Action Spaces in Reinforcement Learning

Ökolopoly is a serious game developed by biochemist Frederic Vester with the goal to enhance understanding of interactions in complex systems. Due to its vast observation and action spaces, it presents a challenge for Deep Reinforcement Learning (DRL). In this paper, we make the board game available as a reinforcement learning environment and compare different methods of making the large spaces manageable. Our aim is to determine the conditions under which DRL agents are able to learn this game from self-play. To this goal we implement various wrappers to reduce the observation and action spaces, and to change the reward structure. We train PPO, SAC, and TD3 agents on combinations of these wrappers and compare their performance. We analyze the contribution of different representations of observation and action spaces to successful learning and the possibility of steering the DRL agents’ gameplay by shaping reward functions.

Raphael C. Engelhardt, Ralitsa Raycheva, Moritz Lange, Laurenz Wiskott, Wolfgang Konen
Alternating Mixed-Integer Programming and Neural Network Training for Approximating Stochastic Two-Stage Problems

The presented work addresses two-stage stochastic programs (2SPs), a broadly applicable model to capture optimization problems subject to uncertain parameters with adjustable decision variables. In case the adjustable or second-stage variables contain discrete decisions, the corresponding 2SPs are known to be $$\textrm{NP}$$ NP -complete. The standard approach of forming a single-stage deterministic equivalent problem can be computationally challenging even for small instances, as the number of variables and constraints scales with the number of scenarios. To avoid forming a potentially huge MILP problem, we build upon an approach of approximating the expected value of the second-stage problem by a neural network (NN) and encoding the resulting NN into the first-stage problem. The proposed algorithm alternates between optimizing the first-stage variables and retraining the NN. We demonstrate the value of our approach with the example of computing operating points in power systems by showing that the alternating approach provides improved first-stage decisions and a tighter approximation between the expected objective and its neural network approximation.

Jan Kronqvist, Boda Li, Jan Rolfes, Shudian Zhao
Heaviest and Densest Subgraph Computation for Binary Classification. A Case Study

This article presents a novel network-based data classification method. The classification problem is discussed as a graph theoretical problem. A real-valued data first is transformed to an undirected graph, and then the heaviest and densest subgraphs are detected based on an ant colony optimization approach. Numerical experiments conducted on a real-valued dataset show the potential of the proposed approach.

Zoltán Tasnádi, Noémi Gaskó
SMBOX: A Scalable and Efficient Method for Sequential Model-Based Parameter Optimization

The application of Machine Learning (ML) algorithms continues to grow and shows no signs of slowing down. Each ML algorithm has an associated set of hyperparameter values that need to be set to achieve the best performance for each problem. The task of selecting the best parameter values for the problem at hand is known as Hyperparameter Optimisation (HPO). Traditionally this has been carried out manually or by unguided programmatic approaches such as grid or random search. These approaches can be extremely time-consuming and inefficient, especially when dealing with more than a handful of parameters. More advanced methods involving Evolutionary Heuristics [23] or Bayesian Optimisation [17, 28] use a guided search approach and are widely considered as the gold standard approach for hyperparameter optimisation.In this paper, we introduce SMBOX ( https://github.com/smbox/smbox ), a novel HPO search strategy developed to rival the state-of-the-art, SMAC [15]. Our benchmarking on public classification datasets, against both SMAC and a Random search baseline, shows that SMBOX not only challenges SMAC in tuning hyperparameters for two prevalent ML algorithms, but it also excels in finding good hyperparameter values quicker than SMAC. This rapid optimisation capability is extremely powerful, particularly in situations where time or computational resources are constrained or costly.

Tarek Salhi, John Woodward
Accelerated Graph Integration with Approximation of Combining Parameters

Graph-based models offer the advantage of handling data that resides on irregular and complex structures. From various models for graph-structured data, graph-based semi-supervised learning (SSL) with label propagation has shown promising results in numerous applications. Meanwhile, with the rapid growth in the availability of data, there exist multiple relations for the same set of data points. Each relation contains complementary information to one another, and it would be beneficial to integrate all the available information. Such integration can be translated to finding an optimal combination of the graphs, and several studies have been conducted. Previous works, however, incur high computation time with a complex design of the learning process. This leads to a low capacity of applicability in multiple cases. To circumvent the difficulty, we propose an SSL-based fast graph integration method that employs approximation in the maximum likelihood estimation process of finding the combination. The proposed approximation utilizes the connection between the co-variance and its Neumann series, which allows us to avoid explicit matrix inversion. Empirically, the proposed method achieves competitive performance with significant improvements in computational time when compared to other integration methods.

Taehwan Yun, Myung Jun Kim, Hyunjung Shin
Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-visual Environments: A Comparison

Real-world reinforcement learning (RL) environments, whether in robotics or industrial settings, often involve non-visual observations and require not only efficient but also reliable and thus interpretable and flexible RL approaches. To improve efficiency, agents that perform state representation learning with auxiliary tasks have been widely studied in visual observation contexts. However, for real-world problems, dedicated representation learning modules that are decoupled from RL agents are more suited to meet requirements. This study compares common auxiliary tasks based on, to the best of our knowledge, the only decoupled representation learning method for low-dimensional non-visual observations. We evaluate potential improvements in sample efficiency and returns for environments ranging from a simple pendulum to a complex simulated robotics task. Our findings show that representation learning with auxiliary tasks only provides performance gains in sufficiently complex environments and that learning environment dynamics is preferable to predicting rewards. These insights can inform future development of interpretable representation learning approaches for non-visual observations and advance the use of RL solutions in real-world scenarios.

Moritz Lange, Noah Krystiniak, Raphael C. Engelhardt, Wolfgang Konen, Laurenz Wiskott
A Hybrid Steady-State Genetic Algorithm for the Minimum Conflict Spanning Tree Problem

This paper studies a hybrid approach for the minimum conflict spanning tree (MCST) problem, where the MCST problem deals with finding a spanning tree (T) with the minimum number of conflicting edge-pairs. The problem finds some important real-world applications. In this hybrid approach (hSSGA), a steady-state genetic algorithm generates a child solution with the help of crossover operator and mutation operator which are applied in a mutually exclusive way, and the generated child solution is further improved through a local search based on reduction of conflicting edge-pairs. The proposed crossover operator is problem-specific operator that attempt to create a fitter child solution. All components of SSGA and local search effectively coordinate in finding a conflict-free solution or a solution with a minimal number of conflicting edge-pairs. Experimental results, particularly, on available 12 instances of type 1 benchmark instances whose conflict solutions are not known show that the proposed hybrid approach hSSGA is able to find better solution quality in comparison to state-of-the-art approaches. Also, hSSGA discovers new values on 8 instances out of 12 instances of type 1.

Punit Kumar Chaubey, Shyam Sundar
Reinforcement Learning for Multi-Neighborhood Local Search in Combinatorial Optimization

This study investigates the application of reinforcement learning for the adaptive tuning of neighborhood probabilities in stochastic multi-neighborhood search. The aim is to provide a more flexible and robust tuning method for heterogeneous scenarios than traditional offline tuning. We propose a novel mix of learning components for multi-neighborhood Simulated Annealing, which considers both cost- and time-effectiveness of moves. To assess the performance of our approach we employ two real-world case studies in timetabling, namely examination timetabling and sports timetabling, for which multi-neighborhood Simulated Annealing has already obtained remarkable results using offline tuning techniques. Experimental data show that our approach obtains better results than the analogous algorithm that uses state-of-the-art offline tuning on benchmarking datasets while requiring less tuning effort.

Sara Ceschia, Luca Di Gaspero, Roberto Maria Rosati, Andrea Schaerf
Evaluation of Selected Autoencoders in the Context of End-User Experience Management

Empirical research shows that a significant portion of employees regularly faces IT-related challenges in their workplace, resulting in lost productivity, customer dissatisfaction, and increased employee turnover [1]. Although the significant impact of these problems, keeping the IT administration informed about ongoing issues is a major challenge. End-User Experience Management (EUEM) aims to help IT administrators address this problem. For example, in the context of EUEM, telemetry data collected from employees’ devices can help IT administrators to identify potential issues [2]. Machine learning algorithms can automatically detect anomalies in the collected telemetry data, providing IT administration with essential insights to optimize the end-user experience [2]. This paper examines the advantages and disadvantages of three different autoencoder-based algorithms identified in the literature as well-suited for detecting anomalies applied in this paper to hardware telemetry: Autoencoder (AE), Variational Autoencoder (VAE), and Deep Autoencoding Gaussian Mixture Model (DAEGMM). The results show that all three models provide anomaly detection in hardware telemetry data, though with significant differences. While the AE is the fastest Algorithm, the VAE offers the most stable results. The DAEGMM provides the best separation of endpoints into outliers and normal data points but has the most extended runtime. For all models, data aggregation has a significant potential for data reduction by aggregating the measurements over a longer time interval.

Sven Beckmann, Bernhard Bauer
Application of Multi-agent Reinforcement Learning to the Dynamic Scheduling Problem in Manufacturing Systems

Most recent research in reinforcement learning (RL) has dem-onstrated remarkable results on complex strategic planning problems. Especially popular have become approaches which incorporate multiple agents to complete complex tasks in a cooperative manner. However, the application of multi-agent reinforcement learning (MARL) to manufacturing problems, such as the production scheduling problem, has been less frequently addressed and remains a challenge for current research. A major reason is that applications to the manufacturing domain are typically characterized by specific requirements, and impose the research community with major difficulties in terms of implementation. MARL has the capability to solve complex problems with enhanced performance in comparison with traditional methods. The main objective of this paper is to implement feasible MARL algorithms to solve the problem of dynamic scheduling in manufacturing systems using a model factory as an example. We focus on optimizing the performance of the scheduling task, which is mainly reflected in the maskspan. We obtained more stable and enhanced performance in our experiments with algorithms based on the on-policy policy gradient methods. Therefore, this study also investigates the promising and state-of-the-art single-agent reinforcement learning algorithms based on the on-policy method, including Asynchronous Advantage Actor-Critic, Proximal Policy Optimization, and Recurrent Proximal Policy Optimization, and compares the results with those of MARL. The findings illustrate that RL was indeed successful in converging to optimal solutions that are ahead of the traditional heuristic methods for dealing with the complex problem of scheduling under uncertain conditions.

David Heik, Fouad Bahrpeyma, Dirk Reichelt
Solving Mixed Influence Diagrams by Reinforcement Learning

While efficient optimisation methods exist for problems with special properties (linear, continuous, differentiable, unconstrained), real-world problems often involve inconvenient complications (constrained, discrete, multi-stage, multi-level, multi-objective). Each of these complications has spawned research areas in Artificial Intelligence and Operations Research, but few methods are available for hybrid problems. We describe a reinforcement learning-based solver for a broad class of discrete problems that we call Mixed Influence Diagrams, which may have multiple stages, multiple agents, multiple non-linear objectives, correlated chance variables, exogenous and endogenous uncertainty, constraints (hard, soft and chance) and partially observed variables. We apply the solver to problems taken from stochastic programming, chance-constrained programming, limited-memory influence diagrams, multi-level and multi-objective optimisation. We expect the approach to be useful on new hybrid problems for which no specialised solution methods exist.

S. D. Prestwich
Multi-scale Heat Kernel Graph Network for Graph Classification

Graph neural networks (GNNs) have been shown to be useful in a variety of graph classification tasks, from bioinformatics to social networks. However, most GNNs represent the graph using local neighbourhood aggregation. This mechanism is inherently difficult to learn about the global structure of a graph and does not have enough expressive power to distinguish simple non-isomorphic graphs. To overcome the limitation, here we propose multi-head heat kernel convolution for graph representation. Unlike the conventional approach of aggregating local information from neighbours using an adjacency matrix, the proposed method uses multiple heat kernels to learn the local information and the global structure simultaneously. The proposed algorithm outperforms the competing methods in most benchmark datasets or at least shows comparable performance.

Jong Ho Jhee, Jeongheun Yeon, Yoonshin Kwak, Hyunjung Shin
PROS-C: Accelerating Random Orthogonal Search for Global Optimization Using Crossover

Pure Random Orthogonal Search (PROS) is a parameterless evolutionary algorithm (EA) that has shown superior performance when compared to many existing EAs on well-known benchmark functions with limited search budgets. Its implementation simplicity, computational efficiency, and lack of hyperparameters make it attractive to both researchers and practitioners. However, PROS can be inefficient when the error requirement becomes stringent. In this paper, we propose an extension to PROS, called Pure Random Orthogonal Search with Crossover (PROS-C), which aims to improve the convergence rate of PROS while maintaining its simplicity. We analyze the performance of PROS-C on a class of functions that are monotonically increasing in each single dimension. Our numerical experiments demonstrate that, with the addition of a simple crossover operation, PROS-C consistently and significantly reduces the errors of the obtained solutions on a wide range of benchmark functions. Moreover, PROS-C converges faster than Genetic Algorithms (GA) on benchmark functions when the search budget is tight. The results suggest that PROS-C is a promising algorithm for optimization problems that require high computational efficiency and with a limited search budget.

Bruce Kwong-Bun Tong, Wing Cheong Lau, Chi Wan Sung, Wing Shing Wong
A Multiclass Robust Twin Parametric Margin Support Vector Machine with an Application to Vehicles Emissions

This paper considers the problem of predicting vehicles smog rating by applying a novel Support Vector Machine (SVM) technique. Classical SVM-type models perform a binary classification of the training observations. However, in many real-world applications only two classifying categories may not be enough. For this reason, a new multiclass Twin Parametric Margin Support Vector Machine (TPMSVM) is designed. On the basis of different characteristics, such as engine size and fuel consumption, the model aims to assign each vehicle to a specific smog rating class. To protect the model against uncertainty arising in the measurement procedure, a robust optimization extension of the multiclass TPMSVM model is formulated. Spherical uncertainty sets are considered and a tractable robust counterpart of the model is derived. Experimental results on a real-world dataset show the good performance of the robust formulation.

Renato De Leone, Francesca Maggioni, Andrea Spinelli
LSTM Noise Robustness: A Case Study for Heavy Vehicles

Artificial intelligence (AI) techniques are becoming more and more widespread. This is directly related to technology progress and aspects as the flexibility and adaptability of the algorithms considered, key characteristics that allow their use in the most variegated fields. Precisely the increasing diffusion of these techniques leads to the necessity of evaluating their robustness and reliability. This field is still quite unexplored, especially considering the automotive sector, where the algorithms need to be prepared to answer noise problems in data acquisition. For this reason, a methodology directly linked to previous works in the heavy vehicles field is presented. In particular, the same is focused on the estimation of rollover indexes, one of the main issues in road safety scenarios. The purpose is to expand the cited works, addressing the LSTM networks performance in case of strongly disturbed signals.

Maria Elena Bruni, Guido Perboli, Filippo Velardocchia
Ensemble Clustering for Boundary Detection in High-Dimensional Data

The emergence of novel data collection methods has led to the accumulation of vast amounts of unlabelled data. Discovering well separated groups of data samples through clustering is a critical but challenging task. In recent years various techniques to detect isolated and boundary points have been developed. In this work, we propose a clustering methodology that enables us to discover boundary data effectively, discriminating them from outliers. The proposed methodology utilizes a well established density based clustering method designed for high dimensional data, to develop a new ensemble scheme. The experimental results demonstrate very good performance, indicating that the approach has the potential to be used in diverse domains.

Panagiotis Anagnostou, Nicos G. Pavlidis, Sotiris Tasoulis
Learning Graph Configuration Spaces with Graph Embedding in Engineering Domains

In various domains, engineers face the challenge of optimising system configurations while considering numerous constraints. A common goal is not to identify the best configuration as fast as possible, but rather to find a useful set of very good configurations in a given time for further elaboration by human engineers. Existing techniques for exploring large configuration spaces work well on Euclidean configuration spaces (e.g., with Boolean and numerical configuration decisions). However, it is unclear to what extent they are applicable to configuration problems where solutions are represented as graphs – a common representation in many engineering disciplines. To investigate this problem, we propose an adaptation of existing techniques for Euclidean configurations, to graph configuration spaces by applying graph embedding. We demonstrate the feasibility of this adapted pipeline and conduct a controlled experiment to estimate its efficiency. We apply our approach to a sample case of HVAC (Heating, Ventilation, and Air-Conditioning) systems in 40,000 simulated houses. By first learning the configuration space from a small number of simulations, we can identify 75% of the best configurations within 7,508 simulations compared to 29,725 simulations without our approach. That is a speed-up of 4.0 $$\times $$ × and saves more than 15 days if one simulation takes about one minute, as in our experimental set-up.

Michael Mittermaier, Takfarinas Saber, Goetz Botterweck

Artificial Intelligence and Neuroscience (ACAIN 2023)

Frontmatter
Towards an Interpretable Functional Image-Based Classifier: Dimensionality Reduction of High-Density Diffuse Optical Tomography Data

High-density diffuse optical tomography (HD-DOT) is a wearable neuroimaging method that demonstrates high temporal and spatial resolution. While this data contains far richer information as a result, the high dimensionality and presence of complicated interconnections between data points requires the use of dimensionality reduction techniques to simplify the predictive modelling task without eliminating meaningful data features. To interrogate the possibility of designing a physiologically relevant HD-DOT feature set, cortical parcellations were applied to reconstructed images of brain activity to reduce the data dimensionality. A preliminary assessment of the predictive power of these parcel features was conducted on two binary tasks, with reasonable accuracies being achieved using standard classification models. Our results also demonstrated high spatial signal reproducibility across participants, which is promising for the application of image-based classification models that rely on spatial similarities to define separable class boundaries. These results provide insight into how the increased spatial resolution of HD-DOT can be leveraged to perform more accurate classification of neural data.

Sruthi Srinivasan, Emilia Butters, Flavia Mancini, Gemma Bale
On Ensemble Learning for Mental Workload Classification

The ability to determine a subject’s Mental Work Load (MWL) has a wide range of significant applications within modern working environments. In recent years, techniques such as Electroencephalography (EEG) have come to the forefront of MWL monitoring by extracting signals from the brain that correlate strongly to the workload of a subject. To effectively classify the MWL of a subject via their EEG data, prior works have employed machine and deep learning models. These studies have primarily utilised single-learner models to perform MWL classification. However, given the significance of accurately detecting a subject’s MWL for use in practical applications, steps should be taken to assess how we can increase the accuracy of these systems so that they are robust enough for use in real-world scenarios. Therefore, in this study, we investigate if the use of state-of-the-art ensemble learning strategies can improve performance over individual models. As such, we apply Bagging and Stacking ensemble techniques to the STEW dataset to classify “low”, “medium”, and “high” workload levels using EEG data. We also explore how different model compositions impact performance by modifying the type and quantity of models within each ensemble. The results from this study highlight that ensemble networks are capable of improving upon the accuracy of all their individual learner counterparts whilst reducing the variance of predictions, with our highest scoring model being a stacking BLSTM consisting of 8 learners, which achieved a classification accuracy of 97%.

Niall McGuire, Yashar Moshfeghi
Decision-Making over Compact Preference Structures

We consider a scenario where a user must make a set of correlated decisions and we propose a computational cognitive model of the deliberation process. We assume the user compactly expresses her preferences via soft constraints and we study how a psychology-based model of human decision-making, namely Multi-Alternative Decision Field Theory (MDFT), can be applied in this context. We design and study sequential and synchronous procedures which combine local decision-making on each variable, with constraint propagation, as well as a one-shot approach. Our experimental results, which focus on tree-shaped Fuzzy Constraint Satisfaction Problems, suggest that decomposing the decision process along the preference structure allows to find solutions of high quality in terms of preferences, maintains MDFT’s ability to replicate behavioral effects and is more efficient in terms of computational cost.

Andrea Martin, Kristen Brent Venable
User-Like Bots for Cognitive Automation: A Survey

Software bots have attracted increasing interest and popularity in both research and society. Their contributions span automation, digital twins, game characters with conscious-like behavior, and social media. However, there is still a lack of intelligent bots that can adapt to the variability and dynamic nature of digital web environments. Unlike human users, they have difficulty understanding and exploiting the affordances across multiple virtual environments.Despite the hype, bots with human user-like cognition do not currently exist. Chatbots, for instance, lack situational awareness on the digital platforms where they operate, preventing them from enacting meaningful and autonomous intelligent behavior similar to human users.In this survey, we aim to explore the role of cognitive architectures in supporting efforts towards engineering software bots with advanced general intelligence. We discuss how cognitive architectures can contribute to creating intelligent software bots. Furthermore, we highlight key architectural recommendations for the future development of autonomous, user-like cognitive bots.

Habtom Kahsay Gidey, Peter Hillmann, Andreas Karcher, Alois Knoll
On Channel Selection for EEG-Based Mental Workload Classification

Electroencephalogram (EEG) is a non-invasive technology with high temporal resolution, widely used in Brain-Computer Interfaces (BCIs) for mental workload (MWL) classification. However, numerous EEG channels in current devices can make them bulky, uncomfortable, and time-consuming to operate in real-life scenarios. A Riemannian geometry approach has gained attention for channel selection to address this issue. In particular, Riemannian geometry employs covariance matrices of EEG signals to identify the optimal set of EEG channels, given a specific covariance estimator and desired channel number. However, previous studies have not thoroughly assessed the limitations of various covariance estimators, which may influence the analysis results. In this study, we aim to investigate the impact of different covariance estimators, namely Empirical Covariance (EC), Shrunk Covariance (SC), Ledoit-Wolf (LW), and Oracle Approximating Shrinkage (OAS), along with the influence of channel numbers on the process of EEG channel selection. We also examine the performance of selected channels using diverse deep learning models, namely Stacked Gated Recurrent Unit (GRU), Bidirectional Gated Recurrent Unit (BGRU), and BGRU-GRU models, using a publicly available MWL EEG dataset. Our findings show that although no universally optimal channel number exists, employing as few as four channels can achieve an accuracy of 0.940 (±0.036), enhancing practicality for real-world applications. In addition, we discover that the BGRU model, when combined with OAS covariance estimators and a 32-channel configuration, demonstrates superior performance in MWL classification tasks compared to other estimator combinations. Indeed, this study provides insights into the effectiveness of various covariance estimators and the optimal channel subsets for highly accurate MWL classification. These findings can potentially advance the development of EEG-based BCI applications.

Kunjira Kingphai, Yashar Moshfeghi
What Song Am I Thinking Of?

Information Need (IN) is a complex phenomenon due to the difficulty experienced when realising and formulating it into a query format. This leads to a semantic gap between the IN and its representation (e.g., the query). Studies have investigated techniques to bridge this gap by using neurophysiological features. Music Information Retrieval (MIR) is a sub-field of IR that could greatly benefit from bridging the gap between IN and query, as songs present an acute challenge for IR systems. A searcher may be able to recall/imagine a piece of music they wish to search for but still need to remember key pieces of information (title, artist, lyrics) used to formulate a query that an IR system can process. Although, if a MIR system could understand the imagined song, it may allow the searcher to satisfy their IN better. As such, in this study, we aim to investigate the possibility of detecting pieces from Electroencephalogram (EEG) signals captured while participants “listen” to or “imagine” songs. We employ six machine learning models on the publicly available data set, OpenMIIR. In the model training phase, we devised several experiment scenarios to explore the capabilities of the models to determine the potential effectiveness of Perceived and Imagined EEG song data in a MIR system. Our results show that, firstly, we can detect perceived songs using the recorded brain signals, with an accuracy of 62.0% (SD 5.4%). Furthermore, we classified imagined songs with an accuracy of 60.8% (SD 13.2%). Insightful results were also gained from several experiment scenarios presented within this paper. Overall, the encouraging results produced by this study are a crucial step towards information retrieval systems capable of interpreting INs from the brain, which can help alleviate the semantic gap’s negative impact on information retrieval.

Niall McGuire, Yashar Moshfeghi
Path-Weights and Layer-Wise Relevance Propagation for Explainability of ANNs with fMRI Data

The application of artificial neural networks (ANNs) to functional magnetic resonance imaging (fMRI) data has recently gained renewed attention for signal analysis, modeling the underlying processes, and knowledge extraction. Although adequately trained ANNs characterize by high predictive performance, the intrinsic models tend to be inscrutable due to their complex architectures. Still, explainable artificial intelligence (xAI) looks to find methods that can help to delve into ANNs’ structures and reveal which inputs most contribute to correct predictions and how the networks unroll calculations until the final decision.Several methods have been proposed to explain the black-box ANNs’ decisions, with layer-wise relevance propagation (LRP) being the current state-of-the-art. This study aims to investigate the consistency between LRP-based and path-weight-based analysis and how the network’s pruning and retraining processes affect each method in the context of fMRI data analysis.The procedure is tested with fMRI data obtained in a motor paradigm. Both methods were applied to a fully connected ANN, and to pruned and retrained versions. The results show that both methods agree on the most relevant inputs for each stimulus. The pruning process did not lead to major disagreements. Retraining affected both methods similarly, exacerbating the changes initially observed in the pruning process. Notably, the inputs retained for the ultimate ANN are in accordance with the established neuroscientific literature concerning motor action in the brain, validating the procedure and explaining methods. Therefore, both methods can yield valuable insights for understanding the original fMRI data and extracting knowledge.

José Diogo Marques dos Santos, José Paulo Marques dos Santos
Sensitivity Analysis for Feature Importance in Predicting Alzheimer’s Disease

Artificial Intelligence (AI) classifier models based on Deep Neural Networks (DNN) have demonstrated superior performance in medical diagnostics. However, DNN models are regarded as “black boxes” as they are not intrinsically interpretable and, thus, are reluctantly considered for deployment in healthcare and other safety-critical domains. In such domains explainability is considered a fundamental requisite to foster trust and acceptability of automatic decision-making processes based on data-driven machine learning models. To overcome this limitation, DNN models require additional and careful post-processing analysis and evaluation to generate suitable explainability of their predictions. This paper analyses a DNN model developed for predicting Alzheimer’s Disease to generate and assess explainability analysis of the predictions based on feature importance scores computed using sensitivity analysis techniques. In this study, a high dimensional dataset was obtained from Magnetic Resonance Imaging of the brain for healthy subjects and for Alzheimer’s Disease patients. The dataset was annotated with two labels, Alzheimer’s Disease (AD) and Cognitively Normal (CN), which were used to build and test a DNN model for binary classification. Three Global Sensitivity Analysis (G-SA) methodologies (Sobol, Morris, and FAST) as well as the SHapley Additive exPlanations (SHAP) were used to compute feature importance scores. The results from these methods were evaluated for their usefulness to explain the classification behaviour of the DNN model. The feature importance scores from sensitivity analysis methods were assessed and combined based on similarity for robustness. The results indicated that features related to specific brain regions (e.g., the hippocampal sub-regions, the temporal horn of the lateral ventricle) can be considered very important in predicting Alzheimer’s Disease. The findings are consistent with earlier results from the relevant specialised literature on Alzheimer’s Disease. The proposed explainability approach can facilitate the adoption of black-box classifiers, such as DNN, in medical and other application domains.

Akhila Atmakuru, Giuseppe Di Fatta, Giuseppe Nicosia, Ali Varzandian, Atta Badii
A Radically New Theory of How the Brain Represents and Computes with Probabilities

It is widely believed that the brain implements probabilistic reasoning and that it represents information via some form of population (distributed) code. Most prior probabilistic population coding (PPC) theories share basic properties: 1) continuous-valued units; 2) fully/densely distributed codes; 3) graded synapses; 4) rate coding; 5) units have innate low-complexity, usually unimodal, tuning functions (TFs); and 6) units are intrinsically noisy and noise is generally considered harmful. I describe a radically different theory that assumes: 1) binary units; 2) sparse distributed codes (SDC); 3) functionally binary synapses; 4) a novel, atemporal, combinatorial spike code; 5) units initially have flat TFs (all weights zero); and 6) noise is a controlled resource used to cause similar inputs to be mapped to similar codes. The theory, Sparsey, was introduced 25 + years ago as: a) an explanation of the physical/computational relationship of episodic and semantic memory for the spatiotemporal (sequential) pattern domain; and b) a canonical, mesoscale cortical probabilistic circuit/algorithm possessing fixed-time, unsupervised, single-trial, non-optimization-based, unsupervised learning and fixed-time best-match (approximate) retrieval; but was not described in terms of probabilistic computation. Here, we show that: a) the active SDC in a Sparsey coding field (CF) simultaneously represents not only the likelihood of the single most likely input but the likelihoods of all hypotheses stored in the CF; and b) that entire explicit distribution can be transmitted, e.g., to a downstream CF, via a set of simultaneous single spikes from the neurons comprising the active SDC.

Gerard Rinkus
Backmatter
Metadaten
Titel
Machine Learning, Optimization, and Data Science
herausgegeben von
Giuseppe Nicosia
Varun Ojha
Emanuele La Malfa
Gabriele La Malfa
Panos M. Pardalos
Renato Umeton
Copyright-Jahr
2024
Electronic ISBN
978-3-031-53966-4
Print ISBN
978-3-031-53965-7
DOI
https://doi.org/10.1007/978-3-031-53966-4

Premium Partner