Skip to main content
main-content

Über dieses Buch

This book constitutes revised selected papers from the First International Workshop on Machine Learning, Optimization, and Big Data, MOD 2015, held in Taormina, Sicily, Italy, in July 2015.
The 32 papers presented in this volume were carefully reviewed and selected from 73 submissions. They deal with the algorithms, methods and theories relevant in data science, optimization and machine learning.

Inhaltsverzeichnis

Frontmatter

Learning with Discrete Least Squares on Multivariate Polynomial Spaces Using Evaluations at Random or Low-Discrepancy Point Sets

We review the results achieved in previous works [1, 2, 6, 8, 10–12] concerning the analysis of stability and accuracy of discrete least-squares approximation on multivariate polynomial spaces with noiseless evaluations at random points, and the results from [9] concerning the case of noiseless evaluations at low-discrepancy point sets. Afterwards, we present some numerical examples that confirm our theoretical findings and give some insights on their potential applications. The purpose of the numerical section is twofold: on the one hand we compare the performance of discrete least squares using random points versus low-discrepancy points; on the other hand we point out further directions of research, by showing what happens when we choose fewer evaluation points than those prescribed by our theoretical analysis.

Giovanni Migliorati

Automatic Tuning of Algorithms Through Sensitivity Minimization

Parameters tuning is a crucial step in global optimization. In this work, we present a novel method, the Sensitive Algorithmic Tuning, which finds near-optimal parameter configurations through sensitivity minimization. The experimental results highlight the effectiveness and robustness of this novel approach.

Piero Conca, Giovanni Stracquadanio, Giuseppe Nicosia

Step Down and Step Up Statistical Procedures for Stock Selection with Sharp Ratio

Stock selection by Sharp ratio is considered in the framework of multiple statistical hypotheses testing theory. The main attention is paid to comparison of Holm step down and Hochberg step up procedures for different loss functions. Comparison is made on the basis of conditional risk as a function of selection threshold. This approach allows to discover that properties of procedures depend not only on relationship between test statistics, but also depend on dispersion of Sharp ratios. Difference in error rate between two procedures is increasing when the concentration of Sharp ratios is increasing. When Sharp ratios do not have a concentration points there is no significant difference in quality of both procedures.

A. P. Koldanov, V. A. Kalyagin, P. M. Pardalos

Differentiating the Multipoint Expected Improvement for Optimal Batch Design

This work deals with parallel optimization of expensive objective functions which are modelled as sample realizations of Gaussian processes. The study is formalized as a Bayesian optimization problem, or continuous multi-armed bandit problem, where a batch of $$q > 0$$ arms is pulled in parallel at each iteration. Several algorithms have been developed for choosing batches by trading off exploitation and exploration. As of today, the maximum Expected Improvement (EI) and Upper Confidence Bound (UCB) selection rules appear as the most prominent approaches for batch selection. Here, we build upon recent work on the multipoint Expected Improvement criterion, for which an analytic expansion relying on Tallis’ formula was recently established. The computational burden of this selection rule being still an issue in application, we derive a closed-form expression for the gradient of the multipoint Expected Improvement, which aims at facilitating its maximization using gradient-based ascent algorithms. Substantial computational savings are shown in application. In addition, our algorithms are tested numerically and compared to state-of-the-art UCB-based batch-sequential algorithms. Combining starting designs relying on UCB with gradient-based EI local optimization finally appears as a sound option for batch design in distributed Gaussian Process optimization.

Sébastien Marmin, Clément Chevalier, David Ginsbourger

Dynamic Detection of Transportation Modes Using Keypoint Prediction

This paper proposes an approach that makes logical knowledge-based decisions, to determine the transportation mode a person is using in real-time. The focus is set to the detection of different public transportation modes. Hereby it is analyzed how additional contextual information can be used to improve the decision making process. The methodology implemented is capable to differentiate between different modes of transportation including walking, driving by car, taking the bus, tram and (suburbain) trains. The implemented knowledge-based system is based on the idea of Keypoints, which provide contextual information about the environment. The proposed algorithm reached an accuracy of about 95 %, which outclasses other methodologies in detecting the different public transportation modes a person is currently using.

Olga Birth, Aaron Frueh, Johann Schlichter

Effect of the Dynamic Topology on the Performance of PSO-2S Algorithm for Continuous Optimization

PSO-2S is a multi-swarm PSO algorithm using charged particles in a partitioned search space for continuous optimization problems. In order to improve the performance of PSO-2S, this paper proposes a novel variant of this algorithm, called DPSO-2S, which uses the Dcluster neighborhood topologies to organize the communication networks between the particles. Experiments were conducted on a set of classical benchmark functions. The obtained results prove the effectiveness of the proposed algorithm.

Abbas El Dor, Patrick Siarry

Heuristic for Site-Dependent Truck and Trailer Routing Problem with Soft and Hard Time Windows and Split Deliveries

In this paper we develop an iterative insertion heuristic for a site-dependent truck and trailer routing problem with soft and hard time windows and split deliveries. In the considered problem a truck can leave its trailer for unloading or parking, make a truck-subtour to serve truck-customers, and return back to take the trailer. This can be done several times in one route. In our heuristic every route is constructed by sequentially inserting customers to it in the way similar to Solomon’s (1987) approach developed for simple vehicle routes. Our contributions include: heuristic insertion procedure for complex truck and trailer routes with transshipment locations; efficient randomized mechanisms for choosing the first customer for insertion, for making time window violations, and for making split-deliveries; an improvement procedure shifting deliveries in a route to earlier time; an efficient approach dealing with site-dependency feature based on the transportation problem in case of arbitrary intersecting vehicle sets and a fast vehicle assignment procedure in case of nested vehicle sets.

Mikhail Batsyn, Alexander Ponomarenko

Cross-Domain Matrix Factorization for Multiple Implicit-Feedback Domains

Cross-domain recommender systems represent an emerging research topic as users generally have interactions with items from multiple domains. One goal of a cross-domain recommender system is to improve the quality of recommendations in a target domain by using user preference information from other source domains. We observe that, in many applications, users interact with items of different types (e.g., artists and tags). Each recommendation problem, for example, recommending artists or recommending tags, can be seen as a different task, or, in general, a different domain. Furthermore, for such applications, explicit feedback may not be available, while implicit feedback is readily available. To handle such applications, in this paper, we propose a novel cross-domain collaborative filtering approach, based on a regularized latent factor model, to transfer knowledge between source and target domains with implicit feedback. More specifically, we identify latent user and item factors in the source domains, and transfer the user factors to the target, while controlling the amount of knowledge transferred through regularization parameters. Experimental results on six target recommendation tasks (or domains) from two real-world applications show the effectiveness of our approach in improving target recommendation accuracy as compared to state-of-the-art single-domain collaborative filtering approaches. Furthermore, preliminary results also suggest that our approach can handle varying percentages of user overlap between source and target domains.

Rohit Parimi, Doina Caragea

Advanced Metamodeling Techniques Applied to Multidimensional Applications with Piecewise Responses

Due to digital changes in the solution properties of many engineering applications, the model response is described by a piecewise continuous function. Generating continuous metamodels for such responses can provide very poor fits due to the discontinuity in the response. In this paper, a new smart sampling approach is proposed to generate high quality metamodels for such piecewise responses. The proposed approach extends the Sequential Approximate Optimization (SAO) procedure, which uses the Radial Basis Function Network (RBFN). It basically generates accurate metamodels iteratively by adding new sampling points, to approximate responses with discrete changes. The new sampling points are added in the sparse region of the feasible (continuous) domain to achieve a high quality metamodel and also next to the discontinuity to refine the uncertainty area between the feasible and non-feasible domain. The performance of the approach is investigated through two numerical examples, a two- dimensional analytical function and a laser epoxy cutting simulation model.

Toufik Al Khawli, Urs Eppelt, Wolfgang Schulz

Alternating Direction Method of Multipliers for Regularized Multiclass Support Vector Machines

The support vector machine (SVM) was originally designed for binary classifications. A lot of effort has been put to generalize the binary SVM to multiclass SVM (MSVM) which are more complex problems. Initially, MSVMs were solved by considering their dual formulations which are quadratic programs and can be solved by standard second-order methods. However, the duals of MSVMs with regularizers are usually more difficult to formulate and computationally very expensive to solve. This paper focuses on several regularized MSVMs and extends the alternating direction method of multiplier (ADMM) to these MSVMs. Using a splitting technique, all considered MSVMs are written as two-block convex programs, for which the ADMM has global convergence guarantees. Numerical experiments on synthetic and real data demonstrate the high efficiency and accuracy of our algorithms.

Yangyang Xu, Ioannis Akrotirianakis, Amit Chakraborty

Tree-Based Response Surface Analysis

Computer-simulated experiments have become a cost effective way for engineers to replace real experiments in the area of product development. However, one single computer-simulated experiment can still take a significant amount of time. Hence, in order to minimize the amount of simulations needed to investigate a certain design space, different approaches within the design of experiments area are used. One of the used approaches is to minimize the time consumption and simulations for design space exploration through response surface modeling. The traditional methods used for this purpose are linear regression, quadratic curve fitting and support vector machines. This paper analyses and compares the performance of four machine learning methods for the regression problem of response surface modeling. The four methods are linear regression, support vector machines, M5P and random forests. Experiments are conducted to compare the performance of tree models (M5P and random forests) with the performance of non-tree models (support vector machines and linear regression) on data that is typical for concept evaluation within the aerospace industry. The main finding is that comprehensible models (the tree models) perform at least as well as or better than traditional black-box models (the non-tree models). The first observation of this study is that engineers understand the functional behavior, and the relationship between inputs and outputs, for the concept selection tasks by using comprehensible models. The second observation is that engineers can also increase their knowledge about design concepts, and they can reduce the time for planning and conducting future experiments.

Siva Krishna Dasari, Niklas Lavesson, Petter Andersson, Marie Persson

A Single-Facility Manifold Location Routing Problem with an Application to Supply Chain Management and Robotics

The location routing problem (LRP), a problem formulated for determining locations of facilities and the vehicle routes operating between these facilities, is the combination of the vehicle routing (VRP) and the facility location problems (FLP) in Euclidean space. The manifold location routing problem (MLRP) is an LRP in a Riemannian manifold setting as introduced in [14]. In seeking further advancements in the solution of LRP, MLRP improves the accuracy of the distance calculations by using geodesic distances. The shortest path distances on Earth’s surface can be determined by calculating geodesic distances in local neighborhoods by using Riemannian geometry. In this work, we advance the theoretical results obtained for MLRP in [14] by incorporating support vector machines (SVM), dynamic programming, parallel programming, data mining, and Geographic Information Systems (GIS). The theory will be explained on a supply chain problem with a robotics paradigm.

Emre Tokgöz, Iddrisu Awudu, Theodore B. Trafalis

An Efficient Many-Core Implementation for Semi-Supervised Support Vector Machines

The concept of semi-supervised support vector machines extends classical support vector machines to learning scenarios, where both labeled and unlabeled patterns are given. In recent years, such semi-supervised extensions have gained considerable attention due to their huge potential for real-world applications with only small amounts of labeled data. While being appealing from a practical point of view, semi-supervised support vector machines lead to a combinatorial optimization problem that is difficult to address. Many optimization approaches have been proposed that aim at tackling this task. However, the computational requirements can still be very high, especially in case large data sets are considered and many model parameters need to be tuned. A recent trend in the field of big data analytics is to make use of graphics processing units to speed up computationally intensive tasks. In this work, such a massively-parallel implementation is developed for semi-supervised support vector machines. The experimental evaluation, conducted on commodity hardware, shows that valuable speed-ups of up to two orders of magnitude can be achieved over a standard single-core CPU execution.

Fabian Gieseke

Intent Recognition in a Simulated Maritime Multi-agent Domain

Intent recognition is the process of determining the action an agent is about to take, given a sequence of past actions. In this paper, we propose a method for recognizing intentions in highly populated multi-agent environments. Low-level intentions, representing basic activities, are detected through a novel formulation of Hidden Markov Models with perspective-taking capabilities. Higher level intentions, involving multiple agents, are detected with a distributed architecture that uses activation spreading between nodes to detect the most likely intention of the agents. The solution we propose brings the following main contributions: (i) it enables early recognition of intentions before they are being realized, (ii) it has real-time performance capabilities, and (iii) it can detect both single agent as well as joint intentions of a group of agents. We validate our framework in an open source naval ship simulator, the context of recognizing threatening intentions against naval ships. Our results show that our system is able to detect intentions early and with high accuracy.

Mohammad Taghi Saffar, Mircea Nicolescu, Monica Nicolescu, Daniel Bigelow, Christopher Ballinger, Sushil Louis

An Adaptive Classification Framework for Unsupervised Model Updating in Nonstationary Environments

This paper introduces an adaptive framework that makes use of ensemble classification and self-training to maintain high classification performance in datasets affected by concept drift without the aid of external supervision to update the model of a classifier. The updating of the model of the framework is triggered by a mechanism that infers the presence of concept drift based on the analysis of the differences between the outputs of the different classifiers. In order to evaluate the performance of the proposed algorithm, comparisons were made with a set of unsupervised classification techniques and drift detection techniques. The results show that the framework is able to react more promptly to performance degradation than the existing methods and this leads to increased classification accuracy. In addition, the framework stores a smaller amount of instances with respect to a single-classifier approach.

Piero Conca, Jon Timmis, Rogério de Lemos, Simon Forrest, Heather McCracken

Global Optimization with Sparse and Local Gaussian Process Models

We present a novel surrogate model-based global optimization framework allowing a large number of function evaluations. The method, called SpLEGO, is based on a multi-scale expected improvement (EI) framework relying on both sparse and local Gaussian process (GP) models. First, a bi-objective approach relying on a global sparse GP model is used to determine potential next sampling regions. Local GP models are then constructed within each selected region. The method subsequently employs the standard expected improvement criterion to deal with the exploration-exploitation trade-off within selected local models, leading to a decision on where to perform the next function evaluation(s). The potential of our approach is demonstrated using the so-called Sparse Pseudo-input GP as a global model. The algorithm is tested on four benchmark problems, whose number of starting points ranges from $$10^{2}$$ to $$10^{4}$$. Our results show that SpLEGO is effective and capable of solving problems with large number of starting points, and it even provides significant advantages when compared with state-of-the-art EI algorithms.

Tipaluck Krityakierne, David Ginsbourger

Condense Mixed Convexity and Optimization with an Application in Data Service Optimization

Elements of matrix theory are useful in exploring solutions for optimization, data mining, and big data problems. In particular, mixed integer programming is widely used in data based optimization research that uses matrix theory (see for example [13]). Important elements of matrix theory, such as Hessian matrices, are well studied for continuous (see for example [11]) and discrete [9] functions, however matrix theory for functions with mixed (i.e. continuous and discrete) variables has not been extensively developed from a theoretical perspective. There are many mixed variable functions to be optimized that can make use of a Hessian matrix in various fields of research such as queueing theory, inventory systems, and telecommunication systems. In this work we introduce a mixed Hessian matrix, named condense mixed Hessian matrix, for mixed variable closed form functions $$g: \mathbb {Z}^{n}\times \mathbb {R}^{m}\rightarrow \mathbb {R}$$g:Zn×Rm→R, and the use of this matrix for determining convexity and optimization results for mixed variable functions. These tasks are accomplished by building on the definition of a multivariable condense discrete convex function and the corresponding Hessian matrix that are introduced in [14]. In addition, theoretical condense mixed convexity and optimization results are obtained. The theoretical results are implemented on an M/M/s queueing function that is widely used in optimization, data mining, and big data problems.

Emre Tokgöz, Hillel Kumin

SoC-Based Pattern Recognition Systems for Non Destructive Testing

Non Destructive Testing (NDT) is one of the most important aspect in modern manufacturing companies. Automation of this task improves productivity and reliability of distribution chains. We present an optimized implementation of common pattern recognition algorithms that performs NDT on factory products. To the aim of enhancing the industrial integration, our implementation is highly optimized to work on SoC-based (System on Chip: an integrated circuit that integrates all components of a computer into a single chip.) hardware and we worked with the initial idea of an overall design for these devices. While perfectly working on general purpose SoCs, the best performances are achieved on GPU accelerated ones. We reached the notable performance of a PC-based workstation by exploiting technologies like CUDA and BLAS for embedded SoCs. The test case is a collection of toy scenarios commonly found in manufacturing companies.

Omar Schiaratura, Pietro Ansaloni, Giovanni Lughi, Mattia Neri, Matteo Roffilli, Fabrizio Serpi, Andrea Simonetto

Node-Immunization Strategies in a Stochastic Epidemic Model

The object under study is an epidemic spread of a disease through individuals. A stochastic process is first introduced, inspired in classical Susceptible, Infected and Removed (SIR) model. In order to jeopardize the epidemic spread, two different immunization strategies are proposed. A combinatorial optimization problem is further formalized. The goal is to minimize the effect of the disease spread, choosing a correct immunization strategy, subject to a budget constraint. We are witness of a counter-intuitive result: in non-virulent scenarios, it is better to immunize common individuals rather than communicative ones. A discussion is provided, together with open problems and trends for future work.

Juan Piccini, Franco Robledo, Pablo Romero

An Efficient Numerical Approximation for the Monge-Kantorovich Mass Transfer Problem

The approximation scheme for the Monge-Kantorovich mass transfer problem on compact spaces proposed in [7] is improved. The upgrade presented is inspired on a meta-heuristic algorithm called Scatter Search in order to reduce the dimensionality of the problem. The new approximation scheme solves finite linear programs similar to the transport problem but with lower dimension. A numerical example is presented and compared with the scheme studied in [7].

M. L. Avendaño-Garrido, J. R. Gabriel-Argüelles, L. Quintana-Torres, E. Mezura-Montes

Adaptive Targeting for Online Advertisement

We consider the problem of adaptive targeting for real-time bidding for internet advertisement. This problem involves making fast decisions on whether to show a given ad to a particular user. For intelligent platforms, these decisions are based on information extracted from big data sets containing records of previous impressions, clicks and subsequent purchases. We discuss several strategies for maximizing the click through rate, which is often the main criteria of measuring the success of an advertisement campaign. In the second part of the paper, we provide some results of statistical analysis of real data.

Andrey Pepelyshev, Yuri Staroselskiy, Anatoly Zhigljavsky

Outlier Detection in Cox Proportional Hazards Models Based on the Concordance c-Index

Outliers can have extreme influence on data analysis and so their presence must be taken into account. We propose a method to perform outlier detection on multivariate survival datasets, named Dual Bootstrap Hypothesis Testing (DBHT). Experimental results show that DBHT is a competitive alternative to state-of-the-art methods and can be applied to clinical data.

João Diogo Pinto, Alexandra M. Carvalho, Susana Vinga

Characterization of the $$\#k$$ # k –SAT Problem in Terms of Connected Components

We study the $$\#k$$#k–satisfiability problem, that is the problem of counting the number of different truth assignments which satisfy random Boolean expressions having k variables per clause. We design and implement an exact algorithm, which we will call Star, that solves $$\#k$$#k–SAT problem instances. We characterize the solution space using the connected components of a graph G, that is a subgraph of the n dimensional hypercube representing the search space.

Giuseppe Nicosia, Piero Conca

A Bayesian Network Model for Fire Assessment and Prediction

Smartphones and other wearable computers with modern sensor technologies are becoming more advanced and widespread. This paper proposes exploiting those devices to help the firefighting operation. It introduces a Bayesian network model that infers the state of the fire and predicts its future development based on smartphone sensor data gathered within the fire area. The model provides a prediction accuracy of 84.79 % and an area under the curve of 0.83. This solution had also been tested in the context of a fire drill and proved to help firefighters assess the fire situation and speed up their work.

Mehdi Ben Lazreg, Jaziar Radianti, Ole-Christoffer Granmo

Data Clustering by Particle Swarm Optimization with the Focal Particles

Clustering is an important technique in data mining. In unsupervised clustering, data is divided into several subsets (clusters) without any prior knowledge. Heuristic optimization based clustering algorithms tries to minimize an objective function, generally a clustering validity index, in the search space defined by the dimensions of the data vectors. If the number of the attributes of the data is large, then this will decrease the clustering performance. This study presents a new clustering algorithm, particle swarm optimization with the focal particles (PSOFP). Contrary to the standard particle swarm optimization (PSO) approach, this new clustering technique ensures high quality clustering results without increasing the dimensions of the search space. This new clustering technique handles communication among the particles in a swarm by using multiple focal particles. The number of focal particles equals to the number of clusters. This approach simplifies the candidate solution representation by a particle and therefore reduces the effect of ‘curse of dimensionality’. Performance of the proposed method on the clustering analysis is benchmarked against K-means, K-means++, hybrid PSO and the CLARANS algorithms on five datasets. Experimental results show that the proposed algorithm has an acceptable efficiency and robustness and superior to the benchmark algorithms.

Tarık Küçükdeniz, Şakir Esnaf

Fast and Accurate Steepest-Descent Consistency-Constrained Algorithms for Feature Selection

Realizing a good balance to the fundamental trade-off between accuracy and efficiency has been an important problem of feature selection. The algorithm of Interact was an important breakthrough, and the algorithms of Sdcc and Lcc were stemmed from Interact. Lcc has fixed a certain theoretical drawback of Interact in accuracy, while Sdcc has improved accuracy of Interact by expanding the search space. However, when comparing Sdcc and Lcc, we find that Sdcc can output smaller feature sets with smaller Bayesian risks than Lcc (advantages of Sdcc) but can show only worse classification accuracy when used with classifiers (disadvantages). Furthermore, because Sdcc searches answers in much wider spaces than Lcc, it is a few ten times slower in practice. In this paper, we show two methods to improve Sdcc in both accuracy and efficiency and actually propose two algorithms, namely, Fast Sdcc and Accurate Sdcc. We show through experiments that these algorithms can output further smaller feature sets with better classification accuracy than Sdcc. Their classification accuracy appears better than Lcc. In terms of time complexity, Fast Sdcc and Accurate Sdcc improve Sdcc significantly and are only a few times slower than Lcc.

Adrian Pino Angulo, Kilho Shin

Conceptual Analysis of Big Data Using Ontologies and EER

Large amounts of “big data” are generated every day, many in a “raw” format that is difficult to analyze and mine. This data contains potential hidden meaningful concepts, but much of the data is superfluous and not of interest to the domain experts. Thus, dealing with big raw data solely by applying a set of distributed computing technologies (e.g., MapReduce, BSP [Bulk Synchronous Parallel], and Spark) and/or distributed storage systems, namely NoSQL, is generally not sufficient. Extracting the full knowledge that is hidden in the raw data is necessary to efficiently enable analysis and mining. The data needs to be processed to remove the superfluous parts and generate the meaningful domain-specific concepts. In this paper, we propose a framework that incorporates conceptual modeling and EER principle to effectively extract conceptual knowledge from the raw data so that mining and analysis can be applied to the extracted conceptual data.

Kulsawasd Jitkajornwanich, Ramez Elmasri

A Parallel Consensus Clustering Algorithm

Consensus clustering is a stability-based algorithm with a prediction power far better than other internal measures. Unfortunately, this method is reported to be slow in terms of time and hard to scalability. We presented here consensus clustering algorithm optimized for multi-core processors. We showed that it is possible to obtain scalable performance of the compute-intensive algorithm for high-dimensional data such as gene expression microarrays.

Olgierd Unold, Tadeusz Tagowski

Bandits and Recommender Systems

This paper addresses the on-line recommendation problem facing new users and new items; we assume that no information is available neither about users, nor about the items. The only source of information is a set of ratings given by users to some items. By on-line, we mean that the set of users, and the set of items, and the set of ratings is evolving along time and that at any moment, the recommendation system has to select items to recommend based on the currently available information, that is basically the sequence of past events. We also mean that each user comes with her preferences which may evolve along short and longer scales of time; so we have to continuously update their preferences. When the set of ratings is the only available source of information, the traditional approach is matrix factorization. In a decision making under uncertainty setting, actions should be selected to balance exploration with exploitation; this is best modeled as a bandit problem. Matrix factors provide a latent representation of users and items. These representations may then be used as contextual information by the bandit algorithm to select items. This last point is exactly the originality of this paper: the combination of matrix factorization and bandit algorithms to solve the on-line recommendation problem. Our work is driven by considering the recommendation problem as a feedback controlled loop. This leads to interactions between the representation learning, and the recommendation policy.

Jérémie Mary, Romaric Gaudel, Philippe Preux

Semi-Naive Mixture Model for Consensus Clustering

Consensus clustering is a powerful method to combine multiple partitions obtained through different runs of clustering algorithms. The goal is to achieve a robust and stable partition of the space through a consensus procedure which exploits the diversity of multiple clusterings outputs. Several methods have been proposed to tackle the consensus clustering problem. Among them, the algorithm which models the problem as a mixture of multivariate multinomial distributions in the space of cluster labels gained high attention in the literature. However, to make the problem tractable, the theoretical formulation takes into account a Naive Bayesian conditional independence assumption over the components of the vector space in which the consensus function acts (i.e., the conditional probability of a $$d-$$d-dimensional vector space is represented as the product of conditional probability in an one dimensional feature space). In this paper we propose to relax the aforementioned assumption, heading to a Semi-Naive approach to model some of the dependencies among the components of the vector space for the generation of the final consensus partition. The Semi-Naive approach consists in grouping in a random way the components of the labels space and modeling the conditional density term in the maximum-likelihood estimation formulation as the product of the conditional densities of the finite set of groups composed by elements of the labels space. Experiments are performed to point out the results of the proposed approach.

Marco Moltisanti, Giovanni Maria Farinella, Sebastiano Battiato

Consensus Decision Making in Random Forests

The applications of Random Forests, an ensemble learner, are investigated in different domains including malware classification. Random Forests uses the majority rule for the outcome, however, a decision from the majority rule faces different challenges such as the decision may not be representative or supported by all trees in Random Forests. To address such problems and increase accuracy in decisions, a consensus decision making (CDM) is suggested. The decision mechanism of Random Forests is replaced with the CDM. The updated Random Forests algorithm is evaluated mainly on malware data sets, and results are compared with unmodified Random Forests. The empirical results suggest that the proposed Random Forests, i.e., with CDM performs better than the original Random Forests.

Raja Khurram Shahzad, Mehwish Fatima, Niklas Lavesson, Martin Boldt

Multi-objective Modeling of Ground Deformation and Gravity Changes of Volcanic Eruptions

Inverse modeling of geophysical observations is becoming an important topic in volcanology. The advantage of exploiting innovative inverse methods in volcanology is twofold by providing: a robust tool for the interpretation of the observations and a quantitative model-based assessment of volcanic hazard. This paper re-interprets the data collected during the 1981 eruption of Mt Etna, which offers a good case study to explore and validate new inversion algorithms. Single-objective optimization and multi-objective optimization are here applied in order to improve the fitting of the geophysical observations and better constrain the model parameters. We explore the genetic algorithm NSGA2 and the differential evolution (DE) method. The inverse results provide a better fitting of the model to the geophysical observations with respect to previously published results. In particular, NSGA2 shows low fitting error in electro-optical distance measurements (EDM), leveling and micro-gravity measurements; while the DE algorithm provides a set of solutions that combine low leveling error with low EDM error but that are characterized by a poor capability of minimizing all measures at the same time. The sensitivity of the model to variations of its parameters are investigated by means of the Morris technique and the Sobol’ indices with the aim of identifying the parameters that have higher impact on the model. In particular, the model parameters, which define the sources position, their dip and the porosity of the infiltration zones, are found to be the more sensitive. In addition, being the robustness a good indicator of the quality of a solution, a subset of solutions with good characteristics is selected and their robustness is evaluated in order to identify the more suitable model.

Piero Conca, Gilda Currenti, Giovanni Carapezza, Ciro del Negro, Jole Costanza, Giuseppe Nicosia

Backmatter

Weitere Informationen

Premium Partner

Neuer Inhalt

BranchenIndex Online

Die B2B-Firmensuche für Industrie und Wirtschaft: Kostenfrei in Firmenprofilen nach Lieferanten, Herstellern, Dienstleistern und Händlern recherchieren.

Whitepaper

- ANZEIGE -

Best Practices für die Mitarbeiter-Partizipation in der Produktentwicklung

Unternehmen haben das Innovationspotenzial der eigenen Mitarbeiter auch außerhalb der F&E-Abteilung erkannt. Viele Initiativen zur Partizipation scheitern in der Praxis jedoch häufig. Lesen Sie hier  - basierend auf einer qualitativ-explorativen Expertenstudie - mehr über die wesentlichen Problemfelder der mitarbeiterzentrierten Produktentwicklung und profitieren Sie von konkreten Handlungsempfehlungen aus der Praxis.
Jetzt gratis downloaden!

Bildnachweise