Skip to main content

2016 | Buch

Machine Learning, Optimization, and Big Data

Second International Workshop, MOD 2016, Volterra, Italy, August 26-29, 2016, Revised Selected Papers

herausgegeben von: Panos M. Pardalos, Piero Conca, Giovanni Giuffrida, Giuseppe Nicosia

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

This book constitutes revised selected papers from the Second International Workshop on Machine Learning, Optimization, and Big Data, MOD 2016, held in Volterra, Italy, in August 2016.
The 40 papers presented in this volume were carefully reviewed and selected from 97 submissions. These proceedings contain papers in the fields of Machine Learning, Computational Optimization and DataScience presenting a substantial array of ideas, technologies, algorithms, methods and applications.

Inhaltsverzeichnis

Frontmatter
Machine Learning: Multi-site Evidence-Based Best Practice Discovery

This study establishes interoperability among electronic medical records from 737 healthcare sites and performs machine learning for best practice discovery. A mapping algorithm is designed to disambiguate free text entries and to provide a unique and unified way to link content to structured medical concepts despite the extreme variations that can occur during clinical diagnosis documentation. Redundancy is reduced through concept mapping. A SNOMED-CT graph database is created to allow for rapid data access and queries. These integrated data can be accessed through a secured web-based portal. A classification model (DAMIP) is then designed to uncover discriminatory characteristics that can predict the quality of treatment outcome. We demonstrate system usability by analyzing Type II diabetic patients. DAMIP establishes a classification rule on a training set which results in greater than 80% blind predictive accuracy on an independent set of patients. By including features obtained from structured concept mapping, the predictive accuracy is improved to over 88%. The results facilitate evidence-based treatment and optimization of site performance through best practice dissemination and knowledge transfer.

Eva K. Lee, Yuanbo Wang, Matthew S. Hagen, Xin Wei, Robert A. Davis, Brent M. Egan
Data-Based Forest Management with Uncertainties and Multiple Objectives

In this paper, we present an approach of employing multiobjective optimization to support decision making in forest management planning. The planning is based on data representing so-called stands, each consisting of homogeneous parts of the forest, and simulations of how the trees grow in the stands under different treatment options. Forest planning concerns future decisions to be made that include uncertainty. We employ as objective functions both the expected values of incomes and biodiversity as well as the value at risk for both of these objectives. In addition, we minimize the risk level for both the income value and the biodiversity value. There is a tradeoff between the expected value and the value at risk, as well as between the value at risk of the two objectives of interest and, thus, decision support is needed to find the best balance between the conflicting objectives. We employ an interactive method where a decision maker iteratively provides preference information to find the most preferred management plan and at the same time learns about the interdependencies of the objectives.

Markus Hartikainen, Kyle Eyvindson, Kaisa Miettinen, Annika Kangas
Metabolic Circuit Design Automation by Multi-objective BioCAD

We present a thorough in silico analysis and optimization of the genome-scale metabolic model of the mycolic acid pathway in M. tuberculosis. We apply and further extend meGDMO to account for finer sensitivity analysis and post-processing analysis, thanks to the combination of statistical evaluation of strains robustness, and clustering analysis to map the phenotype-genotype relationship among Pareto optimal strains. In the first analysis scenario, we find 12 Pareto-optimal single gene set knockout, which completely shut down the pathway, hence critically reducing the pathogenicity of M. tuberculosis; as well as 34 genotypically different strains in which the production of mycolic acid is severely reduced.

Andrea Patané, Piero Conca, Giovanni Carapezza, Andrea Santoro, Jole Costanza, Giuseppe Nicosia
A Nash Equilibrium Approach to Metabolic Network Analysis

A novel approach to metabolic network analysis using a Nash Equilibrium formulation is proposed. Enzymes are considered to be players in a multi-player game in which each player attempts to minimize the dimensionless Gibbs free energy associated with the biochemical reaction(s) it catalyzes subject to elemental mass balances. Mathematical formulation of the metabolic network as a set of nonlinear programming (NLP) sub-problems and appropriate solution methodologies are described. A small example representing part of the production cycle for acetyl-CoA is used to demonstrate the efficacy of the proposed Nash Equilibrium framework and show that it represents a paradigm shift in metabolic network analysis.

Angelo Lucia, Peter A. DiMaggio
A Blocking Strategy for Ranking Features According to Probabilistic Relevance

The paper presents an algorithm to rank features in “small number of samples, large dimensionality” problems according to probabilistic feature relevance, a novel definition of feature relevance. Probabilistic feature relevance, intended as expected weak relevance, is introduced in order to address the problem of estimating conventional feature relevance in data settings where the number of samples is much smaller than the number of features. The resulting ranking algorithm relies on a blocking approach for estimation and consists in creating a large number of identical configurations to measure the conditional information of each feature in a paired manner. Its implementation can be made embarrassingly parallel in the case of very large n. A number of experiments on simulated and real data confirms the interest of the approach.

Gianluca Bontempi
A Scalable Biclustering Method for Heterogeneous Medical Data

We define the problem of biclustering on heterogeneous data, that is, data of various types (binary, numeric, etc.). This problem has not yet been investigated in the biclustering literature. We propose a new method, HBC (Heterogeneous BiClustering), designed to extract biclusters from heterogeneous, large-scale, sparse data matrices. The goal of this method is to handle medical data gathered by hospitals (on patients, stays, acts, diagnoses, prescriptions, etc.) and to provide valuable insight on such data. HBC takes advantage of the data sparsity and uses a constructive greedy heuristic to build a large number of possibly overlapping biclusters. The proposed method is successfully compared with a standard biclustering algorithm on small-size numeric data. Experiments on real-life data sets further assert its scalability and efficiency.

Maxence Vandromme, Julie Jacques, Julien Taillard, Laetitia Jourdan, Clarisse Dhaenens
Neural Learning of Heuristic Functions for General Game Playing

The proposed model represents an original approach to general game playing, and aims at creating a player able to develop a strategy using as few requirements as possible, in order to achieve the maximum generality. The main idea is to modify the known minimax search algorithm removing its task-specific component, namely the heuristic function: this is replaced by a neural network trained to evaluate the game states using results from previous simulated matches. A method for simulating matches and extracting training examples from them is also proposed, completing the automatic procedure for the setup and improvement of the model. Part of the algorithm for extracting training examples is the Backward Iterative Deepening Search, a new original search algorithm which aims at finding, in a limited time, a high number of leaves along with their common ancestors.

Leo Ghignone, Rossella Cancelliere
Comparing Hidden Markov Models and Long Short Term Memory Neural Networks for Learning Action Representations

In this paper we are concerned with learning models of actions and compare a purely generative model based on Hidden Markov Models to a discriminatively trained recurrent LSTM network in terms of their properties and their suitability to learn and represent models of actions. Specifically we compare the performance of the two models regarding the overall classification accuracy, the amount of training sequences required and how early in the progression of a sequence they are able to correctly classify the corresponding sequence. We show that, despite the current trend towards (deep) neural networks, traditional graphical model approaches are still beneficial under conditions where only few data points or limited computing power is available.

Maximilian Panzner, Philipp Cimiano
Dynamic Multi-Objective Optimization with jMetal and Spark: A Case Study

Technologies for Big Data and Data Science are receiving increasing research interest nowadays. This paper introduces the prototyping architecture of a tool aimed to solve Big Data Optimization problems. Our tool combines the jMetal framework for multi-objective optimization with Apache Spark, a technology that is gaining momentum. In particular, we make use of the streaming facilities of Spark to feed an optimization problem with data from different sources. We demonstrate the use of our tool by solving a dynamic bi-objective instance of the Traveling Salesman Problem (TSP) based on near real-time traffic data from New York City, which is updated several times per minute. Our experiment shows that both jMetal and Spark can be integrated providing a software platform to deal with dynamic multi-optimization problems.

José A. Cordero, Antonio J. Nebro, Cristóbal Barba-González, Juan J. Durillo, José García-Nieto, Ismael Navas-Delgado, José F. Aldana-Montes
Feature Selection via Co-regularized Sparse-Group Lasso

We propose the co-regularized sparse-group lasso algorithm: a technique that allows the incorporation of auxiliary information into the learning task in terms of “groups” and “distances” among the predictors. The proposed algorithm is particularly suitable for a wide range of biological applications where good predictive performance is required and, in addition to that, it is also important to retrieve all relevant predictors so as to deepen the understanding of the underlying biological process. Our cost function requires related groups of predictors to provide similar contributions to the final response, and thus, guides the feature selection process using auxiliary information. We evaluate the proposed method on a synthetic dataset and examine various settings where its application is beneficial in comparison to the standard lasso, elastic net, group lasso and sparse-group lasso techniques. Last but not least, we make a python implementation of our algorithm available for download and free to use (Available at www.learning-machines.com).

Paula L. Amaral Santos, Sultan Imangaliyev, Klamer Schutte, Evgeni Levin
Economic Lot-Sizing Problem with Remanufacturing Option: Complexity and Algorithms

In a single item dynamic lot-sizing problem, we are given a time horizon and demand for a single item in every time period. The problem seeks a solution that determines how much to produce and carry at each time period, so that we will incur the least amount of production and inventory cost. When the remanufacturing option is included, the input comprises of number of returned products at each time period that can be potentially remanufactured to satisfy the demands, where remanufacturing and inventory costs are applicable. For this problem, we first show that it cannot have a fully polynomial time approximation scheme (FPTAS). We then provide a pseudo-polynomial algorithm to solve the problem and show how this algorithm can be adapted to solve it in polynomial time, when we make certain realistic assumptions on the cost structure. We finally give a computational study for the capacitated version of the problem and provide some valid inequalities and computational results that indicate that they significantly improve the lower bound for a certain class of instances.

Kerem Akartunalı, Ashwin Arulselvan
A Branch-and-Cut Algorithm for a Multi-item Inventory Distribution Problem

This paper considers a multi-item inventory distribution problem motivated by a practical case occurring in the logistic operations of an hospital. There, a single warehouse supplies several nursing wards. The goal is to define a weekly distribution plan of medical products that minimizes the visits to wards, while respecting inventory capacities and safety stock levels. A mathematical formulation is introduced and several improvements such as tightening constraints, valid inequalities and an extended reformulation are discussed. In order to deal with real size instances, an hybrid heuristic based on mathematical models is introduced and the improvements are discussed. A branch-and-cut algorithm using all the discussed improvements is proposed.Finally, a computational experimentation is reported to show the relevance of the model improvements and the quality of the heuristic scheme.

Agostinho Agra, Adelaide Cerveira, Cristina Requejo
Adaptive Targeting in Online Advertisement: Models Based on Relative Influence of Factors

We consider the problem of adaptive targeting for real-time bidding for internet advertisement. This problem involves making fast decisions on whether to show a given ad to a particular user. For demand partners, these decisions are based on information extracted from big data sets containing records of previous impressions, clicks and subsequent purchases. We discuss several criteria which allow us to assess the significance of different factors on probabilities of clicks and conversions. We then devise simple strategies that are based on the use of the most influential factors and compare their performance with strategies that are much more computationally demanding. To make the numerical comparison, we use real data collected by Crimtan in the process of running several recent ad campaigns.

Andrey Pepelyshev, Yuri Staroselskiy, Anatoly Zhigljavsky, Roman Guchenko
Design of Acoustic Metamaterials Through Nonlinear Programming

The dispersive wave propagation in a periodic metamaterial with tetrachiral topology and inertial local resonators is investigated. The Floquet-Bloch spectrum of the metamaterial is compared with that of the tetrachiral beam lattice material without resonators. The resonators can be designed to open and shift frequency band gaps, that is, spectrum intervals in which harmonic waves do not propagate. Therefore, an optimal passive control of the frequency band structure can be pursued in the metamaterial. To this aim, suitable constrained nonlinear optimization problems on compact sets of admissible geometrical and mechanical parameters are stated. According to functional requirements, sets of parameters which determine the largest low-frequency band gap between selected pairs of consecutive branches of the Floquet-Bloch spectrum are soughted for numerically. The various optimization problems are successfully solved by means of a version of the method of moving asymptotes, combined with a quasi-Monte Carlo multi-start technique.

Andrea Bacigalupo, Giorgio Gnecco, Marco Lepidi, Luigi Gambarotta
Driver Maneuvers Inference Through Machine Learning

Inferring driver maneuvers is a fundamental issue in Advanced Driver Assistance Systems (ADAS), which can significantly increase security and reduce the risk of road accidents. This is not an easy task due to a number of factors such as driver distraction, unpredictable events on the road, and irregularity of the maneuvers. In this complex setting, Machine Learning techniques can play a fundamental and leading role to improve driving security. In this paper, we present preliminary results obtained within the Development Platform for Safe and Efficient Drive (DESERVE) European project. We trained a number of classifiers over a preliminary dataset to infer driver maneuvers of Lane Keeping and Lane Change. These preliminary results are very satisfactory and motivate us to proceed with the application of Machine Learning techniques over the whole dataset.

Mauro Maria Baldi, Guido Perboli, Roberto Tadei
A Systems Biology Approach for Unsupervised Clustering of High-Dimensional Data

One main challenge in modern medicine is the discovery of molecular disease subtypes characterized by relevant clinical differences, such as survival. However, clustering high-dimensional expression data is challenging due to noise and the curse of high-dimensionality. This article describes a disease subtyping pipeline that is able to exploit the important information available in pathway databases and clinical variables. The pipeline consists of a new feature selection procedure and existing clustering methods. Our procedure partitions a set of patients using the set of genes in each pathway as clustering features. To select the best features, this procedure estimates the relevance of each pathway and fuses relevant pathways. We show that our pipeline finds subtypes of patients with more distinctive survival profiles than traditional subtyping methods by analyzing a TCGA colon cancer gene expression dataset. Here we demonstrate that our pipeline improves three different clustering methods: k-means, SNF, and hierarchical clustering.

Diana Diaz, Tin Nguyen, Sorin Draghici
Large-Scale Bandit Recommender System

The main target of Recommender Systems (RS) is to propose to users one or several items in which they might be interested. However, as users provide more feedback, the recommendation process has to take these new data into consideration. The necessity of this update phase makes recommendation an intrinsically sequential task. A few approaches were recently proposed to address this issue, but they do not meet the need to scale up to real life applications. In this paper, we present a Collaborative Filtering RS method based on Matrix Factorization and Multi-Armed Bandits. This approach aims at good recommendations with a narrow computation time. Several experiments on large datasets show that the proposed approach performs personalized recommendations in less than a millisecond per recommendation.

Frédéric Guillou, Romaric Gaudel, Philippe Preux
Automatic Generation of Sitemaps Based on Navigation Systems

In this paper we present a method to automatically discover sitemaps from websites. Given a website, existing automatic solutions extract only a flat list of urls that do not show the hierarchical structure of its content. Manual approaches, performed by web-masters, extract deeper sitemaps (with respect to automatic methods). However, in many cases, also because of the natural evolution of the websites’ content, generated sitemaps do not reflect the actual content becoming soon helpless and confusing for users. We propose a different approach that is both automatic and effective. Our solution combines an algorithm to extract frequent patterns from navigation systems (e.g. menu, nav-bar, content list, etc.) contained in a website, with a hierarchy extraction algorithm able to discover rich hierarchies that unveil relationships among web pages (e.g. relationships of super/sub topic). Experimental results, show how our approach discovers high quality sitemaps that have a deep hierarchy and are complete in the extracted urls.

Pasqua Fabiana Lanotte, Fabio Fumarola, Donato Malerba, Michelangelo Ceci
A Customer Relationship Management Case Study Based on Banking Data

This work aims to show a product recommender construction approach within the banking industry. Such a model costruction should respect several methodological and business constraints. In particular, analysis’ outcome should be a model which must be easily interpretable when shown to business people. We start from a Customer Relationship Management data set collected in Banking industry. Formerly, data is prepared by managing missing values and keeping only the most relevant variables. Latterly, we apply some algorithms and evaluate them using diagnostic tools.

Ivan Luciano Danesi, Cristina Rea
Lagrangian Relaxation Bounds for a Production-Inventory-Routing Problem

We consider a single item Production-Inventory-Routing problem with a single producer/supplier and multiple retailers. Inventory management constraints are considered both at the producer and at the retailers, following a vendor managed inventory approach, where the supplier monitors the inventory at retailers and decides on the replenishment policy for each retailer. We assume a constant production capacity. Based on the mathematical formulation we discuss a classical Lagrangian relaxation which allows to decompose the problem into four subproblems, and a new Lagrangian decomposition which decomposes the problem into just a production-inventory subproblem and a routing subproblem. The new decomposition is enhanced with valid inequalities. A computational study is reported to compare the bounds from the two approaches.

Agostinho Agra, Adelaide Cerveira, Cristina Requejo
Convergence Rate Evaluation of Derivative-Free Optimization Techniques

This paper presents a convergence rate comparison of five different derivative-free numerical optimization techniques across a set of 50 benchmark objective functions. Results suggest that Adaptive Memory Programming for constrained Global Optimization, and a variant of Simulated Annealing are two of the fastest-converging numerical optimization techniques in this set. Lastly, there is a mechanism for expanding the set of optimization algorithms provided.

Thomas Lux
The Learnability of Business Rules

Among programming languages, a popular one in corporate environments is Business Rules. These are conditional statements which can be seen as a sort of “programming for non-programmers”, since they remove loops and function calls, which are typically the most difficult programming constructs to master by laypeople. A Business Rules program consists of a sequence of “IF condition THEN actions” statements. Conditions are verified over a set of variables, and actions assign new values to the variables. Medium-sized to large corporations often enforce, document and define their business processes by means of Business Rules programs. Such programs are executed in a special purpose virtual machine which verifies conditions and executes actions in an implicit loop. A problem of extreme interest in business environments is enforcing high-level strategic decisions by configuring the parameters of Business Rules programs so that they behave in a certain prescribed way on average. In this paper we show that Business Rules are Turing-complete. As a consequence, we argue that there can exist no algorithm for configuring the average behavior of all possible Business Rules programs.

Olivier Wang, Changhai Ke, Leo Liberti, Christian de Sainte Marie
Dynamic Programming with Approximation Function for Nurse Scheduling

Although dynamic programming could ideally solve any combinatorial optimization problem, the curse of dimensionality of the search space seriously limits its application to large optimization problems. For example, only few papers in the literature have reported the application of dynamic programming to workforce scheduling problems. This paper investigates approximate dynamic programming to tackle nurse scheduling problems of size that dynamic programming cannot tackle in practice. Nurse scheduling is one of the problems within workforce scheduling that has been tackled with a considerable number of algorithms particularly meta-heuristics. Experimental results indicate that approximate dynamic programming is a suitable method to solve this problem effectively.

Peng Shi, Dario Landa-Silva
Breast Cancer’s Microarray Data: Pattern Discovery Using Nonnegative Matrix Factorizations

One challenge in microarray analysis is to discover and capture valuable knowledge to understand biological processes and human disease mechanisms. Nonnegative Matrix Factorization (NMF) – a constrained optimization mechanism which decomposes a data matrix in terms of additive combination of non-negative factors– has been demonstrated to be a useful tool to reduce the dimension of gene expression data and to identify potentially interesting genes which explain latent structure hidden in microarray data.In this paper, we detail how to use Nonnegative Matrix Factorization based on generalized Kullback-Leibler divergence to analyze gene expression profile data related to the cell line of mammary cancer MCF-7 and to pharmaceutical compounds connected to the metabolism of arachidonic acid. NMF technique is able to reduce the dimension of the considered genes-compounds matrix from thousands of genes to few metagenes and to extract information about the drugs that more affect these genes. We provide an experimental framework illustrating the technical steps one has to perform to use NMF to discover useful patterns from microarray data. In fact, the results obtained by NMF method could be used to select and characterize therapies that can be effective on biological functions involved in the neoplastic transformation process and to perform further biological investigations.

Nicoletta Del Buono, Flavia Esposito, Fabio Fumarola, Angelina Boccarelli, Mauro Coluccia
Optimizing the Location of Helicopter Emergency Medical Service Operating Sites

The European Commission Regulation (EU) No 965/2012, now completely operative in all the European countries, allows helicopter night landings for emergency medical service in dedicated spaces, provided with a minimum amount of facilities, called HEMS Operating Sites. This possibility opens new scenarios for the mixed, ambulance/helicopter, rescue procedure, today not fully exploited. The paper studies the problem of optimal positioning for HEMS sites, where the transfer of the patient from ambulance to helicopter takes place, through the use of Geographic Information System (GIS) and optimization algorithms integrated in the software ArcGIS for Desktop. The optimum is defined in terms of the minimum intervention time. The solution approach has been applied to the area of competence of “SOREU dei Laghi”, in Lombardia region, with a catchment area of almost two million people.

Maurizio Bruglieri, Cesare Cardani, Matteo Putzu
An Enhanced Infra-Chromatic Bound for the Maximum Clique Problem

There has been a rising interest in experimental exact algorithms for the maximum clique problem because the gap between the expected theoretical performance and the reported results in practice is becoming surprisingly large. One reason for this is the family of bounding functions denoted as infra-chromatic because they produce bounds which can be lower than the chromatic number of the bounded subgraph. In this paper we describe a way to enhance exact solvers with an additional infra-chromatic bounding function and report performance over a number of graphs from well known data sets. Moreover, the reported results show that the new enhanced procedure significantly outperforms state-of-the-art.

Pablo San Segundo, Jorge Artieda, Rafael Leon, Cristobal Tapia
Cultural Ant Colony Optimization on GPUs for Travelling Salesman Problem

Ant Colony Optimization (ACO) is a well-established metaheuristic successfully applied to solve hard combinatorial optimization problems, including Travelling Salesman Problem (TSP). However, ACO algorithm as many population-based approaches has some disadvantages, such as lower solution quality and longer computational time. To overcome these issues, parallel Cultural Ant Colony Optimization (pCACO) is introduced in this paper. The proposed approach hybridises Cultural Algorithm with ACO-based $$\mathcal {MAX}$$-$$\mathcal {MIN}$$ Ant System. pCACO has been implemented on Graphics Processing Units (GPUs) using CUDA programming model. Through testing nine benchmark asymmetric TSP problems, the experimental results show the new method enhances the solution quality when compared to sequential and parallel ACO, yielding comparable computational time to parallel ACO.

Olgierd Unold, Radosław Tarnawski
Combining Genetic Algorithm with the Multilevel Paradigm for the Maximum Constraint Satisfaction Problem

Genetic algorithms (GA) which belongs to the class of evolutionary algorithms are regarded as highly successful algorithms when applied to a broad range of discrete as well continuous optimization problems. This paper introduces a hybrid approach combining genetic algorithm with the multilevel paradigm for solving the maximum constraint satisfaction problem (Max-CSP). The multilevel paradigm refers to the process of dividing large and complex problems into smaller ones, which are hopefully much easier to solve, and then work backward towards the solution of the original problem, using the solution reached from a child level as a starting solution for the parent level. The promising performances achieved by the proposed approach are demonstrated by comparisons made to solve conventional random benchmark problems.

Noureddine Bouhmala
Implicit Location Sharing Detection in Social Media Turkish Text Messaging

Social media have become a significant venue for information sharing of live updates. Users of social media are producing and sharing large amount of personal data as a part of the live updates. A significant share of this data contains location information that can be used by other people for many purposes. Some of the social media users deliberately share their own location information with other users. However, a large number of users blindly or implicitly share their own location without noticing it and its possible consequences. Implicit location sharing is investigated in the current paper.We perform a large scale study on implicit location sharing detection for one of the most popular social media platform, namely Twitter. After a careful study, we prepared a training data set of Turkish tweets and manually labelled them. Using machine learning techniques we induced classifiers that are able to classify whether a given tweet contains implicit location sharing or not. The classifiers are shown to be very accurate and efficient as well. Moreover, the best classifier is employed in a browser add-on tool which warns the user whenever an implicit location sharing is predicted from just to be released tweet. The paper provides the followed methodology and the technical analysis as well. Furthermore, it discusses how these techniques can be extended to different social network services and also to different languages.

Davut Deniz Yavuz, Osman Abul
Fuzzy Decision-Making of a Process for Quality Management

Problem solving and decision-making are important skills for business and life. Good decision-making requires a mixture of skills: creative development and identification of options, clarity of judgement, firmness of decision, and effective implementation. SWOT analysis is a powerful tool that can help decision makers achieve their goals and objectives. In this study, data obtained through SWOT analysis from a quality department of a textile company were integrated by means of fuzzy multi criterian decision making. The aim of the study was to choose the policy most appropriate for the beneficial development of the quality department.

Feyza Gürbüz, Panos M. Pardalos
A Bayesian Network Profiler for Wildfire Arsonists

Arson-caused wildfires have a rate of clarification that is extremely low compared to other criminal activities. This fact made evident the importance of developing methodologies to assist investigators in the criminal profiling. For that we introduce Bayesian Networks (BN), which have only recently be applied to criminal profiling and never to arsonists. We learn a BN from data and expert knowledge and, after validation, we use it to predict the profile (characteristics) of the offender from the information about a particular arson-caused wildfire, including confidence levels that represent expected probabilities.

Rosario Delgado, José Luis González, Andrés Sotoca, Xavier-Andoni Tibau
Learning Optimal Decision Lists as a Metaheuristic Search for Diagnosis of Parkinson’s Disease

Decision Lists are a very general model representation. In learning decision structures from medical datasets one needs a simple understandable model. Such a model should correctly classify unknown cases. One must search for the most general decision structure using the training dataset as input, taking into account both complexity and goodness-of-fit of the underlying model. In this paper, we propose to search the Decision List state space as an optimization problem using a metaheuristic. We implemented the method and carried out experimentation over a well-known Parkinson’s Disease training set. Our results are encouraging when compared to other machine learning references on the same dataset.

Fernando de Carvalho Gomes, José Gilvan Rodrigues Maia
Hermes: A Distributed-Messaging Tool for NLP

In this paper we present Hermes, a novel tool for natural language processing. By employing an efficient and extendable distributed-message architecture, Hermes is able to fullfil the requirements of large-scale processing, completeness, and versatility that are currently missed by existing NLP tools.

Ilaria Bordino, Andrea Ferretti, Marco Firrincieli, Francesco Gullo, Marcello Paris, Gianluca Sabena
Deep Learning for Classification of Dental Plaque Images

Dental diseases such as caries or gum disease are caused by prolonged exposure to pathogenic plaque. Assessment of such plaque accumulation can be used to identify individuals at risk. In this work we present an automated dental red autofluorescence plaque image classification model based on application of Convolutional Neural Networks (CNN) on Quantitative Light-induced Fluorescence (QLF) images. CNN model outperforms other state of the art classification models providing a 0.75 ± 0.05 F1-score on test dataset. The model directly benefits from multi-channel representation of the images resulting in improved performance when all three colour channels were used.

Sultan Imangaliyev, Monique H. van der Veen, Catherine M. C. Volgenant, Bart J. F. Keijser, Wim Crielaard, Evgeni Levin
Multiscale Integration for Pattern Recognition in Neuroimaging

Multiscale, multilevel integration is a valuable method for the recognition and analysis of combined spatial-temporal characteristics of specific brain regions. Using this method, primary experimental data are decomposed both into sets of spatially independent images and into sets of time series. The results of this decomposition are then integrated into a single space using a coordinate system that contains metadata regarding the data sources. The following modules can be used as tools to optimize data processing: (a) the selection of reference points; (b) the identification of regions of interest; and (c) classification and generalization. Multiscale integration methods are applicable for achieving pattern recognition in computed tomography and magnetic resonance imaging, thereby allowing for comparative analyses of data processing results.

Margarita Zaleshina, Alexander Zaleshin
Game Theoretical Tools for Wing Design

In the general setting of modeling for system design it is assumed that all decision-makers cooperate in order to choose the optimal set of the variable design. Sometimes there are conflicts between the different tasks, so that the design process is studied as a multi-player game. In this work we deal with a preliminary application of the design of a wing by studying its optimal configuration by considering some of the standard parameters of the plant design. The choice of the parameters value is done by optimizing two tasks: the weight and the drag. This two-objective optimization problem is approached by a cooperative model, just minimizing the sum of the weight and the drag, as well by a non-cooperative model by means of the Nash equilibrium concept. Both situations will be studied and some test cases will be presented.

Lina Mallozzi, Giovanni Paolo Reina, Serena Russo, Carlo de Nicola
Fastfood Elastic Net: Combining Variable Selection with Kernel Expansion Approximations

As the complexity of a prediction problem grows, simple linear approaches tend to fail which has led to the development of algorithms to make complicated, nonlinear problems solvable both quickly and inexpensively. Fastfood, one of such algorithms, has been shown to generate reliable models, but its current state does not offer feature selection that is useful in solving a wide array of complex real-world problems that spans from cancer prediction to financial analysis.The aim of this research is to extend Fastfood with variable importance by integrating with Elastic net. Elastic net offers feature selection, but is only capable of producing linear models. We show that in combining the two, it is possible to retain the feature selection offered by the Elastic net and the nonlinearity produced by Fastfood. Models constructed with the Fastfood enhanced Elastic net are relatively quick and inexpensive to compute and are also quite powerful in their ability to make accurate predictions.

Sonia Kopel, Kellan Fluette, Geena Glen, Paul E. Anderson
Big Data Analytics in a Public General Hospital

Obtaining information and knowledge from big data has become a common practice today, especially in health care. However, a number of challenges make the use of analytics in health care data difficult. The aim of this paper is to present the big data analytics framework defined and implemented at an important Brazilian public hospital, which decided to use this technology to provide insights that will help improve clinical practices. The framework was validated by a use case in which the goal is to discover the behavior patterns of nosocomial infections in the institution. The architecture was defined, evaluated, and implemented. The overall result was very positive, with a relatively simple process for use that was able to produce interesting analytical results.

Ricardo S. Santos, Tiago A. Vaz, Rodrigo P. Santos, José M. Parente de Oliveira
Inference of Gene Regulatory Network Based on Radial Basis Function Neural Network

Inference of gene regulatory network (GRN) from gene expression data is still a challenging work. The supervised approaches perform better than unsupervised approaches. In this paper, we develop a new supervised approach based on radial basis function (RBF) neural network for inference of gene regulatory network. A new hybrid evolutionary method based on dissipative particle swarm optimization (DPSO) and firefly algorithm (FA) is proposed to optimize the parameters of RBF. The data from E.coli network is used to test our method and results reveal that our method performs than classical approaches.

Sanrong Liu, Bin Yang, Haifeng Wang
Establishment of Optimal Control Strategy of Building-Integrated Photovoltaic Blind Slat Angle by Considering Interior Illuminance and Electricity Generation

A building-integrated photovoltaic blind (BIPB), in which blind and PV system is combined to generate energy in the building exterior and reduce the heating and cooling load in building by shading function. This study aimed to establish the optimal control strategy of BIPB slat angle by considering interior illuminance and electricity generation. First, in terms of interior illuminance considering overall light (i.e., daylight and artificial illumination) and electricity generation from BIPB, it was determined that the optimal blind slat angle is 80° at all time. Second, in terms of interior illuminance considering daylight and electricity generation from BIPB, it was determined that the optimal blind slat angle is 10° (9:00), 20° (10:00–11:00, 14:00–15:00) and 30° (12:00–13:00). Based on results of this study, effective use of BIPB can be induced by providing information for optimal blind slat angle to users that are considering BIPB implementation.

Taehoon Hong, Jeongyoon Oh, Kwangbok Jeong, Jimin Kim, Minhyun Lee
Backmatter
Metadaten
Titel
Machine Learning, Optimization, and Big Data
herausgegeben von
Panos M. Pardalos
Piero Conca
Giovanni Giuffrida
Giuseppe Nicosia
Copyright-Jahr
2016
Electronic ISBN
978-3-319-51469-7
Print ISBN
978-3-319-51468-0
DOI
https://doi.org/10.1007/978-3-319-51469-7

Premium Partner