Skip to main content

2019 | Buch

Computational Science – ICCS 2019

19th International Conference, Faro, Portugal, June 12–14, 2019, Proceedings, Part III

herausgegeben von: Dr. João M. F. Rodrigues, Dr. Pedro J. S. Cardoso, Dr. Jânio Monteiro, Prof. Roberto Lam, Dr. Valeria V. Krzhizhanovskaya, Michael H. Lees, Prof. Jack J. Dongarra, Prof. Dr. Peter M.A. Sloot

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The five-volume set LNCS 11536, 11537, 11538, 11539 and 11540 constitutes the proceedings of the 19th International Conference on Computational Science, ICCS 2019, held in Faro, Portugal, in June 2019.

The total of 65 full papers and 168 workshop papers presented in this book set were carefully reviewed and selected from 573 submissions (228 submissions to the main track and 345 submissions to the workshops). The papers were organized in topical sections named:

Part I: ICCS Main Track

Part II: ICCS Main Track; Track of Advances in High-Performance Computational Earth Sciences: Applications and Frameworks; Track of Agent-Based Simulations, Adaptive Algorithms and Solvers; Track of Applications of Matrix Methods in Artificial Intelligence and Machine Learning; Track of Architecture, Languages, Compilation and Hardware Support for Emerging and Heterogeneous Systems

Part III: Track of Biomedical and Bioinformatics Challenges for Computer Science; Track of Classifier Learning from Difficult Data; Track of Computational Finance and Business Intelligence; Track of Computational Optimization, Modelling and Simulation; Track of Computational Science in IoT and Smart Systems

Part IV: Track of Data-Driven Computational Sciences; Track of Machine Learning and Data Assimilation for Dynamical Systems; Track of Marine Computing in the Interconnected World for the Benefit of the Society; Track of Multiscale Modelling and Simulation; Track of Simulations of Flow and Transport: Modeling, Algorithms and Computation

Part V: Track of Smart Systems: Computer Vision, Sensor Networks and Machine Learning; Track of Solving Problems with Uncertainties; Track of Teaching Computational Science; Poster Track ICCS 2019

Chapter “Comparing Domain-decomposition Methods for the Parallelization of Distributed Land Surface Models” is available open access under a Creative Commons Attribution 4.0 International License via link.springer.com.

Inhaltsverzeichnis

Frontmatter

Track of Biomedical and Bioinformatics Challenges for Computer Science

Frontmatter
Parallelization of an Algorithm for Automatic Classification of Medical Data

In this paper, we present the optimization and parallelization of a state-of-the-art algorithm for automatic classification, in order to perform real-time classification of clinical data. The parallelization has been carried out so that the algorithm can be used in real time in standard computers, or in high performance computing servers. The fastest versions have been obtained carrying out most of the computations in Graphics Processing Units (GPUs). The algorithms obtained have been tested in a case of automatic classification of electroencephalographic signals from patients.

Victor M. Garcia-Molla, Addisson Salazar, Gonzalo Safont, Antonio M. Vidal, Luis Vergara
The Chain Alignment Problem

This paper introduces two new combinatorial optimization problems involving strings, namely, the , and a multiple version of it, the . For the first problem, a polynomial-time algorithm using dynamic programming is presented, and for the second one, a proof of its $$\mathcal {NP}$$ NP -hardness is provided and some heuristics are proposed for it. The applicability of both problems here introduced is attested by their good results when modeling the .

Leandro Lima, Said Sadique Adi
Comparing Deep and Machine Learning Approaches in Bioinformatics: A miRNA-Target Prediction Case Study

MicroRNAs (miRNAs) are small non-coding RNAs with a key role in the post-transcriptional gene expression regularization, thanks to their ability to link with the target mRNA through the complementary base pairing mechanism. Given their role, it is important to identify their targets and, to this purpose, different tools were proposed to solve this problem. However, their results can be very different, so the community is now moving toward the deployment of integration tools, which should be able to perform better than the single ones.As Machine and Deep Learning algorithms are now in their popular years, we developed different classifiers from both areas to verify their ability to recognize possible miRNA-mRNA interactions and evaluated their performance, showing the potentialities and the limits that those algorithms have in this field.Here, we apply two deep learning classifiers and three different machine learning models to two different miRNA-mRNA datasets, of predictions from 3 different tools: TargetScan, miRanda, and RNAhybrid. Although an experimental validation of the results is needed to better confirm the predictions, deep learning techniques achieved the best performance when the evaluation scores are taken into account.

Valentina Giansanti, Mauro Castelli, Stefano Beretta, Ivan Merelli
Automated Epileptic Seizure Detection Method Based on the Multi-attribute EEG Feature Pool and mRMR Feature Selection Method

Electroencephalogram (EEG) signals reveal many crucial hidden attributes of the human brain. Classification based on EEG-related features can be used to detect brain-related diseases, especially epilepsy. The quality of EEG-related features is directly related to the performance of automated epileptic seizure detection. Therefore, finding prominent features bears importance in the study of automated epileptic seizure detection. In this paper, a novel method is proposed to automatically detect epileptic seizure. This work proposes a novel time-frequency-domain feature named global volatility index (GVIX) to measure holistic signal fluctuation in wavelet coefficients and original time-series signals. Afterwards, the multi-attribute EEG feature pool is constructed by combining time-frequency-domain features, time-domain features, nonlinear features, and entropy-based features. Minimum redundancy maximum relevance (mRMR) is then introduced to select the most prominent features. Results in this study indicate that this method performs better than others for epileptic seizure detection using an identical dataset, and that our proposed GVIX is a prominent feature in automated epileptic seizure detection.

Bo Miao, Junling Guan, Liangliang Zhang, Qingfang Meng, Yulin Zhang
An Approach for Semantic Data Integration in Cancer Studies

Contemporary development in personalized medicine based both on extended clinical records and implementation of different high-throughput “omics” technologies has generated large amounts of data. To make use of these data, new approaches need to be developed for their search, storage, analysis, integration and processing. In this paper we suggest an approach for integration of data from diverse domains and various information sources enabling extraction of novel knowledge in cancer studies. Its application can contribute to the early detection and diagnosis of cancer as well as to its proper personalized treatment.The data used in our research consist of clinical records from two particular cancer studies with different factors and different origin, and also include gene expression datasets from different high-throughput technologies – microarray and next generation sequencing. An especially developed workflow, able to deal effectively with the heterogeneity of data and the enormous number of relations between patients and proteins, is used to automate the data integration process. During this process, our software tool performs advanced search for additional expressed protein relationships in a set of available knowledge sources and generates semantic links to them. As a result, a set of hidden common expressed protein mutations and their subsequent relations with patients is generated in the form of new knowledge about the studied cancer cases.

Iliyan Mihaylov, Maria Nisheva-Pavlova, Dimitar Vassilev
A Study of the Electrical Propagation in Purkinje Fibers

Purkinje fibers are fundamental structures in the process of the electrical stimulation of the heart. To allow the contraction of the ventricle muscle, these fibers need to stimulate the myocardium in a synchronized manner. However, certain changes in the properties of these fibers may provide a desynchronization of the heart rate. This can occur through failures in the propagation of the electrical stimulus due to conduction blocks occurring at the junctions that attach the Purkinje fibers to the ventricle muscle. This condition is considered a risk state for cardiac arrhythmias. The aim of this work is to investigate and analyze which properties may affect the propagation velocity and activation time of the Purkinje fibers, such as cell geometry, conductivity, coupling of the fibers with ventricular tissue and number of bifurcations in the network. In order to reach this goal, several Purkinje networks were generated by varying these parameters to perform a sensibility analysis. For the implementation of the computational model, the monodomain equation was used to describe mathematically the phenomenon and the numerical solution was calculated using the Finite Volume Method. The results of the present work were in accordance with those obtained in the literature: the model was able to reproduce certain behaviors that occur in the propagation velocity and activation time of the Purkinje fibers. In addition, the model was able to reproduce the characteristic delay in propagation that occurs at the Purkinje-muscle junctions.

Lucas Arantes Berg, Rodrigo Weber dos Santos, Elizabeth M. Cherry
A Knowledge Based Self-Adaptive Differential Evolution Algorithm for Protein Structure Prediction

Tertiary protein structure prediction is one of the most challenging problems in Structural Bioinformatics, and it is a NP-Complete problem in computational complexity theory. The complexity is related to the significant number of possible conformations a single protein can assume. Metaheuristics became useful algorithms to find feasible solutions in viable computational time since exact algorithms are not capable. However, these stochastic methods are highly-dependent from parameter tuning for finding the balance between exploitation (local search refinement) and exploration (global exploratory search) capabilities. Thus, self-adaptive techniques were created to handle the parameter definition task, since it is time-consuming. In this paper, we enhance the Self-Adaptive Differential Evolution with problem-domain knowledge provided by the angle probability list approach, comparing it with every single mutation we used to compose our set of mutation operators. Moreover, a population diversity metric is used to analyze the behavior of each one of them. The proposed method was tested with ten protein sequences with different folding patterns. Results obtained showed that the self-adaptive mechanism has a better balance between the search capabilities, providing better results in regarding root mean square deviation and potential energy than the non-adaptive single-mutation methods.

Pedro H. Narloch, Márcio Dorn
A Multi-objective Swarm-Based Algorithm for the Prediction of Protein Structures

The protein structure prediction is one of the most challenging problems in Structural Bioinformatics. In this paper, we present some variations of the artificial bee colony algorithm to deal with the problem’s multimodality and high-dimensionality by introducing multi-objective optimization and knowledge from experimental proteins through the use of protein contact maps. Obtained results regarding measures of structural similarity indicate that our approaches surpassed their previous ones, showing the real need to adapt the method to tackle the problem’s complexities.

Leonardo de Lima Corrêa, Márcio Dorn
Combining Polynomial Chaos Expansions and Genetic Algorithm for the Coupling of Electrophysiological Models

The number of computational models in cardiac research has grown over the last decades. Every year new models with different assumptions appear in the literature dealing with differences in interspecies cardiac properties. Generally, these new models update the physiological knowledge using new equations which reflect better the molecular basis of process. New equations require the fitting of parameters to previously known experimental data or even, in some cases, simulated data. This work studies and proposes a new method of parameter adjustment based on Polynomial Chaos and Genetic Algorithm to find the best values for the parameters upon changes in the formulation of ionic channels. It minimizes the search space and the computational cost combining it with a Sensitivity Analysis. We use the analysis of different models of L-type calcium channels to see that by reducing the number of parameters, the quality of the Genetic Algorithm dramatically improves. In addition, we test whether the use of the Polynomial Chaos Expansions improves the process of the Genetic Algorithm search. We find that it reduces the Genetic Algorithm execution in an order of $$10^3$$ 10 3 times in the case studied here, maintaining the quality of the results. We conclude that polynomial chaos expansions can improve and reduce the cost of parameter adjustment in the development of new models.

Gustavo Montes Novaes, Joventino Oliveira Campos, Enrique Alvarez-Lacalle, Sergio Alonso Muñoz, Bernardo Martins Rocha, Rodrigo Weber dos Santos
A Cloud Architecture for the Execution of Medical Imaging Biomarkers

Digital Medical Imaging is increasingly being used in clinical routine and research. As a consequence, the workload in medical imaging departments in hospitals has multiplied by over 20 in the last decade. Medical Image processing requires intensive computing resources not available at hospitals, but which could be provided by public clouds. The article analyses the requirements of processing digital medical images and introduces a cloud-based architecture centred on a DevOps approach to deploying resources on demand, adjusting them based on the request of resources and the expected execution time to deal with an unplanned workload. Results presented show a low overhead and high flexibility executing a lung disease biomarker on a public cloud.

Sergio López-Huguet, Fabio García-Castro, Angel Alberich-Bayarri, Ignacio Blanquer
A Self-adaptive Local Search Coordination in Multimeme Memetic Algorithm for Molecular Docking

Molecular Docking is a methodology that deals with the problem of predicting the non-covalent binding of a receptor and a ligand at an atomic level to form a stable complex. Because the search space of possible conformations is vast, molecular docking is classified in computational complexity theory as a NP-hard problem. Because of the high complexity, exact methods are not efficient and several metaheuristics have been proposed. However, these methods are very dependent on parameter settings and search mechanism definitions, which requires approaches able to self-adapt these configurations along the optimization process. We proposed and developed a novel self-adaptive coordination of local search operators in a Multimeme Memetic Algorithm. The approach is based on the Biased Random Key Genetic Algorithm enhanced with four local search algorithms. The self-adaptation of methods and radius perturbation in local improvements works under a proposed probability function, which measures their performance to best guide the search process. The methods have been tested on a test set based on HIV-protease and compared to existing tools. Statistical test performed on the results shows that this approach reaches better results than a non-adaptive algorithm and is competitive with traditional methods.

Pablo Felipe Leonhart, Pedro Henrique Narloch, Márcio Dorn
Parallel CT Reconstruction for Multiple Slices Studies with SuiteSparseQR Factorization Package

Algebraic factorization methods applied to the discipline of Computerized Tomography (CT) Medical Imaging Reconstruction involve a high computational cost. Since these techniques are significantly slower than the traditional analytical ones and time is critical in this field, we need to employ parallel implementations in order to exploit the machine resources and obtain efficient reconstructions.In this paper, we analyze the performance of the sparse QR decomposition implemented on SuiteSparseQR factorization package applied to the CT reconstruction problem. We explore both the parallelism provided by BLAS threads and the use of the Householder reflections to reconstruct multiple slices at once efficiently. Combining both strategies, we can boost the performance of the reconstructions and implement a reliable and competitive method that gets high-quality CT images.

Mónica Chillarón, Vicente Vidal, Gumersindo Verdú

Track of Classifier Learning from Difficult Data

Frontmatter
ARFF Data Source Library for Distributed Single/Multiple Instance, Single/Multiple Output Learning on Apache Spark

Apache Spark has become a popular framework for distributed machine learning and data mining. However, it lacks support for operating with Attribute-Relation File Format (ARFF) files in a native, convenient, transparent, efficient, and distributed way. Moreover, Spark does not support advanced learning paradigms represented in the ARFF definition including learning from data comprising single/multiple instances and/or single/multiple outputs. This paper presents an ARFF data source library to provide native support for ARFF files, single/multiple instance, and/or single/multiple output learning on Apache Spark. This data source extends seamlessly the Apache Spark machine learning library allowing to load all the ARFF file varieties, attribute types, and learning paradigms. The ARFF data source allows researchers to incorporate a large number of diverse datasets, and develop scalable solutions for learning problems with increased complexity. The data source is implemented on Scala, just like the Apache Spark source code, however, it can be used from Java, Scala, and Python. The ARFF data source is free and open source, available on GitHub under the Apache License 2.0.

Jorge Gonzalez-Lopez, Sebastian Ventura, Alberto Cano
On the Role of Cost-Sensitive Learning in Imbalanced Data Oversampling

Learning from imbalanced data is still considered as one of the most challenging areas of machine learning. Among plethora of methods dedicated to alleviating the challenge of skewed distributions, two most distinct ones are data-level sampling and cost-sensitive learning. The former modifies the training set by either removing majority instances or generating additional minority ones. The latter associates a penalty cost with the minority class, in order to mitigate the classifiers’ bias towards the better represented class. While these two approaches have been extensively studied on their own, no works so far have tried to combine their properties. Such a direction seems as highly promising, as in many real-life imbalanced problems we may obtain the actual misclassification cost and thus it should be embedded in the classification framework, regardless of the selected algorithm. This work aims to open a new direction for learning from imbalanced data, by investigating an interplay between the oversampling and cost-sensitive approaches. We show that there is a direct relationship between the misclassification cost imposed on the minority class and the oversampling ratios that aim to balance both classes. This becomes vivid when popular skew-insensitive metrics are modified to incorporate the cost-sensitive element. Our experimental study clearly shows a strong relationship between sampling and cost, indicating that this new direction should be pursued in the future in order to develop new and effective algorithms for imbalanced data.

Bartosz Krawczyk, Michal Wozniak
Characterization of Handwritten Signature Images in Dissimilarity Representation Space

The offline Handwritten Signature Verification (HSV) problem can be considered as having difficult data since it presents imbalanced class distributions, high number of classes, high-dimensional feature space and small number of learning samples. One of the ways to deal with this problem is the writer-independent (WI) approach, which is based on the dichotomy transformation (DT). In this work, an analysis of the difficulty of the data in the space triggered by this transformation is performed based on the instance hardness (IH) measure. Also, the paper reports on how this better understanding can lead to better use of the data through a prototype selection technique.

Victor L. F. Souza, Adriano L. I. Oliveira, Rafael M. O. Cruz, Robert Sabourin
Missing Features Reconstruction and Its Impact on Classification Accuracy

In real-world applications, we can encounter situations when a well-trained model has to be used to predict from a damaged dataset. The damage caused by missing or corrupted values can be either on the level of individual instances or on the level of entire features. Both situations have a negative impact on the usability of the model on such a dataset. This paper focuses on the scenario where entire features are missing which can be understood as a specific case of transfer learning. Our aim is to experimentally research the influence of various imputation methods on the performance of several classification models. The imputation impact is researched on a combination of traditional methods such as k-NN, linear regression, and MICE compared to modern imputation methods such as multi-layer perceptron (MLP) and gradient boosted trees (XGBT). For linear regression, MLP, and XGBT we also propose two approaches to using them for multiple features imputation. The experiments were performed on both real world and artificial datasets with continuous features where different numbers of features, varying from one feature to $$50\%$$ 50 % , were missing. The results show that MICE and linear regression are generally good imputers regardless of the conditions. On the other hand, the performance of MLP and XGBT is strongly dataset dependent. Their performance is the best in some cases, but more often they perform worse than MICE or linear regression.

Magda Friedjungová, Marcel Jiřina, Daniel Vašata
A Deep Malware Detection Method Based on General-Purpose Register Features

Based on low-level features at micro-architecture level, the existing detection methods usually need a long sample length to detect malicious behaviours and can hardly identify non-signature malware, which will inevitably affect the detection efficiency and effectiveness. To solve the above problems, we propose to use the General-Purpose Registers (GPRs) as our features and design a novel deep learning model for malware detection. Specifically, each register has specific functions and changes of its content contain the action information which can be used to detect illegal behaviours. Also, we design a deep detection model, which can jointly fuse spatial and temporal correlations of GPRs for malware detection only requiring a short sample length. The proposed deep detection model can well learn discriminative characteristics from GPRs between normal and abnormal processes, and thus can also identify non-signature malware. Comprehensive experimental results show that our proposed method performs better than the state-of-art methods for malicious behaviours detection relying on low-level features.

Fang Li, Chao Yan, Ziyuan Zhu, Dan Meng
A Novel Distribution Analysis for SMOTE Oversampling Method in Handling Class Imbalance

Class Imbalance problems are often encountered in many applications. Such problems occur whenever a class is under-represented, has a few data points, compared to other classes. However, this minority class is usually a significant one. One approach for handling imbalance is to generate new minority class instances to balance the data distribution. The Synthetic Minority Oversampling TEchnique (SMOTE) is one of the dominant oversampling methods in the literature. SMOTE generates data using linear interpolation between minority class data point and one its K-nearest neighbors. In this paper, we present a theoretical and an experimental analysis of the SMOTE method. We explore the accuracy of how faithful SMOTE method emulates the underlying density. To our knowledge, this is the first mathematical analysis of the SMOTE method. Moreover, we study the impacts of the different factors on generation accuracy, such as the dimension of data, the number of examples, and the considered number of neighbors K on both artificial, and real datasets.

Dina Elreedy, Amir F. Atiya
Forecasting Purchase Categories by Transactional Data: A Comparative Study of Classification Methods

Forecasting purchase behavior of bank clients allows for development of new recommendation and personalization strategies and results in better Quality-of-Service and customer experience. In this study, we consider the problem of predicting purchase categories of a client for the next time period by the historical transactional data. We study the predictability of expenses for different Merchant Category Codes (MCCs) and compare the efficiency of different classes of machine learning models including boosting algorithms, long-short term memory networks and convolutional networks. The experimental study is performed on a massive dataset with debit card transactions for 5 years and about 1.2 M clients provided by our bank-partner. The results show that: (i) there is a set of MCC categories which are highly predictable (an exact number of categories varies with thresholds for minimal precision and recall), (ii) for most of the considered cases, convolutional neural networks perform better, and thus, may be recommended as basic choice for tackling similar problems.

Egor Shikov, Klavdiya Bochenina
Recognizing Faults in Software Related Difficult Data

In this paper we have investigated the use of numerous machine learning algorithms, with emphasis on multilayer artificial neural networks in the domain of software source code fault prediction. The main contribution lies in enhancing the data pre-processing step as the partial solution for handling software related difficult data. Before we put the data into an Artificial Neural Network, we are implementing PCA (Principal Component Analysis) and k-means clustering. The data-clustering step improves the quality of the whole dataset. Using the presented approach we were able to obtain 10% increase of accuracy of the fault detection. In order to ensure the most reliable results, we implement 10-fold cross-validation methodology during experiments. We have also evaluated a wide range of hyperparameter setups for the network, and compared the results to the state of the art, cost-sensitive approaches - Random Forest, AdaBoost, RepTrees and GBT.

Michał Choraś, Marek Pawlicki, Rafał Kozik

Track of Computational Finance and Business Intelligence

Frontmatter
Research on Knowledge Discovery in Database of Traffic Flow State Based on Attribute Reduction

Recognizing and diagnosing the state of traffic flow is an important research area, which is the basis of improving the level of traffic management and the quality of traffic information services. However, due to the increasing amount of traffic data collected, the traffic management system is facing the problem of “information surplus”. After finishing several process, including data preprocessing, attribute reduction and rule acquisition, finally obtained the knowledge rules of the traffic flow’s state. Using the method of knowledge discovery can reveal some hidden, unknown and valuable information from the huge amount of traffic flow information, so as to provide rules and decision-making basis for traffic management department.

Jia-lin Wang, Xiao-lu Li, Li Wang, Xi Zhang, Peng Zhang, Guang-yu Zhu
Factor Integration Based on Neural Networks for Factor Investing

Factor investing is one kind of quantitative investing methodologies for portfolio construction based on factors. Factors with different style are extracted from multiple sources such as market data, fundamental information from financial statements, sentimental information from the Internet, etc. Numerous style factors are defined by Barra model proposed by Morgan Stanley Capital International(MSCI) to explain the return of a portfolio. Multiple factors are usually integrated linearly when being put to use, which ensures the stability of the process of integration and enhances the effectiveness of integrated factors. In this work, we integrate factors by machine learning and deep learning methodologies to explore deeper information among multiple style factors defined by MSCI Barra model. Multi-factors indexes are compiled using Smart Beta Index methodology proposed by MSCI. The results show non-linear integration by deep neural network can enhance the profitability and stability of the index compiled according to the integrated factor.

Zhichen Lu, Wen Long, Jiashuai Zhang, Yingjie Tian
A Brief Survey of Relation Extraction Based on Distant Supervision

As a core task and important part of Information ExtractionEntity Relation Extraction can realize the identification of the semantic relation between entity pairs. And it plays an important role in semantic understanding of sentences and the construction of entity knowledge base. It has the potential of employing distant supervision method, end-to-end model and other deep learning model with the creation of large datasets. In this review, we compare the contributions and defect of the various models that have been used for the task, to help guide the path ahead.

Yong Shi, Yang Xiao, Lingfeng Niu
Short-Term Traffic Congestion Forecasting Using Attention-Based Long Short-Term Memory Recurrent Neural Network

Traffic congestion seriously affect citizens’ life quality. Many researchers have paid much attention to the task of short-term traffic congestion forecasting. However, the performance of the traditional traffic congestion forecasting approaches is not satisfactory. Moreover, most neural network models cannot capture the features at different moments effectively. In this paper, we propose an Attention-based long short-term memory (LSTM) recurrent neural network. We evaluate the prediction architecture on a real-time traffic data from Gray-Chicago-Milwaukee (GCM) Transportation Corridor in Chicagoland. The experimental results demonstrate that our method outperforms the baselines for the task of congestion prediction.

Tianlin Zhang, Ying Liu, Zhenyu Cui, Jiaxu Leng, Weihong Xie, Liang Zhang
Portfolio Selection Based on Hierarchical Clustering and Inverse-Variance Weighting

This paper presents a remarkable model for portfolio selection using inverse-variance weighting and machine learning techniques such as hierarchical clustering algorithms. This method allows building diversified portfolios that have a good balance sector exposure and style exposure, respect to momentum, size, value, short-term reversal, and volatility. Furthermore, we compare performance for seven hierarchical algorithms: Single, Complete, Average, Weighted, Centroid, Median and Ward Linkages. Results show that the Average Linkage algorithm has the best Cophenetic Correlation Coefficient. The proposed method using the best linkage criteria is tested against real data over a two-year dataset of one-minute American stocks returns. The portfolio selection model achieves a good financial return and an outstanding result in the annual volatility of 3.2%. The results suggest good behavior in performance indicators with a Sharpe ratio of 0.89, an Omega ratio of 1.16, a Sortino ratio of 1.29 and a beta to S&P of 0.26.

Andrés Arévalo, Diego León, German Hernandez
A Computational Technique for Asian Option Pricing Model

In the present work, the European style fixed strike Asian call option with arithmetic and continuous averaging is numerically evaluated where the volatility, the risk free interest rate and the dividend yield are functions of the time. A finite difference scheme consisting of second order HODIE scheme for spatial discretization and two-step backward differentiation formula for temporal discretization is applied. The scheme is proved to be second order accurate in space and time both. The numerical results are in accordance with analytical results.

Manisha, S. Chandra Sekhara Rao
Improving Portfolio Optimization Using Weighted Link Prediction in Dynamic Stock Networks

Portfolio optimization in stock markets has been investigated by many researchers. It looks for a subset of assets able to maintain a good trade-off control between risk and return. Several algorithms have been proposed to portfolio management. These algorithms use known return and correlation data to build subset of recommended assets. Dynamic stock correlation networks, whose vertices represent stocks and edges represent the correlation between them, can also be used as input by these algorithms. This study proposes the definition of constants of the classical mean-variance analysis using machine learning and weighted link prediction in stock networks (method named as MLink). To assess the performance of MLink, experiments were performed using real data from the Brazilian Stock Exchange. In these experiments, MLink was compared with mean-variance analysis (MVA), a popular method to portfolio optimization. According to the experimental results, using weighted link prediction in stock networks as input considerably increases the performance in portfolio optimization task, resulting in a gross capital increase of $$41\%$$ 41 % in 84 days.

Douglas Castilho, João Gama, Leandro R. Mundim, André C. P. L. F. de Carvalho

Track of Computational Optimization, Modelling and Simulation

Frontmatter
Comparison of Constraint-Handling Techniques for Metaheuristic Optimization

Many design problems in engineering have highly nonlinear constraints and the proper handling of such constraints can be important to ensure solution quality. There are many different ways of handling constraints and different algorithms for optimization problems, which makes it difficult to choose for users. This paper compares six different constraint-handling techniques such as penalty methods, barrier functions, $$\epsilon $$ ϵ -constrained method, feasibility criteria and stochastic ranking. The pressure vessel design problem is solved by the flower pollination algorithm, and results show that stochastic ranking and $$\epsilon $$ ϵ -constrained method are most effective for this type of design optimization.

Xing-Shi He, Qin-Wei Fan, Mehmet Karamanoglu, Xin-She Yang
Dynamic Partitioning of Evolving Graph Streams Using Nature-Inspired Heuristics

Detecting communities of interconnected nodes is a frequently addressed problem in situation that be modeled as a graph. A common practical example is this arising from Social Networks. Anyway, detecting an optimal partition in a network is an extremely complex and highly time-consuming task. This way, the development and application of meta-heuristic solvers emerges as a promising alternative for dealing with these problems. The research presented in this paper deals with the optimal partitioning of graph instances, in the special cases in which connections among nodes change dynamically along the time horizon. This specific case of networks is less addressed in the literature than its counterparts. For efficiently solving such problem, we have modeled and implements a set of meta-heuristic solvers, all of them inspired by different processes and phenomena observed in Nature. Concretely, considered approaches are Water Cycle Algorithm, Bat Algorithm, Firefly Algorithm and Particle Swarm Optimization. All these methods have been adapted for properly dealing with this discrete and dynamic problem, using a reformulated expression for the well-known modularity formula as fitness function. A thorough experimentation has been carried out over a set of 12 synthetically generated dynamic graph instances, with the main goal of concluding which of the aforementioned solvers is the most appropriate one to deal with this challenging problem. Statistical tests have been conducted with the obtained results for rigorously concluding the Bat Algorithm and Firefly Algorithm outperform the rest of methods in terms of Normalized Mutual Information with respect to the true partition of the graph.

Eneko Osaba, Miren Nekane Bilbao, Andres Iglesias, Javier Del Ser, Akemi Galvez, Iztok Fister Jr., Iztok Fister
Bat Algorithm for Kernel Computation in Fractal Image Reconstruction

Computer reconstruction of digital images is an important problem in many areas such as image processing, computer vision, medical imaging, sensor systems, robotics, and many others. A very popular approach in that regard is the use of different kernels for various morphological image processing operations such as dilation, erosion, blurring, sharpening, and so on. In this paper, we extend this idea to the reconstruction of digital fractal images. Our proposal is based on a new affine kernel particularly tailored for fractal images. The kernel computes the difference between the source and the reconstructed fractal images, leading to a difficult nonlinear constrained continuous optimization problem, solved by using a powerful nature-inspired metaheuristics for global optimization called the bat algorithm. An illustrative example is used to analyze the performance of this approach. Our experiments show that the method performs quite well but there is also room for further improvement. We conclude that this approach is promising and that it could be a very useful technique for efficient fractal image reconstruction.

Akemi Gálvez, Eneko Osaba, Javier Del Ser, Andrés Iglesias
Heuristic Rules for Coordinated Resources Allocation and Optimization in Distributed Computing

In this paper, we consider heuristic rules for resources utilization optimization in distributed computing environments. Existing modern job-flow execution mechanics impose many restrictions for the resources allocation procedures. Grid, cloud and hybrid computing services operate in heterogeneous and usually geographically distributed computing environments. Emerging virtual organizations and incorporated economic models allow users and resource owners to compete for suitable allocations based on market principles and fair scheduling policies. Subject to these features a set of heuristic rules for coordinated compact scheduling are proposed to select resources depending on how they fit a particular job execution and requirements. Dedicated simulation experiment studies integral job flow characteristics optimization when these rules are applied to conservative backfilling scheduling procedure.

Victor Toporkov, Dmitry Yemelyanov
Nonsmooth Newton’s Method: Some Structure Exploitation

We investigate real asymmetric linear systems arising in the search direction generation in a nonsmooth Newton’s method. This applies to constrained optimisation problems via reformulation of the necessary conditions into an equivalent nonlinear and nonsmooth system of equations. We propose a strategy to exploit the problem structure. First, based on the sub-blocks of the original matrix, some variables are selected and ruled out for a posteriori recovering; then, a smaller and symmetric linear system is generated; eventually, from the solution of the latter, the remaining variables are obtained. We prove the method is applicable if the original linear system is well-posed. We propose and discuss different selection strategies. Finally, numerical examples are presented to compare this method with the direct approach without exploitation, for full and sparse matrices, in a wide range of problem size.

Alberto De Marchi, Matthias Gerdts
Fully-Asynchronous Cache-Efficient Simulation of Detailed Neural Networks

Modern asynchronous runtime systems allow the re-thinking of large-scale scientific applications. With the example of a simulator of morphologically detailed neural networks, we show how detaching from the commonly used bulk-synchronous parallel (BSP) execution allows for the increase of prefetching capabilities, better cache locality, and a overlap of computation and communication, consequently leading to a lower time to solution. Our strategy removes the operation of collective synchronization of ODEs’ coupling information, and takes advantage of the pairwise time dependency between equations, leading to a fully-asynchronous exhaustive yet not speculative stepping model. Combined with fully linear data structures, communication reduce at compute node level, and an earliest equation steps first scheduler, we perform an acceleration at the cache level that reduces communication and time to solution by maximizing the number of timesteps taken per neuron at each iteration.Our methods were implemented on the core kernel of the NEURON scientific application. Asynchronicity and distributed memory space are provided by the HPX runtime system for the ParalleX execution model. Benchmark results demonstrate a superlinear speed-up that leads to a reduced runtime compared to the bulk synchronous execution, yielding a speed-up between 25% to 65% across different compute architectures, and in the order of 15% to 40% for distributed executions.

Bruno R. C. Magalhães, Thomas Sterling, Michael Hines, Felix Schürmann
Application of the Model with a Non-Gaussian Linear Scalar Filters to Determine Life Expectancy, Taking into Account the Cause of Death

It is well-known that civilization diseases shorten life expectancy. The most common causes of death in Poland, both for women and men, are cancer and cardiovascular disease. The aim of the article is to use the non-Gaussian scalar filter model to determine life expectancy based on death rates after eliminating one of the above causes of death. Based on the obtained results, it can be stated that depending on the sex and type of the cause of death, the life expectancy may extend to several years.

Piotr Sliwka
Improving ODE Integration on Graphics Processing Units by Reducing Thread Divergence

Ordinary differential equations are widely used for the mathematical modeling of complex systems in biology and statistics. Since the analysis of such models needs to be performed using numerical integration, many applications can be gravely limited by the computational cost. This paper present a general-purpose integrator that runs massively parallel on graphics processing units. By minimizing thread divergence and bundling similar tasks using linear regression, execution time can be reduced by 40–80% when compared to a naive GPU implementation. Compared to a 36-core CPU implementation, a 150 fold runtime improvement is measured.

Thomas Kovac, Tom Haber, Frank Van Reeth, Niel Hens
Data Compression for Optimization of a Molecular Dynamics System: Preserving Basins of Attraction

Understanding the evolution of atomistic systems is essential in various fields such as materials science, biology, and chemistry. The gold standard for these calculations is molecular dynamics, which simulates the dynamical interaction between pairs of molecules. The main challenge of such simulation is the numerical complexity, given a vast number of atoms over a long time scale. Furthermore, such systems often contain exponentially many optimal states, and the simulation tends to get trapped in local configurations. Recent developments leverage the existing temporal evolution of the system to improve the stability and scalability of the method; however, they suffer from large data storage requirements. To efficiently compress the data while retaining the basins of attraction, we have developed a framework to determine the acceptable level of compression for an optimization method by application of a Kantorovich-type theorem, using binary digit rounding as our compression technique. Choosing the Lennard-Jones potential function as a model problem, we present a method for determining the local Lipschitz constant of the Hessian with low computational cost, thus allowing the use of our technique in real-time computation.

Michael Retzlaff, Todd Munson, Zichao (Wendy) Di
An Algorithm for Hydraulic Tomography Based on a Mixture Model

Hydraulic Tomography (HT) has become one of the most robust methods to characterize the heterogeneity in hydraulic parameters such as hydraulic conductivity and specific storage. However, in order to obtain high resolution hydraulic parameter estimates, several pumping/injection tests with sufficient monitoring data are necessary. In highly heterogeneous media, even with large numbers of measurements, the resolution may not be enough for predicting contaminant transport behavior. In addition, during inverse modeling, the groundwater flow equation is solved numerous times, thus the computational burden could be large, especially for a large, three-dimensional, transient model.In this work we present a new approach to model aquifer heterogeneity, based on a Gaussian Mixture Model (GMM) to parameterize the K field, which significantly reduces the number of parameters to be estimated during the inversion process. In addition, a new objective function based on the spatial derivatives of hydraulic heads is introduced.The developed approach is tested with synthetic data and data from a previously conducted sandbox experiments. Results indicate that the new approach improves the accuracy of the K heterogeneity map produced through HT and reduces the computational effort. For two dimensional synthetic experiments, this approach was able to achieve a significant reduction in the error for K field estimation as well as computational time compared to a geostatistical inversion approach. Similar results were also achieved when the approach was tested using pumping test data conducted in a synthetic aquifer constructed in the laboratory.

Carlos Minutti, Walter A. Illman, Susana Gomez
Rapid Multi-band Patch Antenna Yield Estimation Using Polynomial Chaos-Kriging

Yield estimation of antenna systems is important to check their robustness with respect to the uncertain sources. Since the Monte Carlo sampling-based real physics simulation model evaluations are computationally intensive, this work proposes the polynomial chaos-Kriging (PC-Kriging) metamodeling technique for fast yield estimation. PC-Kriging integrates the polynomial chaos expansion (PCE) as the trend function of Kriging metamodel since the PCE is good at capturing the function tendency and Kriging is good at matching the observations at training points. The PC-Kriging is demonstrated with an analytical case and a multi-band patch antenna case and compared with direct PCE and Kriging metamodels. In the analytical case, PC-Kriging reduces the computational cost by around 42% compared with PCE and over 94% compared with Kriging. In the antenna case, PC-Kriging reduces the computational cost by over 60% compared with Kriging and over 90% compared with PCE. In both cases, the savings are obtained without compromising the accuracy.

Xiaosong Du, Leifur Leifsson, Slawomir Koziel
Accelerating Limited-Memory Quasi-Newton Convergence for Large-Scale Optimization

Quasi-Newton methods are popular gradient-based optimization methods that can achieve rapid convergence using only first-order derivatives. However, the choice of the initial Hessian matrix upon which quasi-Newton updates are applied is an important factor that can significantly affect the performance of the method. This fact is especially true for limited-memory variants, which are widely used for large-scale problems where only a small number of updates are applied in order to minimize the memory footprint. In this paper, we introduce both a scalar and a sparse diagonal Hessian initialization framework, and we investigate its effect on the restricted Broyden-class of quasi-Newton methods. Our implementation in PETSc/TAO allows us to switch between different Broyden class methods and Hessian initializations at runtime, enabling us to quickly perform parameter studies and identify the best choices. The results indicate that a sparse Hessian initialization based on the diagonalization of the BFGS formula significantly improves the base BFGS methods and that other parameter combinations in the Broyden class may offer competitive performance.

Alp Dener, Todd Munson
Reduced-Cost Design Optimization of High-Frequency Structures Using Adaptive Jacobian Updates

Electromagnetic (EM) analysis is the primary tool utilized in the design of high-frequency structures. In vast majority of cases, simpler models (e.g., equivalent networks or analytical ones) are either not available or lack accuracy: they can only be used to yield initial designs that need to be further tuned. Consequently, EM-driven adjustment of geometry and/or material parameters of microwave and antenna components is a necessary design stage. This, however, is a computationally expensive process, not only because of a considerable computational cost of high-fidelity EM analysis but also due to a typically large number of parameters that need to be adjusted. In particular, conventional numerical optimization routines (both local and global) may be prohibitively expensive. In this paper, a reduced-cost trust-region-based gradient search algorithm is proposed for the optimization of high-frequency components. Our methodology is based on a smart management of the system Jacobian enhancement which combines: (i) omission of (finite-differentiation-based) sensitivity updates for variables that exhibit small (relative) relocation in the directions of the corresponding coordinate system axes and (ii) selective utilization of a rank-one Broyden updating formula. Parameter selection for Broyden-based updating depends on the alignment between the direction of the latest design relocation and respective search space basis vectors. The proposed technique is demonstrated using a miniaturized coupler and an ultra-wideband antenna. In both cases, significant reduction of the number of EM simulations involved in the optimization process is achieved as compared to the benchmark algorithm (computational speedup of 60% on average). At the same time, degradation of the design quality is minor.

Slawomir Koziel, Anna Pietrenko-Dabrowska, Leifur Leifsson
An Algorithm for Selecting Measurements with High Information Content Regarding Parameter Identification

Reducing the measurement effort that is made for identification of parameters is an important task in some fields of technology. This work focuses on calibration of functions running on the electronic control unit (ECU), where measurements are the main expense factor. An algorithm for information content analysis of recorded measurement data is introduced that places the calibration engineer in the position to shorten future test runs. The analysis is based upon parameter sensitivities and utilizes the Fisher-information matrix to determine the value of certain measurement portions with respect to parameter identification. By means of a simple DC motor model the algorithm’s working principle is illustrated. The first use on a real ECU function achieves a measurement time reduction of 67% while a second use case opens up new features for the calibration of connected cars.

Christian Potthast
Optimizing Parallel Performance of the Cell Based Blood Flow Simulation Software HemoCell

Large scale cell based blood flow simulations are expensive, both in time and resource requirements. HemoCell can perform such simulations on high performance computing resources by dividing the simulation domain into multiple blocks. This division has a performance impact caused by the necessary communication between the blocks. In this paper we implement an efficient algorithm for computing the mechanical model for HemoCell together with an improved communication structure. The result is an up to 4 times performance increase for blood flow simulations performed with HemoCell.

Victor Azizi Tarksalooyeh, Gábor Závodszky, Alfons G. Hoekstra
Surrogate-Based Optimization of Tidal Turbine Arrays: A Case Study for the Faro-Olhão Inlet

This paper presents a study for estimating the size of a tidal turbine array for the Faro-Olhão Inlet (Potugal) using a surrogate optimization approach. The method compromises problem formulation, hydro-morphodynamic modelling, surrogate construction and validation, and constraint optimization. A total of 26 surrogates were built using linear RBFs as a function of two design variables: number of rows in the array and Tidal Energy Converters (TECs) per row. Surrogates describe array performance and environmental effects associated with hydrodynamic and morphological aspects of the multi inlet lagoon. After validation, surrogate models were used to formulate a constraint optimization model. Results evidence that the largest array size that satisfies performance and environmental constraints is made of 3 rows and 10 TECs per row.

Eduardo González-Gorbeña, André Pacheco, Theocharis A. Plomaritis, Óscar Ferreira, Cláudia Sequeira, Theo Moura
Time-Dependent Link Travel Time Approximation for Large-Scale Dynamic Traffic Simulations

Large-scale dynamic traffic simulations generate a sizeable amount of raw data that needs to be managed for analysis. Typically, big data reduction techniques are used to decrease redundant, inconsistent and noisy data as these are perceived to be more useful than the raw data itself. However, these methods are normally performed independently so it wouldn’t compete with the simulation’s computational and memory resources.In this paper, we propose a data reduction technique that will be integrated into a simulation process and executed numerous times. Our interest is in reducing the size of each link’s time-dependent travel time data in a large-scale dynamic traffic simulation. The objective is to approximate the time-dependent link travel times along the $$ y $$ y - axis to reduce memory consumption while insignificantly affecting the simulation results. An important aspect of the algorithm is its capability to restrict the maximum absolute error bound which avoids theoretically inconsistent results which may not have been accounted for by the dynamic traffic simulation model. One major advantage of the algorithm is its efficiency’s independence from the input data complexity such as the number of sampled data points, sampled data’s shape and irregularity of sampling intervals. Using a 10 × 10 grid network with variable time-dependent link travel time data complexities and absolute error bounds, the dynamic traffic simulation results show that the algorithm achieves around 80%–90% of link travel time data reduction using a small amount of computational resource.

Genaro Peque Jr., Hiro Harada, Takamasa Iryo
Evaluation of the Suitability of Intel Xeon Phi Clusters for the Simulation of Ultrasound Wave Propagation Using Pseudospectral Methods

The ability to perform large-scale ultrasound simulations using Fourier pseudospectral methods has generated significant interest in medical ultrasonics, including for treatment planning in therapeutic ultrasound and image reconstruction in photoacoustic tomography. However, the routine execution of such simulations is computationally very challenging. Nowadays, the trend in parallel computing is towards the use of accelerated clusters where computationally intensive parts are offloaded from processors to accelerators. During last five years, Intel has released two generations of Xeon Phi accelerators. The goal of this paper is to investigate the performance on both architectures with respect to current processors, and evaluate the suitability of accelerated clusters for the distributed simulation of ultrasound propagation using Fourier-based methods. The paper reveals that the former version of Xeon Phis, the Knight’s Corner architecture, suffers from several flaws that reduce the performance far below the Haswell processors. On the other hand, the second generation called Knight’s Landing shows very promising performance comparable with current processors.

Filip Vaverka, Bradley E. Treeby, Jiri Jaros

Track of Computational Science in IoT and Smart Systems

Frontmatter
Fog Computing Architecture Based Blockchain for Industrial IoT

Industry 4.0 is also referred to as the fourth industrial revolution and is the vision of a smart factory built with CPS. The ecosystem of the manufacturing industry is expected to be activated through autonomous and intelligent systems such as self-organization, self-monitoring and self-healing. The Fourth Industrial Revolution is beginning with an attempt to combine the myriad elements of the industrial system with Internet communication technology to form a future smart factory. The related technologies derived from these attempts are creating new value. However, the existing Internet has no effective way to solve the problem of cyber security and data information protection against new technology of future industry. In a future industrial environment where a large number of IoT devices will be supplied and used, if the security problem is not resolved, it is hard to come to a true industrial revolution. Therefore, in this paper, we propose block chain based fog system architecture for Industrial IoT. In this paper, we propose a new block chain based fog system architecture for industrial IoT. In order to guarantee fast performance, And the performance is evaluated and analyzed by applying a proper fog system-based permission block chain.

Su-Hwan Jang, Jo Guejong, Jongpil Jeong, Bae Sangmin
Exploration of Data from Smart Bands in the Cloud and on the Edge – The Impact on the Data Storage Space

Wearable devices used for tracking people’s health state usually transmit their data to a remote monitoring data center that can be located in the Cloud due to large storage capacities. However, the growing number of smart bands, fitness trackers, and other IoT devices used for health state monitoring pose pressure on the data centers and may raise the Big Data challenge and cause network congestion. This paper focuses on the consumption of the storage space while monitoring people’s health state and detecting possibly dangerous situations in the Cloud and on the Edge. We investigate the storage space consumption in three scenarios, including (1) transmission of all data regardless of the health state and any danger, (2) data transmission after the change in person’s activity, and (3) data transmission on the detection of a health-threatening situation. Results of our experiments show that the last two scenarios can bring significant savings in the consumed storage space.

Mateusz Gołosz, Dariusz Mrozek
Security of Low Level IoT Protocols

Application of formal methods in security is demonstrated. Formalism for description of security properties of low level IoT protocols is proposed. It is based on security property called infinite step opacity. We prove some of its basic properties as well as we show its relation to other security notions. Finally, complexity issues of verification and security enforcement are discussed. As a working formalism timed process algebra is used.

Damas P. Gruska, M. Carmen Ruiz
FogFlow - Computation Organization for Heterogeneous Fog Computing Environments

With the arising amounts of devices and data that Internet of Things systems are processing nowadays, solutions for computational applications are in high demand. Many concepts targeting more efficient data processing are arising and among them edge and fog computing are the ones gaining significant interest since they reduce cloud load. In consequence Internet of Things systems are becoming more and more diverse in terms of architecture. In this paper we present FogFlow - model and execution environment allowing for organization of data-flow applications to be run on the heterogeneous environments. We propose unified interface for data-flow creation, graph model and we evaluate our concept in the use case of production line model that mimic real-world factory scenario.

Joanna Sendorek, Tomasz Szydlo, Mateusz Windak, Robert Brzoza-Woch
Research and Implementation of an Aquaculture Monitoring System Based on Flink, MongoDB and Kafka

With the rapid advancement of intelligent agriculture technology, the application of IoT and sensors in aquaculture domain is becoming more and more widespread. Traditional relational database management systems cannot store the large scale and diversified sensor data flexibly and expansively. Moreover, the sensor stream data usually requires a processing operation with high throughput and low latency. Based on Flink, MongoDB and Kafka, we propose and implement an aquaculture monitoring system. Among them, Flink provides a high throughput, low latency processing platform for sensor data. Kafka, as a distributed publish-subscribe message system, acquires different sensor data and builds reliable pipelines for transmitting real-time data between application programs. MongoDB is suitable for storing diversified sensor data. As a highly reliable and high-performance column database, HBase is often used in sensor data storage schemes. Therefore, using real aquaculture dataset, the execution efficiency of some common operations between HBase and our solution are tested and compared. The experimental results show that the efficiency of our solution is much higher than that of HBase, which provided a feasible solution for the sensor data storage and processing of aquaculture.

Yuansheng Lou, Lin Chen, Feng Ye, Yong Chen, Zihao Liu
Enhanced Hydroponic Agriculture Environmental Monitoring: An Internet of Things Approach

Hydroponic cultivation is an agricultural method where nutrients are efficiently provided as mineral nutrient solutions. This modern agriculture sector provides numerous advantages such as efficient location and space requirements, adequate climate control, water-saving and controlled nutrients usage. The Internet of things (IoT) concept assumes that various “things,” which include not only communication devices but also every other physical object on the planet, are going to be connected and will be controlled across the Internet. Mobile computing technologies in general and mobile applications, in particular, can be assumed as significant methodologies to handle data analytics and data visualisation. Using IoT and mobile computing is possible to develop automatic systems for enhanced hydroponic agriculture environmental monitoring. Therefore, this paper presents an IoT monitoring system for hydroponics named iHydroIoT. The solution is composed of a prototype for data collection and an iOS mobile application for data consulting and real-time analytics. The collected data is stored using Plotly, a data analytics and visualisation library. The proposed system provides not only temporal changes monitoring of light, temperature, humidity, CO2, pH and electroconductivity but also water level for enhanced hydroponic supervision solutions. The iHydroIoT offers real-time notifications to alert the hydroponic farm manager when the conditions are not favourable. Therefore, the system is a valuable tool for hydroponics condition analytics and to support decision making on possible intervention to increase productivity. The results reveal that the system can generate a viable hydroponics appraisal, allowing to anticipate technical interventions that improve agricultural productivity.

Gonçalo Marques, Diogo Aleixo, Rui Pitarma
Noise Mapping Through Mobile Crowdsourcing for Enhanced Living Environments

Environmental noise pollution has a significant impact on health. The noise effects on health are related to annoyance, sleep and cognitive performance for both adults and children are reported in the literature. The smart city concept can be assumed as a strategy to mitigate the problems generated by the urban population growth and rapid urbanisation. Noise mapping is an important step for noise pollution reduction. Although, noise maps are particularly time-consuming and costly to create as they are produced with standard methodologies and are based on specific sources such as road traffic, railway traffic, aircraft and industrial. Therefore, the actual noise maps are significantly imperfect because the noise emission models and sources are extremely limited. Smartphones have incredible processing capabilities as well as several powerful sensors such as microphone and GPS. Using the resources present in a smartphone as long with participatory sensing, a crowdsourcing noise mobile application can be used to provide environmental noise supervision for enhanced living environments. Crowdsourcing techniques applied to environmental noise monitoring allow creating reliable noise maps at low-cost. This paper presents a mobile crowdsourcing solution for environmental noise monitoring named iNoiseMapping. The environmental noise data is collected through participatory sensing and stored for further analysis. The results obtained can ensure that mobile crowdsourcing offers several enhanced features for environmental noise supervision and analytics. Consequently, this mobile application is a significant decision-making tool to plan interventions for noise pollution reduction.

Gonçalo Marques, Rui Pitarma
Environmental Quality Supervision for Enhanced Living Environments and Laboratory Activity Support Using IBM Watson Internet of Things Platform

Temperature and humidity are extremely important not only for occupational health and well-being but also for supervising laboratory activities. Laboratories are places characterised by several contamination sources which lead to significant poor indoor quality conditions. Laboratory activities require real-time monitoring supervision. Around 40% of the energy consumed worldwide and around 30% of the carbon dioxide liberated are related to indoor living environments. Further, a substantial amount of this energy is used to provide a satisfactory human perception of the thermal conditions. The IBM Watson IoT Platform provides data integration, security methods, data collection, visualisation, analytics, device management functionalities and allows data to be sent securely to the cloud using MQTT messaging protocol. This document presents a temperature and humidity real-time supervision system based on Internet of Things architecture named iTemp+. The system incorporates physical prototype for data acquisition and uses IBM Watson IoT for data storing and consulting. The IBM Watson IoT Platform provides data integration, security methods, data collection, visualisation, analytics, device management, artificial intelligence and blockchain functionalities which are not implemented in the concurrent IoT platforms. The results obtained reveal that IBM Watson IoT platform offers several enhanced features for device management and analytics and can be used as a powerful approach to provide IEQ supervision.

Gonçalo Marques, Rui Pitarma
Combining Data from Fitness Trackers with Meteorological Sensor Measurements for Enhanced Monitoring of Sports Performance

Systematic analysis of training data has become an inherent element of building a sports condition and preparation for sports competitions. Today’s progress in the development of various fitness trackers, smart wearables and IoT devices allows monitoring the level of development of athletic abilities not only for professional athletes but also for sports enthusiasts and sports amateurs. Meteorological conditions prevailing on a given day can significantly affect the effectiveness of training and abilities of a person during sports competitions. However, in order to properly analyze particular sports achievements and the effectiveness of sports efforts, the training data from body sensors should be appropriately combined with weather sensor data. In this paper, we show that, due to approximate nature, this process can be implemented by using the fuzzy join technique.

Anna Wachowicz, Bożena Małysiak-Mrozek, Dariusz Mrozek
Collaborative Learning Agents (CLA) for Swarm Intelligence and Applications to Health Monitoring of System of Systems

The system of systems is the perspective of multiple systems as part of a larger, more complex system. A system of systems usually includes highly interacting, interrelated and interdependent sub-systems that form a complex and unified system. Maintaining the health of such a system of systems requires constant collection and analysis of the big data from sensors installed in the sub-systems. The statistical significance for machine learning (ML) and artificial intelligence (AI) applications improves purely due to the increasing big data size. This positive impact can be a great advantage. However, other challenges arise for processing and learning from big data. Traditional data sciences, ML and AI used in small- or moderate-sized analysis typically require tight coupling of the computations, where such an algorithm often executes in a single machine or job and reads all the data at once. Making a generic case of parallel and distributed computing for a ML/AI algorithm using big data proves a difficult task. In this paper, we described a novel infrastructure, namely collaborative learning agents (CLA) and the application in an operational environment, namely swarm intelligence, where a swarm agent is implemented using a CLA. This infrastructure enables a collection of swarms working together for fusing heterogeneous big data sources in a parallel and distributed fashion as if they are as in a single agent. The infrastructure is especially feasible for analyzing data from internet of things (IoT) or broadly defined system of systems to maintain its well-being or health. As a use case, we described a data set from the Hack the Machine event, where data sciences and ML/AI work together to better understand Navy’s engines, ships and system of systems. The sensors installed in a distributed environment collect heterogeneous big data. We show how CLA and swarm intelligence used to analyze data from system of systems and quickly examine the health and maintenance issues across multiple sensors. The methodology can be applied to a wide range of system of systems that leverage collaborative, distributed learning agents and AI for automation.

Ying Zhao, Charles C. Zhou
Computationally Efficient Classification of Audio Events Using Binary Masked Cochleagrams

In this work, a computationally efficient technique for acoustic events classification is presented. The approach is based on cochleagram structure by identification of dominant time-frequency units. The input signal is splitting into frames, then cochleagram is calculated and masked by the set of masks to determine the most probable audio class. The mask for the given class is calculated using a training set of time aligned events by selecting dominant energy parts in the time–frequency plane. The process of binary mask estimation exploits the thresholding of consecutive cochleagrams, computing the sum, and then final thresholding is applied to the result giving the representation for a particular class. All available masks for all classes are checked in sequence to determine the highest probability of the considered audio event. The proposed technique was verified on a small database of acoustic events specific to the surveillance systems. The results show that such an approach can be used in systems with limited computational resources giving satisfying classification results.

Tomasz Maka
Backmatter
Metadaten
Titel
Computational Science – ICCS 2019
herausgegeben von
Dr. João M. F. Rodrigues
Dr. Pedro J. S. Cardoso
Dr. Jânio Monteiro
Prof. Roberto Lam
Dr. Valeria V. Krzhizhanovskaya
Michael H. Lees
Prof. Jack J. Dongarra
Prof. Dr. Peter M.A. Sloot
Copyright-Jahr
2019
Electronic ISBN
978-3-030-22744-9
Print ISBN
978-3-030-22743-2
DOI
https://doi.org/10.1007/978-3-030-22744-9