main-content

## Über dieses Buch

The three-volume set LNCS 10860, 10861 + 10862 constitutes the proceedings of the 18th International Conference on Computational Science, ICCS 2018, held in Wuxi, China, in June 2018.

The total of 155 full and 66 short papers presented in this book set was carefully reviewed and selected from 404 submissions. The papers were organized in topical sections named:

Part I: ICCS Main Track

Part II: Track of Advances in High-Performance Computational Earth Sciences: Applications and Frameworks; Track of Agent-Based Simulations, Adaptive Algorithms and Solvers; Track of Applications of Matrix Methods in Artificial Intelligence and Machine Learning; Track of Architecture, Languages, Compilation and Hardware Support for Emerging ManYcore Systems; Track of Biomedical and Bioinformatics Challenges for Computer Science; Track of Computational Finance and Business Intelligence; Track of Computational Optimization, Modelling and Simulation; Track of Data, Modeling, and Computation in IoT and Smart Systems; Track of Data-Driven Computational Sciences; Track of Mathematical-Methods-and-Algorithms for Extreme Scale; Track of Multiscale Modelling and Simulation

Part III: Track of Simulations of Flow and Transport: Modeling, Algorithms and Computation; Track of Solving Problems with Uncertainties; Track of Teaching Computational Science; Poster Papers

## Inhaltsverzeichnis

### Erratum to: The t-Modified Self-Shrinking Generator

Sara D. Cardell, Amparo Fúster-Sabater

### Optimizing the Efficiency, Vulnerability and Robustness of Road-Based Para-Transit Networks Using Genetic Algorithm

In the developing world, majority of people usually take para-transit services for their everyday commutes. However, their informal and demand-driven operation, like making arbitrary stops to pick up and drop off passengers, has been inefficient and poses challenges to efforts in integrating such services to more organized train and bus networks. In this study, we devised a methodology to design and optimize a road-based para-transit network using a genetic algorithm to optimize efficiency, robustness, and invulnerability. We first generated stops following certain geospatial distributions and connected them to build networks of routes. From them, we selected an initial population to be optimized and applied the genetic algorithm. Overall, our modified genetic algorithm with 20 evolutions optimized the 20% worst performing networks by 84% on average. For one network, we were able to significantly increase its fitness score by 223%. The highest fitness score the algorithm was able to produce through optimization was 0.532 from a score of 0.303.

Briane Paul V. Samson, Gio Anton T. Velez, Joseph Ryan Nobleza, David Sanchez, Jan Tristan Milan

### On the Configuration of Robust Static Parallel Portfolios for Efficient Plan Generation

Automated Planning has achieved a significant step forward in the last decade, and many advanced planning engines have been introduced. Nowadays, increases in computational power are mostly achieved through hardware parallelisation. In view of the increasing availability of multicore machines and of the intrinsic complexity of designing parallel algorithms, a natural exploitation of parallelism is to combine existing sequential planning engines into parallel portfolios.In this work, we introduce three techniques for an automatic configuration of static parallel portfolios of planning engines. The aim of generated portfolios is to provide a good tradeoff performance between coverage and runtime, on previously unseen problems. Our empirical results demonstrate that our techniques for configuring parallel portfolios combine strengths of planning engines, and fully exploit multicore machines.

Mauro Vallati, Lukáš Chrpa, Diane Kitchin

### SDF-Net: Real-Time Rigid Object Tracking Using a Deep Signed Distance Network

In this paper, a deep neural network is used to model the signed distance function (SDF) of a rigid object for real-time tracking using a single depth camera. By leveraging the generalization capability of the neural network, we could better represent the model of the object implicitly. With the training stage done off-line, our proposed methods are capable of real-time performance and running as fast as 1.29 ms per frame on one CPU core, which is suitable for applications with limited hardware capabilities. Furthermore, the memory footprint of our trained SDF-Net for an object is less than 10 kilobytes. A quantitative comparison using public dataset is being carried out and our approach is comparable with the state-of-the-arts. The methods are also tested on actual depth records to evaluate their performance in real-life scenarios.

Prayook Jatesiktat, Ming Jeat Foo, Guan Ming Lim, Wei Tech Ang

### Insider Threat Detection with Deep Neural Network

Insider threat detection has attracted a considerable attention from the researchers and industries. Existing work mainly focused on applying machine-learning techniques to detecting insider threat. However, this work requires “feature engineering” which is difficult and time-consuming. As we know, the deep learning technique can automatically learn powerful features. In this paper, we present a novel insider threat detection method with Deep Neural Network (DNN) based on user behavior. Specifically, we use the LSTM-CNN framework to find user’s anomalous behavior. First, similar to natural language modeling, we use the Long Short Term Memory (LSTM) to learn the language of user behavior through user actions and extract abstracted temporal features. Second, the extracted features are converted to the fixed-size feature matrices and the Convolutional Neural Network (CNN) use these fixed-size feature matrices to detect insider threat. We conduct experiments on a public dataset of insider threats. Experimental results show that our method can successfully detect insider threat and we obtained AUC = 0.9449 in best case.

Fangfang Yuan, Yanan Cao, Yanmin Shang, Yanbing Liu, Jianlong Tan, Binxing Fang

### Pheromone Model Based Visualization of Malware Distribution Networks

We present a novel computational pheromone model for describing dynamic network behaviors in terms of transition, persistency, and hosting. The model consists of a three-dimensional force-directed graph with bi-directional pheromone deposit and decay paths. A data compression algorithm is developed to optimize computational performance. We applied the model for visual analysis of a Malware Distribution Network (MDN), a connected set of maliciously compromised domains used to disseminate malicious software to victimize computers and users. The MDN graphs are extracted from datasets from Google Safe Browsing (GSB) reports with malware attributions from VirusTotal. Our research shows that this novel approach reveals patterns of topological changes of the network over time, including the existence of persistent sub-networks and individual top-level domains critical to the successful operation of MDNs, as well as the dynamics of the topological changes on a daily basis. From the visualization, we observed notable clustering effects, and also noticed life span patterns for high-edge-count malware distribution clusters.

Yang Cai, Jose Andre Morales, Sihan Wang, Pedro Pimentel, William Casey, Aaron Volkmann

### Detecting Wildlife in Unmanned Aerial Systems Imagery Using Convolutional Neural Networks Trained with an Automated Feedback Loop

Using automated processes to detect wildlife in uncontrolled outdoor imagery in the field of wildlife ecology is a challenging task. This is especially true in imagery provided by an Unmanned Aerial System (UAS), where the relative size of wildlife is small and visually similar to its background. This work presents an automated feedback loop which can be used to train convolutional neural networks with extremely unbalanced class sizes, which alleviates some of these challenges. This work utilizes UAS imagery collected by the Wildlife@Home project, which has employed citizen scientists and trained experts to go through collected UAS imagery and classify it. Classified data is used as inputs to convolutional neural networks (CNNs) which seek to automatically mark which areas of the imagery contain wildlife. The output of the CNN is then passed to a blob counter which returns a population estimate for the image. The feedback loop was developed to help train the CNNs to better differentiate between the wildlife and the visually similar background and deal with the disparate amount of wildlife training images versus background training images. Utilizing the feedback loop dramatically reduced population count error rates from previously published work, from +150% to $$-3.93$$% on citizen scientist data and +88% to +5.24% on expert data.

Connor Bowley, Marshall Mattingly, Andrew Barnas, Susan Ellis-Felege, Travis Desell

### Incentive Mechanism for Cooperative Intrusion Detection: An Evolutionary Game Approach

In Mobile Ad-Hoc Networks, cooperative intrusion detection is efficient and scalable to massively parallel attacks. However, due to concerns of privacy leak-age and resource costs, if without enough incentives, most mobile nodes are often selfish and disinterested in helping others to detect an intrusion event, thus an ef-ficient incentive mechanism is required. In this paper, we formulate the incentive mechanism for cooperative intrusion detection as an evolutionary game and achieve an optimal solution to help nodes decide whether to participate in detec-tion or not. Our proposed mechanism can deal with the problems that cooperative nodes do not own complete knowledge about other nodes. We develop a game algorithm to maximize nodes utility. Simulations demonstrate that our strategy can efficiently incentivize potential nodes to cooperate.

Yunchuan Guo, Han Zhang, Lingcui Zhang, Liang Fang, Fenghua Li

### Hybrid Genetic Algorithm for an On-Demand First Mile Transit System Using Electric Vehicles

First/Last mile gaps are a significant hurdle in large scale adoption of public transit systems. Recently, demand responsive transit systems have emerged as a preferable solution to first/last mile problem. However, existing work requires significant computation time or advance bookings. Hence, we propose a public transit system linking the neighborhoods to a rapid transit node using a fleet of demand responsive electric vehicles, which reacts to passenger demand in real-time. Initially, the system is modeled using an optimal mathematical formulation. Owing to the complexity of the model, we then propose a hybrid genetic algorithm that computes results in real-time with an average accuracy of 98%. Further, results show that the proposed system saves travel time up to 19% compared to the existing transit services.

Thilina Perera, Alok Prakash, Chathura Nagoda Gamage, Thambipillai Srikanthan

### Comprehensive Learning Gene Expression Programming for Automatic Implicit Equation Discovery

Implicit equation is loose in form, which makes it more powerful than explicit equation for data regression. The mainstream method for automatic implicit equation discovery is based on calculating derivatives. However, this derivative-based mechanism requires high time consumption and it is difficult to solve problems with sparse data. To solve these deficiencies, this paper proposes a new mechanism named Comprehensive Learning Fitness Evaluation Mechanism (CL-FEM). The mechanism learns knowledge from disturbed information collected from several previously generated stochastic datasets, to check the validity of the equation model. Only the valid equations can be candidates of selection, which is a process to pick out the equation with the smallest output. We integrate the proposed mechanism with the simplified Self-Learning Gene Expression Programming (SL-GEP) and propose the Comprehensive Learning Gene Expression Programming (CL-GEP). The experiment results have demonstrated that CL-GEP can offer very promising performance.

Yongliang Chen, Jinghui Zhong, Mingkui Tan

### Multi-population Genetic Algorithm for Cardinality Constrained Portfolio Selection Problems

Portfolio Selection (PS) is recognized as one of the most important and challenging problems in financial engineering. The aim of PS is to distribute a given amount of investment fund across a set of assets in such a way that the return is maximised and the risk is minimised. To solve PS more effectively and more efficiently, this paper introduces a Multi-population Genetic Algorithm (MPGA) methodology. The proposed MPGA decomposes a large population into multiple populations to explore and exploit the search space simultaneously. These populations evolve independently during the evolutionary learning process. Yet different populations periodically exchange their individuals so promising genetic materials could be shared between different populations. The proposed MPGA method was evaluated on the standard PS benchmark instances. The experimental results show that MPGA can find better investment strategies in comparison with state-of-the-art portfolio selection methods. In addition, the search process of MPGA is more efficient than these existing methods requiring significantly less amount of computation.

Nasser R. Sabar, Ayad Turky, Mark Leenders, Andy Song

### Recognition and Classification of Rotorcraft by Micro-Doppler Signatures Using Deep Learning

Detection and classification of rotorcraft targets are of great significance not only in civil fields but also in defense. However, up to now, it is still difficult for the traditional radar signal processing methods to detect and distinguish rotorcraft targets from various types of moving objects. Moreover, it is even more challenging to classify different types of helicopters. As the development of high-precision radar, classification of moving targets by micro-Doppler features has become a promising research topic in the modern signal processing field. In this paper, we propose to use the deep convolutional neural networks (DCNNs) in rotorcraft detection and helicopter classification based on Doppler radar signals. We apply DCNN directly to raw micro-Doppler spectrograms for rotorcraft detection and classification. The proposed DCNNs can learn the features automatically from the micro-Doppler signals without introducing any domain background knowledge. Simulated data are used in the experiments. The experimental results show that the proposed DCNNs achieve superior accuracy in rotorcraft detection and superior accuracy in helicopter classification, outperforming the traditional radar signal processing methods.

Ying Liu, Jinyi Liu

### Data Allocation Based on Evolutionary Data Popularity Clustering

This study is motivated by the high-energy physics experiment ATLAS, one of the four major experiments at the Large Hadron Collider at CERN. ATLAS comprises 130 data centers worldwide with datasets in the Petabyte range. In the processing of data across the grid, transfer delays and subsequent performance loss emerged as an issue. The two major costs are the waiting time until input data is ready and the job computation time. In the ATLAS workflows, the input to computational jobs is based on grouped datasets. The waiting time stems mainly from WAN transfers between data centers when job properties require execution at one data center but the dataset is distributed among multiple data centers. The proposed novel data allocation algorithm redistributes the constituent files of datasets such that the job efficiency is increased in terms of a cost metric. An evolutionary algorithm is proposed that addresses the data allocation problem in a network based on data popularity and clustering. The number of expected job’s file transfers is used as the target metric and it is shown that job waiting times can be decreased by faster input data readiness.

Ralf Vamosi, Mario Lassnig, Erich Schikuta

### Hyper-heuristic Online Learning for Self-assembling Swarm Robots

A robot swarm is a solution for difficult and large scale tasks. However, controlling and coordinating a swarm of robots is challenging, because of the complexity and uncertainty of the environment where manual programming of robot behaviours is often impractical. In this study we propose a hyper-heuristic methodology for swarm robots. It allows robots to create suitable actions based on a set of low-level heuristics, where each heuristic is a behavioural element. With online learning, the robot behaviours can be improved during execution by autonomous heuristic adjustment. The proposed hyper-heuristic framework is applied to surface cleaning tasks on buildings where multiple separate surfaces exist and complete surface information is difficult to obtain. Under this scenario, the robot swarm not only needs to clean the surfaces efficiently by distributing the robots, but also to move across surfaces by self-assembling into a bridge structure. Experimental results showed the effectiveness of the hyper-heuristic framework; the same group of robots was able to autonomously deal with multiple surfaces of different layouts. Their behaviours can improve over time because of the online learning mechanism.

Shuang Yu, Aldeida Aleti, Jan Carlo Barca, Andy Song

### An Innovative Heuristic for Planning-Based Urban Traffic Control

The global growth in urbanisation increases the demand for services including road transport infrastructure, presenting challenges in terms of mobility. In this scenario, optimising the exploitation of urban road network is a pivotal challenge, particularly in the case of unexpected situations. In order to tackle this challenge, approaches based on mixed discrete-continuous planning have been recently proposed and although their feasibility has been demonstrated, there is a lack of informative heuristics for this class of applications. Therefore, existing approaches tend to provide low-quality solutions, leading to a limited impact of generated plans on the actual urban infrastructure.In this work, we introduce the Time-Based heuristic: a highly informative heuristic for PDDL+ planning-based urban traffic control. The heuristic, which has an admissible and an inadmissible variant, has been evaluated considering scenarios that use real-world data.

Santiago Franco, Alan Lindsay, Mauro Vallati, Thomas Lee McCluskey

### Automatic Web News Extraction Based on DS Theory Considering Content Topics

In addition to the news content, most news web pages also contain various noises, such as advertisements, recommendations, and navigation panels. These noises may hamper the studies and applications which require pre-processing to extract the news content accurately. Existing methods of news content extraction mostly rely on non-content features, such as tag path, text layout, and DOM structure. However, without considering topics of the news content, these methods are difficult to recognize noises whose external characteristics are similar to those of the news content. In this paper, we propose a method that combines non-content features and a topic feature based on Dempster-Shafer (DS) theory to increase the recognition accuracy. We use maximal compatibility blocks to generate topics from text nodes and then obtain feature values of topics. Each feature is converted into evidence for the DS theory which can be utilized in the uncertain information fusion. Experimental results on English and Chinese web pages show that combining the topic feature by DS theory can improve the extraction performance obviously.

Kaihang Zhang, Chuang Zhang, Xiaojun Chen, Jianlong Tan

### DomainObserver: A Lightweight Solution for Detecting Malicious Domains Based on Dynamic Time Warping

People use the Internet to shop, access information and enjoy entertainment by browsing web sites. At the same time, cyber-criminals operate malicious domains to spread illegal content, which poses a great risk to the security of cyberspace. Therefore, it is of great importance to detect malicious domains in the field of cyberspace security. Typically, there are broad research focusing on detecting malicious domains either by blacklist or learning the features. However, the former is infeasible due to its unpredictability of unknown malicious domains, and the later requires complex feature engineering. Different from most of previous methods, in this paper, we propose a novel lightweight solution named DomainObserver to detect malicious domains. Our technique of DomainObserver is based on dynamic time warping that is used to better align the time series. To the best of our knowledge, it is a new trial to apply passive traffic measurements and time series data mining to malicious domain detection. Extensive experiments on real datasets are performed to demonstrate the effectiveness of our proposed method.

Guolin Tan, Peng Zhang, Qingyun Liu, Xinran Liu, Chunge Zhu

### You Have More Abbreviations Than You Know: A Study of AbbrevSquatting Abuse

Domain squatting is a speculative behavior involving the registration of domain names that are trademarks belonging to popular companies, important organizations or other individuals, before the latters have a chance to register. This paper presents a specific and unconcerned type of domain squatting called “AbbrevSquatting”, the phenomena that mainly happens on institutional websites. As institutional domain names are usually named with abbreviations (i.e., short forms) of the full names or official titles of institutes, attackers can mine abbreviation patterns from existed pairs of abbreviations and full names, and register forged domain names with unofficial but meaningful abbreviations for a given institute. To measure the abuse of AbbrevSquatting, we first mine the common abbreviation patterns used in institutional domain names, and generate potential AbbrevSquatting domain names with a data set of authoritative domains. Then, we check the maliciousness of generated domains with a public API and seven different blacklists, and group the domains into several categories with crawled data. Through a series of manual and automated experiments, we discover that attackers have already been aware of the principles of AbbrevSquatting and are monetizing them in various unethical and illegal ways. Our results suggest that AbbrevSquatting is a real problem that requires more attentions from security communities and institutions’ registrars.

Pin Lv, Jing Ya, Tingwen Liu, Jinqiao Shi, Binxing Fang, Zhaojun Gu

### Large Scale Retrieval of Social Network Pages by Interests of Their Followers

Social networks provide an opportunity to form communities of people that share their interests on a regular basis (circles of fans of different music, books, kinds of sports, etc.). Every community manifests these interests creating lots of linguistic data to attract new followers to certain pages and support existing clusters of users. In the present article, we suggest a model of retrieving such pages that attract users with similar interests, from a large collection of pages. We test our model on three types of pages manually retrieved from the social network Vkontakte and classified as interesting for a. football fans, b. vegetarians, c. historical reenactors. We use such machine learning classifiers as Naive Bayes, SVM, Logistic Regression, Decision Trees to compare their performance with the performance of our system. It appears that the mentioned classifiers can hardly retrieve (i.e. single out) pages with a particular interest that form a small collection of 30 samples from a collection as large as 4,090 samples. In particular, our system exceeds their best result (F1-score = 0.65) and achieves F1-score of 0.72.

Elena Mikhalkova, Yuri Karyakin, Igor Glukhikh

### Parallel Data-Driven Modeling of Information Spread in Social Networks

Models of information spread in social networks are widely used to explore the drivers of content contagion and to predict the effect of new information messages. Most of the existing models (aggregated as SIR-like or network-based as independent cascades) use the assumption of homogeneity of an audience. However, to make a model plausible for a description of real-world processes and to measure the accumulated impact of information on individuals, one needs to personalize the characteristics of users as well as sources of information. In this paper, we propose an approach to data-driven simulation of information spread in social networks which combines a set of different models in a unified framework. It includes a model of a user (including sub-models of reaction and daily activity), a model of message generation by information source and a model of message transfer within a user network. The parameters of models (e.g. for different types of agents) are identified by data from the largest Russian social network vk.com. For this study, we collected the network of users associated with charity community (~33.7 million nodes). To tackle with huge size of networks, we implemented parallel version of modeling framework and tested it on the Lomonosov supercomputer. We identify key parameters of models that may be tuned to reproduce observable behavior and show that our approach allows to simulate aggregated dynamics of reactions to a series of posts as a combination of individual responses.

Oksana Severiukhina, Klavdiya Bochenina, Sergey Kesarev, Alexander Boukhanovsky

### Topology of Thematic Communities in Online Social Networks: A Comparative Study

The network structure of communities in social media significantly affects diffusion processes which implement positive or negative information influence on social media users. Some of the thematic communities in online social networks may provide illegal services or information in them may cause undesired psychological effects; moreover, the topology of such communities and behavior of their members are influenced by a thematic. Nevertheless, recent research does not contain enough detail about the particularities of thematic communities formation, or about the topological properties of underlying friendship networks. To address this gap, in this study we analyze structure of communities of different types, namely, carders, commercial sex workers, substance sellers and users, people with radical political views, and compare them to the ‘normal’ communities (without a single narrow focus). We discovered that in contrast to ordinary communities which have positive assortativity (as expected for social networks), specific thematical communities are significantly disassortative. Types of anomalous communities also differ not only in content but in structure. The most specific are the communities of radicalized individuals: it was shown that they have the highest connectivity and the larger part of nodes within a friendship graph.

Valentina Guleva, Danila Vaganov, Daniil Voloshin, Klavdia Bochenina

### Topological Street-Network Characterization Through Feature-Vector and Cluster Analysis

Complex networks provide a means to describe cities through their street mesh, expressing characteristics that refer to the structure and organization of an urban zone. Although other studies have used complex networks to model street meshes, we observed a lack of methods to characterize the relationship between cities by using their topological features. Accordingly, this paper aims to describe interactions between cities by using vectors of topological features extracted from their street meshes represented as complex networks. The methodology of this study is based on the use of digital maps. Over the computational representation of such maps, we extract global complex-network features that embody the characteristics of the cities. These vectors allow for the use of multidimensional projection and clustering techniques, enabling a similarity-based comparison of the street meshes. We experiment with 645 cities from the Brazilian state of Sao Paulo. Our results show how the joint of global features describes urban indicators that are deep-rooted in the network’s topology and how they reveal characteristics and similarities among sets of cities that are separated from each other.

Gabriel Spadon, Gabriel Gimenes, Jose F. Rodrigues

### A Distance-Based Tool-Set to Track Inconsistent Urban Structures Through Complex-Networks

Complex networks can be used for modeling street meshes and urban agglomerates. With such a model, many aspects of a city can be investigated to promote a better quality of life to its citizens. Along these lines, this paper proposes a set of distance-based pattern-discovery algorithmic instruments to improve urban structures modeled as complex networks, detecting nodes that lack access from/to points of interest in a given city. Furthermore, we introduce a greedy algorithm that is able to recommend improvements to the structure of a city by suggesting where points of interest are to be placed. We contribute to a thorough process to deal with complex networks, including mathematical modeling and algorithmic innovation. The set of our contributions introduces a systematic manner to treat a recurrent problem of broad interest in cities.

### A Conceptual Framework for Social Movements Analytics for National Security

Social media tools have changed our world due to the way they convey information between individuals; this has led to many social movements either starting on social media or being organised and managed through this medium. At times however, certain human-induced events can trigger Human Security Threats such as Personal Security, Health Security, Economic Security or Political Security. The aim of this paper is to propose a holistic Data Analysis Framework for examining Social Movements and detecting pernicious threats to National Security interests. As a result of this, the proposed framework focuses on three main stages of an event (Detonating Event, Warning Period and Crisis Interpretation) to provide timely additional insights, enabling policy makers, first responders, and authorities to determine the best course of action. The paper also outlines the possible computational techniques utilised to achieve in depth analysis at each stage. The robustness and effectiveness of the framework are demonstrated by dissecting Warning Period scenarios, from real-world events, where the increase of Human Security aspects were key to identifying likely threats to National Security.

Pedro Cárdenas, Georgios Theodoropoulos, Boguslaw Obara, Ibad Kureshi

### Retweet Prediction Using Social-Aware Probabilistic Matrix Factorization

Retweet prediction is a fundamental and crucial task in social networking websites as it may influence the process of information diffusion. Existing prediction approaches simply ignore social contextual information or don’t take full advantage of these potential factors, damaging the performance of prediction. Besides, the sparsity of retweet data also severely disturb the performance of these models. In this paper, we propose a novel retweet prediction model based on probabilistic matrix factorization method by integrating the observed retweet data, social influence and message semantic to improve the accuracy of prediction. Finally, we incorporate these social contextual regularization terms into the objective function. Comprehensive experiments on the real-world dataset clearly validate both the effectiveness and efficiency of our model compared with several state-of the-art baselines.

Bo Jiang, Zhigang Lu, Ning Li, Jianjun Wu, Zhengwei Jiang

### Cascading Failure Based on Load Redistribution of a Smart Grid with Different Coupling Modes

As one of the most important properties of the power grid, the voltage load plays an important role in the cascading failure of the smart grid and load redistribution can accelerate the speed of the failure by triggering more nodes to overload and fail. The subnet structure and different coupling modes also affect the robustness of the smart grid. However, the research on the effect of load, subnet structure and coupling mode on the cascading failure of the smart grid is still rare. In this paper, the smart grid with two-way coupling link consists of a power grid with small world topology and a communication network with scale-free topology. An improved load-capacity model is applied to overload-induced failure in the power grid and node importance (NI) is used as an evaluation index to assess the effect of nodes on the power grid and communication network. We propose three kinds of coupling modes based on NI of nodes between the cyber and physical subnets, i.e., Random Coupling in Subnets (RCIS), Assortative Coupling in Subnets (ACIS) and Disassortative Coupling in Subnets (DCIS). In order to improve the robustness of the smart grid, a cascading failure model based on load redistribution is proposed to analyze the influence of different coupling modes on the cascading failure of the smart grid under both a targeted attack and random attack. Some findings are summarized as: (I) The robustness of the smart grid is improved by increasing the tolerance $$\alpha$$. (II) ACIS applied to the bottom-up coupling link is more beneficial in enhancing the robustness of the smart grid than DCIS and RCIS, regardless of a targeted attack or random attack.

WenJie Kang, PeiDong Zhu, Gang Hu

### Measuring Social Responsiveness for Improved Handling of Extreme Situations

Volunteering and community reaction is known to be an essential part of response to critical events. Rapid evolution and emergence of the new means of communication allowed even further expansion of these practices via the medium of the social networks. A new category of volunteers emerged – those that are not in the proximity to the area of emergency but willing to help the affected. Widely known as digital volunteers, they help aggregate, disseminate and distribute information to increase and maintain the awareness of stakeholders and resourceful individuals about the situation. There has been an upsurge of investigations of roles, timelines and aggregate characteristics of emergent communication. Compared to that, characteristics of crisis-related social media posts that predict wider social response to date have been studied modestly. In this research we are studying the process of reaction of potential digital volunteers to different extreme situations in a social media platform.

Nikolay Butakov, Timur Fatkulin, Daniil Voloshin

### A Computational Model-Based Framework to Plan Clinical Experiments – An Application to Vascular Adaptation Biology

Several computational models have been developed in order to improve the outcome of Vein Graft Bypasses in response to arterial occlusions and they all share a common property: their accuracy relies on a winning choice of the coefficients’ value related to biological functions that drive them. Our goal is to optimize the retrieval of these unknown coefficients on the base of experimental data and accordingly, as biological experiments are noisy in terms of statistical analysis and the models are typically stochastic and complex, this work wants first to elucidate which experimental measurements might be sufficient to retrieve the targeted coefficients and second how many specimens would constitute a good dataset to guarantee a sufficient level of accuracy. Since experiments are often costly and time consuming, the planning stage is critical to the success of the operation and, on the base of this consideration, the present work shows how, thanks to an ad hoc use of a computational model of vascular adaptation, it is possible to estimate in advance the entity and the quantity of resources needed in order to efficiently reproduce the experimental reality.

Stefano Casarin, Scott A. Berceli, Marc Garbey

### Accelerating Data Analysis in Simulation Neuroscience with Big Data Technologies

Important progress in computational sciences has been made possible recently thanks to the increasing computing power of high performance systems. Following this trend, larger scientific studies, like brain tissue simulations, will continue to grow in the future. In addition to the challenges of conducting these experiments, we foresee an explosion of the amount of data generated and the consequent unfeasibility of analyzing and understanding the results with the current techniques.This paper proposes Neurolytics, a new data analysis framework, together with a new data layout. The implementation of Neurolytics is mainly focused on simulation neuroscience, although we believe that our design can be applied to other science domains. The framework relies on big data technologies, like Apache Spark, to enable fast, reliable and distributed analyses of brain simulation data in this case. Our experimental evaluation on a cluster of 100 nodes shows that Neurolytics gets up to 374x speed-up compared to a thread-parallel Python implementation and is on par with a highly optimized Spark code. This demonstrates the suitability of our proposal to help scientists structure and understand the results of their experiments in a fast and efficient way.

Judit Planas, Fabien Delalondre, Felix Schürmann

### Spiral Wave Drift Induced by High-Frequency Forcing. Parallel Simulation in the Luo–Rudy Anisotropic Model of Cardiac Tissue

Non-linear waves occur in various physical, chemical and biological media. One of the most important examples is electrical excitation waves in the myocardium, which initiate contraction of the heart. Abnormal wave propagation in the heart, such as the formation of spiral waves, causes dangerous arrhythmias, and thus methods of elimination of such waves are of great interest. One of the most promising methods is so-called low-voltage cardioversion and defibrillation, which is believed to be achieved by inducing the drift and disappearance of spiral waves using external high-frequency electrical stimulation of the heart. In this paper, we perform a computational analysis of the interaction of spiral waves and trains of high-frequency plane waves in 2D models of cardiac tissue. We investigate the effectiveness and safety of the treatment. We also identify the dependency of drift velocity on the period of plane waves. The simulations were carried out using a parallel computing system with OpenMP technology.

Timofei Epanchintsev, Sergei Pravdin, Alexander Panfilov

### Understanding Malaria Induced Red Blood Cell Deformation Using Data-Driven Lattice Boltzmann Simulations

Malaria remains a deadly disease that affected millions of people in 2016. Among the five Plasmodium (P.) parasites which contribute to malaria diseases in humans. P. falciparum is a lethal one which is responsible for the majority of the world-wide-malaria-related deaths. Since the banana-shaped stage V gametocytes play a crucial role in disease transmission, understanding the deformation of single stage V gametocytes may offer deeper insights into the development of the disease and provide possible targets for new treatment methods. In this study we used lattice Boltzmann-based simulations to investigate the effects of the stretching forces acting on infected red blood cells inside a slit-flow cytometer. The parameters that represent the cellular deformability of healthy and malaria infected red blood cells are chosen such that they mimic the deformability of these cells in a slit-flow cytometer. The simulation results show good agreement with experimental data and allow for studying the transportation of malaria infected red blood cell in blood circulation.

Joey Sing Yee Tan, Gábor Závodszky, Peter M. A. Sloot

### Towards Model-Based Policy Elaboration on City Scale Using Game Theory: Application to Ambulance Dispatching

The paper presents early results on the development of a generalized approach for modeling and analysis of the interaction of multiple stakeholders in city environment while providing services to citizens under the regulation of city authorities. The approach considers the interaction between main stakeholders (organizations of various kind, citizens, and city authorities) including information and finances exchange, activities taken and services or goods provided. The developed approach is based on a combination of game-theoretic modeling and simulation of service providers interaction. Such combination enables consideration of confronting stakeholders as well as determined (e.g., scheduled) and stochastic variation in characteristics of system’s elements. The goal of this approach development is supporting of analysis and optimization of city-level regulation through legislative, financial, and informational interaction with organizations and environment of a city. An example of ambulance dispatching during providing emergent care for acute coronary syndrome (ACS) patients is considered. The example is analyzed in a simplified linear case and in practical application to dispatching ambulances providing service for ACS patients in Saint Petersburg.

Sergey V. Kovalchuk, Mariia A. Moskalenko, Alexey N. Yakovlev

### Elucidation of Mechanism for Reducing Porosity in Electric Arc Spraying Through CFD

We elucidated the mechanism for reducing the porosity (a means for achieving smaller globules) through Computational Fluid Dynamics while focusing on the flow of compressed air. A simulation study revealed that a spray gun nozzle comprising a flow splitting plate located upstream of the arc point in the nozzle produces compression waves whereby the flow field made in the nozzle differs substantially from that made in a conventional, plate-less nozzle. Observation using a high-speed camera showed that smaller particles of the molten metal (globules) were made due to the plate, which means that the compression waves generated upstream of the arc point affect the formation of globules at the arc point.

Ryoji Tamaki, Masashi Yamakawa

### nSharma: Numerical Simulation Heterogeneity Aware Runtime Manager for OpenFOAM

Roberto Ribeiro, Luís Paulo Santos, João Miguel Nóbrega

### High Performance Computational Hydrodynamic Simulations: UPC Parallel Architecture as a Future Alternative

Developments in high performance computing (HPC) has today transformed the manner of how computational hydrodynamic (CHD) simulations are performed. Till now, the message passing interface (MPI) remains the common parallelism architecture and has been adopted widely in CHD simulations. However, its bottleneck problem remains for some large-scale simulation cases due to delays during message passing whereby the total communication time may exceed the total simulation runtime with an increasing number of computer processers. In this study, we utilise an alternative parallelism architecture, known as PGAS-UPC, to develop our own UPC-CHD model with a 2-step explicit scheme from the Lax-Wendroff family of predictors-correctors. The model is evaluated on three incompressible, adiabatic viscous 2D flow cases having moderate flow velocities. Model validation is achieved by the reasonably good agreement between the predicted and respective analytical values. We then compare the computational performance between UPC-CHD and that of MPI in its base design in a SGI UV-2000 server till 100 processers maximum in this study. The former achieves a near 1:1 speedup which demonstrates its efficiency potential for very large-scale CHD simulations, while the later experiences slowdown at some point. Extension of UPC-CHD remains our main objective which can be achieved by the following additions: (a) inclusions of other numerical schemes to accommodate for other types of fluid simulations, and (b) coupling UPC-CHD with Amazon Web Service (AWS) to further exploit its parallelism efficiency as a viable alternative.

Alvin Wei Ze Chew, Tung Thanh Vu, Adrian Wing-Keung Law

### Classifying Aircraft Approach Type in the National General Aviation Flight Information Database

This work details the development of the “Go-Around Detection Tool”, a tool for classification of aircraft approach types for the National General Aviation Flight Information Database (NGAFID). The NGAFID currently houses over 700,000 h of per-second time series flight data recorder readings generated by over 400,000 flights from 8 fleet of aircraft and over 190 participating private individuals. The approach phase of flight is one of the most dangerous, and classifying types of approaches as stable or unstable, and if they were a go-around, touch-and-go, or stop-and-go is an especially important issue for flight safety monitoring programs. As General Aviation typically lacks the Weight on Wheels (WoW) technology and many others that exist within Commercial Aviation, there is difficulty in detecting landings and go-arounds as these need to be inferred somehow from the raw flight data. The developed application uses several airplane parameters reported by a flight data recorder and successfully detects go-arounds, touch-and-go landings, and stop-and-go landings as either stable or unstable with an accuracy of 98.14%. The application was tested using 100 student flights from the NGAFID, which generated 377 total approaches. Out of those approaches, 25.73% were classified as unstable. It was found that only 20.62% of all unstable approaches resulted with a go-around, which is far from the ideal 100% goal. Lastly, the application was parallelized and found to have a 9.75x speedup in doing so. The Go-Around Detection Tool can be used to provide post-flight statistics and user-friendly graphs on both an organizational- and individual-level for educational purposes. It is capable of assisting both new and experienced pilots for the safety of themselves, their organization, and General Aviation as a whole.

Kelton Karboviak, Sophine Clachar, Travis Desell, Mark Dusenbury, Wyatt Hedrick, James Higgins, John Walberg, Brandon Wild

### On Parametric Excitation for Exploration of Lava Tubes and Caves

Huge lava tubes with an approximate diameter of 65–225 m were found on the surfaces of Moon and Mars in the late 2000’s. It has been argued that the interiors of the caves are spacious, and are suitable to build artificial bases with habitable features such as constant temperature, as well as protection from both meteorites and harmful radiation. In line of the above, a number of studies which regard the soft landing mechanisms on the bottom of the lava tubes have been proposed. In this paper, aiming to extend the ability to explore arbitrary surface caves, we propose a mechanism which is able to reach the ceiling of lava tubes. The basic concept of our proposed mechanism consists of a rover connected to an oscillating sample-gatherer, wherein the rover is able to adjust the length of the rope parametrically to increase the deflection angle by considering periodic changes in the pivot, and thus to ease the collection of samples by hitting against the ceiling of the cave. Relevant simulations confirmed our theoretical observations which predict the increase of deflection angle by periodically winding and rewinding the rope according to pivotal variations. We believe the our proposed approach brings the building blocks to enable finer control of exploration mechanisms of lava tubes and narrow environments.

Victor Parque, Masato Kumai, Satoshi Miura, Tomoyuki Miyashita

### Global Simulation of Planetary Rings on Sunway TaihuLight

In this paper, we report the implementation and measured performance of a global simulation of planetary rings on Sunway TaihuLight. The basic algorithm is the Barnes-Hut tree, but we have made a number of changes to achieve good performance for extremely large simulations on machines with an extremely large number of cores. The measured performance is around 35% of the theoretical peak. The main limitation comes from the performance of the interaction calculation kernel itself, which is currently around 50%.

Masaki Iwasawa, Long Wang, Keigo Nitadori, Daisuke Namekata, Takayuki Muranushi, Miyuki Tsubouchi, Junichiro Makino, Zhao Liu, Haohuan Fu, Guangwen Yang

### Parallel Performance Analysis of Bacterial Biofilm Simulation Models

Modelling and simulation of bacterial biofilms is a computationally expensive process necessitating use of parallel computing. Fluid dynamics and advection-consumption models can be decoupled and solved to handle the fluid-solute-bacterial interactions. Data exchange between the two processes add up to the communication overheads. The heterogenous distribution of bacteria within the simulation domain further leads to non-uniform load distribution in the parallel system. We study the effect of load imbalance and communication overheads on the overall performance of simulation at different stages of biofilm growth. We develop a model to optimize the parallelization procedure for computing the growth dynamics of bacterial biofilms.

M. V. Sheraton, Peter M. A. Sloot

### Parallel Solutions to the k-difference Primer Problem

This paper presents parallel solutions to the k-difference primer problem, targeting multicore processors and GPUs. This problem consists of finding the shortest substrings of one sequence with at least k differences from another sequence. The sequences found in the solution are candidate regions to contain primers used by biologists to amplify a DNA sequence in laboratory. To the authors’ knowledge, these are the first parallel solutions proposed for the k-difference primer problem. We identified two forms, coarse- and fine-grained, of exploiting parallelism while solving the problem. Several optimizations were applied to the solutions, such as synchronization overhead reduction, tiling, and speculative prefetch, allowing the analysis of very long sequences in a reduced execution time. In an experimental performance evaluation using real DNA sequences, the best OpenMP (in a quad-core processor) and CUDA solutions produced speedups up to 5.6 and 72.8, respectively, when compared to the best sequential solution. Even when the sequences length and the number of differences k increase, the performance is not affected. The best sequential, OpenMP, and CUDA solutions achieved the throughput of 0.16, 0.94, and 11.85 billions symbol comparisons per second, respectively, emphasizing the performance gain of the CUDA solution, which reached 100% of GPU occupancy.

Leandro Feuser, Nahri Moreano

### RT-DBSCAN: Real-Time Parallel Clustering of Spatio-Temporal Data Using Spark-Streaming

Clustering algorithms are essential for many big data applications involving point-based data, e.g. user generated social media data from platforms such as Twitter. One of the most common approaches for clustering is DBSCAN. However, DBSCAN has numerous limitations. The algorithm itself is based on traversing the whole dataset and identifying the neighbours around each point. This approach is not suitable when data is created and streamed in real-time however. Instead a more dynamic approach is required. This paper presents a new approach, RT-DBSCAN, that supports real-time clustering of data based on continuous cluster checkpointing. This approach overcomes many of the issues of existing clustering algorithms such as DBSCAN. The platform is realised using Apache Spark running over large-scale Cloud resources and container based technologies to support scaling. We benchmark the work using streamed social media content (Twitter) and show the advantages in performance and flexibility of RT-DBSCAN over other clustering approaches.

Yikai Gong, Richard O. Sinnott, Paul Rimba

### GPU-Based Implementation of Ptycho-ADMM for High Performance X-Ray Imaging

X-ray imaging allows biologists to retrieve the atomic arrangement of proteins and doctors the capability to view broken bones in full detail. In this context, ptychography has risen as a reference imaging technique. It provides resolutions of one billionth of a meter, macroscopic field of view, or the capability to retrieve chemical or magnetic contrast, among other features. The goal is to reconstruct a 2D visualization of a sample from a collection of diffraction patterns generated from the interaction of a light source with the sample. The data collected is typically two orders of magnitude bigger than the final image reconstructed, so high performance solutions are normally desired. One of the latest advances in ptychography imaging is the development of Ptycho-ADMM, a new ptychography reconstruction algorithm based on the Alternating Direction Method of Multipliers (ADMM). Ptycho-ADMM provides faster convergence speed and better quality reconstructions, all while being more resilient to noise in comparison with state-of-the-art methods. The downside of Ptycho-ADMM is that it requires additional computation and a larger memory footprint compared to simpler solutions. In this paper we tackle the computational requirements of Ptycho-ADMM, and design the first high performance multi-GPU solution of the method. We analyze and exploit the parallelism of Ptycho-ADMM to make use of multiple GPU devices. The proposed implementation achieves reconstruction times comparable to other GPU-accelerated high performance solutions, while providing the enhanced reconstruction quality of the Ptycho-ADMM method.

Pablo Enfedaque, Huibin Chang, Hari Krishnan, Stefano Marchesini

### Elastic CPU Cap Mechanism for Timely Dataflow Applications

Sudden surges in the incoming workload can cause adverse consequences on the run-time performance of data-flow applications. Our work addresses the problem of limiting CPU associated with the elastic scaling of timely data-flow (TDF) applications running in a shared computing environment while each application can possess a different quality of service (QoS) requirement. The key argument here is that an unwise consolidation decision to dynamically scale up/out the computing resources for responding to unexpected workload changes can degrade the performance of some (if not all) collocated applications due to their fierce competition getting the shared resources (such as the last level cache). The proposed solution uses a queue-based model to predict the performance degradation of running data-flow applications together. The problem of CPU cap adjustment is addressed as an optimization problem, where the aim is to reduce the quality of service violation incidents among applications while raising the CPU utilization level of server nodes as well as preventing the formation of bottlenecks due to the fierce competition among collocated applications. The controller uses and efficient dynamic method to find a solution at each round of the controlling epoch. The performance evaluation is carried out by comparing the proposed controller against an enhanced QoS-aware version of round robin strategy which is deployed in many commercial packages. Experimental results confirmed that the proposed solution improves QoS satisfaction by near to 148% on average while it can reduce the latency of processing data records for applications in the highest QoS classes by near to 19% during workload surges.

M. Reza Hoseinyfarahabady, Nazanin Farhangsadr, Albert Y. Zomaya, Zahir Tari, Samee U. Khan

### Blockchain-Based Transaction Integrity in Distributed Big Data Marketplace

Today Big Data occupies a crucial part of scientific research areas as well as in the business analysis of large companies. Each company tries to find the best way to make generated Big Data sets valuable and profitable. However, in most cases, companies have not enough opportunities and budget to solve this complex problem. On the other hand, there are companies (i.e., in insurance or banking) that can significantly improve their business organization by applying hidden knowledge extracted from such massive data. This situation leads to the necessity of building a platform for exchange, processing, and sale of collected Big Data sets. In this paper, we propose a distributed big data platform that implements digital data marketplace based on the blockchain mechanism for data transaction integrity.

Denis Nasonov, Alexander A. Visheratin, Alexander Boukhanovsky

### Workload Characterization and Evolutionary Analyses of Tianhe-1A Supercomputer

Currently, supercomputer systems face a variety of application challenges, including high-throughput, data-intensive, and stream-processing applications. At the same time, there is more challenge to improve user satisfaction at the supercomputers such as Tianhe-1A, Tianhe-2 and TaihuLight, because of the commercial service model. It is important to understand HPC workloads and their evolution to facilitate informed future research and improve user satisfaction.In this paper, we present a methodology to characterize workloads on the commercial supercomputer (users need to pay), at a particular period and its evolution over time. We apply this method to the workloads of Tianhe-1A at the National Supercomputer Center in Tianjin. This paper presents the concept of quota-constrained waiting time for the first time, which has significance for optimizing scheduling and enhancing user satisfaction on the commercial supercomputer.

Jinghua Feng, Guangming Liu, Jian Zhang, Zhiwei Zhang, Jie Yu, Zhaoning Zhang

### The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques

As parallel computers approach exascale, power efficiency in high-performance computing (HPC) systems is of increasing concern. Exploiting both the hardware features and algorithms is an effective solution to achieve power efficiency, and to address the energy constraints in modern and future HPC systems. In this work, we present a novel design and implementation of an energy-efficient solution for dense linear systems of equations, which are at the heart of large-scale HPC applications. The proposed energy-efficient linear system solvers are based on two main components: (1) iterative refinement techniques, and (2) reduced-precision computing features in modern accelerators and coprocessors. While most of the energy efficiency approaches aim to reduce the consumption with a minimal performance penalty, our method improves both the performance and the energy efficiency. Compared to highly-optimized linear system solvers, our kernels deliver the same accuracy solution up to $$2\times$$ faster and reduce the energy consumption up to half on Intel Knights Landing (KNL) architectures. By efficiently using the Tensor Cores available in the NVIDIA V100 PCIe GPUs, the speedups can be up to $$4\times$$, with more than 80% reduction in the energy consumption.

Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Panruo Wu, Srikara Pranesh, Stanimire Tomov, Jack Dongarra

### Design of Parallel BEM Analyses Framework for SIMD Processors

Parallel Boundary Element Method (BEM) analyses are typically conducted using a purpose-built software framework called BEM-BB. This framework requires a user-defined function program that calculates the i-th row and the j-th column of the coefficient matrix arising from the convolution integral term in the fundamental BEM equation. Owing to this feature, the framework can encapsulate MPI and OpenMP hybrid parallelization with $$\mathcal {H}$$-matrix approximation. Therefore, users can focus on implementing a fundamental solution or a Green’s function, which is the most important element in BEM and depends on the targeted physical phenomenon, as a user-defined function. However, the framework does not consider single instruction multiple data (SIMD) vectorization, which is important for high-performance computing and is supported by the majority of existing processors. Performing SIMD vectorization of a user-defined function is difficult because SIMD exploits instruction-level parallelization and is closely associated with the user-defined function. In this paper, a conceptual framework for enhancing SIMD vectorization is proposed. The proposed framework is evaluated using two BEM problems, namely, static electric field analysis with a perfect conductor and static electric field analysis with a dielectric, on Intel Broadwell (BDW) processor and Intel Xeon Phi Knights Landing (KNL) processor. It offers good vectorization performance with limited SIMD knowledge, as can be verified from the numerical results obtained herein. Specifically, in perfect conductor analyses conducted using the $$\mathcal {H}$$-matrix, the framework achieved performance improvements of 2.22x and 4.34x compared to the original BEM-BB framework for the BDW processor and KNL, respectively.

Tetsuya Hoshino, Akihiro Ida, Toshihiro Hanawa, Kengo Nakajima

### An Experimental Assessment of Three Point-Insertion Sequences for 3-D Incremental Delaunay Tessellations

Currently, state-of-the-art algorithms for building 3-D Delaunay tessellations are incremental. Thus, their execution costs depend on the order of point insertion. This work evaluates three point-insertion sequences in incremental algorithms for building 3-D Delaunay tessellations. An incremental algorithm with point-insertion sequence provided by the cut-longest-edge kd–tree is evaluated against the BRIO–Hilbert order in conjunction with spatial middle and median policies employed in the 4.11 version of the Computational Geometry Algorithms Library. The results of computational costs (time and space) of these three algorithms are evaluated experimentally. Extensive results show that the incremental algorithm with a point-insertion sequence provided by the BRIO–Hilbert order with spatial middle policy employed in the latest version of the Computational Geometry Algorithms Library shows lower execution and storage costs than the two other algorithms evaluated.

Sanderson L. Gonzaga de Oliveira, Diogo T. Robaina, Diego N. Brandão, Mauricio Kischinhevsky, Gabriel Oliveira

### Learning Knowledge Graph Embeddings via Generalized Hyperplanes

For knowledge graph completion, translation-based methods such as Trans(E and H) are promising, which embed knowledge graphs into continuous vector spaces and construct translation operation between head and tail entities. However, TransE and TransH still have limitations in preserving mapping properties of complex relation facts for knowledge graphs. In this paper, we propose a novel translation-based method called translation on generalized hyperplanes (TransGH), which extends TransH by defining a generalized hyperplane for entities projection. TransGH projects head and tail embeddings from a triplet into a generalized relation-specific hyperplane determined by a set of basis vectors, and then fulfills translation operation on the hyperplane. Compared with TransH, TransGH can capture more fertile interactions between entities and relations, and simultaneously has strong expression in mapping properties for knowledge graphs. Experimental results on two tasks, link prediction and triplet classification, show that TransGH can significantly outperform the state-of-the-art embedding methods.

Qiannan Zhu, Xiaofei Zhou, JianLong Tan, Ping Liu, Li Guo

### Fast Higher-Order Functions for Tensor Calculus with Tensors and Subtensors

Tensors analysis has become a popular tool for solving problems in computational neuroscience, pattern recognition and signal processing. Similar to the two-dimensional case, algorithms for multidimensional data consist of basic operations accessing only a subset of tensor data. With multiple offsets and step sizes, basic operations for subtensors require sophisticated implementations even for entrywise operations.In this work, we discuss the design and implementation of optimized higher-order functions that operate entrywise on tensors and subtensors with any non-hierarchical storage format and arbitrary number of dimensions. We propose recursive multi-index algorithms with reduced index computations and additional optimization techniques such as function inlining with partial template specialization. We show that single-index implementations of higher-order functions with subtensors introduce a runtime penalty of an order of magnitude than the recursive and iterative multi-index versions. Including data- and thread-level parallelization, our optimized implementations reach 68% of the maximum throughput of an Intel Core i9-7900X. In comparison with other libraries, the average speedup of our optimized implementations is up to 5x for map-like and more than 9x for reduce-like operations. For symmetric tensors we measured an average speedup of up to 4x.

Cem Bassoy, Volker Schatz

### The Modified Self-Shrinking Generator?

Pseudo-random sequences exhibit interesting properties with applications in many and distinct areas ranging from reliable communications to number generation or cryptography. Inside the family of decimation-based sequence generators, the modified self-shrinking generator (an improved version of the self-shrinking generator) is one of its best-known elements. In fact, such a generator divides the PN-sequence produced by a maximum-length LFSR into groups of three bits. When the sum of the first two bits in a group is one, then the generator returns the third bit, otherwise the bit is discarded. In this work, we introduce a generalization of this generator, where the PN-sequence is divided into groups of t bits, $$t\ge 2$$. It is possible to check that the properties of the output sequences produced by this family of generators have the same or better properties than those of the classic modified self-shrunken sequences. Moreover, the number of sequences generated by this new family with application in stream cipher cryptography increases dramatically.

Sara D. Cardell, Amparo Fúster-Sabater

### Simulating Negotiation-Based Cloud Markets

Today, the so called supermarket approach is used for trading Cloud services on Cloud markets. Thereby, consumers purchase Cloud services at fixed prices without negotiation. More dynamic Cloud markets are emerging as e.g. the recent development of the Amazon EC2 spot market shows - with spot blocks and spot fleet management. Hence, autonomous Bazaar-based negotiations are a promising approach for trading Cloud services on future Cloud markets. Thereby, market participants negotiate the characteristics of Cloud services which are described in Service Level Agreements (SLAs). Specifications such as the WS-Agreement Negotiation standard foster the development of such Bazaar-based Cloud markets.In this paper we present a scientific simulation environment for the simulation of Bazaar-based Cloud markets which conforms to the WS-Agreement Negotiation standard. A two-stepped process is required for using the simulation environment: first consumers, intermediaries and providers have to be created, then strategies have to be assigned to them before the result of the simulation can be analyzed. The aim of the simulation environment is to support market participants during the evaluation of their negotiation strategies.

Benedikt Pittl, Werner Mach, Erich Schikuta

### Structural Learning of Probabilistic Graphical Models of Cumulative Phenomena

One of the critical issues when adopting Bayesian networks (BNs) to model dependencies among random variables is to “learn” their structure. This is a well-known NP-hard problem in its most general and classical formulation, which is furthermore complicated by known pitfalls such as the issue of I-equivalence among different structures. In this work we restrict the investigation to a specific class of networks, i.e., those representing the dynamics of phenomena characterized by the monotonic accumulation of events. Such phenomena allow to set specific structural constraints based on Suppes’ theory of probabilistic causation and, accordingly, to define constrained BNs, named Suppes-Bayes Causal Networks (SBCNs). Within this framework, we study the structure learning of SBCNs via extensive simulations with various state-of-the-art search strategies, such as canonical local search techniques and Genetic Algorithms. This investigation is intended to be an extension and an in-depth clarification of our previous works on SBCN structure learning. Among the main results, we show that Suppes’ constraints do simplify the learning task, by reducing the solution search space and providing a temporal ordering on the variables, which simplifies the complications derived by I-equivalent structures. Finally, we report on tradeoffs among different optimization techniques that can be used to learn SBCNs.

Daniele Ramazzotti, Marco S. Nobile, Marco Antoniotti, Alex Graudenzi

### Sparse Surface Speed Evaluation on a Dynamic Three-Dimensional Surface Using an Iterative Partitioning Scheme

We focus on a surface evolution problem where the local surface speed depends on a computationally expensive scalar function with non-local properties. The local surface speed must be re-evaluated in each time step, even for non-moving parts of the surface, due to possibly changed properties in remote regions of the simulation domain. We present a method to evaluate the surface speed only on a sparse set of points to reduce the computational effort. This sparse set of points is generated according to application-specific requirements using an iterative partitioning scheme. We diffuse the result of a constant extrapolation in the neighborhood of the sparse points to obtain an approximation to a linear interpolation between the sparse points.We demonstrate the method for a surface evolving with a local surface speed depending on the incident flux from a source plane above the surface. The obtained speedups range from 2 to 8 and the surface deviation is less than 3 grid-cells for all evaluated test cases.

Paul Manstetten, Lukas Gnam, Andreas Hössinger, Siegfried Selberherr, Josef Weinbub

### Accurate, Automatic and Compressed Visualization of Radiated Helmholtz Fields from Boundary Element Solutions

We propose a methodology to generate an accurate and efficient reconstruction of radiated fields based on high order interpolation. As the solution is obtained with the convolution by a smooth but potentially high frequency oscillatory kernel, our basis functions therefore incorporate plane waves. Directional interpolation is shown to be efficient for smart directions. An adaptive subdivision of the domain is established to limit the oscillations of the kernel in each element. The new basis functions, combining high order polynomials and plane waves, provide much better accuracy than low order ones. Finally, as standard visualization softwares are generally unable to represent such fields, a method to have a well-suited visualization of high order functions is used. Several numerical results confirm the potential of the method.

Matthieu Maunoury, Christophe Besse, Vincent Mouysset, Sébastien Pernet

### Backmatter

Weitere Informationen