Modeling and Simulation in HPC and Cloud Systems

herausgegeben von: Joanna Kołodziej, Dr. Florin Pop, Prof. Ciprian Dobre

Verlag: Springer International Publishing

Buchreihe : Studies in Big Data

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book consists of eight chapters, five of which provide a summary of the tutorials and workshops organised as part of the cHiPSet Summer School: High-Performance Modelling and Simulation for Big Data Applications Cost Action on “New Trends in Modelling and Simulation in HPC Systems,” which was held in Bucharest (Romania) on September 21–23, 2016. As such it offers a solid foundation for the development of new-generation data-intensive intelligent systems.

Modelling and simulation (MS) in the big data era is widely considered the essential tool in science and engineering to substantiate the prediction and analysis of complex systems and natural phenomena. MS offers suitable abstractions to manage the complexity of analysing big data in various scientific and engineering domains. Unfortunately, big data problems are not always easily amenable to efficient MS over HPC (high performance computing). Further, MS communities may lack the detailed expertise required to exploit the full potential of HPC solutions, and HPC architects may not be fully aware of specific MS requirements.

The main goal of the Summer School was to improve the participants’ practical skills and knowledge of the novel HPC-driven models and technologies for big data applications. The trainers, who are also the authors of this book, explained how to design, construct, and utilise the complex MS tools that capture many of the HPC modelling needs, from scalability to fault tolerance and beyond. In the final three chapters, the book presents the first outcomes of the school: new ideas and novel results of the research on security aspects in clouds, first prototypes of the complex virtual models of data in big data streams and a data-intensive computing framework for opportunistic networks. It is a valuable reference resource for those wanting to start working in HPC and big data systems, as well as for advanced researchers and practitioners.

Inhaltsverzeichnis

Frontmatter

Evaluating Distributed Systems and Applications Through Accurate Models and Simulations

Abstract

Evaluating the performance of distributed applications can be performed by in situ deployment on real-life platforms. However, this technique requires effort in terms of time allocated to configure both application and platform, execution time of tests, and analysis of results. Alternatively, users can evaluate their applications by running them on simulators on multiple scenarios. This provides a fast and reliable method for testing the application and platform on which it is executed. However, the accuracy of the results depend on the cross-layer models used by the simulators. In this chapter we investigate some of the existing models for representing both applications and the underlying distributed platform and infrastructure. We focus our presentation on the popular SimGrid simulator. We emphasize some best practices and conclude with few control questions and problems.

Marc Frincu, Bogdan Irimie, Teodora Selea, Adrian Spataru, Anca Vulpe

Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges

Abstract

With the explosive growth of big data, workloads tend to get more complex and computationally demanding. Such applications are processed on distributed interconnected resources that are becoming larger in scale and computational capacity. Data-intensive applications may have different degrees of parallelism and must effectively exploit data locality. Furthermore, they may impose several Quality of Service requirements, such as time constraints and resilience against failures, as well as other objectives, like energy efficiency. These features of the workloads, as well as the inherent characteristics of the computing resources required to process them, present major challenges that require the employment of effective scheduling techniques. In this chapter, a classification of data-intensive workloads is proposed and an overview of the most commonly used approaches for their scheduling in large-scale distributed systems is given. We present novel strategies that have been proposed in the literature and shed light on open challenges and future directions.

Georgios L. Stavrinides, Helen D. Karatza

Design Patterns and Algorithmic Skeletons: A Brief Concordance

Abstract

Having been designed as abstractions of common themes in object-oriented programming, patterns have been incorporated into parallel programming to allow an application programmer the freedom to generate parallel codes by parameterising a framework and adding the sequential parts. On the one hand, parallel programming patterns and their derived languages have maintained, arguably, the best adoption rate; however, they have become conglomerates of generic attributes for specific purposes, oriented towards code generation rather than the abstraction of structural attributes. On the other hand, algorithmic skeletons systematically abstract commonly-used structures of parallel computation, communication, and interaction. Although there are significant examples of relevant applications—mostly in academia—where they have been successfully deployed in an elegant manner, algorithmic skeletons have not been widely adopted as patterns have. However, the ICT industry expects graduates to be able to easily adapt to its best practices. Arguably, this entails the use of pattern-based programming, as it has been the case in sequential programming where the use of design patterns is widely considered the norm, as demonstrated by a myriad of citations to the seminal work of Gamma et al. [6] widely known as the Gang-of-Four. We contend that an algorithmic skeleton can be treated as a structural design pattern where the degree of parallelism and computational infrastructure are only defined at runtime. The purpose of this chapter is to explain how design patterns can be mapped into algorithmic skeletons. We illustrate our approach using a simple example using the visitor design pattern and the task farm algorithmic skeleton.

Adriana E. Chis, Horacio González–Vélez

Evaluation of Cloud Systems

Abstract

Modelling and simulation represent suitable instruments for evaluation of distributed system. These essential tools in science are used in Cloud systems design and performance evaluation. The chapter covers the fundamental skills for a practitioner working in the field of Cloud Systems to have, for the development of a correct methodology for the evaluation using simulation of Cloud services and components. We concentrate on subjects related to tasks scheduling and resource allocation with the focus on scalability and elasticity, the constraints imposed by SLA and the use of CloudSim for performance evaluation of Cloud Systems. Several metrics used in modelling and simulation are presented in this chapter.

Mihaela-Andreea Vasile, George-Valentin Iordache, Alexandru Tudorica, Florin Pop

Science Gateways in HPC: Usability Meets Efficiency and Effectiveness

Abstract

The amount of available data as well as computational methods for research have grown faster in the last decade than ever before. Methods that address compute-intensive and data-intensive challenges often apply sophisticated computing and data infrastructures such High-Performance Computing. Effectiveness and efficiency are the basis to achieve meaningful results in reasonable time in such infrastructures and science gateways consider additionally usability to support users in their research. The main goals of science gateways are to lower the barrier of using complex methods and infrastructures; to connect computing and data resources seamlessly behind an easy-to-use interface; and to share methods and data in a community. We go into detail on the concept of science gateways and their design requirements. Furthermore, we discuss the challenges for sharing research in science gateways and thus, aiming at reusability of scientific methods and reproducibility of science.

Sandra Gesing

MobEmu: A Framework to Support Decentralized Ad-Hoc Networking

Abstract

Opportunistic networks (ONs) are an extension of mobile ad hoc networks where nodes are generally human-carried mobile devices like smartphones and tablets, which do not have a global view of the network. They only possess knowledge from the nodes they encounter, so well-defined paths between a source and a destination do not necessarily exist. There are plenty of real-life uses for ONs, including, but not limited to, disaster management, smart cities, floating content, advertising, crowd management, context-aware platforms, distributed social networks, or data offloading and mobile cloud computing. In order to implement and test a routing or dissemination solution for opportunistic networks, simulators are employed. They have the benefit of allowing developers to analyze and tweak their solutions with reduced costs, before deploying them in a working environment. For this reason, in this chapter we present MobEmu, an opportunistic network simulator which can be used to evaluate a user-created routing or dissemination algorithm on a desired mobility trace or synthetic model.

Radu-Ioan Ciobanu, Radu-Corneliu Marin, Ciprian Dobre

Virtualization Model for Processing of the Sensitive Mobile Data

Abstract

In this chapter, the k-anonymity algorithm is used for anonymization of sensitive data sending via network and analyzed by experts. Anonymization is a technique used to generalize sensitive data to block the possibility of assigning them to specific individuals or entities. In our proposed model, we have developed a layer that enables virtualization of sensitive data, ensuring that they are transmitted safely over the network and analyzed with respects the protection of personal data. Solution has been verified in real use case for transmission sports data to the experts who send the diagnosis as a response.

Andrzej Wilczyński, Joanna Kołodziej

Analysis of Selected Cryptographic Services for Processing Batch Tasks in Cloud Computing Systems

Abstract

This chapter evaluates the features and a computational load of two proposed cryptographic procedures which aim to protect confidentiality and data integrity in Cloud Computing (CC) systems. It should be kept in mind that a bad use of some cryptographic tools may negatively impact the overall CC operation. Regarding this, meeting the Quality of Service (QoS) requirements is only possible when the security layer applied does not interrupt the computing process. The security layer applied to tasks should also fulfill the advanced security conditions present in CC systems. Thus, the solutions aiming to protect both the user data as well as the whole system have to deliver the scalability, multi-tenancy and complexity that these systems demand. We present a cryptographic service based on blind RSA algorithm and Shamir secret sharing that supports batch tasks processing. Hence, this service is suitable for CC systems equipped with a monolithic central scheduler and many Virtual Machines (VMs) as working nodes. Blind RSA cryptographic system is used to encrypt the data without actually knowing any details about the tasks content. Shamir secret sharing procedure is proposed in order to assure whether all VMs in the system gave back their shares after deploying the batch of tasks on them or not.

Agnieszka Jakóbik, Jacek Tchórzewski

Titel: Modeling and Simulation in HPC and Cloud Systems
herausgegeben von: Joanna Kołodziej
Dr. Florin Pop
Prof. Ciprian Dobre
Verlag: Springer International Publishing
Electronic ISBN: 978-3-319-73767-6
Print ISBN: 978-3-319-73766-9
DOI: https://doi.org/10.1007/978-3-319-73767-6