Skip to main content

About this book

This book addresses topics related to cloud and Big Data technologies, architecture and applications including distributed computing and data centers, cloud infrastructure and security, and end-user services. The majority of the book is devoted to the security aspects of cloud computing and Big Data.

Cloud computing, which can be seen as any subscription-based or pay-per-use service that extends the Internet’s existing capabilities, has gained considerable attention from both academia and the IT industry as a new infrastructure requiring smaller investments in hardware platforms, staff training, or licensing software tools. It is a new paradigm that has ushered in a revolution in both data storage and computation.

In parallel to this progress, Big Data technologies, which rely heavily on cloud computing platforms for both data storage and processing, have been developed and deployed at breathtaking speed. They are among the most frequently used technologies for developing applications and services in many fields, such as the web, health, and energy.

Accordingly, cloud computing and Big Data technologies are two of the most central current and future research mainstreams. They involve and impact a host of fields, including business, scientific research, and public and private administration.

Gathering extended versions of the best papers presented at the Third International Conference on Cloud Computing Technologies and Applications (CloudTech’17), this book offers a valuable resource for all Information System managers, researchers, students, developers, and policymakers involved in the technological and application aspects of cloud computing and Big Data.

Table of Contents


Elliptic Curve Qu-Vanstone Based Signcryption Schemes with Proxy Re-encryption for Secure Cloud Data Storage

Data storage in cloud computing leads to several security issues such as data privacy, integrity, and authentication. Efficiency for the user to upload and download the data in a secure way plays an important role, as users are nowadays performing these actions on all types of devices, including e.g. smartphones. Signing and encryption of the sensitive data before hosting can solve potential security breaches. In this chapter, we propose two highly efficient identity based signcryption schemes. One of them is used as a building block for a proxy re-encryption scheme. This scheme allows users to store signed and encrypted data in the cloud, where the cloud server provider is able to check the authentication but not to derive the content of the message. When another user requests data access, the originator of the message first checks the authorization and then provides the cloud server with an encryption key to re-encrypt the stored data, enabling the requesting party to decrypt the resulting ciphertext and to validate the signature. The proposed scheme is based on elliptic curve operations and does not use computationally intensive pairing operations, like previous proposals.
Placide Shabisha, An Braeken, Abdellah Touhafi, Kris Steenhaut

Cloud Computing: Overview and Risk Identification Based on Classification by Type

The Cloud Computing is experiencing a powerful and very fast development in the IT field. Based on the principle of virtualization, it allows the consumer to use computing resources on demand, by means of the Internet, regardless of location and time. This technology also ensures broadband network access with fast realizing as required by the user. Finally, the invoicing is determined according to the usage. However, the pooling of resources increases the number of risks affecting the properties of; Confidentiality, availability and integrity. These risks are related to several factors; Data location, loss of governance and others. Unlike other works in which the risk analysis in Cloud Computing is done passively.
This work aims to make a thorough study to identify the set of security risks in a cloud environment in a structured way, by classifying them by types of service as well as by deployment and hosting models. This classification is fundamental since, except for the common risks, there are others which depend on the type of used cloud and which must be determined.
Chaimaa Belbergui, Najib Elkamoun, Rachid Hilal

Authentication Model for Mobile Cloud Computing Database Service

Mobile cloud computing (MCC) is a technique or model, in which mobile applications are built, powered and hosted using cloud computing technology. It refers to the availability of Cloud Computing services in a mobile environment and it is the combination of the heterogeneous fields like mobile phone device, cloud computing and wireless networks. Of all that has been written about cloud computing, precious little attention has been paid to authentication in the cloud. In this paper we have designed a new effective security model for mobile cloud Database as a Service (DBaaS). A user can change his/her password, whenever demanded. Furthermore, security analysis realizes the feasibility of the proposed model for DBaaS and achieves efficiency. We also proposed an efficient authentication scheme to solve the authentication problem in MCC. The proposed solution which we have provided is based mainly on improved Needham-Schroeder’s protocol to prove the users’ identity to determine if this user is authorized or not. The results showed that this scheme is very strong and difficult to break it.
Kashif Munir

FADETPM: Novel Approach of File Assured Deletion Based on Trusted Platform Module

Cloud Computing is emerging as a dominant approach for delivering services that encompasses a range of business and technical opportunities. However, users concerns are beginning to grow about the security and the privacy of their data. Assured deletion of data hosted in cloud providers platforms is on top of these concerns since all implemented solutions are proposed and totally controlled by the cloud services providers companies.
Cryptographic based techniques, foremost among them File Assured Deletion (FADE), are a promising solution for addressing this issue. FADE achieves assured deletion of files by making them unrecoverable to anybody, including those who manage the cloud storage, upon revocations of file access policies, by encrypting all data files before outsourcing, and then using a trusted third party to outsource the cryptographic keys. Unfortunately, this system remains weak since its security relies entirely on the security of the key manager.
In this chapter, we propose a new scheme that aims to improve the security of FADE by using the TPM (Trusted Platform Module). Implemented carefully in the hardware, the TPM is resistant to software attacks and hence it can allow our scheme to store safely keys, passwords and digital certificates on behalf of the cloud user. A prototype implementation of the proposed scheme shows that it provides a value-added security layer compared to FADE with a less overhead computational time.
Zakaria Igarramen, Mustapha Hedabou

Issues and Threats of Cloud Data Storage

Cloud Storage is a subset of Cloud Computing, a new technology that knows a hasty progress in the IT world. Indeed, many providers offer, as a service, plenty of Cloud Storage spaces, especially for mobile’s use. Due to the massive use of this means of storage increase issues and threats. Thus, many efforts are dedicated by scientific and researches all over the word to avoid these risks and solve the various security problems that confront Cloud Storage. This Chapter is a modest contribution to the ongoing efforts. It will be focused on the Cloud Storage security. At first, we will try to describe the data life cycle and to giving meanwhile the security measures and methods for each cycle. Then, we will present some new approaches that protect data in Cloud. Finally, we will propose and discuss a new secure architecture based on three layers.
Maryem Berrezzouq, Abdellatif El Ghazi, Zineelabidine Abdelali

Challenges of Crowd Sensing for Cost-Effective Data Management in the Cloud

Cloud computing has attracted researchers and organizations in the last decade due to the powerful and elastic computation capabilities provided on-demand to users. Mobile cloud computing is a way of enriching users of mobile devices with the computational resources and services of clouds. The recent developments of mobile devices and their sensors introduced the crowd sensing paradigm that uses powerful cloud computing to analyze, manage and store data produced by mobile sensors. However, crowd sensing in the context of using the cloud is posing new challenges that increase the importance of adopting new approaches to overcome them. This chapter introduces a middleware solution that provides a set of services for cost-effective management of crowd sensing data.
Aseel Alkhelaiwi, Dan Grigoras

On the Security of Medical Image Processing in Cloud Environment

The implementation of cloud computing in the healthcare domain offers an easy and ubiquitous access to off-site solutions to manage and process electronic medical records. In this concept, distributed computational resources can be shared by multiple clients in order to achieve significant cost savings. However, despite all these great advantages, security and privacy remain the major concerns hindering the wide adoption of cloud technology. In this respect, there have been many attempts to strengthen the trust between clients and service providers by building various security countermeasures and mechanisms. Amongst these measures, homomorphic algorithms, Service-Oriented Architecture (SOA) and Secret Share Scheme (SSS) are frequently used to mitigate security risks associated with cloud. Unfortunately, these existing approaches are not able to provide a practical trade-off between performance and security. In the light of this fact, we use a simple method based on fragmentation to support a distributed image processing architecture, as well as data privacy. The proposed methods combine a clustering method, the Fuzzy C-Means (FCM) algorithm, and a Genetic Algorithm (GA) to satisfy Quality of Service (QoS) requirements. Consequently, the proposal is an efficient architecture for reducing the execution time and security problems. This is accomplished by using a multi-cloud system and parallel image processing approach.
Mbarek Marwan, Ali Kartit, Hassan Ouahmane

Implementations of Intrusion Detection Architectures in Cloud Computing

Cloud computing is a paradigm that provides access to compute infrastructure on demand by allowing a customer to use virtual machines (VMs) to solve a given computational problem. Before implementing new applications running on the cloud, it is often useful to estimate the performance/cost of various implementations. In this paper we will compare different scenarios of collaborative intrusion detection systems that we have proposed already in a previous paper. This study is done using CloudAnalyst which is developed to simulate large-scale Cloud applications in order to study the behavior of such applications under various deployment configurations [11]. The simulation is done taking into consideration several parameters such as the data processing time, the response time, user hourly average, the request servicing time, the total data transfer and virtual machines costs. The obtained results are analyzed and compared in order to choose the most efficient implementation in terms of response time and the previous parameters. We will go into the details of the IDS (intrusion detection system) database by performing a statistical analysis of KDD dataset using the Weka tool to extract the most relevant attributes. For that we will briefly survey recent researches and proposals regarding the study and analysis of KDD dataset then we give an overview about the KDD dataset which is wildly used in anomaly detection, we also proceed to the analysis of KDD using Weka by executing a set of algorithms such as CfsSubsetEval and J48 in order to deduct the combinations of attributes that are relevant in the detection of attacks.
Mostapha Derfouf, Mohsine Eleuldj

Privacy in Big Data Through Variable t-Closeness for MSN Attributes

With the raised and extensive use of online data, the notion of big data has been widely studied in the literature recently. In fact, a big quantity of sensitive personal information could be contained in high dimensional data bases. This data needs to be sanitized before publishing. In this context, many ways were proposed in order to ensure privacy in big data including pseudonymization, cryptographic and anonymization techniques. T-closeness has been studied and treated with great interest as an anonymization technique ensuring privacy in big data when dealing with sensitive attributes. Although, t-closeness could be applied when treating quasi identifier attributes, but it is more suitable for sensitive attributes. Despite the fact that many algorithms for t-closeness have been proposed, many of them admit that the threshold t of t-closeness is set to a fixed value. In this chapter, a method using t-closeness for multiple sensitive numerical (MSN) attributes is presented. The method could be applied on both single and multiple sensitive numerical attributes. In the case where the data set contains attributes with high correlation, then our method will be applied only on one numerical attribute. In addition, a new algorithm called variable t-closeness for multiple sensitive numerical attributes was implemented. Our algorithm gives good results in terms of data anonymization and was experimentally evaluated on a test table. Furthermore, we highlighted all the steps of our proposed algorithm with detailed comments.
Zakariae El Ouazzani, Hanan El Bakkali

The Big Data-RTAP: Toward a Secured Video Surveillance System in Smart Environment

Big Data is an emerged architecture and technology paradigm that is used by many organizations to extract valuable information either to take decisions. Big Data is a technique and method used to retrieve, collect, process and analyze a very big volume of unstructured and structured data. The challenge is processing and analyzing the huge volume of data coming in from network sensors. Practically, it’s too late to stop an abnormal comportment, if we collect the incoming streams and wait for many days for processing and analyzing the stored streams. Big Data in video surveillance systems, offer ETL (Extract Transform and Load) challenges related to the Van Newman Bottleneck and Data Locality. In this chapter we propose a conceptual model with architectural elements and proposed tools for monitoring in RTAP (Real Time Analytical Processing) mode smart areas.
Our model is based on lambda architecture, in order to resolve the problem of latency which is imposed in transactional requests (GAB Network). We consider the real example that data comes from different sources (Automatic monitoring Centers, GAB, Facebook, Twitter, Instagram, LinkedIn, Medical Centers, Commercial Centers and any other data collected by satellites.) which is n dimension and we which to reduce with PCA algorithm the number of components to reduce the processing time and increase the speed of execution.
Abderrahmane Ezzahout, Jawad Oubaha

Optimizations in Fully Homomorphic Encryption

Optimizations in fully homomorphic encryption are the guidelines of many cryptographic researches after Gentry’s breakthrough in 2009. In this chapter, we sketch different technics to optimize and simplify fully homomorphic encryption schemes. Among these technics, we will focus on the trans-ciphering one, for this method we will describe a homomorphic evaluation of the different AES circuits (AES-128, AES-192 and AES-256) using a noise-free fully homomorphic encryption scheme. After all, we will present a new noise-free fully homomorphic encryption scheme based on quaternions. Trans-ciphering is supposed to be an efficient solution to optimize data storage in a context of outsourcing computations to a remote cloud computing as it is considered a powerful tool to minimize runtime in the client side. In this implementation, we will use our noise-free fully homomorphic encryption scheme with different key sizes. Among the tools we are using in this work, a small laptop with characteristics: bi-cores Intel core i5 CPU running at 2.40 GHz, with 512 KB L2 cache and 4 GB of Random Access Memory. Our implementation takes about 18 min to evaluate an entire AES circuit using a key of 1024 bits for the fully homomorphic encryption scheme.
Ahmed El-Yahyaoui, Mohamed Dafir Ech-cherif El Kettani

Support Cloud SLA Establishment Using MDE

In the last decade, Service Level Agreements (SLAs) play a pivotal role in Cloud Computing especially for guaranteeing quality, availability and responsibility. SLA involves different actors including customers and service providers. The problem that arises is how to establish an SLA contract between those actors and especially how to help the customer to choose the provider that offers the adequate services. Another important point is the measures to guarantee that the provider respects its contract with the consumer. Our approach embraces model driven engineering principles to automate the generation of the SLA contract and its real-time monitoring. For this purpose, we propose three languages dedicated respectively to the customer, the supplier, and the contract specification. Since we cannot predict QoS values at advance, we propose to use machine learning to learn QoS behavior at run-time.
Mahmoud El Hamlaoui, Tarik Fissaa, Youness Laghouaouta, Mahmoud Nassar

A New Parallel and Distributed Approach for Large Scale Images Retrieval

The process of image retrieval presents a great interest in the domains of computer vision, video-surveillance, etc. Visual characteristics of image such as color, texture, shape are used to identify the content of images. However, the retrieving process becomes very challenging due to the hard management of large databases in terms of storage, computation complexity, performance and similarity representation.
In this paper, we propose a new approach for indexing images by the content. The proposed method provides a parallel and distributed computation using the HIPI framework (Hadoop Image Processing Interface) and HDFS (Hadoop Distributed File System) as a storage system, and exploiting the high power of GPUs (Graphic Processing Units). As result, our approach allows to manage and process, fastly, large images databases, thanks to the distributed storage (HDFS) and the GPU parallel computations.
Mohammed Amin Belarbi, Sidi Ahmed Mahmoudi, Saïd Mahmoudi, Ghalem Belalem

Classification of Social Network Data Using a Dictionary-Based Approach

The classification of the social network’s data becomes in recent years an active area in the scientific research, it tries to classify the data of the social networks into classes or extract the feelings, attitudes and opinions. This type of classification is called sentiment analysis or opinion mining which is the process of studying people’s opinion, emotion and also classifying a sentence or a document into classes like positive, negative or neutral. In this article we propose a new method to classify the tweets into three classes: positive, negative or neutral in a semantic way using the WordNet and AFINN (AFINN is a dictionary that contains words with weights between \(-5\) and 5 which expresses the sentimental degree of the word) dictionaries, and in a parallel way using the Hadoop framework with the Hadoop Distributed File System HDFS and the programming model MapReduce. the main objective of this work is the proposal of a new sentiment analysis approach by combining between several domains like the information retrieval, semantic similarity, opinion mining or sentiment analysis and big data.
Youness Madani, Mohammed Erritali, Jamaa Bengourram

Parallel and Distributed Map-Reduce Models for External Clustering Validation Indexes

Procedures that evaluate the results of clustering algorithms are known as cluster validation (CV) indexes. There exist several CV indexes usually classified into two broad classes namely external and internal clustering validation indexes depending on whether ground truth or optimal clustering solutions are known in advance or not respectively. Traditional cluster validation indexes are even impossible to perform especially when the size of the data set is very large. To solve the issue of CV indexes in such contexts, we propose parallel and distributed external clustering validation models based on MapReduce for three indexes namely: F-measure, Normalized Mutual Information and Variation of Information. The experimental results reveal that these models scale very well with increasing size of dataset and provide accurate results.
Soumeya Zerabi, Souham Meshoul

Workflow Scheduling Issues and Techniques in Cloud Computing: A Systematic Literature Review

One of the most challenging issues in cloud computing is workflow scheduling. Workflow applications have a complex structure and many discrete tasks. Each task may include entering data, processing, accessing software, or storage functions. For these reasons, the workflow scheduling is considered to be an NP-hard problem. Then, efficient scheduling algorithms are required for selection of best suitable resources for workflow execution. In this paper, we conduct a SLR (Systematic literature review) of workflow scheduling strategies that have been proposed for cloud computing platforms to help researchers systematically and objectively gather and aggregate research evidences about this topic. Then, we present a comparative analysis of the studied strategies. Finally, we highlight workflow scheduling issues for further research. The findings of this review provide a roadmap for developing workflow scheduling models, which will motivate researchers to propose better workflow scheduling algorithms for service consumers and/or utility providers in cloud computing.
Samadi Yassir, Zbakh Mostapha, Tadonki Claude

A Review of Green Cloud Computing Techniques

The information and communication technology became of important use but its impact on the environment became as important due to the large amount of CO2 emissions and energy consumption. Cloud computing is considered one of Information and communication technologies that managed to achieve efficient usage of resources and energy. However, data centers still represent a huge percentage of the companies energy cost since the usage is continuously growing. Ever since this issue took notice, the number of research on energy efficiency and the green field has being growing. Green cloud computing represents a solution to allow companies and users to use the Cloud and all its perks while reducing the negative environmental impact and general costs through energy efficiency, carbon footprint and e-waste reduction. Applications and practices to make companies more eco friendly are being developed or deployed day by day. Different aspects are treated in companies to achieve green cloud computing. This chapter presents different techniques to achieve green computing but focuses more on Cloud computing.
Hala Zineb Naji, Mostapha Zbakh, Kashif Munir

Towards a Smart Exploitation of GPUs for Low Energy Motion Estimation Using Full HD and 4K Videos

Video processing and more particularly motion tracking algorithms present a necessary tool for various domains related to computer vision such as motion recognition, depth estimation and event detection. However, the use of high definitions videos (HD, Full HD, 4K, etc.) cause that current implementations, even running on modern hardware, no longer respect the requirements of real-time treatment. In this context, several solutions have been proposed to overcome this constraint, by exploiting graphic processing units (GPUs). Although, they benefit from the high power of GPU, none of them is able to provide efficient dense and sparse motion tracking within high definition videos efficiently. In this work, we propose a GPU and Multi-GPU based method for both sparse and dense optical flow motion tracking using the Lucas-Kanade algorithm. Our method presents an efficient exploitation and management of single or/and multiple GPU memories, according to the type of applied implementation: sparse or dense. The sparse implementation allows tracking meaningful pixels, which are detected with the Harris corner detector. The dense implementation requires more computation since it is applied on each pixel of the video. Within our approach, high definition videos are processed on GPUs while low resolution videos are treated on CPUs. As result, our method allows real-time sparse and dense optical flow computation on videos in Full HD or even 4K format. The exploitation of multiple GPUs presents performance that scale up very well. In addition to these performances, the parallel implementations offered lower power consumption as result of the fast treatment.
Sidi Ahmed Mahmoudi, Mohammed Amine Belarbi, Pierre Manneback

Machine Learning Applications in Supply Chains: Long Short-Term Memory for Demand Forecasting

Due to the rapid technological advances, machine Learning or the ability of a machine to learn automatically has found applications in various fields. It has proven to be a valuable tool for aiding decision makers and improving the productivity of enterprise processes, due to its ability to learn and find interesting patterns in the data. Thereby, it is possible to improve supply chains processes by using Machine Learning which generates in general better forecasts than the traditional approaches.
As such, this chapter examines multiple Machine Learning algorithms, explores their applications in the various supply chain processes, and presents a long short-term memory model for predicting the daily demand in a Moroccan supermarket.
Halima Bousqaoui, Said Achchab, Kawtar Tikito

Performance Analysis of Preconditioned Conjugate Gradient Solver on Heterogeneous (Multi-CPUs/Multi-GPUs) Architecture

The solution of systems of linear equations is one of the most central processing unit-intensive steps in engineering and simulation applications and can greatly benefit from the multitude of processing cores and vectorisation on today’s parallel computers. Our objective is to evaluate the performance of one of them, the conjugate gradient method, on a hybrid computing platform (Multi-GPU/Multi-CPU). We consider the preconditioned conjugate gradient solver (PCG) since it exhibits the main features of such problems. Indeed, the relative performance of CPU and GPU highly depends on the sub-routine: GPUs are for instance much more efficient to process regular kernels such as matrix vector multiplications rather than more irregular kernels such as matrix factorization. In this context, one solution consists in relying on dynamic scheduling and resource allocation mechanisms such as the ones provided by StarPU. In this chapter we evaluate the performance of dynamic schedulers proposed by StarPU, and we analyse the scalability of PCG algorithm. We show how effectively we can choose the best combination of resources in order to improve their performance.
Najlae Kasmi, Mostapha Zbakh, Amine Haouari

Runtime Prediction of Optimizers Using Improved Support Vector Machine

The aim of this paper is to propose a machine learning approach to build a model for predicting the runtime of optimization algorithms as a function of problem-specific instance features. That is, our method consists of building a support vector machine (SVM) model incorporating feature selection to predict the runtime of each configuration on each instance in order to select the adapted setting depending on the instance. Such approach is useful for both algorithm configuration and algorithm selection. These problems are attracting much attention and they enable to benefit from the increasing volume of data for better decision making. The experiment consists of predicting algorithm performance for a well known optimization problem using the regression form of SVM.
Abdellatif El Afia, Malek Sarhani

AND/OR Directed Graph for Dynamic Web Service Composition

Nowadays, web services have become more popular and are the most preferred technology for distributed system development. However, several issues related to the dynamic nature of the web still need to be addressed, such as scalability, high complexity, high computing costs and failure issues. It becomes very important to find efficient solutions for the composition of web services, capable of handling different problems such as large quantities of services, semantics or user’s constraints. In this chapter, we formalize the web Service composition problem as a search problem in an AND/OR Service Dependency Graph, where nodes represent available services and arcs represent the semantic input/output dependencies among these services. A set of dynamic optimization techniques based on redundancy analysis and service dominance has been included to reduce the size of this graph and thus improves the scalability and performance of our approach. We pre-calculate all the shortest paths between each pair of this graph’s node using a graph search algorithm. These paths will be used upon the receipt of a client request. The construction of the graph and calculation of the shortest paths are done offline to remove this time-consuming task from the composition search process; therefore optimizing the composition process by reducing the computational effort when running the query. Furthermore, in addition to the sequence and fork relations, our model supports the parallel relation.
Hajar Elmaghraoui, Laila Benhlima, Dalila Chiadmi

An NLP Based Text-to-Speech Synthesizer for Moroccan Arabic

The ultimate goal of text-to-speech synthesis is to produce natural sounding speech from any text sequence regardless of its complexity and ambiguity. For this purpose, many approaches combine different models to text-to-speech methods in order to enhance results. Applying text-to-speech to Arabic dialects presents additional challenges, such as ambiguity of undiacritized words, and lack of linguistics resources. In this respect, the purpose of this chapter is to present a text-to-speech synthesizer for Moroccan Arabic based on NLP rule-based and probabilistic models. The chapter contains a presentation of Moroccan Arabic linguistics, an analysis of NLP techniques in general, and Arabic NLP techniques in particular.
Rajae Moumen, Raddouane Chiheb

Context-Aware Routing Protocol for Mobile WSN: Fire Forest Detection

Wireless sensor networks (WSNs) are extensively used in several fields, especially for monitoring and gathering physical information. The sensor nodes could be static or mobile depending on the application requirements . Mobility arises major challenges especially in routing. It radically changes the routing path, while the WSNs peculiarities make reuse of designed protocols difficult especially for other types of mobile networks. In this paper, we present a Context-Aware routing protocol based on the particle swarm optimization (PSO) in random waypoint (RWP) based dynamic WSNs. Finally, a case study of forest fire detection is presented as a validation of the proposed approach.
Asmae El Ghazi, Zineb Aarab, Belaïd Ahiod


Additional information