Skip to main content
Top

2018 | Book

Big Data Analytics

Proceedings of CSI 2015

Editors: Prof. Dr. V. B. Aggarwal, Prof. Dr. Vasudha Bhatnagar, Dr. Durgesh Kumar Mishra

Publisher: Springer Singapore

Book Series : Advances in Intelligent Systems and Computing

insite
SEARCH

About this book

This volume comprises the select proceedings of the annual convention of the Computer Society of India. Divided into 10 topical volumes, the proceedings present papers on state-of-the-art research, surveys, and succinct reviews. The volumes cover diverse topics ranging from communications networks to big data analytics, and from system architecture to cyber security. This volume focuses on Big Data Analytics. The contents of this book will be useful to researchers and students alike.

Table of Contents

Frontmatter
Need for Developing Intelligent Interfaces for Big Data Analytics in the Microfinance Industry

The main objective of the paper is to provide a multidimensional perspective of the microfinance industry where one finds that several different components such as “Sustainable Rural employment”, “Data Analysis for the Micro Finance Industry”, and Theory of Maslow’s Need Hierarchy interrelate and work hand in hand. There is a strong correlation between Maslow’s need hierarchy theory of motivation and assessing the changes in demand for financial services in the microfinance industry. How ICT and data analytics could help in efficiently tracking the change in demand and thus help the microfinance institutions in better demand forecasting as well as acquisition and management of resources, which are shared commonly, between various stakeholders, is the focus of this research paper. The paper is structured in sections starting with an introduction of the microfinance industry. It is then followed by the literature review, which explains a few of the concepts in theory to form the base. Other sections include discussion and policy implications followed by conclusion and future research which focuses more on the IT interventions and the need for advance level and integrated systems design for efficient delivery of financial services, better policy planning, and optimized use of real-time information for analytical decision-making, at the MFI level for the microfinance industry to achieve its goal of financial inclusion.

Purav Parikh, Pragya Singh
Unified Resource Descriptor over KAAS Framework
Refining Cloud Dynamics

With the advent of information digitization, virtual social networking, and other means of information sharing protocols, today billions of data are available on the World Wide Web from heterogeneous sources. All these data further contribute to the emergence of Big Data gamut. When these data are processed further, we get a glimpse of information which gives some level of understanding on the subject or the matter (person, place, enterprise, etc.). Knowledge is cohesively logically processed related information with the intellect to give us multidimensional information spectrum for decision-making in real time. In today’s global environment, data plays crucial role to understand the social, cultural, behavioral, and demographic attributes of a subject. Knowledge-as-a-Service (KAAS) is a pioneering cloud framework inheriting the “Internet of Things” principles that extract data from various sources in a seamless manner and can further decouple–couple logically processed information based on the “matching chromosome” algorithm. Unified Resource Descriptor (URD) is an innovative information modeling technique that operates over KAAS framework to further publish knowledge on the subject on need basis. Based on this concept, every resource or subject is assigned a unique identifier that can perform multilayered search in the KAAS Database to extract relevant knowledge frames. Considering India’s context, second most populated country in the world, URD can play an indispensable role to tighten information dynamics holistically and accumulate a broader spectrum of knowledge of the resource to address adverse situations (natural calamity, medication, insurance, etc.), business process solution (Banking, BPOs, KPOs, etc.), and research practices.

Subhajit Bhattacharya
An Adaptable and Secure Intelligent Smart Card Framework for Internet of Things and Cloud Computing

Internet of Things (IoT) and cloud computing paradigm is a next wave in the era of digital life and in the field of Information and Communication Technology. It has been understood from the literature that integration of IoT and cloud is in its infantile phase that has not been extended to all application domains due to its inadequate security architecture. Hence, in this paper, a novel, adaptable, and secure intelligent smart card framework for integrating IoT and cloud computing is proposed. Elliptic Curve Cryptography is used to ensure complete protection against the security risks. This model ensures security and realizes the vision of “one intelligent smart card for any applications and transactions” anywhere, anytime with one unique ID. The performance of the proposed framework is tested in a simulated environment and the results are presented.

T. Daisy Premila Bai, A. Vimal Jerald, S. Albert Rabara
A Framework for Ontology Learning from Taxonomic Data

Taxonomy is implemented in myriad areas of biological research and though structured it deals with the problem of information retrieval. Ontology is a very powerful tool for knowledge representation and literature also cites the conversion of taxonomies into ontologies. The automated ontology learning is developed to ward off the knowledge acquisition bottleneck; but thereof the limitation includes text understanding, knowledge extraction, structured labelling and filtering. The system, ASIUM, TEXT TO ONTO, DODDLE II, SYNDIKATE, HASTI, etc., includes some inadequacies and does not exclusively deal with taxonomic texts. The proposed system will deal with the taxonomic text available in agricultural system and will also enhance the algorithms thereby available. We also propose a framework for learning of the taxonomic text which will overcome the loopholes of ontology developed from generalized texts. Finally, a framework of comparison of the manually developed ontology and automatically developed ontology will be ensured.

Chandan Kumar Deb, Sudeep Marwaha, Alka Arora, Madhurima Das
Leveraging MapReduce with Column-Oriented Stores: Study of Solutions and Benefits

The MapReduce framework is a powerful tool to process large volume of data. It is becoming ubiquitous and is generally used with column-oriented stores. It offers high scalability and fault tolerance in large-scale data processing, but still there are certain issues when it comes to access data from columnar stores. In this paper, first, we compare the features of column stores with row stores in terms of storing and accessing the data. The paper is focused on studying the main challenges that arise when column stores are used with MapReduce, such as data co-location, distribution, serialization, and data compression. Effective solutions to overcome these challenges are also discussed.

Narinder K. Seera, S. Taruna
Hadoop: Solution to Unstructured Data Handling

Data is nothing but information of anything and we know it will continue to grow more and more. Unspecified format of data is unstructured data known as big data. 25% of the data that exist is in specified format, i.e., structured data and other 75% is in unspecified format. Unstructured data can be found anywhere. Generally, most of people and organizations pass out their lives working around unstructured data. In this paper, we have tried to work on how one can store unstructured data.

Aman Madaan, Vishal Sharma, Prince Pahwa, Prasenjit Das, Chetan Sharma
Task-Based Load Balancing Algorithm by Efficient Utilization of VMs in Cloud Computing

Although a lot of fundamental research is being carried out in a field of cloud computing, but still it is in infancy stage. In today’s era of computing, it is trending Internet technology, which suffers from various issues and challenges. This research addresses load balancing as one of the major challenges in cloud computing. The paper proposes a dynamic load balancing algorithm for cloud computing environment and compares with the existing algorithm. The results show that proposed algorithm outperforms existing algorithm in terms of average response time, turnaround time, and total cost.

Ramandeep Kaur, Navtej Singh Ghumman
A Load Balancing Algorithm Based on Processing Capacities of VMs in Cloud Computing

Cloud Computing is a computing paradigm which has made high-performance computing accessible even to SMEs (small and medium enterprises). It provides various types of services to the users in the form of hardware, software, application platforms. The cloud computing environment is elastic and heterogeneous in nature. Any number of users may join/leave the system at any point of time; it means the workload of the system increases/decreases randomly. Therefore, there is a requirement for a load balancing system which must ensure that the load of the system is fairly distributed among the nodes of the system and aims to achieve minimum completion time and maximum resource utilization. The paper presents a load balancing algorithm based on processing capacities of virtual machines (VMs) in cloud computing. It analyses the algorithm and finds the research gap. It also proposes the future work overcoming the research gap in this field. The simulation of the algorithm is carried out in the CloudSim simulation toolkit.

Ramandeep Kaur, Navtej Singh Ghumman
Package-Based Approach for Load Balancing in Cloud Computing

Cloud computing is a developing technology in today’s Internet world which offers the users with on demand access to resources through different service models. In spite of providing many advantages over the traditional computing, there are some critical issues in cloud computing. Load balancing is a crucial issue in cloud computing that distributes the user’s requests to the nodes in such a manner to balance the load on nodes. A proper load balancing algorithm is required to execute the process and manage the resources. The common objective of load balancing algorithm is to achieve the minimum execution time and proper utilization of resources. In this paper, we proposed a new technique to achieve the load balancing called packet-based load balancing algorithm. The motive of this algorithm is to design the concept of load balancing using the grouping of packages and perform the virtual machine replication, if requested package is not available. In this paper, task is achieved with minimum execution time and execution cost which is profitable for the service provider and the user.

Amanpreet Chawla, Navtej Singh Ghumman
Workload Prediction of E-business Websites on Cloud Using Different Methods of ANN

Workload forecasting of cloud-based application depends on the type of application and user behavior. Measurement of workload can be done in terms of loads, data storage, service rate, processing time, etc. In this paper, author has tried to predict workload of e-business website on cloud-based environment. ANN-based approach is used by author to calculate number of cloud instances required to manage workload efficiently. Also different training method of ANN has been applied to perform comparative study. MATLAB neural network toolbox is used for simulation work. An Amazon cloud service is also used for different parameters of data collection.

Supreet Kaur Sahi, V. S. Dhaka
Data Security in Cloud-Based Analytics

Cloud computing platforms have grown in prominence in last few years, as they have made business applications and information accessible on the move without the need to purchase, set up, and maintain necessary hardware and software. The organizations are churning enormous gains due to scalability, agility, and efficiency achieved through the use of clouds. Data analytics involves voluminous data crunching to determine trends and patterns for business intelligence, scientific studies, and data mining. The incessant outburst of data from multiple sources such as web applications, social media, and other Internet-based sources motivate leveraging cloud technology for data analytics. Different strategies are being studied and incorporated to use the subscription-based cloud for serving analytics systems. The paper focusses on understanding the security threats associated with cloud-based analytics and approaches to cloud security assurance in data analytics systems.

Charru Hasti, Ashema Hasti
Ontology-Based Ranking in Search Engine

Today’s web is human readable where information cannot be easily processed by machines. The current existing Keyword-Based Search Engines provides an efficient way to browse the web content. But they do not consider the context of the user query or the web page and return a large result set, out of which very few are relevant to the user. Therefore, users are often confronted with the daunting task of shifting through multiple pages, to find the exact match. In addition, the Ranking factors employed by these search engines do not take into account the context or the domain of the web page. In this paper, to rank a context sensitive web page, a ranking factor is developed which uses the underlying ontology of the particular domain in which it lies. The value of this factor is computed by calculating the number of data properties present in the web page.

Rahul Bansal, Jyoti, Komal Kumar Bhatia
Hidden Data Extraction Using URL Templates Processing

A lot of work has been carried out in the deep web. Deep web is like a golden apple in the eyes of the researchers. Most of the deep web search engines extract the data from the deep web, store them in the database, and index them. So, such kind of techniques have the disadvantage of less freshness, large repository requirement and need of frequent updating of the deep web database to give accurate and correct results. In order to overcome these drawbacks, we propose a new technique “Hidden Data Extraction using URL Template processing” where the fresh results from the website server database are fetched dynamically and are served to the users.

Babita Ahuja, Anuradha, Dimple Juneja
Automatic Generation of Ontology for Extracting Hidden Web Pages

WWW consists of thousands of web pages which are hidden behind search interfaces. To retrieve those hidden web pages user fills in the details manually in various fields of form pages. To automatically extract all this hidden information the crawler (hidden web crawler) must be so intelligent that it understands the interface and fill the required information accurately. This process of understanding and filling forms automatically can be easily and efficiently done with the help of ontology. A database that stores semantic information about objects and their relations may solve this purpose. In this work, a novel technique for creation of ontology with the help of form pages is proposed and implemented.

Manvi, Komal Kumar Bhatia, Ashutosh Dixit
Importance of SLA in Cloud Computing

Cloud computing can be thought of as service provider that involves delivering hosted services over the Internet. It does not mean handling of the application using local or personal resources nor using the dedicated network to provide service like office or home network. Consumers and providers both need to face some challenges in spite of accessing many utilities as a process in cloud computing. Service level agreement (SLA) is a common legal document where both the party needs to agree to the terms and conditions for provisioning and consuming the service. Hence, SLA plays a major role in cloud computing in order to access service as expected with a few realistic limitations. The objective of this paper is to explain briefly the importance of SLA in cloud computing along with phases of its lifecycle, template, and parameters. This paper also proposes sample SLA template on which basis service provisioning and monitoring being carried out successfully.

Angira Ghosh Chowdhury, Ajanta Das
A Survey on Cloud Computing

Cloud computing technology is the way to provide everything to clients as services through internet connection. Using this technology the clients would be able to rent the required services via web browsers. This study gives a proper definition to cloud computing, highlighted the related technologies, the essential characteristics, cloud architecture and components. Comparison among three service models (SaaS, PaaS, and IaaS) as well as deployment models: private, public, and community cloud has been given. Furthermore, the chapter includes information security requirements of public and private cloud according to different service models. The aim of this chapter is to giving the researchers a clear vision about this technology and the information security requirements for private and public cloud as well as the main security issues for future researches.

Mohammad Ubaidullah Bokhari, Qahtan Makki, Yahya Kord Tamandani
Adapting and Reducing Cost in Cloud Paradigm (ARCCP)

Cloud computing paradigm has been gaining popularity day by day. This is because of the enormous benefits it offers to the user or provider. The upfront cost of setting up a business is greatly reduced by adopting the required delivery models such as platform, software or application as service. In terms of storage, it provides formidable redundancy and guarantees at much lower prices as compared to setting up own data centers. These enormous benefits come at cost and ways and means have to be adopted in order to reduce the recurring and non-recurring cost involved in embracing this technology. Numerous researchers have proposed various solutions for reducing the Cloud Application Development and Deployment cost reduction strategies and techniques. Few authors have proposed efficient service discovery techniques. Others have studied the impact of data transfer from one Virtual machine (VM) to other VM, when both these are on one physical machine or two different machines. Yet, few authors have worked on Multitenant databases for cost reduction. In those situations, where returns on investment fall into high-risk category, Ad Hoc Cloud paradigm has been proposed.

Khushboo Tripathi, Dharmender Singh Kushwaha
Power Aware-Based Workflow Model of Grid Computing Using Ant-Based Heuristic Approach

Grid computing is treated as one of the emerging fields in distributed computing; it exploits the services like sharing of resources and scheduling of workflows. One of the major issues in grid computing is resource scheduling, this can be handled using the ant colony optimization algorithm, and it can be implemented in PERMA-G framework and it is an extended version of our previous work. The ant colony optimization is used to reduce the energy consumption and execution time of the tasks. It follows the nature of ant colony mechanism to compute the total execution time and power consumption of the tasks scheduled dynamically, the experimental results show the performance of the proposed model.

T. Sunil Kumar Reddy, Dasari Naga Raju, P. Ravi Kumar, S. R. Raj Kumar
Image Categorization Using Improved Data Mining Technique

Image categorization is one of the important branches of artificial intelligence. Categorization of images is a way of grouping images according to their similarity. Image categorization uses various features of images like texture, color component, shape, edge, etc. Categorization process has various steps like image preprocessing, object detection, object segmentation, feature extraction, and object classification. For the past few years, researchers have been contributing different algorithms in the two most common machine learning categories to either cluster or classify images. The goal of this paper is to discuss two of the most popular machine learning algorithms: Nearest Neighbor (k-NN) for image classification and Means clustering algorithm. After that, a Hybrid model of both the above algorithms is proposed. These algorithms are implemented in MATLAB; finally, the experimental results of each algorithm are presented and discussed.

Pinki Solanki, Girdhar Gopal
An Effective Hybrid Encryption Algorithm for Ensuring Cloud Data Security

Cloud computing is one of the most research hot topics in IT industry nowadays. A lot of startup organizations are adopting cloud eagerly due to massive cloud facilities available with minimal investment; but as every coin has two sides, so with cloud. In the cloud, the user data is stored at some off-site location. So cloud data security is one of the main concerns of any organizations, before shifting to the cloud. The data owners can ensure the data security at its premises using firewalls, VPN (Virtual Private Network) like most used security options. But as data owner stores their sensitive data to remote servers and users access required data from these remote cloud servers, which is not under their control. So storing data outside client premises, raises the issue of data security. Thus, one of the primary research areas in cloud computing is cloud data protection. In this research paper, strategies followed include categorization of the data on the basis of their sensitivity and importance, followed by the various cryptography techniques such as the AES (a Symmetric Cryptography technique), SHA-1 (a Hashing technique), and ECC (Elliptic curve Cryptography (an Asymmetric Cryptography technique). Till date, most of the authors were using a single key for both encryption and decryption which is a weak target of various identified malicious attacks. Hence, in the designed hybrid algorithm, two separate keys are used for each encryption and decryption. The cloud user who wants to access cloud data, need to first register with CSP and cloud owner. After registration, user login id, password and OTP (One Time Password) sent to the user registered mobile number, are required to access the encrypted cloud data.

Vikas Goyal, Chander Kant
Big Data Analytics: Recent and Emerging Application in Services Industry

The term ‘Big Data’ initially used by Roger Magoulas [1] from O’Reilly media in 2005 is all modern day large data sets and different stakeholders understand the phenomenon in multiple ways. Knowledge workers view Big Data as huge correlated data sets requiring super large computing powers to analyze the process and derive meaningful conclusions. Big Data is multidimensional data these days ranging from info-bytes from newspapers to online journals, from tweets to YouTube videos, social networking updates and blog discussions, which any business/organization is accumulating thru various information channels. The huge explosion of data and increase in Internet devices has led to the rapid rise of Big Data. Majority of data that contribute to Big Data comes from Internet sources or are Internet of Things. The scale of data set referred as “Big Data can be defined as data that legacy DBMS tools can’t load and understand the heterogeneous relationship.” With the massive current technological advances the cutoff size of data sets qualifying as Big Data is bound to increase. We look at some of the underlying concepts of Big Data and propose solutions in Service Industry as there the quality of service is a key differentiator as compared traditional manufacturing industry where the quality of finished goods or product is important. The primary advantage of Big Data is by aggregating large amount of data integrated from various sources such as CRM, social media, email, web, mobile and tablet and data acquisition from other technologies. In this paper, we try to review the potential applications of Big Data in Service Industry and understand the source of its decision making data from Internet. We understand that the business is not interested in single data but desires to look at the trend and patterns in the data which might boost the business tremendously, and Internet of things would be creating streams of data needed for this purpose.

Rajesh Math
An Analysis of Resource-Aware Adaptive Scheduling for HPC Clusters with Hadoop

High-Performance Computing (HPC) is one of the upcoming technologies that represent data-intensive and compute-intensive applications. HPC–on–Cloud is an added advantage to enhance the efficiency of massively parallel applications. Hadoop-MapReduce is a programming paradigm designed to process parallel data on cloud. The key to improve performance of Hadoop-MapReduce lies with Efficient Resource allocation and Scheduling. In this paper, we analyze the behaviour of resource-aware adaptive scheduling which aims to improve resource utilization in MapReduce clusters.

S. Rashmi, Anirban Basu
Analytical and Perspective Approach of Big Data in Cloud Computing

Cloud computing is a term which involves delivering services over the Internet at a cheaper cost. Big data is a massive collection of data sets having huge and complicated structures which are difficult to store, analyze and visualize. Cloud computing is the commonly used technology over which big data are managed and stored. The research in this domain has increased over past few years. In order to investigate the usage, issues and challenges by combining the big data in cloud computing, a systematic literature review is conducted. The review included various publications from 2004 to 2014 as a primary study. With the use of search techniques considered, 96 research papers were recognized out of which 23 were identified as relevant papers. The paper presents the various research progresses related to big data in cloud computing. It will also help the researchers to figure out the current and future scenario of research in big data using cloud computing technology.

Rekha Pal, Tanvi Anand, Sanjay Kumar Dubey
Implementation of CouchDBViews

Flexible data model and horizontal scalability are the need of contemporary era to handle huge heterogeneous data. This has lead to the popularity of NoSQL Databases. CouchDB is an admired and easy to use choice among NoSQL Document-Oriented databases. CouchDB is developed in Erlang language. CouchDB’s RESTful (Representational State Transfer) APIs (Application Programming Interface) make it special because they allow database access through http (Hyper Text Transfer Protocol) requests. This access in the form of HTTP requests is achieved with the help of command line utility Curl. The Futon, web-based utility of CouchDB, is also used to manage documents, databases, and replication in CouchDB. CouchDB uses a special type of system for querying data than traditional RDBMS (Relational Database Management Systems) i.e. views. This paper explains various unique features of CouchDB which distinguish it from RDBMS. It also includes implementation of temporary and permanent views using MapReduce.

Subita Kumari, Pankaj Gupta
Evolution of FOAF and SIOC in Semantic Web: A Survey

The era of social web has been growing tremendously over the web. Users are getting allured towards new paradigms tools and services of social web. The amount of information available on social web is produced by sharing of beliefs, reviews and knowledge by various online communities. Interoperability and portability of social data are one of the major bottlenecks of social network applications like Facebook, Twitter, Flicker and many more. In order to represent and integrate social information explicitly and efficiently, it is mandatory to enrich social information with the power of semantics. The paper is categorized into following sections. Section 2 describes various studies conducted in context of social semantic web. Section 3 makes readers aware of concept of social web and various issues associated with it. Section 4 describes use of ontologies in achieving interoperability between social and semantic web. Section 5 concludes the giver paper.

Gagandeep Singh Narula, Usha Yadav, Neelam Duhan, Vishal Jain
Classification of E-commerce Products Using RepTree and K-means Hybrid Approach

The paper discusses an algorithm that groups the items on the basis of their attributes and then classifies the clusters. In other words, the proposed algorithms first cluster the items on the basis of property, i.e., attributes available for the dataset. The clustering is performed by K-means clustering. Then this clustered data is classified using the RepTree. In other words, the proposed algorithm is the hybrid algorithm of K-means clustering and the RepTree classification. The proposed algorithm is compared with the RepTree algorithm using the WEKA tool. The comparison is done over clothing dataset downloaded from Internet. The proposed algorithm decreases the mean absolute error as well as the root-mean-square error. The decrease in error results in accurate classification. So the proposed algorithm clusters the items and classifies them on the basis of their attributes more accurately.

Neha Midha, Vikram Singh
A Study of Factors Affecting MapReduce Scheduling

MapReduce is a programming model for parallel distributed processing of large-scale data. Hadoop framework is an implementation of MapReduce. Since MapReduce processes data parallel on clusters of nodes, there is a need to have a good scheduling technique to optimize performance. Performance of MapReduce scheduling depends upon various points like execution time, resource utilization across the cluster, data locality, compute capacity, energy efficiency, heterogeneity, scaling, etc. Researchers have developed various algorithms to resolve some or the other problem and reach a near-optimal solution. This paper summarizes most of the research work done in this regard.

Manisha Gaur, Bhawna Minocha, Sunil Kumar Muttoo
Outlier Detection in Agriculture Domain: Application and Techniques

Outliers are those values that do not comply with the general behavior of the existing data. Outliers vary quantitatively from rest of the data, according to any outlier-selection algorithm. Normal data values or objects follow a common generating mechanism, whereas the abnormal objects deviated from that mechanism and it seems that they have been generated from some different mechanisms. These abnormal data objects are referred as “Outliers”. In this paper, authors have tried to explore various applications and techniques of outlier detection. Further, an algorithm for detecting the outliers in agriculture domain has been proposed and its implementation through hand-coded ETL tool, AGRETL, has been discussed. The results show the significant improvement, when the algorithm was validated on the real-time dataset.

Sonal Sharma, Rajni Jain
A Framework for Twitter Data Analysis

As the number of users increases on the Internet, the data is also increasing with the exponential rate. A number of websites are available where terabytes of data is generated daily. As the e-commerce sites are gaining popularity, the data from the various customers, reviews, transactions logs, web search logs, etc. is being generated on daily basis. Other than e-commerce sites some other very important sources of data on the internet are social network sites and web search engines. Social networking sites like Facebook and Twitter have billions of users which generate petabytes of data with a high rate that emerges a new era in the field of data science and that is “Big Data”. In this paper, we have proposed a framework that could analyze Twitter data which is one of the major sources of big data. In our study, we have focused on the political domain and proposed a framework that extracts the tweets from any location and classifies them as political or not. After classification out system also extracts the sentiments from those tweets in order to understand the emotions regarding various political issues at a particular place.

Imran Khan, S. K. Naqvi, Mansaf Alam, S. N. A. Rizvi
Web Structure Mining Algorithms: A Survey

World Wide Web (WWW) is a massive collection of information and due to its rapid growing size, information retrieval becomes more challenging task to the user. Web mining techniques such as web content mining, web usage mining, and web structure mining are used to make the information retrieval more efficient. In this paper, study is focused on the web structure mining and different link analysis algorithms. Further, a comparative review of these algorithms is given.

Neha Tyagi, Santosh Kumar Gupta
Big Data Analytics via IoT with Cloud Service

This publish conceptualizes the origin of data accumulated with the help of Internet of Things and analysis, storage, providing security to data, measures taken to meet the demands of market with the help of cloud services. Big data emphasizes the organization for marketing effectively by reaching to communally noteworthy issues. Big data is centralized on developing variable and highly attained computing, analytics as well as governance in a model such that organization can be related with variety of fields like health care.

Saritha Dittakavi, Goutham Bhamidipati, V. Siva Krishna Neelam
A Proposed Contextual Model for Big Data Analysis Using Advanced Analytics

Big Data has numerous issues related to its primary defining characteristics of the three V’s: Variety, Volume and Velocity. A greater segment of Big Data is attributed to semi-structured or unstructured text that emanates from social interactions on the web, emails, tweets, blogs, etc. Conventional approaches are overwhelmed by the data deluge and fall short to perform. These challenges consequently create scope for research in developing models to analyze data and extract actionable insights to realize the fourth V, i.e., Value. The purpose of this paper is to propose a contextual model for Resume Analytics that utilizes Semantic technologies and Analytic (Descriptive, Predictive and Prescriptive) procedures to find a befitting match between a job and candidate(s). The related work, issues and challenges and design requirements are presented along with a discussion of the analytical framework for the opted use case.

Manjula Ramannavar, Nandini S. Sidnal
Ranked Search Over Encrypted Cloud Data in Azure Using Secure K-NN

Commercial cloud service providers are approached to maintain huge data and run the applications on their platforms. But when sensitive data is to be outsourced, the data owners expect privacy for their data from being known to the cloud management. In such cases to incorporate privacy and security, encrypted cloud services are of supreme requirement. Unfortunately, traditional encryption methods are not fair for this purpose. So here we define a method using which the cloud data can be encrypted allowing the best relevant document search. Based on secure kNN computation, this method is made to evaluate similarity measure for the user’s search keyword while allowing encryption. The method includes encryption of documents before outsourcing to the cloud environment using a random matrix. For enriched privacy, keyword encryption is made and finally, top-k documents are retrieved. The approach is compared with multiple similarity measures like Euclidean, Manhattan, and Cosine distances, to find the best relevant document for the user’s query.

Himaja Cheruku, P. Subhashini
DCI3 Model for Privacy Preserving in Big Data

Big Data is like a hot cake in the market. It is an important discussion topic of different areas, like marketing management, research in science, security areas, etc. Nowadays, everyone is curious about the privacy risks of big and thereof legal issues emerge from the breach of privacy. In this paper, we have tried to cover major privacy and legal issues specific to Big Data. We have summarized different method to tackle the breach of privacy. Also, we have proposed DCI3 legal model to reduce the legal problems for the security of data and information. Some legal cases from different areas specific to Big Data are also presented in the end.

Hemlata, Preeti Gulia
Study of Sentiment Analysis Using Hadoop

In the current world of Internet people express themselves, present their views and feelings about specific topics or entities using various social media application. These posts from users present a huge opportunity for the organizations to increase their market value by analyzing the posts and using information in decision making. These posts can be studied using various machine learning and lexicon-based approaches for extracting its sentiments. With more and more people moving to internet, huge data is being produced every second and challenge is to store this large data and process it efficiently in real time to infer knowledge from this data. This paper presents different approaches for real-time and scalable ways of performing sentiment analysis using Hadoop in a time efficient manner. Hadoop and its component tools like MapReduce, Mahout, and Hive are being surveyed in different scholar articles for this paper.

Dipty Sharma
OPTIMA (OPinionated Tweet Implied Mining and Analysis)
An Innovative Tool to Automate Sentiment Analysis

The prevalent social media usage ramification has recently directed the research into the area of “Sentiment Analysis” producing potpourri of interesting results to analyze and apply in diverse domains. Among its existence in several forms, “Twitter” (an instance of the social media) is the most popular micro-blogging platform. Hence, contextually, “Sentiment analysis” connotes to assortment of user’s opinions expressed over the “Twitter,” into positive and negative classes. To exemplify and establish the essence, significance, and incidence of Opinion Mining, a corpus of tweets on “Windows 10” has been collected and built, with an objective to provide aid in customer feedback loop during its beta release. This paper focuses on the significance of sentiment analysis in a reasoned manner by epitomizing the working methodology of opinion mining techniques via its automated implementation. To address the purpose, an innovative tool has been developed named OPTIMA (OPinionated Tweet Implied Mining and Analysis) ver. 1.0.0., to automate the process of sentiment analysis and the results have been presented in a graphical form, in an analytical and comprehensive manner.

Ram Chatterjee, Monika Goyal
Mobile Agent Based MapReduce Framework for Big Data Processing

This chapter gives the information regarding Big Data, MapReduce Framework, and Stragglers in MapReduce Network, their current situation, their impact, and scope in today’s reality. Paper proceeds with information about MapReduce strategy for Big Data handling and the vicinity of stragglers in MapReduce Network. Further, the significance of mitigating straggler is talked about, alongside their effects. This paper also introduces the mobile agent technology for processing Big Data utilizing MapReduce system and its implementation results.

Umesh Kumar, Sapna Gambhir
Review of Parallel Apriori Algorithm on MapReduce Framework for Performance Enhancement

Finding frequent itemsets in the large transactional database is considered as one of the most and significant issues in data mining. Apriori is one of the popular algorithms that widely used as a solution of addressing the same issue. However, it has computing power shortage to deal with large data sets. Various modified Apriori-like algorithms have been proposed to enhance the performance of traditional Apriori algorithm that works on distributed platform. Developing efficient and fast computing algorithm to handle large data sets becomes a challenging task due to load balancing, synchronisation and fault-tolerance issue. In order to overcome these problems, MapReduce model comes into existence, originally introduced by Google. MapReduce model-based parallel Apriori algorithm finds the frequent itemsets from large data sets using a large number of computers in distributed computational environment. In this paper, we mainly focused on parallel Apriori algorithm and its different versions based on approaches used to implement them. We also explored on current major open issues and extensions of MapReduce framework along with future research directions.

Ruchi Agarwal, Sunny Singh, Satvik Vats
A Novel Approach to Realize Internet of Intelligent Things

In this era of emerging technologies, Internet of Things is one which is ready to change the world on how it works. Today, a new breed of generation of human beings has evolved which love to be always “connected” to the other part of world, publish their opinions on social networks, and tweet about their sentiments on Twitter. Human beings are surrounded with physical objects. Is there a way so that these surrounding objects become part of the human community and the answer is YES: Internet of Things, where everything is connected to everything else. But connecting the “Things” is not just the answer we are looking to. In this paper, we are trying to propose a framework which does not tell that the things are mere an abstraction but also they are “intelligent” enough like human beings, they have the intelligence to take decision on their own, and they can also post their sentiments on social networks like human beings. In this paper, we are trying to implement a system which can automatically contact the manufacturer, block a calendar for the maintenance, and drop an e-mail to the owner to make a credit card payment to get the maintenance done before it breaks.

Vishal Mehta
An Innovative Approach of Web Page Ranking Using Hadoop- and Map Reduce-Based Cloud Framework

In this era of Big Data, Web page searching and ranking in an efficient manner on WWW to satisfy the search needs of modern-day user is undoubtedly a major challenge for search engines. In this paper, we propose an innovative algorithm based on Hadoop–Map Reduce-supported cloud computing framework that can be implemented in the form of Meta Search and Page Ranking Tool to efficiently search and rank Big Data available on WWW which is increasing in the scale of megabytes to terabytes per day. An extensive experimental evaluation shows that the average ranking precision of proposed algorithm and Meta tool is better than other popular search engines.

Dheeraj Malhotra, Monica Malhotra, O. P. Rishi
SAASQUAL: A Quality Model for Evaluating SaaS on the Cloud Computing Environment

Cloud computing is a technology that has come out in the last decade and that is transforming the IT industry in huge. The cloud computing is playing a vital role as a backbone component of the Internet of Things (IoT). In a cloud computing scenario, cloud services are accessible via Internet. Cloud computing is providing on-demand resources like infrastructure, platform, and software as it does not pay to possess the software itself but rather to use it. Pay for use concept is very attractive, hence many organizations are adopting the SaaS model drastically. Even though, each customer is unique and leads to unique variation in the requirements of the software. The SaaS is generally pressed into service, and it yields advantages to service providers and service customers. More and more SaaS services are emerging, how to select qualified service is key problem for customers. Present quality models are not sufficient to evaluate SaaS selection on the cloud due to its tremendous increase in the use. A quality model can be used to represent, evaluate and differentiate the quality of the SaaS providers. In this paper, a new quality model proposed and named SAASQUAL for cloud software services. This model is based on different attributes of quality software, quality service and metrics that measure software quality and service quality in order to evaluate potential software as a service on the cloud.

Dhanamma Jagli, Seema Purohit, N. Subhash Chandra
Scalable Aspect-Based Summarization in the Hadoop Environment

In the present-day scenario, selecting a good product is a cumbersome process. The reviews from the shopping sites may confuse the user while purchasing the product. It becomes hard for the customers to go through all the reviews, even when they read they may get into a baffling state. Some consumers may like to buy the best product based on its features and its extra comfort. Meanwhile, the size of the datasets for analysis process is huge which cannot be handled by traditional systems. In order to handle the large datasets, we are proposing a parallel approach using Hadoop cluster for extracting the feature and opinion. Then by using online sentiment dictionary and interaction information method, predict the sentiments followed by summarization using clustering. After classifying each opinion words, our summarization system generates an easily readable summary for that particular product based on aspects.

Kalyanasundaram Krishnakumari, Elango Sivasankar
Parallel Mining of Frequent Itemsets from Memory-Mapped Files

Due to digitization of data in different fields, data are increasing in leaps and bounds. Mining of these large amounts of data requires two major issues to deal with. The first is the potential to deal with huge data which can be dealt with parallel algorithms as serial algorithms may take very long time or sometimes may not process. The second is the I/O overhead which can be dealt with memory mapping of files. This chapter brings together both parallelization and memory mapping of files concepts in mining the frequent itemsets. Our experiments proved that there is almost 20% more speedup on parallelizing our frequent itemset mining algorithm with memory mapping when compared to conventional I/O without memory mapping.

T. Anuradha
Handling Smurfing Through Big Data

Money laundering is a worrying term for every country’s economy these days. Leading economists of all major developed and developing economies are concerned to devise methods to prevent it. The economy of a country is weakened by the impact of money laundering. Networks created between various banks in different countries facilitate online money transfer, which is turning the process of money laundering into digital money laundering. This promotes money launderers to perform wired transactions from anywhere. People involved in the process of money laundering are efficiently using online banking as their weapon. Evading the anti-money laundering agencies is becoming easier for them because of having online bank accounts. Such people are misusing technology. Therefore, it is restricting one’s own country’s economic progress. But with the help of recently developed technologies, we are able to prevent such illegal activities. Scrutinizing all the transactions and investigating them manually at financial intelligence units are cumbersome tasks because petabytes of transactions are taking place each day. Advanced technologies like Big Data enable us to detect the suspicious customers possibly involved in money laundering. In this paper, we have proposed a methodology using big data to detect smurfing; based on which, suspicious people involved in money laundering may be identified and appropriate action can be taken against them.

Akshay Chadha, Preeti Kaur
A Novel Approach for Semantic Prefetching Using Semantic Information and Semantic Association

Exponential growth of web accesses on the Internet causes substantial delays in providing services to the user. Web prefetching is an effective solution that can improve the performance of the web by reducing the latency perceived by the user. Content on the web page also provides meaningful data to predict the future requests. This paper presents a content-based semantic prefetching approach. The proposed approach basically works on the semantic preferences of the tokens present in the anchor text associated with the URLs. To make more accurate predictions, it also uses the semantic information which is explicitly embedded with each link. It then computes the semantic association between the tokens and links then associates weightage in order to improve the prediction accuracy. This prefetching scheme would be more effective for long browsing sessions and will achieve good hit rate.

Sonia Setia, Jyoti, Neelam Duhan
Optimized Cost Model with Optimal Disk Usage for Cloud

Cloud is a bag full of resources. Using cloud services at an optimal level is required as now cloud is primary technology for deployment over Internet. This is indeed a practice to make use of things efficiently to make cloud a better place. Cloud is providing all computing resources that one may need to compile tasks, but efficiently using of resources can increase the power to accommodate more consumers and also consumer can save on cost for the services subscribed. This paper provides a mechanism to increase or decrease the subscription as per the use.

Mayank Aggrawal, Nishant Kumar, Raj Kumar
Understanding Live Migration Techniques Intended for Resource Interference Minimization in Virtualized Cloud Environment

Cloud computing is consolidated as an environment which allows concurrent execution of various cloud applications of different organizations via a shared pool of resources. Each cloud user is provided with virtual machine to have further interaction with the cloud architecture components. Effective management of these virtualized machines along with satisfactory level of SLA is major challenge. Due to the resource overbooking over the physical host running, virtual machines need to be migrated from source to destination host. The migrated machine may disrupt other ongoing virtualized machines on destination host which can lead the application performance degradation. This paper provides insight of existing interference-aware live virtual machine migration techniques. As well taxonomy of the resource interference has been introduces. This paper also contains the comparative study of the performance assessment matrix, issues resolved, and mathematical models used by available live migration techniques that can act major key point while making live migration decisions. This paper is useful to cloud architect and the researchers working on automated live VM migration decision support system to achieve higher-level satisfaction of SLA by providing maximum quality-of-service parameters.

Tarannum Bloch, R. Sridaran, CSR Prashanth
Cloud Security Issues and Challenges

Cloud computing, is open for all from any location of the globe, to utilize the services and resources as per the demand of an individual. Nowadays, any organization can easily migrate its entire system on cloud as it gives the pay-as-you-go service. Cloud has benefits like multi-tenancy, data storage, resource pooling, and a very obvious virtualization. Though there are advantages, cloud computing also has security flaws like loss of subtle data, data escape, cloning, and other security challenges related to virtualization. Because security challenges of cloud are very high, a large area of study is required to signify risks in services and deployment models of the cloud. This study represents the cloud security problems in numerous cloud-related fields and the threats related to cloud model and cloud network. This paper will also mitigate several issues related to virtualization and will be precisely addressed with side effects.

Dhaivat Dave, Nayana Meruliya, Tirth D. Gajjar, Grishma T. Ghoda, Disha H. Parekh, R. Sridaran
A Novel Approach to Protect Cloud Environments Against DDOS Attacks

Virtualization, which is reflected as the backbone of cloud computing, provides cost-effective resource sharing. Owed to the existence of multiple virtual machines, it passages several challenges. Among the numerous virtualization attacks, the Distributed Denial of Service, widely known as the DDoS attack, is considered to be the momentous. These spells consumes large amount of server resources here after denies access to genuine users. DDoS attacks impact is more in cloud computing as sharing of resources is the innate character of a cloud. At hand are a few kinds of literature dealing with the mitigation of DDoS attacks using single server tactic nonetheless they suffer from poor response time. The proposed model in this paper uses multiple servers to deal with the alleviation of DDoS. An initial test result on the proposed model has provided us with better scalability and protection against further attacks.

Nagaraju Kilari, R. Sridaran
An Approach for Workflow Scheduling in Cloud Using ACO

Clouds have emerged as a new model for service provisioning in heterogeneous distributed systems. In this model, users can achieve their Quality of Service from service providers through service level agreements. In addition to that, cloud resources are heterogeneous in nature. Workflow is made up of heterogeneous tasks in terms of their length, runtime, input, and output data. Hence, cloud is the best computing environment for scientific workflow. Workflow scheduling problem by considering the parameters of makespan, cost, and resource utilization is one of the interesting problems in cloud. In this paper, we propose steps to map workflow tasks to cloud resources using ACO (Ant Colony Optimization) that attempts to minimize the makespan, resource cost and maximize the resource utilization.

V Vinothina, R Sridaran
Data Type Identification and Extension Validator Framework Model for Public Cloud Storage

Cloud online storage is one of the sensational and more sensitive topics among the cloud users and cloud service providers. The cloud users can store and retrieve their data in different file formats like document, audio, video, image, and compressed files. Most of the cloud service providers are not providing basic level of encryption service to the data stored in online, so the cloud service providers are not taking care of file format issues. Some of the users may store the files without the file extension and that type of data file may raise the privacy issues for users. This paper discusses and proposes a Data Type Identification and Extension Validator (DTI&EV) framework model for cloud storage to avoid the privacy issues and other regulatory issues related to data file.

D. Boopathy, M. Sundaresan
Robust Fuzzy Neuro system for Big Data Analytics

Big Data is the name given to relationship of data size and its processing speed. These days, it is a high challenge to construct architecture to take out information economically from huge, diverse volume of data at significant rate. So, there is a need to find cost-effective and time-efficient solutions for the major challenges of fast growing volume and uncertainty. Through this paper, we can become skilled in big data analytics, its tools, and application areas. It also presents uncertainty issues related to Big Data for which the solution we provided by combining fuzzy and neural network concepts to assemble a new intelligent system ANFIS that has accumulated characteristics to get the results by relating knowledge representation, uncertainty, and modeling the key feature of big data to provide an optimal solution. Combined intelligent system is proposed to solve complex problems in the domain of big data to give superior modeling and computation to tackle uncertainty issues.

Ritu Taneja, Deepti Gaur
Deployment of Cloud Using Open-Source Virtualization: Study of VM Migration Methods and Benefits

Cloud computing has become a buzz word in the field of Information Technology today. It increases the machine potential in terms of computing using virtualization as a core technology. Virtualization is a core of cloud computing in which creation of virtual machines provides the scalability and portability by hosting the components of different applications. Requirements in the cloud environment are dynamic; therefore, there is always a need to move virtual machines within the same cloud or in different clouds. The goal of this paper is to conduct the experiment for deployment of cloud using open source and show the virtual machine (VM) migration between different hosts within a cloud in different scenarios. For secure migration of VM, we have used secure shell method and compared open-source virtualization with other technologies available in the market. The experiment for this study was conducted in the computer service center of IIT Delhi on their private cloud “Baadal.”

Garima Rastogi, Satya Narayan, Gopal Krishan, Rama Sushil
Implementation of Category-Wise Focused Web Crawler

The size of the World Wide Web is increasing rapidly and has reached a point where it is difficult to handle and manage such amount of information. Search engines are used to gather, index and make available the information across the web for the users. A web crawler is an important part of search engine that finds all the information. As the size of the web is beyond our imaginations, a user needs and focuses only on relevant information available on the web. Focused crawler is a crawler that gives only relevant information to the users and discards the information that is not relevant. The objective of the paper is to implement category-wise focused web crawler so that the user will be able to get focused and relevant information.

Jyoti Pruthi, Monika
MAYA: An Approach for Energy and Cost Optimization for Mobile Cloud Computing Environments

Mobile Cloud Computing (MCC) is the latest paradigm shift to cope up the inherent limitations of SMDs (smart mobile devices). Various strategies have been proposed by the research community to counter these limitations, but current solutions focus mainly on energy and resources optimization and price benefits are still not explored. This paper presents a three-tier Cloud model, MAYA (mobile agility augmentation), to optimize energy and costs savings for MCC users, whereas at the same time it also reduces monetary cost for service providers. It categorizes the users in different price paying categories viz., maximum price paying users, medium price paying users, and low price paying users, based on the remaining battery and focuses on augmenting execution of compute intensive mobile workflow applications using Cloud resources, more profoundly known as offloading. This paper also presents two scheduling techniques for the realization of such system and shows the effectiveness of the MAYA system in minimizing the SaaS (Software as a service) provider’s monetary cost as well as service user’s cost.

Jitender Kumar, Amita Malik
Load Balancing in Cloud—A Systematic Review

Cloud computing is an upcoming technology, which has been recently introduced in the field of IT for delivering services that are hosted over the Internet. It is an amalgamation of Grid computing, Utility computing, Autonomic computing, and utilizes the concept of virtualization. It provides on demand service to the users for accessing resources, information, and software as per their needs. With increased popularity, there has been a tremendous increase in the demands of services by the users, which can be fulfilled by effective load balancing techniques. Load balancing allows even distribution of workload across various nodes in the cloud and aims to provide efficient utilization of resources, improving the system performance, minimizing the resource consumption resulting in low energy usage. In this paper, load balancing techniques proposed by researchers have been discussed and studied and a comparative analysis is being provided based on certain parameters.

Veenita Kunwar, Neha Agarwal, Ajay Rana, J. P. Pandey
Cloud-Based Big Data Analytics—A Survey of Current Research and Future Directions

The advent of the digital age has led to a rise in different types of data with every passing day. In fact, it is expected that half of the total data will be on the cloud by 2016. This data is complex and needs to be stored, processed, and analyzed for information that can be used by organizations. Cloud computing provides an apt platform for big data analytics in view of the storage and computing requirements of the latter. This makes cloud-based analytics a viable research field. However, several issues need to be addressed and risks need to be mitigated before practical applications of this synergistic model can be popularly used. This paper explores the existing research, challenges, open issues, and future research direction for this field of study.

Samiya Khan, Kashish Ara Shakil, Mansaf Alam
Fully Homomorphic Encryption Scheme with Probabilistic Encryption Based on Euler’s Theorem and Application in Cloud Computing

Homomorphic encryption is an encryption scheme that allows different operations on encrypted data and produces the same result as well that the operations performed on the plaintext. Homomorphic encryption can be used to enhance the security measure of un-trusted systems which manipulates and stores sensitive data. Therefore, homomorphic encryption can be used in cloud computing environment for ensuring the confidentiality of processed data. In this paper, we propose a fully Homomorphic Encryption Scheme with probabilistic encryption for better security in cloud computing.

Vinod Kumar, Rajendra Kumar, Santosh Kumar Pandey, Mansaf Alam
Big Data: Issues, Challenges, and Techniques in Business Intelligence

During the last decade, the most challenging problem the world envisaged was big data problem. The big data problem means that data is growing at a much faster rate than computational speeds. And it is the result of the fact that storage cost is getting cheaper day by day, so people as well as almost all business or scientific organizations are storing more and more data. Social activities, scientific experiments, biological explorations along with the sensor devices are great big data contributors. Big data is beneficial to the society and business but at the same time, it brings challenges to the scientific communities. The existing traditional tools, machine learning algorithms, and techniques are not capable of handling, managing, and analyzing big data, although various scalable machine learning algorithms, techniques, and tools (e.g., Hadoop and Apache Spark open source platforms) are prevalent. In this paper, we have identified the most pertinent issues and challenges related to big data and point out a comprehensive comparison of various techniques for handling big data problem.

Mudasir Ahmad Wani, Suraiya Jabin
Cloud Computing in Bioinformatics and Big Data Analytics: Current Status and Future Research

Bioinformatics research involves a huge amount of data which is complex in nature. It also involves analysis of huge data sets. Conventional techniques used in bioinformatics takes a lot of time to get results and also it’s difficult to analyze the complex nature of data involved. Therefore, machines having huge processing capabilities are required leading to an escalation in the amount of money which is required do research in bioinformatics field. The problems faced by bioinformatics researchers in order to carry out their research in an economic and fast manner can be solved easily with the help of Cloud computing concepts. Thus, cloud computing is a boon for bioinformatics research. In this paper, we have discussed how the cloud computing will be helpful for bioinformatics researchers ultimately acting as a stepping stone towards big data analytics. It also explains about the current state of the art in bioinformatics and big data analytics and potential future research issues that need to be addressed.

Kashish Ara Shakil, Mansaf Alam
Generalized Query Processing Mechanism in Cloud Database Management System

This is an epoch of Big data, Cloud computing, Cloud Database Management techniques. Traditional database approaches are not suitable for such colossal amount of data. To overcome the limitations of RDBMS, Map Reduce codes can be considered as a probable solution for such huge amount of data processing. Map Reduce codes provide both scalability and reliability. Users till date can work snugly with traditional Database approaches such as SQL, MYSQL, ORACLE, DB2, etc., and they are not aware of Map Reduce codes. In this paper, we are proposing a model which can convert any RDBMS queries to Map Reduce codes. We also gear optimization technique which can improve the performance of such amalgam approach.

Shweta Malhotra, Mohammad Najmud Doja, Bashir Alam, Mansaf Alam
Deliberative Study of Security Issues in Cloud Computing

Cloud computing is an intact new archetype that offers a non-conventional computing exemplar for association to take up information technology and respected utility. Cloud computing provides platform for entrance to numerous, boundless site from flexible work out to on require provision to active storage and computing prerequisite execution. It is observed that potential gain attained in the course of cloud computing is at rest uncertain for generously reachable resources and open-ended resources which blow cloud implementation. Design, level dependency, flexibility, and multi-tenancy are such factors which penetrate new dimension of cloud form. Study of cloud problems is discussed in this article. Varying, active, and secure cloud model implementation face so many challenges which are covered in this paper. Any proposed security for cloud is enclosed, and derivative aspect of cloud security are denoted by this survey.

Chandani Kathad, Tosal Bhalodia
An Overview of Optimized Computing Approach: Green Cloud Computing

Distributed computing is an exceptionally versatile and techno-financial structural planning for running high-performance computing (HPC), venture, and Web applications. As the utilization of tremendous server farms (DC) and immense group jumps up step by step, vitality utilization by these DC is raising speedier. This high vitality utilization influences the high operational expense as well as results in high carbon discharges. Ideal vitality arrangements are obliged to check the effect of Cloud processing on the earth. Expanded processor chips usage frees more warmth. This pointless warming requires furthermore cooling, and cooling again makes warm; then, we move to a stage where we have to change the structure by getting the same registering speed at lessened impressiveness use. Cloud computing with green calculation can empower more vitality upgraded utilization of figuring force.

Archana Gondalia, Rahul N. Vaza, Amit B. Parmar
A Literature Review of QoS with Load Balancing in Cloud Computing Environment

Cloud computing is a type of computing technology which can be considered as a new model of computing. It also can be considered as a speedily emerging new technique for providing computing as a service. In cloud computing, many cloud users demand various services as per their daily new needs. So the function of cloud computing is to provide all the desired services to the cloud users. But due to limited resources, it is very difficult for cloud providers to provide all the users desired services. From the cloud providers, perception cloud resources must be allotted in a rational manner. So, it is a major issue to meet cloud users satisfaction and QoS requirements. The aim of this paper is to present a study of previous works in load balancing and QoS methods used in the cloud computing environment. This paper mainly addresses key performance challenges and different modeling with their applications for QoS management and simulation toolkits in cloud computing.

Geeta, Shiva Prakash
WAMLB: Weighted Active Monitoring Load Balancing in Cloud Computing

Nowadays, cloud computing is an amazing and fast-growing area in research industry. It provides IT-related service through Internet. Load balancing is an essential feature in cloud computing environments, and without proper load balancing, we cannot expect better response time. In traditional active monitoring load balancing techniques in which they generally check least loaded virtual machine and those who are least loaded selected for execution of task, some authors are also select the virtual machine randomly. In our proposed strategy for assigning the virtual machine, we calculate weight factor on the basis of physical memory, bandwidth, number of processor, and processor speed. After calculating the weight of each virtual machine, we select those virtual machine that is highest weight and available for execution of the task. We also verified our results with existing work through CloudAnalyst that is CloudSim-based simulator; our result shows the improvement over existing one.

Aditya Narayan Singh, Shiva Prakash
Applications of Attribute-Based Encryption in Cloud Computing Environment

Cloud computing is becoming very popular and has very good future, but it has various security issues and that need to be addressed. Storing data at some other place have serious problems of privacy and data misuse. Attribute-based encryption has addressed there issues. In this paper, we are discussing about cloud computing and its stacks and growth of cloud computing. We have seen what is the present work going on and then concluded.

Vishnu Shankar, Karan Singh
Query Optimization: Issues and Challenges in Mining of Distributed Data

The technique of finding the optimal processing method to answer a query is called Query optimization, whereas a collection of various sites, distributed over a computer network is called Distributed Database. In Distributed Database, the site communicates with each other through networks. There are various issues arise during evaluation of query cost, among which the processing cost and a transmission cost are important. There are several algorithms developed to find the best possible solution for a particular query, but they all have their certain limitations. The optimizer is mainly concern on search space, search strategy, and the cost model. It primarily focuses on these three factors. The mining cost of a query depends on the order of evaluation of the operators, for the same query we can have different cost if the order is changed. Hence, to find the optimal cost for a particular query is emerging as an open challenge for many researchers. Therefore, the cost-based query optimization technique has emerged as an important concept for dealing with the query optimization. This paper explores the issues and challenges of query optimization in mining of distributed data.

Pramod Kumar Yadav, Sam Rizvi
Comprehensive Study of Cloud Computing and Related Security Issues

Cloud Computing is a global technology changes in order to accomplish the phenomenon that shifts traditional IT services to modern IT services as it provides computing services through a simple internet connection. Also the characteristics of “Pay per use” and on-demand services of cloud model attract the consumers more toward cloud computing. These characteristics can help us to access a shared pool of resources such as storage, networks, servers, etc., without actually, in reality, acquiring them. A lot of big IT Leaders organization such as Microsoft, Google, sales-force, Amazon, and others generate cloud computing structures and provide related services to customers. Even though advantages of cloud computing are clear in front of global computing system but there is a need for security model in cloud computing. This paper discusses and presents a comprehensive study of cloud computing and related security issues.

Manju Khari, Manoj kumar, Vaishali
Healthcare Data Analysis Using R and MongoDB

Big Data Analysis in the healthcare domain is an upcoming and nascent topic. The data that can be analyzed from the healthcare domain is typical of huge volume and is quite varying in nature. The Electronic Health Records (EHRs) of just one year for a major hospital can typically run into terabytes and since these records are both structured as well as unstructured, they are a good fit for being analyzed using Big Data tools like Hadoop, R, and Python, etc. Twitter, with its brevity of messages, is one the sources of fast moving information. Real-time information sharing has made healthcare organizations, hospitals, medical institutes, research companies to come out with their respective Twitter “handle”, from which the respective organization can share it’s official information, can be reached for information, clarifications, and even grievance redress. Such all information is unstructured with links, video, text all being shared and proper data analysis tools are required to gather meaningful data from the tweets. In this paper, we focus on Twitter (Social) data analysis using R and MongoDB. We discuss the data analysis packages available in R, which can be used to analyze tweet and EHR data. We discuss how MongoDB helps map tweets to documents, provides basic operations like aggregation and what all analysis features/packages R provides for analyzing the medical domain data.

Sonia Saini, Shruti Kohli
Data Mining Tools and Techniques for Mining Software Repositories: A Systematic Review

A software repository contains a historical and valuable wealth of information about overall development of software system (project’s status, progress, and evolution). Mining software repositories (MSR) are one of the interesting and fastest growing fields within software engineering. It focuses on extracting and analyzing the heterogeneous data available in software repositories to uncover interesting, useful, and actionable information about software system and projects. Using well-established data mining tools and techniques, professionals, practitioners, and researchers can explore the potential of this valuable data in order to better understand and manage their complicated projects and also to produce high reliable software system delivered on time and within estimated budget. This paper is an effort to discover problems encountered during development of software projects and the role of mining software repositories to resolve these problems. A comparative study of data mining tools and techniques for mining software repositories has been presented.

Tamanna Siddiqui, Ausaf Ahmad
SWOT Analysis of Cloud Computing Environment

Cloud computing is a technology which deals with the collection of a large number of computers connected together on communication networks, for example the Internet. Cloud computing, dynamically increases computing capacity or add capabilities with minimum intervention of humans and without much investment in new infrastructure. At this moment cloud computing is at infancy stage, with many groups of providers, delivering cloud-based services, from full-scale applications to storage services. In the present era, IT sector turned into cloud-based services independently, but cloud computing integrators and aggregators as of now up and coming. The virtual servers in cloud computing virtually exist and so they can be scaled on the fly without affecting the client, to some extent in spite of being physically objectified, cloud gets expanded or compressed. In this paper, an attempt is made to do a SWOT analysis of a cloud computing environment as many users are in dilemma whether to use it or not. What are the benefits and disadvantages in using the cloud? A critical and detail analysis is done by mapping its Strengths (S), Weakness (W), Opportunity (O), and Threat (T) in different ways.

Sonal Dubey, Kritika Verma, M. A. Rizvi, Khaleel Ahmad
A Review on Quality of Service in Cloud Computing

Cloud Computing is a computing technology that uses remote control servers and Internet to maintain applications and data. It is an emerging technology. Today’s Cloud computing is a wide area in research and industry. It is a term, which involves networking, virtualization, software, web services and distributed computing. In cloud computing environment there are various challenges like efficient load balancing, real benefits/business outcome, resource scheduling, datacenter energy consumption, etc. Quality of Service (QoS) plays an important role in distributed computing for multimedia and other essential applications. Aim of this paper to provide a survey of the QoS modeling approaches and other frame works suitable for cloud systems and describe their implementation details, merits and demerits. This paper supports new researchers to be able to understand main techniques used and their limitations in environment of cloud computing for providing QoS.

Geeta, Shiva Prakash
Association Rule Mining for Finding Admission Tendency of Engineering Student with Pattern Growth Approach

Association Rule Mining is one of the important techniques in data mining. Generation of the rule involves two phases where the first phase finds the frequent itemsets and second phase generates the rule. Many algorithms are specified to find frequent item set from the sequential patterns. There are mainly two approaches for finding frequent item sets. First approach is with candidate sequence generation, i.e., Apriori approach and second is the pattern growth method. If the sequence length is less, pattern growth method performs better than that of Apriori approach. In this paper, we have analyzed the pattern growth approach for the database of an engineering student. With finding associations among the attributes we can find the tendency of taking admission and prioritizing an engineering branch. To find strong and valid association rules, different measures like minInterest, lift, leverage, and conviction are considered during finding rules.

Rashmi V. Mane, V. R. Ghorpade
Integrated Effect of Nearest Neighbors and Distance Measures in k-NN Algorithm

Supervised learning or classification is the cornerstone of Data Mining. A well-known, simple, and effective algorithm for supervised classification is k-Nearest Neighbor (k-NN). A distance measure provides significant support in the process of classification and the correct choice of distance measure is the most influential process in the classification technique. Also, the choice of k in k-Nearest Neighbor algorithm plays an effective role in the accuracy of the classifier. The aim of this paper is to analyze the integrated effect of various distance measures on different values of k in k-Nearest Neighbor algorithm on different data sets taken from UCI machine learning repository.

Rashmi Agrawal
Metadata
Title
Big Data Analytics
Editors
Prof. Dr. V. B. Aggarwal
Prof. Dr. Vasudha Bhatnagar
Dr. Durgesh Kumar Mishra
Copyright Year
2018
Publisher
Springer Singapore
Electronic ISBN
978-981-10-6620-7
Print ISBN
978-981-10-6619-1
DOI
https://doi.org/10.1007/978-981-10-6620-7

Premium Partner