Skip to main content

Über dieses Buch

The work presented in this book is a combination of theoretical advancements of big data analysis, cloud computing, and their potential applications in scientific computing. The theoretical advancements are supported with illustrative examples and its applications in handling real life problems. The applications are mostly undertaken from real life situations. The book discusses major issues pertaining to big data analysis using computational intelligence techniques and some issues of cloud computing. An elaborate bibliography is provided at the end of each chapter. The material in this book includes concepts, figures, graphs, and tables to guide researchers in the area of big data analysis and cloud computing.



Theoretical Foundation of Big Data Analysis


“Atrain Distributed System” (ADS): An Infinitely Scalable Architecture for Processing Big Data of Any 4Vs

The present day world dealing with big data (expanding very fast in 4Vs: Volume, Varity, Velocity and Veracity) needs New advanced kind of logical and physical storage structures, New advanced kind of heterogeneous data structures, New mathematical theories and New models: all these four together we call by 4Ns. As on today, the 4N-set lagging behind in the race with 4V-set. If 4V-set continues its dominance over the 4N-set with respect to time, then it will be difficult to the world to think of “BIG DATA: A Revolution That Will Transform How We Live, Work, and Think”. The main objective of this chapter is to report the latest development of an easy and efficient method for processing big data with 4Ns. For processing of giant big data in 4Vs, neither any existing data structure alone nor any existing type of distributed system alone is sufficient. Even the existing network topologies like tree topology or bus/ring/star/mesh/hybrid topologies seem to be weak topologies for big data processing. For a success, there is no other way but to develop ‘new data structures’ for big data, ‘new type of distributed systems’ for big data having a very fast and tremendous extent of mutual compatibility and mutual understanding with the new data structures, ‘new type of network topologies’ for big data to support the new distributed system, and of course ‘new mathematical/logical theory’ models for big data science. The next important issue is how to integrate all these ‘new’ to make ultimately a single and simple scalable system to the laymen users with a new simple ’big data language’. With these views in mind, a special type of distributed system called by ’Atrain Distributed System’ (ADS) is designed which is very suitable for processing big data using the heterogeneous data structure r-atrain for big data (and/or the homogeneous data structure r-train for big data). Consequently, (horizontally) and depth (vertically) can be achieved in ADS by a simple process and this is the unique and extremely rich merit of ADS to challenge the 4Vs of big data. Where r-atrain and r-train are the fundamental data structures for processing big data, the data structures ‘heterogeneous data structure MA’ and ‘homogeneous data structureMT’ are higher order data structures for the processing of big data including temporal big data. Both MA andMT can be well implemented in multi-tier ADS. In fact ‘Data Structures for Big Data’ is to be regarded as a new subject, not just a new topic in big data science. The classical Matrix Theory has been studied by the world with unlimited volume of applications in all branches of Science, Engineering, Statistics, Optimization, Numerical Analysis, Computer Science, Medical Science, Economics, etc. The classical matrices are mainly two dimensional having the elements from the set of real numbers in most of the cases. An infinitely scalable notion of ‘solid matrix’ (SM) i.e. n-dimensional hyper-matrix, is introduced as a generalization of the classical matrix, and also introduced the notion of ‘solid latrix’ (SL). In the ‘Theory of Solid Matrices’, the corresponding matrix-algebra is developed explaining various operations, properties and propositions on them for the case while the elements are objects of the real region RR. As a generalization of the n-SM, another new notion called by ‘solid hematrix’ (SH) (solid heterogeneous matrix i.e. n-dimensional hyper-matrix with heterogeneous data) is introduced.A method is proposed on howto implement Solid Matrices, n-dimensional arrays, n-dimensional larrays etc. in a computer memory using the data structures MT and MA for big data. The combination of r-atrain (r-train) with ADS and the combination of MA (MT) with multitier ADS can play a major role in a new direction to the present data-dependent giant galaxies of organizations, institutions and individuals to process any big data irrespective of the influence of 4Vs.
Ranjit Biswas

Big Data Time Series Forecasting Model: A Fuzzy-Neuro Hybridize Approach

Big data evolves as a new research domain in the era of 21st century. This domain concerns with the study of voluminous data sets with multiple factors, whose sizes are rapidly growing with the time. These types of data sets can be generated from various autonomous sources, such as scientific experiments, engineering applications, government records, financial activities, etc. With the rise of big data concept, demand for a new time series prediction models emerged. For this purpose, a novel big data time series forecasting model is introduced in this chapter, which is based on the hybridization of two soft computing (SC) techniques, viz., fuzzy set and artificial neural network. The proposed model is explained with the stock index price data set of State Bank of India (SBI). The performance of the model is verified with different factors, viz., two-factors, three-factors, and M-factors. Various statistical analyzes signify that the proposed model can take far better decision with the M-factors data set.
Pritpal Singh

Learning Using Hybrid Intelligence Techniques

This chapter focuses on a few key applications of hybrid intelligence techniques in the field of feature selection and classification. Hybrid intelligent techniques have been used to develop an effective and generalized learning model for solving these applications. The first application employs a new evolutionary hybrid feature selection technique for microarray datasets, which is implemented in two stages by integrating correlation-based binary particle swarm optimization (BPSO) with rough set algorithm to identify non-redundant genes capable of discerning between all objects. The other applications discussed are for evaluating the relative performance of different supervised classification procedures using hybrid feature reduction techniques. Correlation based Partial Least square hybrid feature selection method is used for feature extraction and the experimental results show that Partial Least Squares (PLS) regression method is an appropriate feature selection method and a combined use of different classification and feature selection approaches make it possible to construct high performance classification models for microarray data. Another hybrid algorithm, Correlation based reduct algorithm (CFS-RST) is used as a filter to eliminate redundant attributes and minimal reduct set is produced by rough sets. This method improves the efficiency and decreases the complexity of the classical algorithm. Extensive experiments are conducted on two public multi-class gene expression datasets and the experimental results show that hybrid intelligent methods are highly effective for selecting discriminative genes for improving the classification accuracy. The experimental results of all the applications indicate that, all the hybrid intelligent techniques discussed here have shown significant improvements in most of the binary and multi-class microarray datasets.
Sujata Dash

Neutrosophic Sets and Its Applications to Decision Making

This chapter introduces a new emerging tool for uncertain data processing which is known as neutrosophic sets. A neutrosophic set has the potentiality of being a general framework for uncertainty analysis in data sets also including big data sets. Here useful techniques like distance and similarity between two neutrosophic sets have been discussed. These notions are very important in the determination of interacting segments in a data set. Also the notion of entropy has been introduced to measure the amount of uncertainty expressed by a neutrosophic set. Further the notion of neutrosophic sets have been generalized and combined with soft sets to form a new hybrid set called interval valued neutrosophic sets. Some important properties of these sets under various algebraic operations have also been shown here.
Pinaki Majumdar

Architecture for Big Data Analysis


An Efficient Grouping Genetic Algorithm for Data Clustering and Big Data Analysis

Clustering as a formal, systematic subject in dissertations can be considered the most influential unsupervised learning problem; so, as every other problem of this kind, it deals with finding the structure in a collection of unlabeled data. One of the matters associated with this subject is undoubtedly determination of the number of clusters. In this chapter, an efficient grouping genetic algorithm is proposed under the circumstances of an anonymous number of clusters. Concurrent clustering with different number of clusters is implemented on the same data in each chromosome of grouping genetic algorithm in order to discern the accurate number of clusters. In subsequent iterations of the algorithm, new solutions with different clusters number or distinct accuracy of clustering are produced by application of efficient crossover and mutation operators that led to significant improvement of clustering. Furthermore, a local search by a special probability is applied in each chromosome of each new population in order to increase the accuracy of clustering.These special operators will lead to the successful application of the proposed method in the big data analysis. To prove the accuracy and the efficiency of the algorithm, its tested on various artificial and real data sets in a comparable manner. Most of the datasets consisted of overlapping clusters, but the algorithm could detect the proper number of all data sets with high accuracy of clustering. The consequences make the best evidence of the algorithms successful performance of finding an appropriate number of clusters and accomplishment of the best clusterings quality in comparison with others.
Sayede Houri Razavi, E. Omid Mahdi Ebadati, Shahrokh Asadi, Harleen Kaur

Self Organizing Migrating Algorithm with Nelder Mead Crossover and Log-Logistic Mutation for Large Scale Optimization

This chapter presents a hybrid variant of self organizing migrating algorithm (NMSOMA-M) for large scale function optimization, which combines the features of Nelder Mead (NM) crossover operator and log-logistic mutation operator. Self organizing migrating algorithm (SOMA) is a population based stochastic search algorithm which is based on the social behavior of group of individuals. The main characteristics of SOMA are that it works with small population size and no new solutions are generated during the search, only the positions of the solutions are changed. Though it has good exploration and exploitation qualities but as the dimension of the problem increases it trap to local optimal solution and may suffer from premature convergence due to lack of diversity mechanism. This chapter combines NM crossover operator and log-logistic mutation operator with SOMA in order to maintain the diversity of population and to avoid the premature convergence. The proposed algorithm has been tested on a set of 15 large scale unconstrained test problems with problem size taken as up to 1000. In order to see its efficiency over other population based algorithms, the results are compared with SOMA and particle swarm optimization algorithm (PSO). The comparative analysis shows the efficiancy of the proposed algorithm to solve large scale function optimization with less function evaluations.
Dipti Singh, Seema Agrawal

A Spectrum of Big Data Applications for Data Analytics

As technology is gaining its insights, vast amount of data is getting collected from various resources. Foremost complex nature of data is providing challenging task among the researchers to store, process and analyze big data. At present, big data analytics tends to be an emerging domain which potentially has limitless opportunities for possible future outcomes. However, big data mining provides application capabilities to extract hidden information from large volumes of data for knowledge discovery process. In fact big data mining is demonstration varied challenges and vast opportunity among researchers and scientist for another upcoming decade. This chapter provides broad view of big data in medical application domain. In addition, a framework which can handle big data by using several preprocessing and data mining technique to discover hidden knowledge from large scale databases is designed and implemented. The proposed chapter also discuss the challenges in big data to gain insight knowledge for future outcomes.
Ritu Chauhan, Harleen Kaur

Fundamentals of Brain Signals and Its Medical Application Using Data Analysis Techniques

In this chapter, the various data analysis techniques devoted to the development of brain signals controlled interface devices for the purpose of rehabilitation in a multi-disciplinary engineering is presented. The knowledge of electroencephalogram (EEG) is essential for the neophytes in the development of algorithms using EEG.Most literatures, demonstrates the application of EEG signals and no much definite study describes the various components that are censorious for development of interface devices using prevalent algorithms in real-time data analysis. Therefore, this chapter covers the EEG generation, various components of EEG used in development of interface devices and algorithms used for identification of information from EEG.
P. Geethanjali

Big Data Analysis and Cloud Computing


BigData: Processing of Data Intensive Applications on Cloud

Cloud computing, rapidly emerging as a new computation paradigm, provides agile and scalable resource access in a utility-like fashion, especially for the processing of big data. The need to store, process, and analyze large amounts of data makes enterprise customers to adopt cloud computing at scale. Understanding processing of data intensive applications on cloud is key to designing next generation cloud services. Here we aimed to discuss a close-up view about Cloud Computing, Big Data and processing of big data on cloud as well as the state-of-the-art techniques and technologies we currently adopt to deal with the Big Data problems on cloud.
D. H. Manjaiah, B. Santhosh, Jeevan L. J. Pinto

Framework for Supporting Heterogenous Clouds Using Model Driven Approach

Cloud computing has gained the popularity of todays IT sector because of the low cost involved in setup, ease of resource configuration and maintenance. The increase in the number of cloud providers in the market has led to availability of a wide range of cloud solutions offered to the consumers. These solutions are based on different cloud architectures and usually are incompatible with each other. It is very hard to find a single provider which offers all services the end users need. Cloud providers offer proprietary solutions that force cloud customers to decide the design and deployment models as well as the technology at the early stages of software development. One of the major issues of this paradigm is; the applications and services hosted with a specific cloud provider are locked to their specific implementation technique and operational methods. Hence, moving these applications and services to another provider is a tedious task. This situation is often termed as vendor lock-in. Hence a way to provide portability of applications across multiple clouds is a major concern. According to the literature very few efforts have been made in order to propose a unified standard for cloud computing. DSkyL provides a way for reducing the cloud migration efforts. This chapter aims to sketch the architecture of DSkyL and the major steps involved in the migration process.
Aparna Vijaya, V. Neelanarayanan, V. Vijayakumar

Cloud Based Big Data Analytics: WAN Optimization Techniques and Solutions

More advanced applications to run the business and ensure competitiveness includes many factors. Few factors that improve the competitiveness includes more distributed branch offices and users; more reliance on web and wide area network; more remote users insisting on high speed networks; unpredictable response times etc. In addition, escalating malware and malicious content have created lot of pressure on business expansion. Also, ever increasing data volumes, data replication at off-site, and greater than ever use of content-rich applications are mandating IT organizations to optimize their network resources. Trends such as virtualization and cloud computing further emphasize this requirement in the current era of big data. To assist this process, companies are increasingly relying on a new generation of, wide area network (WAN), optimization techniques, ap-pliances, controllers, and platforms. Hence, it displaces standalone physical appli-ances by offering more scalability, flexibility, and manageability. This is achieved by additional inclusion of software to handle big data and bring valuable insights through big data analytics. In addition, network reliability, accessibility, and avail-ability can be increased by an optimized WAN environment. Also, the perform-ance and consistency of data backup, replication, and recovery processes can be progressed. This chapter deals with the study of WAN optimization, tools, techniques, controllers, appliances and the solutions that are available for cloud based big data analytics. In addition, it provides a light on the future trends and the research potentials in this area.
M. Baby Nirmala

Cloud Based E-Governance Solution: A Case Study

The development authorities are facing an exponential increase in the information, and the management of storage and flow of this information is becoming difficult day-by-day, resulting in the definite need of the implementation of information technological tools to maintain the same. Many of the development authorities have implemented or in the process of implementation of IT for complete or partial functioning. Since the development authorities in the state are classified in A/B/C categories depending upon the size and functionality, the priority of the functional requirement also varies to match the budget of IT implementation. To cater the prioritized implementation, the complete application needs to be designed in a modular form consisting of multiple systems, which are individually a complete system in itself but may be integrated with other systems to form efficient information flow. Ghaziabad Development Authority (GDA) had three options: Both the IASP and the citizen portal are on-premise solutions, both the IASP and the citizen portal are on cloud, and the IASP is hosted on premise and the citizen portal is hosted on cloud. The primary aim of the case is to expose students to an infrastructure planning situation in an organization based on the requirements and resource constraints within an organization.
Monika Mital, Ashis K. Pani, Suma Damodaran


Weitere Informationen

Premium Partner