Skip to main content
Top

2017 | Book

Big Data and Visual Analytics

insite
SEARCH

About this book

This book provides users with cutting edge methods and technologies in the area of big data and visual analytics, as well as an insight to the big data and data analytics research conducted by world-renowned researchers in this field. The authors present comprehensive educational resources on big data and visual analytics covering state-of-the art techniques on data analytics, data and information visualization, and visual analytics.

Each chapter covers specific topics related to big data and data analytics as virtual data machine, security of big data, big data applications, high performance computing cluster, and big data implementation techniques. Every chapter includes a description of an unique contribution to the area of big data and visual analytics.

This book is a valuable resource for researchers and professionals working in the area of big data, data analytics, and information visualization. Advanced-level students studying computer science will also find this book helpful as a secondary textbook or reference.

Table of Contents

Frontmatter
Automated Detection of Central Retinal Vein Occlusion Using Convolutional Neural Network
Abstract
The Central Retinal Vein Occlusion (CRVO) is the next supreme reason for the vision loss among the elderly people, after the Diabetic Retinopathy. The CRVO causes abrupt, painless vision loss in the eye that can lead to visual impairment over the time. Therefore, the early diagnosis of CRVO is very important to prevent the complications related to CRVO. But, the early symptoms of CRVO are so subtle that manually observing those signs in the retina image by the ophthalmologists is difficult and time consuming process. There are automatic detection systems for diagnosing ocular disease, but their performance depends on various factors. The haemorrhages, the early sign of CRVO, can be of different size, color and texture from dot haemorrhages to flame shaped. For reliable detection of the haemorrhages of all types; multifaceted pattern recognition techniques are required. To analyse the tortuosity and dilation of the veins, complex mathematical analysis is required in order to extract those features. Moreover, the performance of such feature extraction methods and automatic detection system depends on the quality of the acquired image. In this chapter, we have proposed a prototype for automated detection of the CRVO using the deep learning approach. We have designed a Convolutional Neural Network (CNN) to recognize the retina with CRVO. The advantage of using CNN is that no extra feature extraction step is required. We have trained the CNN to learn the features from the retina images having CRVO and classify them from the normal retina image. We have obtained an accuracy of 97.56% for the recognition of CRVO.
Bismita Choudhury, Patrick H. H. Then, Valliappan Raman
Swarm Intelligence Applied to Big Data Analytics for Rescue Operations with RASEN Sensor Networks
Abstract
Various search methods combined with frontier technology have been utilized to save lives in rescue situations throughout history. Today, new networked technology, cyber-physical system platforms, and algorithms exist which can coordinate rescue operations utilizing swarm intelligence with Rapid Alert Sensor for Enhanced Night Vision (RASEN). We will also introduce biologically inspired algorithms combined with proposed fusion night vision technology that can rapidly converge on a near optimal path between survivors and identify signs of life trapped in rubble. Wireless networking and automated suggested path data analysis is provided to rescue teams utilizing drones as first responders based on the results of swarm intelligence algorithms coordinating drone formations and triage after regional disasters requiring Big Data analytic visualization in real-time. This automated multiple-drone scout approach with dynamic programming ability enables appropriate relief supplies to be deployed intelligently by networked convoys to survivors continuously throughout the night, within critical constraints calculated in advance, such as projected time, cost, and reliability per mission. Rescue operations can scale according to complexity of Big Data characterization based on data volume, velocity, variety, variability, veracity, visualization, and value.
U. John Tanik, Yuehua Wang, Serkan Güldal
Gender Classification Based on Deep Learning
Abstract
Abstract
Dhiraj Gharana, Sang C. Suh, Mingon Kang
Social and Organizational Culture in Korea and Women’s Career Development
Abstract
The biggest challenge faced by the Korean labor market today is its labor shortage due to the rapid decrease in its productive population. Korea’s productive population dwindles from 2017, reaching its peak on 2016. The speed of the productive population is decreasing faster than any other country. Korea’s compressed economic growth during its industrialization process was made possible by an abundance of high quality labor but since then a low birthrate and its rapidly aging society have caused a continued decrease that ultimately led to its current shortage.
Choonhee Yang, Yongman Kwon
Big Data Framework for Agile Business (BDFAB) As a Basis for Developing Holistic Strategies in Big Data Adoption
Abstract
The Big Data Framework for Agile Business (BDFAB) is the result of exploration of the value of Big data technologies and analytics to business. BDFAB is based on literature review, modeling, experimentation and practical application. BDFAB incorporates multiple disciplines of Information Technology, Business Innovation, Sociology and Psychology (people and behavior, Social-Mobile media), Finance (ROI), Processes (Agile), User Experience, Analytics (descriptive, predictive and prescriptive) and Staff Up-skilling (HR). This paper presents the key elements of the framework comprising agile values, roles, building blocks, artifacts, conditions, agile practices and a compendium (repository). The building blocks themselves are made up of five modules: business decisions, Data—technology and analytics, user experience-operational excellence, quality dimensions and people—capabilities. As such, BDFAB exhibits an interdisciplinary approach to Big Data adoption in practice.
Bhuvan Unhelkar
Scalable Gene Sequence Analysis on Spark
Abstract
Scientific advances in technology have helped in digitizing genetic information, which resulted in the generation of the humongous amount of genetic sequences, and analysis of such large-scale sequencing data is the primary concern. This chapter introduces a scalable genome sequence analysis system, which makes use of parallel computing features of Apache Spark and its relational processing module called Spark Structured Query Language (Spark SQL). The Spark framework provides an efficient data reuse feature by holding the data in memory, increasing performance substantially. The introduced system also provides a web-based interface, by which users can specify the search criteria, and Spark SQL performs search operations on the data stored in memory. Experiments detailed in this chapter make use of publicly available 1000 genome Variant Calling Format (VCF) data (Size 1.2TB) as input. The input data are analyzed using Spark and the end results are evaluated to measure the scalability and performance of the system.
Muthahar Syed, Taehyun Hwang, Jinoh Kim
Big Sensor Data Acquisition and Archiving with Compression
Abstract
Machine-generated data such as sensor data now comprise major portion of available information, which raises two important problems: efficient acquisition of sensor data and the storage of massive sensor data collection. These data sources generate so much data quickly that data compression is essential to reduce storage requirement or transmission capacity of devices. This work first discusses a low complexity sensing framework which enables to reduce computation and communication overheads of devices without much compromising the accuracy of sensor readings. Then a new class of compression algorithm based on statistical similarity is presented that can be effectively used in many applications where an order of data sequence could be relaxed. Next, a quality-adjustable sensor data archiving is discussed, which compresses an entire collection of sensor data efficiently without compromising key features. Considering data aging aspect, this archiving scheme is capable of decreasing data fidelity gradually to secure more storage space.
Dongeun Lee
Advanced High Performance Computing for Big Data Local Visual Meaning
Abstract
Being able to scale interactive analysis for big data clusters is gaining more importance with each passing day in our present time. For example, according to the Behaviors Questionnaire performed in 2015 by K.D. Nuggets, around one fourth of 459 participants tried to interpret data clusters that exceed 1 terabyte and 100 petabytes. One of the subjects of previous studies is the Canonic Method, which is used to form the meanings of big data in a fast and efficient manner, because approximate responses given as based on sampling usually bring benefits as much as the response itself; and the sampling may also lessen the burden of cognitive confusion in meaning. As a result of previous studies conducted on database environments, extremely precious data have been obtained in terms of sampling and local visual inference; however, in the present study, firstly, the new methods and the system problems on the access to inference data have been focused on [1–4].
Today, data production is developing at an amazing speed. In our present day, the exponential technical developments, analogue sensor data, adaptive digital systems, scientific high-sensitivity sensors, smart devices and integral-theoretical models cause that data are produced at an extremely great speed. It is expected that global data volume will grow at a speed of 40-fold each year and reach 44 zettabytes by 2020 [5]. The term “big data” has been produced in order to cope with the volume, speed and variety of the data produced, and to make sense of this data trend that is developing day by day. Big data are becoming the new focal point of technology in many fields. A series of additional tools and mechanisms may be integrated to big data systems in order to obtain, store and process different data. These systems use the advantage of a tremendous parallel processing power for the purpose of performing complex conversions and analyses. On the other hand, designing and using a big data system intended for a certain application is not practical [6–7], because data come from more than one source that are heterogeneous and autonomous, and are in complex and changing relations with each other growing in an adaptive manner. In addition to these, the rise of big data applications in which data collection phenomenon is increasing at an amazing speed is beyond the capacity of today’s hardware and software platforms in terms of managing, storing and processing data within a reasonable time [6].
Ozgur Aksu
Transdisciplinary Benefits of Convergence in Big Data Analytics
Abstract
Big Data applications can benefit from transdisciplinary convergence spanning multiple domains and opportunity areas. These big data applications in areas as diverse as healthcare, energy, and business must now process a tremendous volume of data and from a variety of data sources. New ways to process and store information beyond the standard three V’s of information characterization defined by volume, variety, and velocity are needed, to also include data variability and veracity. As such, volume and variety of data types alone pose significant challenges to timely and accurate analysis by human as well as machine operators. Although big data applications can instantiate human logic into executable code, process the volume of data quickly, and make correlations across the variety of data, leading to better analysis and predictive capabilities, the resulting conclusions are delimited by other key factors imposed by data velocity that limits immediate inferencing, data variability that limits data consistency, and data veracity that limits data quality. As these challenges are resolved in other disciplinary domains, the spillover benefits can be profound when transdisciplinary advances in one particular domain impact other domain application areas by way of convergence.
U. John Tanik, Darrell Fielder
A Big Data Analytics Approach in Medical Imaging Segmentation Using Deep Convolutional Neural Networks
Abstract
Big data analytics uncovers hidden patterns, correlations and other insights by examining large amounts of data. Deep Learning can play a role in developing solutions from large datasets, and is an important tool in the big data analytics toolbox. Deep Learning has been recently employed to solve various problems in computer vision and demonstrated state-of-the-art performance on visual recognition tasks. In medical imaging, especially in brain tumor cancer diagnosis and treatment plan development, accurate and reliable brain tumor segmentation plays a critical role. In this chapter, we describe brain tumor segmentation using Deep Learning. We constructed a 6-layer Dense Convolutional Network, which connects each layer to every subsequent layer in a feed-forward fashion. This specific connectivity architecture ensures the maximum information flow between layers in the network and strengthens the feature propagation from layer to layer. We show how this arrangement increases the efficiency during training and the accuracy of the results. We have trained and evaluated our method based on the imaging data provided by the Multimodal Brain Tumor Image Segmentation Challenge (BRATS) 2017. The described method is able to segment the whole tumor (WT) region of the high-grade brain tumor gliomas using T1 Magnetic Resonance Images (MRI) and with excellent segmentation results.
Zheng Zhang, David Odaibo, Frank M. Skidmore, Murat M. Tanik
Big Data in Libraries
Abstract
The term Big Data is somewhat loose. Roughly defined, it refers to any data that exceeds the users ability to analyze it in one of three dimensions (the three Vs): Volume, Velocity and Variety. Laney [1, 2] Each of these has different challenges. Huge volumes of data require the ability to store and retrieve the data efficiently. High velocity data requires the ability to ingest the data as it is created, essentially very fast internet connections. Highly variable data can be difficult to organize and process due to its unpredictability and unstructured nature. Bieraugel [3] Also, multiple data streams can be combined to answer a variety of question. All forms of big data can require high performance computing and specialized software to analyze. Given the fuzziness of defining big data,
Robert Olendorf, Yan Wang
A Framework for Social Network Sentiment Analysis Using Big Data Analytics
Abstract
Traditionally, surveys were used as one of the major methods for finding out the opinion of a group of people about a particular topic. However, over the last two decades with the proliferation of Web to the social media sites such as Twitter, Facebook, and Tumblr, social media are increasingly becoming the platform of choice for people to express their views or opinions. With an account of over two billion users, social media provides a major source for gathering people moods and opinions. Several public and private organizations, such as Government and companies, are attempting to exploit the expressed preferences, opinions, and attitudes regarding politics, commercial products and other matters of personal importance for a competitive edge. One of the efficient ways to get this information is by performing sentiment analysis on these electronic repositories. With the data being ubiquitous, the bottlenecks here are processing speed, storage, and time, which are involved with the traditional storage system. So to deal with the data processing of these massive amounts of data, some special tools and techniques are offered by Big Data framework.
Bharat Sri Harsha Karpurapu, Leon Jololian
Big Data Analytics and Visualization: Finance
Abstract
Big data Analytics and data science helps finance in combining business research expertise, scientific processes, quantitative analytics and system infrastructure to distill knowledge and insights from internal/external, structured/unstructured data.
Shyam Prabhakar, Larry Maves
Study of Hardware Trojans in a Closed Loop Control System for an Internet-of-Things Application
Abstract
A closed-loop system is a primary technology used to automate our critical infrastructure and major industries to improve their efficiency. Their dependability is challenged by probable vulnerabilities in the core computing system. These vulnerabilities can appear on both front (software) and back (hardware) ends of the computing system. While the software vulnerabilities are well researched and documented, the hardware ones are normally overlooked. However, with hardware-inclusive technological evolutions like Cyber-Physical Systems and Internet-of-Things, hardware vulnerabilities should be addressed appropriately. In this work, we present a study of one such vulnerability, called Hardware Trojan (HT), on a closed-loop control system. Since a typical hardware Trojan is a small and stealthy digital circuit, we present a test platform built using FPGA-in-the-loop, where the computing system is represented as a digital hardware. Through this platform, a comprehensive runtime analysis of hardware Trojan is accomplished and we developed four threat models that can lead to destabilization of the closed-loop system causing hazardous conditions. Since the primary objective is to study the effects of hardware Trojans, they are designed in such a way that they can be accessible and controllable.
Ranveer Kumar, Karthikeyan Lingasubramanian
High Performance/Throughput Computing Workflow for a Neuro-Imaging Application: Provenance and Approaches
Abstract
We describe a high performance/throughput computing approach for a full-brain bootstrapped analysis of Diffusion Tensor Imaging (DTI), with a targeted goal of robustly differentiating individuals with Parkinson’s Disease (PD) from healthy adults without PD. Individual brains vary substantially in size and shape, and may even vary structurally (particularly in the case of brain disease). This variability poses significant challenges in extracting diagnostically relevant information from Magnetic Resonance (MR) imaging as brain structures in raw images are typically very poorly aligned. Moreover, these misalignments are poorly captured by simple alignment procedures (such as whole image 12-parameter affine procedures). Non-linear warping procedures that are computationally expensive are often required. Subsequent to warping, intensive statistical bootstrapping procedures (also computationally expensive) may further be required for some purposes, such as generating classifiers. We show that distributing the preprocessing of the images using a compute cluster and running multiple preprocessings in parallel can substantially reduced the time required for the images to be ready for quality control and the final bootstrapped analysis. The proposed pipeline was very effective developing classifiers for individual prediction that are resilient in the face of inter-subject variability, while reducing the time required for the analysis from a few months or years to a few weeks.
T. Anthony, J. P. Robinson, J. R. Marstrander, G. R. Brook, M. Horton, F. M. Skidmore
Backmatter
Metadata
Title
Big Data and Visual Analytics
Editors
Sang C. Suh
Thomas Anthony
Copyright Year
2017
Electronic ISBN
978-3-319-63917-8
Print ISBN
978-3-319-63915-4
DOI
https://doi.org/10.1007/978-3-319-63917-8

Premium Partner