Skip to main content

2019 | Buch

Large Scale Data Analytics

verfasst von: Chung Yik Cho, Rong Kun Jason Tan, John A. Leong, Amandeep S. Sidhu

Verlag: Springer International Publishing

Buchreihe : Studies in Computational Intelligence

insite
SUCHEN

Über dieses Buch

This book presents a language integrated query framework for big data. The continuous, rapid growth of data information to volumes of up to terabytes (1,024 gigabytes) or petabytes (1,048,576 gigabytes) means that the need for a system to manage and query information from large scale data sources is becoming more urgent. Currently available frameworks and methodologies are limited in terms of efficiency and querying compatibility between data sources due to the differences in information storage structures. For this research, the authors designed and programmed a framework based on the fundamentals of language integrated query to query existing data sources without the process of data restructuring. A web portal for the framework was also built to enable users to query protein data from the Protein Data Bank (PDB) and implement it on Microsoft Azure, a cloud computing environment known for its reliability, vast computing resources and cost-effectiveness.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
In this modern technological age, data is growing larger and faster compared to previous decades. The existing methods used to process and analyse the overflowing amount of data are no longer sufficient. The term large scale data first surfaced in the magazine “Visually Exploring Gigabyte Datasets in Real Time” [1] published in Association for Computing Machinery (ACM) in 1999. It was mentioned having large scale data without a proper methodology to analyse data is a huge challenge and a sad occasion at the same time.
Chung Yik Cho, Rong Kun Jason Tan, John A. Leong, Amandeep S. Sidhu
Chapter 2. Background
Abstract
Reductionist molecular biology is a hypothesis-based approach used by scientists in the second half of the 20th century to determine and characterize molecules, cells and major structures of living systems. Biologists identified that, as a single community, they are required to continue using reductionist strategies to further their cause in elucidating the whole structure of components and every single one of their functions.
Chung Yik Cho, Rong Kun Jason Tan, John A. Leong, Amandeep S. Sidhu
Chapter 3. Large Scale Data Analytics
Abstract
The nature of protein data is complicated and constantly updated by researchers around the globe. To query from multiple data sources, a query framework written and built using Python with the concept of Language Integrated Query is proposed as the solution to overcome the limitations discussed in previous chapters. A cloud computing platform is used for this research to host the query framework to enable the framework to use the vast resources available to perform a query with minimal latency while avoiding computing resource deficiency. In this chapter, Language Integrated Query, cloud computing and algebraic operators are explained in detail.
Chung Yik Cho, Rong Kun Jason Tan, John A. Leong, Amandeep S. Sidhu
Chapter 4. Query Framework
Abstract
Protein Data Bank, PDB has a vast amount of resources related to protein 3D models, complex assemblies, and nucleic acids that can be utilized by both students and researchers for learning the characteristics of biomedicine. Therefore, a framework is needed to effectively retrieve information from their database. The functions that are utilized to enable users to query RCSB PDB is explained in this chapter.
Chung Yik Cho, Rong Kun Jason Tan, John A. Leong, Amandeep S. Sidhu
Chapter 5. Results and Discussion
Abstract
For this research, the structure of the query framework that has been explained in Chap. 4 is implemented on Microsoft Azure. The query framework can be accessed in the form of a web portal through any web browsing application, for example, Internet Explorer, Microsoft Edge, Google Chrome and others. The web portal is built to be user friendly and easy to navigate to retrieve data from RCSB PDB. The results of the query web portal are shown in this chapter.
Chung Yik Cho, Rong Kun Jason Tan, John A. Leong, Amandeep S. Sidhu
Chapter 6. Conclusion and Future Works
Abstract
The study of this research shows the difficulties faced by the current generation for database querying. Recent methodologies such as semantic integration focuses on data integration, data mapping and data translation. These approaches can be done for small to medium data sources. However, when it comes to querying databases that are huge and are being constantly updated by users around the world, these approaches are not suitable and not cost effective.
Chung Yik Cho, Rong Kun Jason Tan, John A. Leong, Amandeep S. Sidhu
Backmatter
Metadaten
Titel
Large Scale Data Analytics
verfasst von
Chung Yik Cho
Rong Kun Jason Tan
John A. Leong
Amandeep S. Sidhu
Copyright-Jahr
2019
Electronic ISBN
978-3-030-03892-2
Print ISBN
978-3-030-03891-5
DOI
https://doi.org/10.1007/978-3-030-03892-2

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.