Top

Published in:

2020 | OriginalPaper | Chapter

Performance Analysis of Queries with Hive Optimized Data Models

Authors : Meghna Sharma, Jagdeep Kaur

Published in: Proceedings of ICRIC 2019

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The processing of structured data in Hadoop is achieved by Hive, a data warehouse tool. It is present on top of Hadoop and helps to analyze, query, and review the Big Data. The execution time of the queries has drastically reduced by using Hadoop MapReduce. This paper presents the detailed comparison of various optimizing techniques for data models like partitioning and bucket methods to improve the processing time for Hive queries. The implementation is done on data from New York Police Portal using AWS services for storage. Hive tool in Hadoop ecosystem is used for querying data. Use of partitioning has shown remarkable improvement in terms of execution time.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Clustering of Tweets: A Novel Approach to Label the Unlabelled Tweets

next chapter A Review on Scalable Learning Approches on Intrusion Detection Dataset

Pen, H.D., Dsilva, P., Mascarnes, S.: Comparing HiveQL and MapReduce methods to process fact data in a data warehouse. In: 2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA), pp. 201–206. IEEE (2017, April)

Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. (2017)

Shaw, S., Vermeulen, A.F., Gupta, A., Kjerrumgaard, D.: Hive architecture. In: Practical Hive, pp. 37–48. Apress, Berkeley, CA (2016)

Sakr, S.: Big data 2.0 processing systems: a survey. Springer International Publishing (2016)

Bansal, H., Chauhan, S., Mehrotra, S.: Apache Hive Cookbook. Packt Publishing Ltd (2016)

Loganathan, A., Sinha, A., Muthuramakrishnan, V., Natarajan, S.: A systematic approach to big data. Int. J. Inf. Comput. Technol. 4(9), 869–878 (2014)

Zikopoulos, P., Eaton, C.: Understanding big data: analytics for enterprise class Hadoop and streaming data. McGraw-Hill Osborne Media (2010)

Usha, D., Jenil, A.: A survey of big data processing in perspective of Hadoop and MapReduce. Int. J. Curr. Eng. Technol. 4(2), 602–606 (2014)

Elgazzar, K., Martin, P., Hassanein, H.S.: Cloud-assisted computation offloading to support mobile services. IEEE Trans. Cloud Comput. 4(3), 279–292 (2016)CrossRef

10.

Coronel, C., Morris, S.: Database systems: design, implementation, & management. Cengage Learning (2016)

11.

Lydia, E.L., Swarup, M.B.: Big data analysis using Hadoop components like Flume, MapReduce, Pig and Hive. Int. J. Sci. Eng. Comput. Technol. 5(11), 390 (2015)

12.

Vohra, D.: Using Apache Sqoop. In: Pro Docker, pp. 151–183. Apress, Berkeley, CA (2016)

13.

Hoffman, S.: Apache Flume: Distributed Log Collection for Hadoop. Packt Publishing Ltd (2015)

14.

Shireesha, R., Bhutada, S.: A study of tools, techniques, and trends for big data analytics. IJACTA 4(1), 152–158 (2015)

15.

Mazumder, S.: Big data tools and platforms. In Big Data Concepts, Theories, and Applications, pp. 29–128. Springer, Cham (2016)

Title: Performance Analysis of Queries with Hive Optimized Data Models
Authors: Meghna Sharma
Jagdeep Kaur
Publisher: Springer International Publishing
Book: Proceedings of ICRIC 2019
Print ISBN: 978-3-030-29406-9

Electronic ISBN: 978-3-030-29407-6

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-29407-6_49

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner