Skip to main content
Top

2020 | OriginalPaper | Chapter

Performance Analysis of Queries with Hive Optimized Data Models

Authors : Meghna Sharma, Jagdeep Kaur

Published in: Proceedings of ICRIC 2019

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The processing of structured data in Hadoop is achieved by Hive, a data warehouse tool. It is present on top of Hadoop and helps to analyze, query, and review the Big Data. The execution time of the queries has drastically reduced by using Hadoop MapReduce. This paper presents the detailed comparison of various optimizing techniques for data models like partitioning and bucket methods to improve the processing time for Hive queries. The implementation is done on data from New York Police Portal using AWS services for storage. Hive tool in Hadoop ecosystem is used for querying data. Use of partitioning has shown remarkable improvement in terms of execution time.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Pen, H.D., Dsilva, P., Mascarnes, S.: Comparing HiveQL and MapReduce methods to process fact data in a data warehouse. In: 2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA), pp. 201–206. IEEE (2017, April) Pen, H.D., Dsilva, P., Mascarnes, S.: Comparing HiveQL and MapReduce methods to process fact data in a data warehouse. In: 2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA), pp. 201–206. IEEE (2017, April)
2.
go back to reference Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. (2017) Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. (2017)
3.
go back to reference Shaw, S., Vermeulen, A.F., Gupta, A., Kjerrumgaard, D.: Hive architecture. In: Practical Hive, pp. 37–48. Apress, Berkeley, CA (2016) Shaw, S., Vermeulen, A.F., Gupta, A., Kjerrumgaard, D.: Hive architecture. In: Practical Hive, pp. 37–48. Apress, Berkeley, CA (2016)
4.
go back to reference Sakr, S.: Big data 2.0 processing systems: a survey. Springer International Publishing (2016) Sakr, S.: Big data 2.0 processing systems: a survey. Springer International Publishing (2016)
5.
go back to reference Bansal, H., Chauhan, S., Mehrotra, S.: Apache Hive Cookbook. Packt Publishing Ltd (2016) Bansal, H., Chauhan, S., Mehrotra, S.: Apache Hive Cookbook. Packt Publishing Ltd (2016)
6.
go back to reference Loganathan, A., Sinha, A., Muthuramakrishnan, V., Natarajan, S.: A systematic approach to big data. Int. J. Inf. Comput. Technol. 4(9), 869–878 (2014) Loganathan, A., Sinha, A., Muthuramakrishnan, V., Natarajan, S.: A systematic approach to big data. Int. J. Inf. Comput. Technol. 4(9), 869–878 (2014)
7.
go back to reference Zikopoulos, P., Eaton, C.: Understanding big data: analytics for enterprise class Hadoop and streaming data. McGraw-Hill Osborne Media (2010) Zikopoulos, P., Eaton, C.: Understanding big data: analytics for enterprise class Hadoop and streaming data. McGraw-Hill Osborne Media (2010)
8.
go back to reference Usha, D., Jenil, A.: A survey of big data processing in perspective of Hadoop and MapReduce. Int. J. Curr. Eng. Technol. 4(2), 602–606 (2014) Usha, D., Jenil, A.: A survey of big data processing in perspective of Hadoop and MapReduce. Int. J. Curr. Eng. Technol. 4(2), 602–606 (2014)
9.
go back to reference Elgazzar, K., Martin, P., Hassanein, H.S.: Cloud-assisted computation offloading to support mobile services. IEEE Trans. Cloud Comput. 4(3), 279–292 (2016)CrossRef Elgazzar, K., Martin, P., Hassanein, H.S.: Cloud-assisted computation offloading to support mobile services. IEEE Trans. Cloud Comput. 4(3), 279–292 (2016)CrossRef
10.
go back to reference Coronel, C., Morris, S.: Database systems: design, implementation, & management. Cengage Learning (2016) Coronel, C., Morris, S.: Database systems: design, implementation, & management. Cengage Learning (2016)
11.
go back to reference Lydia, E.L., Swarup, M.B.: Big data analysis using Hadoop components like Flume, MapReduce, Pig and Hive. Int. J. Sci. Eng. Comput. Technol. 5(11), 390 (2015) Lydia, E.L., Swarup, M.B.: Big data analysis using Hadoop components like Flume, MapReduce, Pig and Hive. Int. J. Sci. Eng. Comput. Technol. 5(11), 390 (2015)
12.
go back to reference Vohra, D.: Using Apache Sqoop. In: Pro Docker, pp. 151–183. Apress, Berkeley, CA (2016) Vohra, D.: Using Apache Sqoop. In: Pro Docker, pp. 151–183. Apress, Berkeley, CA (2016)
13.
go back to reference Hoffman, S.: Apache Flume: Distributed Log Collection for Hadoop. Packt Publishing Ltd (2015) Hoffman, S.: Apache Flume: Distributed Log Collection for Hadoop. Packt Publishing Ltd (2015)
14.
go back to reference Shireesha, R., Bhutada, S.: A study of tools, techniques, and trends for big data analytics. IJACTA 4(1), 152–158 (2015) Shireesha, R., Bhutada, S.: A study of tools, techniques, and trends for big data analytics. IJACTA 4(1), 152–158 (2015)
15.
go back to reference Mazumder, S.: Big data tools and platforms. In Big Data Concepts, Theories, and Applications, pp. 29–128. Springer, Cham (2016) Mazumder, S.: Big data tools and platforms. In Big Data Concepts, Theories, and Applications, pp. 29–128. Springer, Cham (2016)
Metadata
Title
Performance Analysis of Queries with Hive Optimized Data Models
Authors
Meghna Sharma
Jagdeep Kaur
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-29407-6_49

Premium Partner