Skip to main content
Erschienen in: Cluster Computing 3/2017

10.07.2017

Making a case for the on-demand multiple distributed message queue system in a Hadoop cluster

verfasst von: Cao Ngoc Nguyen, Soonwook Hwang, Jik-Soo Kim

Erschienen in: Cluster Computing | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present a framework that can provide users with a simple, convenient and powerful way to deploy multiple message queue system on demand in a Hadoop cluster. Specifically, we are leveraging the Apache Kafka which is one of the state of art distributed message queue systems that can achieve high throughput, low latency, and good load balancing. Our framework provides automation of setting up and starting Kafka brokers on the fly and users can leverage the framework to quickly adopt Kafka without spending much efforts on installation and configuration challenges. In addition, the framework supports users to run their Kafka-based applications without detailed knowledge about the Hadoop YARN APIs and underlying mechanisms. We present a use case of the framework to evaluate Kafka’s performance with various test cases and working scenarios. The experimental results allow Kafka’s potential users to perceive the influences of different settings on the queuing performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008)CrossRef
4.
Zurück zum Zitat He, C., Weitzel, D., Swanson, D., Lu, Y.: HOG: distributed Hadoop MapReduce on the grid. In: Proceedings of the 5th Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS) 2012 in conjunction with SC12 (2012) He, C., Weitzel, D., Swanson, D., Lu, Y.: HOG: distributed Hadoop MapReduce on the grid. In: Proceedings of the 5th Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS) 2012 in conjunction with SC12 (2012)
5.
Zurück zum Zitat Hintjens, P.: ZeroMQ: Messaging for Many Applications. O’Reilly Media, Inc., Newton (2013) Hintjens, P.: ZeroMQ: Messaging for Many Applications. O’Reilly Media, Inc., Newton (2013)
7.
Zurück zum Zitat Kim, J.S., Nguyen, C., Hwang, S.: MOHA: many-task computing meets the big data platform. In: IEEE 12th International Conference on eScience (eScience 2016) (2016) Kim, J.S., Nguyen, C., Hwang, S.: MOHA: many-task computing meets the big data platform. In: IEEE 12th International Conference on eScience (eScience 2016) (2016)
8.
Zurück zum Zitat Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB (2011) Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB (2011)
9.
Zurück zum Zitat Liu, G., Wood, T.: Cloud-scale application performance monitoring with SDN and NFV. In: 2015 IEEE International Conference on Cloud Engineering (IC2E), pp. 440–445. IEEE, New York (2015) Liu, G., Wood, T.: Cloud-scale application performance monitoring with SDN and NFV. In: 2015 IEEE International Conference on Cloud Engineering (IC2E), pp. 440–445. IEEE, New York (2015)
10.
Zurück zum Zitat Lu, X., Liang, F., Wang, B., Zha, L., Xu, Z.: DataMPI: extending MPI to Hadoop-like big data computing. In: Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS ’14) (2014) Lu, X., Liang, F., Wang, B., Zha, L., Xu, Z.: DataMPI: extending MPI to Hadoop-like big data computing. In: Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS ’14) (2014)
11.
Zurück zum Zitat Murthy, A., Vavilapalli, V., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2. Addison-Wesley Data & Analytics, New York (2014) Murthy, A., Vavilapalli, V., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2. Addison-Wesley Data & Analytics, New York (2014)
12.
Zurück zum Zitat Murthy, A.C., Vavilapalli, V.K., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2. Pearson Education, Upper Saddle River (2013) Murthy, A.C., Vavilapalli, V.K., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2. Pearson Education, Upper Saddle River (2013)
13.
Zurück zum Zitat Nannoni, N.: Message-oriented middleware for scalable data analytics architectures. Master’s thesis, KTH—Information and Communication Technology School (2015) Nannoni, N.: Message-oriented middleware for scalable data analytics architectures. Master’s thesis, KTH—Information and Communication Technology School (2015)
14.
Zurück zum Zitat Nguyen, C., Kim, J.S., Hwang, S.: KOHA: building a Kafka-based distributed queue system on the fly in a Hadoop cluster. In: 2016 IEEE 1st International Workshops on Foundations and Applications of Self-* Systems (2016) Nguyen, C., Kim, J.S., Hwang, S.: KOHA: building a Kafka-based distributed queue system on the fly in a Hadoop cluster. In: 2016 IEEE 1st International Workshops on Foundations and Applications of Self-* Systems (2016)
15.
Zurück zum Zitat Preuveneers, D., Berbers, Y., Joosen Samurai, W.: A batch and streaming context architecture for large-scale intelligent applications and environments. J. Ambient Intell. Smart Environ. 8(1), 63–78 (2016)CrossRef Preuveneers, D., Berbers, Y., Joosen Samurai, W.: A batch and streaming context architecture for large-scale intelligent applications and environments. J. Ambient Intell. Smart Environ. 8(1), 63–78 (2016)CrossRef
16.
Zurück zum Zitat Raicu, I., Foster, I., Wilde, M., Zhang, Z., Iskra, K., Beckman, P., Zhao, Y., Szalay, A., Choudhary, A., Little, P., et al.: Middleware support for many-task computing. Cluster Comput. 13(3), 291–314 (2010)CrossRef Raicu, I., Foster, I., Wilde, M., Zhang, Z., Iskra, K., Beckman, P., Zhao, Y., Szalay, A., Choudhary, A., Little, P., et al.: Middleware support for many-task computing. Cluster Comput. 13(3), 291–314 (2010)CrossRef
17.
Zurück zum Zitat Raicu, I., Foster, I., Zhao, Y.: Many-task computing for grids and supercomputers. In: Proceedings of the Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS’08) (2008) Raicu, I., Foster, I., Zhao, Y.: Many-task computing for grids and supercomputers. In: Proceedings of the Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS’08) (2008)
18.
Zurück zum Zitat Richardson, A., et al.: Introduction to RabbitMQ—An Open Source Message Broker That Just Works. Google, London (2008) Richardson, A., et al.: Introduction to RabbitMQ—An Open Source Message Broker That Just Works. Google, London (2008)
19.
Zurück zum Zitat Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10) (2010) Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10) (2010)
20.
Zurück zum Zitat Snyder, B., Bosanac, D., Davies, R.: Introduction to apache activeMQ. In: ActiveMQ in Action, pp. 6–16 Snyder, B., Bosanac, D., Davies, R.: Introduction to apache activeMQ. In: ActiveMQ in Action, pp. 6–16
22.
Zurück zum Zitat Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing (SoCC’13) (2013) Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing (SoCC’13) (2013)
23.
Zurück zum Zitat Xu, L., Li, M., Butt, A.R.: GERBIL: MPI+YARN. In: Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2015) Xu, L., Li, M., Butt, A.R.: GERBIL: MPI+YARN. In: Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2015)
24.
Zurück zum Zitat Ye, J., Chow, J.H., Chen, J., Zheng, Z.: Stochastic gradient boosted distributed decision trees. In: Proceedings of the 18th ACM conference on Information and knowledge management (CIKM’09) (2009) Ye, J., Chow, J.H., Chen, J., Zheng, Z.: Stochastic gradient boosted distributed decision trees. In: Proceedings of the 18th ACM conference on Information and knowledge management (CIKM’09) (2009)
25.
Zurück zum Zitat Zookeeper: A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. https://zookeeper.apache.org/ (2017). Accessed 8 July 2017 Zookeeper: A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. https://​zookeeper.​apache.​org/​ (2017). Accessed 8 July 2017
Metadaten
Titel
Making a case for the on-demand multiple distributed message queue system in a Hadoop cluster
verfasst von
Cao Ngoc Nguyen
Soonwook Hwang
Jik-Soo Kim
Publikationsdatum
10.07.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 3/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1031-0

Weitere Artikel der Ausgabe 3/2017

Cluster Computing 3/2017 Zur Ausgabe