Skip to main content
Log in

Improving the performance of Apache Hadoop on pervasive environments through context-aware scheduling

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

This article proposes to improve Apache Hadoop scheduling through a context-aware approach. Apache Hadoop is the most popular implementation of the MapReduce paradigm for distributed computing, but its design does not adapt automatically to computing nodes’ context and capabilities. By introducing context-awareness into Hadoop, we intent to dynamically adapt its scheduling to the execution environment. This is a necessary feature in the context of pervasive grids, which are heterogeneous, dynamic and shared environments. The solution has been incorporated into Hadoop and assessed through controlled experiments. The experiments demonstrate that context-awareness provides comparative performance gains, especially when some of the resources disappear during execution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://aws.amazon.com/elasticmapreduce/

Abbreviations

API:

Application programming interface

DHT:

Distributed hash table

FIFO:

First in, first out

HDFS:

Hadoop distributed file system

P2P:

Peer-to-Peer

PER-MARE:

Pervasive map-reduce project

SLA:

Service-level agreement

VM:

Virtual machine

YARN:

Yet another resource negotiator

References

  • Apache, Apache Hadoop, 2014. http://hadoop.apache.org/docs/r2.6.0/index.html. Last access: November 2014

  • Assuncao MD, Netto MAS, Koch F, Bianchi S (2012) Context-aware job scheduling for cloud computing environments. In: IEEE Fifth International Conference on Utility and Cloud Computing (UCC). 2012. pp 255–262. doi:10.1109/UCC.2012.33

  • Baldauf M, Dustdar S, Rosenberg F (2007) A survey on context-aware systems. Int J Ad Hoc Ubiquitous Comput 2(4):263–277

    Article  Google Scholar 

  • Cassales GW, Charao AS, Pinheiro MK, Souveyet C, Steffenel LA (2014) Bringing Context to Apache Hadoop. In: 8th International Conference on Mobile Ubiquitous Computing, Rome, Italy

  • Cassales GW, Charao AS, Kirsch Pinheiro M, Souveyet C, Steffenel LA (2015) Context-aware scheduling for apache hadoop over pervasive environments. Procedia Comp Sci 52:202–209. The 6th International Conference on Ambient Systems, Networks and Technologies (ANT-2015), the 5th International Conference on Sustainable Energy Information Technology (SEIT-2015). doi:10.1016/j.procs.2015.05.058. http://www.sciencedirect.com/science/article/pii/S1877050915008583

  • Cavallo M, Cusma L, Modica GD, Polito C, Tomarchio O (2015) A scheduling strategy to run Hadoop jobs on geodistributed data. In: 3rd Workshop on CLoud for IoT (CLIoT 2015), in conjunction with the European Conference on Service-Oriented and Cloud Computing (ESOCC 2015)

  • Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment, In: Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology. CIT ’10 (IEEE Computer Society, Washington, DC, USA, 2010), pp 2736–2743 (978-0-7695-4108-2)

  • Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  • Engel T, Charo A, Kirsch-Pinheiro M, Steffenel LA (2015) Performance improvement of data mining in weka through multi-core and gpu acceleration: opportunities and pitfalls. J Ambient Intel Humaniz Comput 6(4):377–390. doi:10.1007/s12652-015-0292-9

  • Grid’5000, Grid 5000, 2013. https://www.grid5000.fr/, Last access: July 2014

  • Hamilton, J.: Hadoop Wins TeraSort, 2008. http://perspectives.mvdirona.com/2008/07/hadoop-wins-terasort/. Last access: September 2015

  • Hofmann P, Woods D (2010) Cloud computing: the limits of public clouds for business applications. IEEE Internet Comput 14(6):90–93. doi:10.1109/MIC.2010.136

    Article  Google Scholar 

  • Huang S, Huang J, Dai J, Xie T, Huang B: The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), 2010, pp 41–51. doi:10.1109/ICDEW.2010.5452747

  • Hunt P, Konar M, Junqueira FP, Reed B, ZooKeeper: wait-free Coordination for Internet-scale Systems. In: Proceedings of the USENIX Annual Technical Conference (USENIX Association, Boston, MA, USA, 2010), pp 11. http://dl.acm.org/citation.cfm?id=1855840.1855851

  • Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A (2009) Quincy: fair scheduling for distributed computing clusters, in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. SOSP ’09 (ACM, New York, NY, USA, 2009), pp 261–276 (978-1-60558-752-3)

  • Kumar KA, Konishetty VK, Voruganti K, Rao GVP (2012) CASH: context aware scheduler for Hadoop. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics. ICACCI ’12, New York, NY, USA, 2012, pp 52–61 (978-1-4503-1196-0)

  • Li J, Wang Q, Jayasinghe D, Park J, Zhu T, Pu C (2013) Performance overhead among three hypervisors: an experimental study using Hadoop benchmarks. In: 2013 IEEE International Congress on Big Data (BigData Congress) 2013, pp 9–16. 2013, doi:10.1109/BigData.Congress..11

  • Maamar Z, Benslimane D, Narendra NC (2006) What can context do for web services? Commun ACM 49(12):98–103

    Article  Google Scholar 

  • Marozzo F, Talia D, Trunfio P (2012) P2p-mapreduce: parallel data processing in dynamic cloud environments. J Comput Syst Sci 78(5):1382–1402

    Article  Google Scholar 

  • Maurer M, Brandic I, Sakellariou R (2012) Self-adaptive and resource-efficient SLA enactment for cloud computing infrastructures. In: 2012 IEEE 5th International Conference on cloud computing (CLOUD), 2012, pp 368–375. doi:10.1109/CLOUD.2012.55

  • Najar S, Kirsch M, Pinheiro C (2015) Souveyet, service discovery and prediction on pervasive information system. J Ambient Intell Human Comp 6(4):407–423. doi:10.1007/s12652-015-0288-5

    Article  Google Scholar 

  • Nascimento AP, Boeres C, Rebello VEF (2008) Dynamic self-scheduling for parallel applications with task dependencies. In: Proceedings of the 6th International Workshop on MGC. MGC ’08, New York, NY, USA, 2008, pp 1–116 (978-1-60558-365-5)

  • Oracle, Overview of Java SE Monitoring and Management, 2014. http://docs.oracle.com/javase/7/docs/technotes/guides/management/overview.html, Last access: July 2014

  • Parashar M, Pierson JM (2010) Pervasive grids: challenges and opportunities. In: Li K, Hsu C, Yang L, Dongarra J, Zima H (eds) Handbook of Research on Scalable Computing Technologies. (IGI Global, 2010), pp 14–30. doi:10.4018/978-1-60566-661-7.ch002 ( 978–160566661-7)

  • Ramakrishnan A, Preuveneers D, Berbers Y (2014) Enabling self-learning in dynamic and open IoT environments. In: Shakshuki E, Yasar A (eds) The 5th International Conference on Ambient Systems, Networks and Technologies (ANT-2014), the 4th International Conference on Sustainable Energy Information Technology (SEIT-2014), vol. 32, 2014, pp 207–214. doi:10.1016/j.procs.2014.05.416

  • Rasooli A, Down DG (2012) Coshh: a classification and optimization based scheduler for heterogeneous hadoop systems. In: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. SCC ’12 (IEEE Computer Society, Washington, DC, USA, 2012), pp. 1284–1291 (978-0-7695-4956-9)

  • Sandholm T, Lai K (2010) Dynamic Proportional Share Scheduling in Hadoop. In: Proceedings of the 15th International Conference on Job Scheduling Strategies for Parallel Processing. JSSPP’10, Berlin, Heidelberg, 2010, pp 110–131. (3–642-16504-4, 978-3-642-16504-7)

  • Schadt EE, Linderman MD, Sorenson J, Lee L, Nolan GP (2010) Computational solutions to large-scale data management and analysis. Nat Rev Genet 11(9):647–657. doi:10.1038/nrg2857

  • Steffenel LA, Kirsch Pinheiro M (2015) Leveraging data intensive applications on a pervasive computing platform: The case of mapreduce. Procedia Comp Sci 52:1034–1039 (2015). The 6th International Conference on Ambient Systems, Networks and Technologies (ANT-2015), the 5th International Conference on Sustainable Energy Information Technology (SEIT-2015). doi:10.1016/j.procs.2015.05.102. http://www.sciencedirect.com/science/article/pii/S1877050915009023

  • Steffenel LA, Flauzac O, Charão AS, Barcelos PP, Stein B, Nesmachnow S, Kirsch Pinheiro M, Diaz D (2013) PER-MARE: adaptive deployment of MapReduce over pervasive grids. In: Proceedings of the 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC ’13 (IEEE Computer Society, Washington, DC, USA, 2013), pp 17–24 (978-0-7695-5094-7)

  • STIC-AmSud, PER-MARE project, 2014. http://cosy.univ-reims.fr/PER-MARE, Last access: July 2014

  • Tian C, Zhou H, He Y, Zha L (2009) A dynamic MapReduce scheduler for heterogeneous workloads. In: Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing. GCC ’09 (IEEE Computer Society, Washington, DC, USA, 2009), pp 218–224 (978-0-7695-3766-5)

  • Xie J, Ruan X, Ding Z, Tian Y, Majors J, Manzanares A, Yin S, Qin X (2010) Improving MapReduce performance through data placement in heterogeneous Hadoop clusters, in Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW)

  • Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving MapReduce performance in heterogeneous environments, in Proceedings of the 8th USENIX conference on Operating systems design and implementation. OSDI’08 (USENIX Association, Berkeley, CA, USA, 2008), pp 29–42

Download references

Acknowledgments

The authors would like to thank their partners in the PER-MARE project STIC-AmSud (2014) and acknowledge the financial support given to this research by the CAPES/MAEE/ANII STIC-AmSud collaboration program (project number 13STIC07).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luiz-Angelo Steffenel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

W. Cassales, G., Schwertner Charão, A., Kirsch-Pinheiro, M. et al. Improving the performance of Apache Hadoop on pervasive environments through context-aware scheduling. J Ambient Intell Human Comput 7, 333–345 (2016). https://doi.org/10.1007/s12652-016-0361-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-016-0361-8

Keywords

Navigation