nach oben

Cluster Computing

Erschienen in:

01.09.2012

The impact of heterogeneous multi-core clusters on graph partitioning: an empirical study

verfasst von: Siew Yin Chan, Teck Chaw Ling, Eric Aubanel

Erschienen in: Cluster Computing | Ausgabe 3/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The advent of multi-core architectures provides an opportunity for accelerating parallelism in mesh-based applications. This multi-core environment, however, imposes challenges not addressed by conventional graph-partitioning techniques that are originally designed for distributed-memory uniprocessors. As the first step to exploit the multi-core platform, this paper presents experimental evaluation to understand partitioning performance on small-scaled heterogeneous multi-core clusters. With results and analyses gathered, we propose a hierarchical framework for resource-aware graph partitioning on heterogeneous multi-core clusters. Preliminary evaluation demonstrates the potential of the framework and motivates directions for incorporating application requirements into graph partitioning.

Vorheriger Artikel A decentralized clustering scheme for transparent mode devices

Nächster Artikel STACRP: a secure trusted auction oriented clustering based routing protocol for MANET

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

While the OS and other software versions were the same on the testbed, we compiled numerical libraries such as ATLAS 3.9.14 and CLAPACK 3.1.1.1 on different microprocessors to harvest maximum optimization.

Compilation of certain libraries on which the HPCC benchmarks rely was performed separately on each machine architecture, and the resulted binaries were placed in different folders on the frontend.

Poisson and SynApp are available on request to aubanel@unb.ca.

156,061 mesh vertices and 467,315 edges.

For brevity, the partitions for Top_v1 and Top_v2 will be named henceforth after the underlying topology.

We found out that JOSTLE constantly created disconnected partitions for four-level processor graph with P=20. Thus, we opted for three levels to minimize the likelihood of producing disjointed partitions.

We have submitted the respective graphs to DIMACS10 [38] for public use.

Alam, S.R., Agarwal, P.K., Hampton, S.S., Ong, H.: Experimental evaluation of molecular dynamics simulations on multi-core systems. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V. (eds.) High Performance Computing—HiPC 2008. Lecture Notes in Computer Science, vol. 5374, pp. 131–141. Springer, Berlin (2008). doi:10.1007/978-3-540-89894-8_15 CrossRef

Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: A view from Berkeley. Tech. rep, EECS Department, University of California, Berkeley (2006)

Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Commun. ACM 52(10), 56–67 (2009). doi:10.1145/1562764.1562783 CrossRef

Aubanel, E.: Resource-aware load balancing of parallel applications. In: Udoh, E., Wang, F.Z. (eds.) Handbook of Research on Grid Technologies and Utility Computing: Concepts for Managing Large-Scale Applications, pp. 12–21. IGI Global, Hershey (2009) CrossRef

Barnard, S.T., Simon, H.D.: Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Concurrency 6(2), 101–117 (1994). doi:10.1002/cpe.433006020 CrossRef

Bhatelé, A., Kalé, L.V.: Quantifying network contention on large parallel machines. Parallel Process. Lett. 19(4), 553–572 (2009). doi:10.1142/S0129626409000419 (Special Issue on Large-Scale Parallel Processing) MathSciNetCrossRef

Borkar, S.Y., Dubey, P., Kahn, K.C., Kuck, D.J., Mulder, H., Pawlowski, S.S., Rattner, J.R.: Platform 2015: Intel processor and platform evolution for the next decade. Tech. Rep. White Paper, Intel Corporation (2005). ftp://download.intel.com/technology/computing/archinnov/platform2015/download/Platform_2015.pdf

Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a generic framework for managing hardware affinities in hpc applications. In: Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010), Pisa, Italy, pp. 180–186 (2010) CrossRef

Bui, T., Jones, C.: A heuristic for reducing fill-in sparse matrix factorization. In: Proceedings of the 6th SIAM Conf. Parallel Processing for Scientific Computing, pp. 445–452. SIAM, Philadelphia (1993)

10.

Canon, L.C., Dubuisson, O., Gustedt, J., Jeannot, E.: Defining and controlling the heterogeneity of a cluster: the Wrekavoc tool. J. Syst. Softw. 83(5), 786–802 (2010). doi:10.1016/j.jss.2009.11.734 CrossRef

11.

Chai, L., Gao, Q., Panda, D.: Understanding the impact of multi-core architecture in cluster computing: A case study with Intel dual-core system. In: Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2007), Rio De Janeiro, pp. 471–478. IEEE, New York (2007). doi:10.1109/CCGRID.2007.119 CrossRef

12.

Chan, S.Y., Ling, T.C., Aubanel, E.: Benchmarking and profiling heterogeneous multi-core clusters using graph-partitioning workload. Tech. rep. tech. report cs-2011-01, Faculty of Computer Science and Information Technology, University of Malaya (2011)

13.

Chartrand, G., Zhang, P.: Introduction to Graph Theory. Walter Rudin Series in Advanced Mathematics. McGraw-Hill Higher Education, Singapore (2005) MATH

14.

Chen, J., Taylor, V.E.: Mesh partitioning for efficient use of distributed systems. IEEE Trans. Parallel Distrib. Syst. 13(1), 67–79 (2002). doi:10.1109/71.980027 CrossRef

15.

Chen, H., Chen, W., Huang, J., Robert, B., Kuhn, H.: Mpipp: an automatic profile-guided parallel process placement toolset for smp clusters and multiclusters. In: Proceedings of the 20th Annual International Conference on Supercomputing, Cairns, Queensland, Australia, pp. 353–360 (2006). doi:10.1145/1183401.1183451 CrossRef

16.

Clout, B., Aubanel, E.: Ehgrid: an emulator of heterogeneous computational grids. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2009), Rome, pp. 1–8 (2009). doi:10.1109/IPDPS.2009.5161167 CrossRef

17.

Cybenko, G.: Dynamic load balancing for distributed memory multiprocessors. J. Parallel Distrib. Comput. 7(2), 279–301 (1989). doi:10.1016/0743-7315(89)90021-X CrossRef

18.

Devine, K., Boman, E., Heaphy, R., Hendrickson, B., Vaughan, C.: Zoltan data management services for parallel dynamic applications. Comput. Sci. Eng. 7(2), 90–97 (2002) CrossRef

19.

Dümmler, J., Rauber, T., Rünger, G.: Mapping algorithms for multiprocessor tasks on multi-core clusters. In: Proc. of the 37th International Conference on Parallel Processing (ICPP 2008), Portland, Oregon, USA, pp. 141–148. IEEE Computer Society, Los Alamitos (2008). doi:10.1109/ICPP.2008.42 CrossRef

20.

Faik, J., Flaherty, J.E., Gervasio, L.G., Teresco, J.D., Devine, K.D.: A model for resource aware load balancing on heterogeneous clusters. Tech. Rep. CS-05-01, Williams College Department of Computer Science (2005)

21.

Gropp, W., Gunter, D., Taylor, V.: Fpmpi-2: Fast profiling library for mpi (2001). http://www.mcs.anl.gov/research/projects/fpmpi/WWW/

22.

Hendrickson, B.: Load balancing fictions, falsehoods and fallacies. Appl. Math. Model. 25, 99–108 (2000) MATHCrossRef

23.

Hendrickson, B., Kolda, T.G.: Graph partitioning models for parallel computing. Parallel Comput. 26(12), 1519–1534 (2000). doi: 10.1016/S0167-8191(00)00042-9 MathSciNetMATHCrossRef

24.

Hendrickson, B., Leland, R.: A multilevel algorithm for partitioning graphs. In: Karin, S. (ed.) Proceedings of the ACM/IEEE conference on Supercomputing, p. 28. ACM, New York (1995). doi: 10.1145/224170.224228

25.

Hood, R., Jin, H., Mehrotra, P., Chang, J., Djomehri, J., Gavali, S., Jespersen, D., Taylor, K., Biswas, R.: Performance impact of resource contention in multicore systems. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2010), Atlanta, GA, pp. 1–12. IEEE, New York (2010). doi:10.1109/IPDPS.2010.5470399 CrossRef

26.

Huang, S., Aubanel, E., Bhavsar, V.: Pagrid: A mesh partitioner for computational grids. J. Grid Comput. 4(1), 71–88 (2006) CrossRef

27.

Jeannot, E., Mercier, G.: Near-optimal placement of mpi processes on hierarchical numa architectures. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010—Parallel Processing. Lecture Notes in Computer Science, vol. 6272, pp. 199–210. Springer, Berlin (2010). doi:10.1007/978-3-642-15291-7_20

28.

Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998). doi:10.1137/S1064827595287997 MathSciNetCrossRef

29.

Karypis, G., Kumar, V.: Metis: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, version 4.0 (1998). http://glaros.dtc.umn.edu/gkhome/views/metis

30.

Kayi, A., El-Ghazawi, T., Newby, G.B.: Performance issues in emerging homogeneous multi-core architectures. Simul. Model. Pract. Theory 17(9), 1485–1499 (2009). doi:10.1016/j.simpat.2009.06.014 CrossRef

31.

Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(1), 291–307 (1970) MATH

32.

Koenig, G.A., Kalé, L.V.: Optimizing distributed application performance using dynamic grid topology-aware load balancing. In: International Parallel and Distributed Processing Symposium, Long Beach, CA, pp. 1–10 (2007) CrossRef

33.

Korkhov, V.V., Krzhizhanovskaya, V.V., Sloot, P.: A grid-based virtual reactor: parallel performance and adaptive load balancing. J. Parallel Distrib. Comput. 68(5), 596–608 (2008). doi:10.1016/j.jpdc.2007.08.010 CrossRef

34.

Kurc, O., Will, K.: An iterative parallel workload balancing framework for direct condensation of substructures. Comput. Methods Appl. Mech. Eng. 196, 2084–2096 (2007). doi:10.1016/j.cma.2006.07.015 MATHCrossRef

35.

Leng, T., Ali, R., Hsieh, J., Mashayekhi, V., Rooholamini, R.: Performance impact of process mapping on small-scale smp clusters—a case study using high performance linpack. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’02), pp. 236–243. IEEE, New York (2002). doi:10.1109/IPDPS.2002.1016657 CrossRef

36.

Levon, J.: Oprofile—a system profiler for Linux (2004). http://oprofile.sourceforge.net/

37.

Luszczek, P., Dongarra, J.J., Koester, D., Rabenseifner, R., Lucas, B., Kepner, J., McCalpin, J., Bailey, D., Takahashi, D.: Introduction to the hpc challenge benchmark suite. Tech. Rep. Paper LBNL-57493, Lawrence Berkeley National Laboratory (2005)

38.

Meyerhenke, H.: 10th dimacs implementation challenge—graph partitioning and graph clustering (2011). http://www.cc.gatech.edu/dimacs10/

39.

Meyerhenke, H., Monien, B., Schamberger, S.: Graph partitioning and disturbed diffusion. Parallel Comput. 35(10–11), 544–569 (2009). doi:10.1016/j.parco.2009.09.006 CrossRef

40.

Moulitsas, I., Karypis, G.: Architecture aware partitioning algorithms. In: Algorithms and Architectures for Parallel Processing. Lecture Notes in Computer Science, vol. 5022, pp. 42–53. Springer, Berlin (2008). doi:10.1007/978-3-540-69501-1 CrossRef

41.

Peng, L., Peir, J.K., Prakash, T.K., Staelin, C., Chen, Y.K., Koppelmana, D.: Memory hierarchy performance measurement of commercial dual-core desktop processors. J. Syst. Archit. 54(8), 816–828 (2008). doi:10.1016/j.sysarc.2008.02.004 CrossRef

42.

Schloegel, K., Karyis, G., Kumar, V.: Graph partitioning for high-performance scientific simulations. In: Sourcebook of Parallel Computing, pp. 491–541. Morgan Kaufmann, San Francisco (2003)

43.

Shewchuk, J.R.: Triangle: A two-dimensional quality mesh generator and Delaunay triangulator (2007). http://www.cs.cmu.edu/~quake/triangle.html

44.

Sinha, S., Parashar, M.: Adaptive system sensitive partitioning of amr applications on heterogeneous clusters. Clust. Comput. 5(4), 343–352 (2002) CrossRef

45.

Teresco, J.D., Faik, J., Flaherty, J.E.: Hierarchical partitioning and dynamic load balancing for scientific computation. In: Dongarra, J., Madsen, K., Wasniewski, J. (eds.) Applied Parallel Computing. Lecture Notes in Computer Science, vol. 3732, pp. 911–920. Springer, Berlin (2006). doi:10.1007/11558958

46.

University of Paderborn: Graph partitioning—graph collection (2011). http://www2.cs.uni-paderborn.de/cs/ag-monien/RESEARCH/PART/GRAPHS/FEM2.tar

47.

Walshaw, C.: The graph partitioning archive (2008). http://staffweb.cms.gre.ac.uk/~c.walshaw/partition/

48.

Walshaw, C., Cross, M.: Multilevel mesh partitioning for heterogeneous communication networks. Future Gener. Comput. Syst. 17(5), 601–623 (2001). doi:10.1016/S0167-739X(00)00107-2 CrossRef

49.

Walshaw, C., Cross, M.: Jostle: parallel multilevel graph-partitioning software—an overview. In: Magoules, F. (ed.) Mesh Partitioning Techniques and Domain Decomposition Techniques, pp. 27–58. Civil-Comp, Stirling (2007)

50.

Walshaw, C., Cross, M., Diekmann, R., Preis, R.: Multilevel mesh partitioning for optimizing domain shape. Int. J. High Perform. Comput. Appl. 13(4), 334–353 (1999). doi:10.1177/109434209901300404 CrossRef

51.

Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). doi:10.1145/1498765.1498785 CrossRef

Titel: The impact of heterogeneous multi-core clusters on graph partitioning: an empirical study
verfasst von: Siew Yin Chan
Teck Chaw Ling
Eric Aubanel
Publikationsdatum: 01.09.2012
Verlag: Springer US
Erschienen in: Cluster Computing / Ausgabe 3/2012
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-012-0229-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2012

A new approach to the job scheduling problem in computational grids

SEParAT: scheduling support environment for parallel application task graphs

Performance, reliability, and performability of a hybrid RAID array and a comparison with traditional RAID1 arrays

STACRP: a secure trusted auction oriented clustering based routing protocol for MANET

Modular implementation of dynamic algorithm switching in parallel simulations

Two-level parallelization of Ehrenfest force calculations in ab initio molecular dynamics simulation