Skip to main content
Erschienen in: The Journal of Supercomputing 3/2016

01.03.2016

Parallel Partition and Merge QuickSort (PPMQSort) on Multicore CPUs

verfasst von: Ratthaslip Ranokphanuwat, Surin Kittitornkun

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

An explosive amount of data has tremendous impacts on sorting, searching, indexing, and so on. Sorting is one of the basic Computer Science problems needed to be fast and efficient to serve Big Data. This paper presents an efficient and scalable algorithm called Parallel Partition and Merge QuickSort (PPMQSort) running on any shared memory/multicore/multi-socket systems. Together with OpenMP 3.0 library, the PPMQSort is developed to be compatible and benchmarked with the fastest C/C++ Stdlib qsort(). The PPMQSort recursively divides an unsorted input array into partially sorted partitions up to Cutoff length using nested multithreading. Finally, those independent partitions are qsort() (conquered) such that no synchronizations are needed. The resulting Speedup of 12.29\(\times \) on a dual-socket 8-core Xeon E5520 can be achieved for sorting random 200 M 32-bit integer data at 16 threads. With the same configuration, a 4-core AMD A6-3600 CPU (non-HyperThread) can reach up to 4.67\(\times \), a superlinear Speedup. It has been proved that the proposed PPMQSort can exploit all available cache levels and HyperThread CPU cores well thus utilizing up to 83 % and 96 % of CPU on E5520 and A6-3600, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
2.
3.
Zurück zum Zitat Mishra AD (2009) Selection of best sorting algorithm for a particular problem. Master’s thesis, Thapar University, Computer Science and Engineering Department Mishra AD (2009) Selection of best sorting algorithm for a particular problem. Master’s thesis, Thapar University, Computer Science and Engineering Department
4.
Zurück zum Zitat Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114CrossRef Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114CrossRef
5.
Zurück zum Zitat Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269CrossRefMATH Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269CrossRefMATH
6.
Zurück zum Zitat Bhandarkar SM, Arabnia HR (1997) Parallel computer vision on a reconfigurable multiprocessor network. IEEE Trans Parallel Distrib Syst 8(3):292–309CrossRef Bhandarkar SM, Arabnia HR (1997) Parallel computer vision on a reconfigurable multiprocessor network. IEEE Trans Parallel Distrib Syst 8(3):292–309CrossRef
7.
Zurück zum Zitat Koch D, Torresen J (2011) Fpgasort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting. In: Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’11. ACM, New York, pp 45–54 Koch D, Torresen J (2011) Fpgasort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting. In: Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’11. ACM, New York, pp 45–54
8.
Zurück zum Zitat Mueller R, Teubner J, Alonso G (2012) Sorting networks on fpgas. VLDB J 21(1):1–23CrossRef Mueller R, Teubner J, Alonso G (2012) Sorting networks on fpgas. VLDB J 21(1):1–23CrossRef
9.
Zurück zum Zitat Casper J, Olukotun K (2014) Hardware acceleration of database operations. In: Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, FPGA ’14. ACM, New York, pp 151–160 Casper J, Olukotun K (2014) Hardware acceleration of database operations. In: Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, FPGA ’14. ACM, New York, pp 151–160
10.
Zurück zum Zitat Capannini G, Silvestri F, Baraglia R (2012) Sorting on gpus for large scale datasets: a thorough comparison. Inf Process Manag 48(5):903–917CrossRef Capannini G, Silvestri F, Baraglia R (2012) Sorting on gpus for large scale datasets: a thorough comparison. Inf Process Manag 48(5):903–917CrossRef
11.
Zurück zum Zitat Xiaochen T, Rocki K, Suda R (2013) Register level sort algorithm on multi-core simd processors. In: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms, p 9. ACM Xiaochen T, Rocki K, Suda R (2013) Register level sort algorithm on multi-core simd processors. In: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms, p 9. ACM
12.
Zurück zum Zitat Heidelberger P, Norton A, Robinson JT (1990) Parallel quicksort using fetch-and-add. IEEE Trans Comput 39(1):847–857CrossRef Heidelberger P, Norton A, Robinson JT (1990) Parallel quicksort using fetch-and-add. IEEE Trans Comput 39(1):847–857CrossRef
13.
Zurück zum Zitat Tsigas P, Zhang Y (2003) A simple, fast parallel implementation of quicksort and its performance evaluation on sun enterprise 10000. In: 11th Euromicro Conference on Parallel Distributed and Network based Processing (PDP 2003). Genoa, pp 372–381 Tsigas P, Zhang Y (2003) A simple, fast parallel implementation of quicksort and its performance evaluation on sun enterprise 10000. In: 11th Euromicro Conference on Parallel Distributed and Network based Processing (PDP 2003). Genoa, pp 372–381
14.
Zurück zum Zitat Sub M, Leopold C (2004) A user’s experience with parallel sorting and openmp. In: Proc. of the 6th European Workshop on OpenMP (EWOMP 2004). Stockholm Sub M, Leopold C (2004) A user’s experience with parallel sorting and openmp. In: Proc. of the 6th European Workshop on OpenMP (EWOMP 2004). Stockholm
15.
Zurück zum Zitat Man D, Ito Y, Nakano K (2009) An efficient parallel sorting compatible with the standard qsort. In: International Conference on Parallel and Distributed Computing, Applications and Technologies. Hiroshima, pp 512–517 Man D, Ito Y, Nakano K (2009) An efficient parallel sorting compatible with the standard qsort. In: International Conference on Parallel and Distributed Computing, Applications and Technologies. Hiroshima, pp 512–517
16.
Zurück zum Zitat Man D, Ito Y, Nakano K (2011) An efficient parallel sorting compatible with the standard qsort. Int J Found Comput Sci 22(5):1057–1071CrossRefMATH Man D, Ito Y, Nakano K (2011) An efficient parallel sorting compatible with the standard qsort. Int J Found Comput Sci 22(5):1057–1071CrossRefMATH
17.
Zurück zum Zitat Kim KJ, Cho SJ, Jeon JW (2011) Parallel quick sort algorithms analysis using openmp 3.0 in embedded system. In: 11th International Conference on Control, Automation and Systems. KINTEX, Gyeonggi-do, pp 757–761 Kim KJ, Cho SJ, Jeon JW (2011) Parallel quick sort algorithms analysis using openmp 3.0 in embedded system. In: 11th International Conference on Control, Automation and Systems. KINTEX, Gyeonggi-do, pp 757–761
18.
Zurück zum Zitat Mahafzah BA (2013) Performance assessment of multithreaded quicksort algorithm on simultaneous multithreaded architecture. J Supercomput 66:339–363CrossRef Mahafzah BA (2013) Performance assessment of multithreaded quicksort algorithm on simultaneous multithreaded architecture. J Supercomput 66:339–363CrossRef
19.
Zurück zum Zitat Bingmann T (2015) Andreas Eberle, and Peter Sanders. Engineering parallel string sorting. Algorithmica, pp 1–52 Bingmann T (2015) Andreas Eberle, and Peter Sanders. Engineering parallel string sorting. Algorithmica, pp 1–52
20.
Zurück zum Zitat Rashid L, Hassanein WM, Hammad MA (2010) Analyzing and enhancing the parallel sort operation on multithreaded architectures. J Supercomput 53:293–312CrossRef Rashid L, Hassanein WM, Hammad MA (2010) Analyzing and enhancing the parallel sort operation on multithreaded architectures. J Supercomput 53:293–312CrossRef
21.
Zurück zum Zitat Saleem S, Lali MIU, Nawaz MS, Nauman AB (2014) Multi-core program optimization: parallel sorting algorithms in intel cilk plus. Int J Hybrid Inf Technol 7(2):151–164CrossRef Saleem S, Lali MIU, Nawaz MS, Nauman AB (2014) Multi-core program optimization: parallel sorting algorithms in intel cilk plus. Int J Hybrid Inf Technol 7(2):151–164CrossRef
23.
Zurück zum Zitat Gustafson JL (1990) Fixed time, tiered memory, and superlinear speedup. In: Proceedings of the Fifth Distributed Memory Computing Conference (DMCC5) Gustafson JL (1990) Fixed time, tiered memory, and superlinear speedup. In: Proceedings of the Fifth Distributed Memory Computing Conference (DMCC5)
24.
Zurück zum Zitat Helmbold DP, Mcdowell CE (1990) Modeling speedup (n) greater than n. IEEE Trans Parallel Distrib Syst 1(2):250–256MathSciNetCrossRef Helmbold DP, Mcdowell CE (1990) Modeling speedup (n) greater than n. IEEE Trans Parallel Distrib Syst 1(2):250–256MathSciNetCrossRef
25.
Zurück zum Zitat Weaver VM (2013) Linux perf event features and overhead. In: Second International Workshop on Performance Analysis of Workload Optimized Systems (FastPath 2013). Austin Weaver VM (2013) Linux perf event features and overhead. In: Second International Workshop on Performance Analysis of Workload Optimized Systems (FastPath 2013). Austin
26.
Zurück zum Zitat Zhang Y, Li ZP, Cao HF (2015) System-enforced deterministic streaming for efficient pipeline parallelism. J Comput Sci Technol 30(1):57–73MathSciNetCrossRef Zhang Y, Li ZP, Cao HF (2015) System-enforced deterministic streaming for efficient pipeline parallelism. J Comput Sci Technol 30(1):57–73MathSciNetCrossRef
27.
Zurück zum Zitat Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing. 2nd ed. Pearson Education Limited Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing. 2nd ed. Pearson Education Limited
28.
Zurück zum Zitat Akhter S, Roberts J (2006) Multi-core programming increasing performance through software multi-threading. Intel Press, Hillsboro Akhter S, Roberts J (2006) Multi-core programming increasing performance through software multi-threading. Intel Press, Hillsboro
29.
Zurück zum Zitat Barker KJ, Davis K, Hoisie A, Kerbyson DJ, Lang Mike, Pakin Scott, Sancho Jose Carlos (2008) A performance evaluation of the nehalem quad-core processor for scientific computing. Parallel Process Lett 18(4):453–469MathSciNetCrossRef Barker KJ, Davis K, Hoisie A, Kerbyson DJ, Lang Mike, Pakin Scott, Sancho Jose Carlos (2008) A performance evaluation of the nehalem quad-core processor for scientific computing. Parallel Process Lett 18(4):453–469MathSciNetCrossRef
30.
Zurück zum Zitat Wulf WA, McKee SA (1995) Hitting the memory wall: implications of the obvious. SIGARCH Comput Archit News 23(1):20–24CrossRef Wulf WA, McKee SA (1995) Hitting the memory wall: implications of the obvious. SIGARCH Comput Archit News 23(1):20–24CrossRef
31.
Zurück zum Zitat Eyerman S, Smith JE, Eeckhout L (2006) Characterizing the branch misprediction penalty. In: IEEE International Symposium on Performance Analysis of Systems Software (ISPASS 2006). Austin, pp 48–58 Eyerman S, Smith JE, Eeckhout L (2006) Characterizing the branch misprediction penalty. In: IEEE International Symposium on Performance Analysis of Systems Software (ISPASS 2006). Austin, pp 48–58
32.
Zurück zum Zitat Qureshi K, Majeed B, Kazmi JH, Madani SA (2012) Task partitioning, scheduling and load balancing strategy for mixed nature of tasks. J Supercomput 59(3):1348–1359CrossRef Qureshi K, Majeed B, Kazmi JH, Madani SA (2012) Task partitioning, scheduling and load balancing strategy for mixed nature of tasks. J Supercomput 59(3):1348–1359CrossRef
Metadaten
Titel
Parallel Partition and Merge QuickSort (PPMQSort) on Multicore CPUs
verfasst von
Ratthaslip Ranokphanuwat
Surin Kittitornkun
Publikationsdatum
01.03.2016
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 3/2016
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-016-1641-y

Weitere Artikel der Ausgabe 3/2016

The Journal of Supercomputing 3/2016 Zur Ausgabe

Premium Partner