nach oben

The Journal of Supercomputing

Erschienen in:

01.03.2016

Parallel Partition and Merge QuickSort (PPMQSort) on Multicore CPUs

verfasst von: Ratthaslip Ranokphanuwat, Surin Kittitornkun

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

An explosive amount of data has tremendous impacts on sorting, searching, indexing, and so on. Sorting is one of the basic Computer Science problems needed to be fast and efficient to serve Big Data. This paper presents an efficient and scalable algorithm called Parallel Partition and Merge QuickSort (PPMQSort) running on any shared memory/multicore/multi-socket systems. Together with OpenMP 3.0 library, the PPMQSort is developed to be compatible and benchmarked with the fastest C/C++ Stdlib qsort(). The PPMQSort recursively divides an unsorted input array into partially sorted partitions up to Cutoff length using nested multithreading. Finally, those independent partitions are qsort() (conquered) such that no synchronizations are needed. The resulting Speedup of 12.29\(\times \) on a dual-socket 8-core Xeon E5520 can be achieved for sorting random 200 M 32-bit integer data at 16 threads. With the same configuration, a 4-core AMD A6-3600 CPU (non-HyperThread) can reach up to 4.67\(\times \), a superlinear Speedup. It has been proved that the proposed PPMQSort can exploit all available cache levels and HyperThread CPU cores well thus utilizing up to 83 % and 96 % of CPU on E5520 and A6-3600, respectively.

Vorheriger Artikel The k-ary n-direct s-indirect family of topologies for large-scale interconnection networks

Nächster Artikel Design procedures and NML cost analysis of reversible barrel shifters optimizing garbage and ancilla lines

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Hoare CAR (1962) Quicksort ACM 4:321CrossRef

Sedgewick R (1978) Implementing quicksort program. Commun ACM 21(10):847–857CrossRefMATH

Mishra AD (2009) Selection of best sorting algorithm for a particular problem. Master’s thesis, Thapar University, Computer Science and Engineering Department

Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114CrossRef

Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269CrossRefMATH

Bhandarkar SM, Arabnia HR (1997) Parallel computer vision on a reconfigurable multiprocessor network. IEEE Trans Parallel Distrib Syst 8(3):292–309CrossRef

Koch D, Torresen J (2011) Fpgasort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting. In: Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’11. ACM, New York, pp 45–54

Mueller R, Teubner J, Alonso G (2012) Sorting networks on fpgas. VLDB J 21(1):1–23CrossRef

Casper J, Olukotun K (2014) Hardware acceleration of database operations. In: Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, FPGA ’14. ACM, New York, pp 151–160

10.

Capannini G, Silvestri F, Baraglia R (2012) Sorting on gpus for large scale datasets: a thorough comparison. Inf Process Manag 48(5):903–917CrossRef

11.

Xiaochen T, Rocki K, Suda R (2013) Register level sort algorithm on multi-core simd processors. In: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms, p 9. ACM

12.

Heidelberger P, Norton A, Robinson JT (1990) Parallel quicksort using fetch-and-add. IEEE Trans Comput 39(1):847–857CrossRef

13.

Tsigas P, Zhang Y (2003) A simple, fast parallel implementation of quicksort and its performance evaluation on sun enterprise 10000. In: 11th Euromicro Conference on Parallel Distributed and Network based Processing (PDP 2003). Genoa, pp 372–381

14.

Sub M, Leopold C (2004) A user’s experience with parallel sorting and openmp. In: Proc. of the 6th European Workshop on OpenMP (EWOMP 2004). Stockholm

15.

Man D, Ito Y, Nakano K (2009) An efficient parallel sorting compatible with the standard qsort. In: International Conference on Parallel and Distributed Computing, Applications and Technologies. Hiroshima, pp 512–517

16.

Man D, Ito Y, Nakano K (2011) An efficient parallel sorting compatible with the standard qsort. Int J Found Comput Sci 22(5):1057–1071CrossRefMATH

17.

Kim KJ, Cho SJ, Jeon JW (2011) Parallel quick sort algorithms analysis using openmp 3.0 in embedded system. In: 11th International Conference on Control, Automation and Systems. KINTEX, Gyeonggi-do, pp 757–761

18.

Mahafzah BA (2013) Performance assessment of multithreaded quicksort algorithm on simultaneous multithreaded architecture. J Supercomput 66:339–363CrossRef

19.

Bingmann T (2015) Andreas Eberle, and Peter Sanders. Engineering parallel string sorting. Algorithmica, pp 1–52

20.

Rashid L, Hassanein WM, Hammad MA (2010) Analyzing and enhancing the parallel sort operation on multithreaded architectures. J Supercomput 53:293–312CrossRef

21.

Saleem S, Lali MIU, Nawaz MS, Nauman AB (2014) Multi-core program optimization: parallel sorting algorithms in intel cilk plus. Int J Hybrid Inf Technol 7(2):151–164CrossRef

22.

Architecture Review Board (2014) The openmp api specification for parallel programming. http://www.openmp.org

23.

Gustafson JL (1990) Fixed time, tiered memory, and superlinear speedup. In: Proceedings of the Fifth Distributed Memory Computing Conference (DMCC5)

24.

Helmbold DP, Mcdowell CE (1990) Modeling speedup (n) greater than n. IEEE Trans Parallel Distrib Syst 1(2):250–256MathSciNetCrossRef

25.

Weaver VM (2013) Linux perf event features and overhead. In: Second International Workshop on Performance Analysis of Workload Optimized Systems (FastPath 2013). Austin

26.

Zhang Y, Li ZP, Cao HF (2015) System-enforced deterministic streaming for efficient pipeline parallelism. J Comput Sci Technol 30(1):57–73MathSciNetCrossRef

27.

Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing. 2nd ed. Pearson Education Limited

28.

Akhter S, Roberts J (2006) Multi-core programming increasing performance through software multi-threading. Intel Press, Hillsboro

29.

Barker KJ, Davis K, Hoisie A, Kerbyson DJ, Lang Mike, Pakin Scott, Sancho Jose Carlos (2008) A performance evaluation of the nehalem quad-core processor for scientific computing. Parallel Process Lett 18(4):453–469MathSciNetCrossRef

30.

Wulf WA, McKee SA (1995) Hitting the memory wall: implications of the obvious. SIGARCH Comput Archit News 23(1):20–24CrossRef

31.

Eyerman S, Smith JE, Eeckhout L (2006) Characterizing the branch misprediction penalty. In: IEEE International Symposium on Performance Analysis of Systems Software (ISPASS 2006). Austin, pp 48–58

32.

Qureshi K, Majeed B, Kazmi JH, Madani SA (2012) Task partitioning, scheduling and load balancing strategy for mixed nature of tasks. J Supercomput 59(3):1348–1359CrossRef

Titel: Parallel Partition and Merge QuickSort (PPMQSort) on Multicore CPUs
verfasst von: Ratthaslip Ranokphanuwat
Surin Kittitornkun
Publikationsdatum: 01.03.2016
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 3/2016
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-016-1641-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 3/2016

HHS: an efficient network topology for large-scale data centers

A study on effective flooding over MANET based on exchange of neighbor information

Erratum to: Design procedures and NML cost analysis of reversible barrel shifters optimizing garbage and ancilla lines

A proposal on location data supplementing information transfer method over MANET

Resource provisioning and scheduling in clouds: QoS perspective

Improved cuckoo optimization algorithm for solving systems of nonlinear equations

Premium Partner