nach oben

The Journal of Supercomputing

Erschienen in:

05.11.2020

An efficient parallel strategy for high-cost prefix operation

verfasst von: Hazem M. Bahig, Khaled A. Fathy

Erschienen in: The Journal of Supercomputing | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The prefix computation strategy is a fundamental technique used to solve many problems in computer science such as sorting, clustering, and computer vision. A large number of parallel algorithms have been introduced that are based on a variety of high-performance systems. However, these algorithms do not consider the cost of the prefix computation operation. In this paper, we design a novel strategy for prefix computation to reduce the running time for high-cost operations such as multiplication. The proposed algorithm is based on (1) reducing the size of the partition and (2) keeping a fixed-size partition during all the steps of the computation. Experiments on a multicore system for different array sizes and number sizes demonstrate that the proposed parallel algorithm reduces the running time of the best-known optimal parallel algorithm in the average range of 62.7–79.6%. Moreover, the proposed algorithm has high speedup and is more scalable than those in previous works.

Nächster Artikel An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Akl S (1997) Parallel computation: models and methods. Prentice Hall, Upper Saddle River, New Jersey

Billeter M, Olsson O, Assarsson U (2009) Efficient stream compaction on wide SIMD many-core architectures. In: Proceedings of the Conference on High Performance Graphics August 01–03, pp 159–166

Bahig H, Bahig H, Fathy K Fast and scalable algorithm for product large data on multicore system. In: Concurrency and computation: practice and expreience. https://doi.org/10.1002/cpe.5259

Blelloch GE (1990) Prefix sums and their applications. Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University. http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf

Breitbart J (2010) Static GPU threads and an improved scan algorithm. LNCS 6586:373–380

Capannini G (2011) Designing efficient parallel prefix sum algorithms for GPUs. In: 11th International Conference on Computer and Information Technology, pp 189–196

Cole R, Vishkin U (1989) Faster optimal parallel prefix sum and list ranking. J Inf Control 4:334–352MathSciNetMATH

Datta A (2004) Multiple addition and prefix sum on a linear array with a reconfigurable pipelined bus system. J Supercomput 29(3):303–317MATHCrossRef

Dehne F, Zaboli H (2017) Parallel Sorting for GPUs. Book Series titled Emergence, Complexity and Computation: Emergent Computation 24:293–302MathSciNetMATHCrossRef

10.

Egecioglu O, Srinivasan A (1993) Optimal parallel prefix on mesh architecture. Parallel Algorithms Appl 1:191–209MATHCrossRef

11.

Fuentes-Seplveda J, Ferres L, He M, Zeh N (2017) Parallel construction of succinct trees. Theor Comput Sci 700:1–22MathSciNetMATHCrossRef

12.

Ha S, Han T (2013) A scalable work-efficient and depth-optimal parallel scan for the GPGPU environment. IEEE Trans Parallel Distrib Syst 24(12):2324–2333CrossRef

13.

Han T, Carlson DD (1987) Fast area-efficient VLSI adders. Proceedings of the Ninth Annual Symposium on Computer Arithmetic, pp 49–56

14.

Kohlhoff K, Pande V, Altman R (2013) K-means for parallel architectures using all-prefix-sum sorting and updating steps. IEEE Trans Parallel Distrib Syst 24(8):1602–1612CrossRef

15.

Jana PK, Naidu BD, Kumar S, Arora M, Sinha BP (2002) Parallel prefix computation on extended multi-mesh network. Inf Process Lett 84(6):295–303MathSciNetMATHCrossRef

16.

Jana PK, Sinha BP (2006) An improved parallel prefix algorithm on OTIS-mesh. Parallel Process Lett (World Sci) 16(4):429–440MathSciNetCrossRef

17.

Li Y, Peng S, Chu W (2009) A new presentation of Metacubes for algorithmic design and case studies: Parallel prefix computation and parallel sorting. J Chin Inst Eng 32:939–949CrossRef

18.

Li Y, Peng S, Chu W (2008) Prefix computation and sorting in dual-cube. In: Proceedings of the 37th International Conference on Parallel Processing, pp 389–396

19.

Lin YC, Lin CM (1996) Efficient parallel prefix algorithms on fully connected message passing computers. In: Proceedings of 3rd International Conference on High Performance Computing (HiPC), Trivandrum, India, pp 19–22

20.

Lucas KT (2009) Parallel algorithm for prefix computation on OTIS \(k\)-Ary 3-cube parallel computers. Int J Recent Trends Eng 1(1):560–562

21.

Mallick DK, Jana PK (2008) Parallel prefix on mesh of trees and OTIS mesh of trees. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’08), pp 359–364

22.

Maleki S, Burtscher M (2018) Automatic hierarchical parallelization of linear recurrences. In: The 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 24–28, USA, pp 128–138

23.

Nakano K (2013) Optimal parallel algorithms for computing the sum, the prefix-sums, and the summed area table on the memory machine models. IEICE Trans. Inf. Syst. E96D(12):2626–2634CrossRef

24.

Nakano k, Ito Y (2015) Optimality of fundamental parallel algorithms on the hierarchical memory machine, with GPU implementation. In: 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 626–634

25.

Park JH, Dai HK (2008) Reconfigurable hardware solution to parallel prefix computation. J Supercomput 43(1):43–58CrossRef

26.

Puchala D, Stokfiszewski KK (2018) Numerical accuracy of integral images computation algorithms. IET Image Process 12(1):31–41CrossRef

27.

Ranka S, Sahni S (1990) Hypercube algorithms with applications to image processing and pattern recognition. Springer, New York, p 237MATH

28.

Santos E (2002) Optimal and efficient algorithms for summing and prefix summing on parallel machines. J Parallel Distrib Comput 62:517–543MATHCrossRef

29.

Sinha BP, Bandyopadhyay S (2004) An optical interconnection system for parallel computing. In: Proceedings of International Conference on Computers and Devices for Communications

30.

Stine J, Babb C, Dave V (2005) Constant addition utilizing flagged prefix structures. In: IEEE International Symposium on Circuits and Systems, pp 668–671

31.

Sudhanshu K (2013) An improved parallel prefix computation on 2D-mesh network. In: International conference on computational intelligence: modeling techniques and applications (CIMTA), pp 919–926

32.

Tokura H, Fujita T, Nakano K, Ito Y, Bordim J (2018) Almost optimal column-wise prefix-sum computation on the GPU. J Supercomput 74(4):1510–1521CrossRef

33.

Wang CF, Sahni S (1998) Basic operations on the OTIS-Mesh optoelectronic computer. IEEE Trans Parallel Distrib Syst 9(12):1226–1998CrossRef

34.

Yang Y, Wu Y, Pan J (2017) Parallel dynamics computation using prefix sum operations. IEEE Robot Automat Lett 2(3):1296–1303CrossRef

35.

Zhang N (2012) A novel parallel scan for multicore processors and its application in sparse matrix-vector multiplication. IEEE Trans Parallel Distrib Syst 23(3):397–404CrossRef

Titel: An efficient parallel strategy for high-cost prefix operation
verfasst von: Hazem M. Bahig
Khaled A. Fathy
Publikationsdatum: 05.11.2020
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 6/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-020-03473-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 6/2021

K-DBSCAN: An improved DBSCAN algorithm for big data

Performance comparison of multi-container deployment schemes for HPC workloads: an empirical study

An evolutionary multi-hidden Markov model for intelligent threat sensing in industrial internet of things

Distributed composition of complex event services in IoT network

Improving Clairvoyant: reduction algorithm resilient to imbalanced process arrival patterns

Parallelization of the self-organized maps algorithm for federated learning on distributed sources