Skip to main content
Erschienen in: The Journal of Supercomputing 6/2021

05.11.2020

An efficient parallel strategy for high-cost prefix operation

verfasst von: Hazem M. Bahig, Khaled A. Fathy

Erschienen in: The Journal of Supercomputing | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The prefix computation strategy is a fundamental technique used to solve many problems in computer science such as sorting, clustering, and computer vision. A large number of parallel algorithms have been introduced that are based on a variety of high-performance systems. However, these algorithms do not consider the cost of the prefix computation operation. In this paper, we design a novel strategy for prefix computation to reduce the running time for high-cost operations such as multiplication. The proposed algorithm is based on (1) reducing the size of the partition and (2) keeping a fixed-size partition during all the steps of the computation. Experiments on a multicore system for different array sizes and number sizes demonstrate that the proposed parallel algorithm reduces the running time of the best-known optimal parallel algorithm in the average range of 62.7–79.6%. Moreover, the proposed algorithm has high speedup and is more scalable than those in previous works.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Akl S (1997) Parallel computation: models and methods. Prentice Hall, Upper Saddle River, New Jersey Akl S (1997) Parallel computation: models and methods. Prentice Hall, Upper Saddle River, New Jersey
2.
Zurück zum Zitat Billeter M, Olsson O, Assarsson U (2009) Efficient stream compaction on wide SIMD many-core architectures. In: Proceedings of the Conference on High Performance Graphics August 01–03, pp 159–166 Billeter M, Olsson O, Assarsson U (2009) Efficient stream compaction on wide SIMD many-core architectures. In: Proceedings of the Conference on High Performance Graphics August 01–03, pp 159–166
5.
Zurück zum Zitat Breitbart J (2010) Static GPU threads and an improved scan algorithm. LNCS 6586:373–380 Breitbart J (2010) Static GPU threads and an improved scan algorithm. LNCS 6586:373–380
6.
Zurück zum Zitat Capannini G (2011) Designing efficient parallel prefix sum algorithms for GPUs. In: 11th International Conference on Computer and Information Technology, pp 189–196 Capannini G (2011) Designing efficient parallel prefix sum algorithms for GPUs. In: 11th International Conference on Computer and Information Technology, pp 189–196
7.
Zurück zum Zitat Cole R, Vishkin U (1989) Faster optimal parallel prefix sum and list ranking. J Inf Control 4:334–352MathSciNetMATH Cole R, Vishkin U (1989) Faster optimal parallel prefix sum and list ranking. J Inf Control 4:334–352MathSciNetMATH
8.
Zurück zum Zitat Datta A (2004) Multiple addition and prefix sum on a linear array with a reconfigurable pipelined bus system. J Supercomput 29(3):303–317MATHCrossRef Datta A (2004) Multiple addition and prefix sum on a linear array with a reconfigurable pipelined bus system. J Supercomput 29(3):303–317MATHCrossRef
9.
Zurück zum Zitat Dehne F, Zaboli H (2017) Parallel Sorting for GPUs. Book Series titled Emergence, Complexity and Computation: Emergent Computation 24:293–302MathSciNetMATHCrossRef Dehne F, Zaboli H (2017) Parallel Sorting for GPUs. Book Series titled Emergence, Complexity and Computation: Emergent Computation 24:293–302MathSciNetMATHCrossRef
10.
Zurück zum Zitat Egecioglu O, Srinivasan A (1993) Optimal parallel prefix on mesh architecture. Parallel Algorithms Appl 1:191–209MATHCrossRef Egecioglu O, Srinivasan A (1993) Optimal parallel prefix on mesh architecture. Parallel Algorithms Appl 1:191–209MATHCrossRef
11.
12.
Zurück zum Zitat Ha S, Han T (2013) A scalable work-efficient and depth-optimal parallel scan for the GPGPU environment. IEEE Trans Parallel Distrib Syst 24(12):2324–2333CrossRef Ha S, Han T (2013) A scalable work-efficient and depth-optimal parallel scan for the GPGPU environment. IEEE Trans Parallel Distrib Syst 24(12):2324–2333CrossRef
13.
Zurück zum Zitat Han T, Carlson DD (1987) Fast area-efficient VLSI adders. Proceedings of the Ninth Annual Symposium on Computer Arithmetic, pp 49–56 Han T, Carlson DD (1987) Fast area-efficient VLSI adders. Proceedings of the Ninth Annual Symposium on Computer Arithmetic, pp 49–56
14.
Zurück zum Zitat Kohlhoff K, Pande V, Altman R (2013) K-means for parallel architectures using all-prefix-sum sorting and updating steps. IEEE Trans Parallel Distrib Syst 24(8):1602–1612CrossRef Kohlhoff K, Pande V, Altman R (2013) K-means for parallel architectures using all-prefix-sum sorting and updating steps. IEEE Trans Parallel Distrib Syst 24(8):1602–1612CrossRef
15.
Zurück zum Zitat Jana PK, Naidu BD, Kumar S, Arora M, Sinha BP (2002) Parallel prefix computation on extended multi-mesh network. Inf Process Lett 84(6):295–303MathSciNetMATHCrossRef Jana PK, Naidu BD, Kumar S, Arora M, Sinha BP (2002) Parallel prefix computation on extended multi-mesh network. Inf Process Lett 84(6):295–303MathSciNetMATHCrossRef
16.
Zurück zum Zitat Jana PK, Sinha BP (2006) An improved parallel prefix algorithm on OTIS-mesh. Parallel Process Lett (World Sci) 16(4):429–440MathSciNetCrossRef Jana PK, Sinha BP (2006) An improved parallel prefix algorithm on OTIS-mesh. Parallel Process Lett (World Sci) 16(4):429–440MathSciNetCrossRef
17.
Zurück zum Zitat Li Y, Peng S, Chu W (2009) A new presentation of Metacubes for algorithmic design and case studies: Parallel prefix computation and parallel sorting. J Chin Inst Eng 32:939–949CrossRef Li Y, Peng S, Chu W (2009) A new presentation of Metacubes for algorithmic design and case studies: Parallel prefix computation and parallel sorting. J Chin Inst Eng 32:939–949CrossRef
18.
Zurück zum Zitat Li Y, Peng S, Chu W (2008) Prefix computation and sorting in dual-cube. In: Proceedings of the 37th International Conference on Parallel Processing, pp 389–396 Li Y, Peng S, Chu W (2008) Prefix computation and sorting in dual-cube. In: Proceedings of the 37th International Conference on Parallel Processing, pp 389–396
19.
Zurück zum Zitat Lin YC, Lin CM (1996) Efficient parallel prefix algorithms on fully connected message passing computers. In: Proceedings of 3rd International Conference on High Performance Computing (HiPC), Trivandrum, India, pp 19–22 Lin YC, Lin CM (1996) Efficient parallel prefix algorithms on fully connected message passing computers. In: Proceedings of 3rd International Conference on High Performance Computing (HiPC), Trivandrum, India, pp 19–22
20.
Zurück zum Zitat Lucas KT (2009) Parallel algorithm for prefix computation on OTIS \(k\)-Ary 3-cube parallel computers. Int J Recent Trends Eng 1(1):560–562 Lucas KT (2009) Parallel algorithm for prefix computation on OTIS \(k\)-Ary 3-cube parallel computers. Int J Recent Trends Eng 1(1):560–562
21.
Zurück zum Zitat Mallick DK, Jana PK (2008) Parallel prefix on mesh of trees and OTIS mesh of trees. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’08), pp 359–364 Mallick DK, Jana PK (2008) Parallel prefix on mesh of trees and OTIS mesh of trees. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’08), pp 359–364
22.
Zurück zum Zitat Maleki S, Burtscher M (2018) Automatic hierarchical parallelization of linear recurrences. In: The 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 24–28, USA, pp 128–138 Maleki S, Burtscher M (2018) Automatic hierarchical parallelization of linear recurrences. In: The 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 24–28, USA, pp 128–138
23.
Zurück zum Zitat Nakano K (2013) Optimal parallel algorithms for computing the sum, the prefix-sums, and the summed area table on the memory machine models. IEICE Trans. Inf. Syst. E96D(12):2626–2634CrossRef Nakano K (2013) Optimal parallel algorithms for computing the sum, the prefix-sums, and the summed area table on the memory machine models. IEICE Trans. Inf. Syst. E96D(12):2626–2634CrossRef
24.
Zurück zum Zitat Nakano k, Ito Y (2015) Optimality of fundamental parallel algorithms on the hierarchical memory machine, with GPU implementation. In: 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 626–634 Nakano k, Ito Y (2015) Optimality of fundamental parallel algorithms on the hierarchical memory machine, with GPU implementation. In: 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 626–634
25.
Zurück zum Zitat Park JH, Dai HK (2008) Reconfigurable hardware solution to parallel prefix computation. J Supercomput 43(1):43–58CrossRef Park JH, Dai HK (2008) Reconfigurable hardware solution to parallel prefix computation. J Supercomput 43(1):43–58CrossRef
26.
Zurück zum Zitat Puchala D, Stokfiszewski KK (2018) Numerical accuracy of integral images computation algorithms. IET Image Process 12(1):31–41CrossRef Puchala D, Stokfiszewski KK (2018) Numerical accuracy of integral images computation algorithms. IET Image Process 12(1):31–41CrossRef
27.
Zurück zum Zitat Ranka S, Sahni S (1990) Hypercube algorithms with applications to image processing and pattern recognition. Springer, New York, p 237MATH Ranka S, Sahni S (1990) Hypercube algorithms with applications to image processing and pattern recognition. Springer, New York, p 237MATH
28.
Zurück zum Zitat Santos E (2002) Optimal and efficient algorithms for summing and prefix summing on parallel machines. J Parallel Distrib Comput 62:517–543MATHCrossRef Santos E (2002) Optimal and efficient algorithms for summing and prefix summing on parallel machines. J Parallel Distrib Comput 62:517–543MATHCrossRef
29.
Zurück zum Zitat Sinha BP, Bandyopadhyay S (2004) An optical interconnection system for parallel computing. In: Proceedings of International Conference on Computers and Devices for Communications Sinha BP, Bandyopadhyay S (2004) An optical interconnection system for parallel computing. In: Proceedings of International Conference on Computers and Devices for Communications
30.
Zurück zum Zitat Stine J, Babb C, Dave V (2005) Constant addition utilizing flagged prefix structures. In: IEEE International Symposium on Circuits and Systems, pp 668–671 Stine J, Babb C, Dave V (2005) Constant addition utilizing flagged prefix structures. In: IEEE International Symposium on Circuits and Systems, pp 668–671
31.
Zurück zum Zitat Sudhanshu K (2013) An improved parallel prefix computation on 2D-mesh network. In: International conference on computational intelligence: modeling techniques and applications (CIMTA), pp 919–926 Sudhanshu K (2013) An improved parallel prefix computation on 2D-mesh network. In: International conference on computational intelligence: modeling techniques and applications (CIMTA), pp 919–926
32.
Zurück zum Zitat Tokura H, Fujita T, Nakano K, Ito Y, Bordim J (2018) Almost optimal column-wise prefix-sum computation on the GPU. J Supercomput 74(4):1510–1521CrossRef Tokura H, Fujita T, Nakano K, Ito Y, Bordim J (2018) Almost optimal column-wise prefix-sum computation on the GPU. J Supercomput 74(4):1510–1521CrossRef
33.
Zurück zum Zitat Wang CF, Sahni S (1998) Basic operations on the OTIS-Mesh optoelectronic computer. IEEE Trans Parallel Distrib Syst 9(12):1226–1998CrossRef Wang CF, Sahni S (1998) Basic operations on the OTIS-Mesh optoelectronic computer. IEEE Trans Parallel Distrib Syst 9(12):1226–1998CrossRef
34.
Zurück zum Zitat Yang Y, Wu Y, Pan J (2017) Parallel dynamics computation using prefix sum operations. IEEE Robot Automat Lett 2(3):1296–1303CrossRef Yang Y, Wu Y, Pan J (2017) Parallel dynamics computation using prefix sum operations. IEEE Robot Automat Lett 2(3):1296–1303CrossRef
35.
Zurück zum Zitat Zhang N (2012) A novel parallel scan for multicore processors and its application in sparse matrix-vector multiplication. IEEE Trans Parallel Distrib Syst 23(3):397–404CrossRef Zhang N (2012) A novel parallel scan for multicore processors and its application in sparse matrix-vector multiplication. IEEE Trans Parallel Distrib Syst 23(3):397–404CrossRef
Metadaten
Titel
An efficient parallel strategy for high-cost prefix operation
verfasst von
Hazem M. Bahig
Khaled A. Fathy
Publikationsdatum
05.11.2020
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 6/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03473-x

Weitere Artikel der Ausgabe 6/2021

The Journal of Supercomputing 6/2021 Zur Ausgabe