Skip to main content
Erschienen in: Knowledge and Information Systems 3/2017

05.10.2016 | Regular Paper

Parallel construction of wavelet trees on multicore architectures

verfasst von: José Fuentes-Sepúlveda, Erick Elejalde, Leo Ferres, Diego Seco

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The wavelet tree has become a very useful data structure to efficiently represent and query large volumes of data in many different domains, from bioinformatics to geographic information systems. One problem with wavelet trees is their construction time. In this paper, we introduce two algorithms that reduce the time complexity of a wavelet tree’s construction by taking advantage of nowadays ubiquitous multicore machines. Our first algorithm constructs all the levels of the wavelet in parallel with O(n) time and \(O(n\lg \sigma + \sigma \lg n)\) bits of working space, where n is the size of the input sequence and \(\sigma \) is the size of the alphabet. Our second algorithm constructs the wavelet tree in a domain decomposition fashion, using our first algorithm in each segment, reaching \(O(\lg n)\) time and \(O(n\lg \sigma + p\sigma \lg n/\lg \sigma )\) bits of extra space, where p is the number of available cores. Both algorithms are practical and report good speedup for large real datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
We use \(\lg x = \log _2 x\).
 
2
Notice that the RAM model is a subset of the DYM model where the outdegree of every vertex \(v \in V\) is \({\le }1\).
 
3
We also tested a new version of Libcds called Libcds2; however, the former had better running times for the construction of wtrees.
 
8
In order to be less sensitive to outliers, we use the median time instead of other statistics. In our experiments, the pwt algorithm showed a larger deviation with respect to the number of threads than the other algorithms. However, the differences were not statistically significant.
 
9
A complete report of running times and everything needed to replicate these results is available at www.​inf.​udec.​cl/​~josefuentes/​wavelettree.
 
10
The Unicode Consortium: http://​www.​unicode.​org/​.
 
11
The construction times of shun with the src.2GB dataset exceeds 1 h. To make the algorithms in the figures comparable, we report the running times for the dataset src.1GB.
 
12
The computer tested is a dual-processor \(\hbox {Intel}^{\circledR }\) \(\hbox {Xeon}^{\circledR }\) CPU (E5645) with six cores per processor, for a total of 12 physical cores running at 2.50GHz. Hyperthreading was disabled. The computer runs Linux 3.5.0-17-generic, in 64-bit mode. This machine has per-core L1 and L2 caches of sizes 32KB and 256KB, respectively, and 1 per-processor shared L3 cache of 12MB, with a 5,958MB (\(\sim \hbox {6GB}\)) DDR3 RAM.
 
13
To ensure the constant access cost, we use the numactl command with “interleave \(=\) all” option. The command allocates the memory using round robin on the NUMA nodes.
 
Literatur
5.
Zurück zum Zitat Burrows M, Wheeler DJ (1994) A block-sorting lossless data compression algorithm. Tech. rep., Digital Equipment Corporation Burrows M, Wheeler DJ (1994) A block-sorting lossless data compression algorithm. Tech. rep., Digital Equipment Corporation
9.
Zurück zum Zitat Claude F, Nicholson PK, Seco D (2011) Space efficient wavelet tree construction. In: SPIRE, vol 7024. Springer, Berlin, pp 185–196 Claude F, Nicholson PK, Seco D (2011) Space efficient wavelet tree construction. In: SPIRE, vol 7024. Springer, Berlin, pp 185–196
10.
Zurück zum Zitat Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn., chap. Multithreaded algorithms. The MIT Press, Cambridge, pp 772–812 Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn., chap. Multithreaded algorithms. The MIT Press, Cambridge, pp 772–812
13.
Zurück zum Zitat Ferragina P, Manzini G, Mäkinen V, Navarro G (2004) String processing and information retrieval: 11th international conference, SPIRE 2004, Padova, Italy, 5–8 October 2004. Proceedings, chap. An Alphabet-Friendly FM-Index. Springer, Berlin, pp 150–160. doi:10.1007/978-3-540-30213-1_23 Ferragina P, Manzini G, Mäkinen V, Navarro G (2004) String processing and information retrieval: 11th international conference, SPIRE 2004, Padova, Italy, 5–8 October 2004. Proceedings, chap. An Alphabet-Friendly FM-Index. Springer, Berlin, pp 150–160. doi:10.​1007/​978-3-540-30213-1_​23
15.
Zurück zum Zitat Fuentes-Sepúlveda J, Elejalde E, Ferres L, Seco D (2014) Efficient wavelet tree construction and querying for multicore architectures. In: Gudmundsson J, Katajainen J (eds) Experimental algorithms, Lecture Notes in Computer Science, vol 8504. Springer, Berlin, pp 150–161. doi:10.1007/978-3-319-07959-2_13 Fuentes-Sepúlveda J, Elejalde E, Ferres L, Seco D (2014) Efficient wavelet tree construction and querying for multicore architectures. In: Gudmundsson J, Katajainen J (eds) Experimental algorithms, Lecture Notes in Computer Science, vol 8504. Springer, Berlin, pp 150–161. doi:10.​1007/​978-3-319-07959-2_​13
17.
Zurück zum Zitat González R, Grabowski S, Mäkinen V, Navarro G (2005) Practical implementation of rank and select queries. In: WEA. CTI Press, Greece, pp 27–38. Poster González R, Grabowski S, Mäkinen V, Navarro G (2005) Practical implementation of rank and select queries. In: WEA. CTI Press, Greece, pp 27–38. Poster
18.
Zurück zum Zitat Grossi R, Gupta A, Vitter JS (2003) High-order entropy-compressed text indexes. In: SODA. Soc. Ind. Appl. Math., Philadelphia, pp 841–850 Grossi R, Gupta A, Vitter JS (2003) High-order entropy-compressed text indexes. In: SODA. Soc. Ind. Appl. Math., Philadelphia, pp 841–850
21.
Zurück zum Zitat Ladra S, Pedreira O, Duato J, Brisaboa NR (2012) Exploiting SIMD instructions in current processors to improve classical string algorithms. In: ADBIS. Springer, Berlin, pp 254–267. doi:10.1007/978-3-642-33074-2_19 Ladra S, Pedreira O, Duato J, Brisaboa NR (2012) Exploiting SIMD instructions in current processors to improve classical string algorithms. In: ADBIS. Springer, Berlin, pp 254–267. doi:10.​1007/​978-3-642-33074-2_​19
22.
Zurück zum Zitat Makris C (2012) Wavelet trees: a survey. Comput Sci Inf Syst 9(2):585–625CrossRef Makris C (2012) Wavelet trees: a survey. Comput Sci Inf Syst 9(2):585–625CrossRef
29.
Zurück zum Zitat Shun J (2015) Parallel wavelet tree construction. In: Proceedings of the IEEE data compression conference, Utah, USA, pp 63–72. doi:10.1109/DCC.2015.7 Shun J (2015) Parallel wavelet tree construction. In: Proceedings of the IEEE data compression conference, Utah, USA, pp 63–72. doi:10.​1109/​DCC.​2015.​7
31.
Zurück zum Zitat Tischler G (2011) On wavelet tree construction. In: CPM. Springer, Berlin, pp 208–218 Tischler G (2011) On wavelet tree construction. In: CPM. Springer, Berlin, pp 208–218
Metadaten
Titel
Parallel construction of wavelet trees on multicore architectures
verfasst von
José Fuentes-Sepúlveda
Erick Elejalde
Leo Ferres
Diego Seco
Publikationsdatum
05.10.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2017
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-016-1000-6

Weitere Artikel der Ausgabe 3/2017

Knowledge and Information Systems 3/2017 Zur Ausgabe