skip to main content
research-article

A simple parallel cartesian tree algorithm and its application to parallel suffix tree construction

Published:03 October 2014Publication History
Skip Abstract Section

Abstract

We present a simple linear work and space, and polylogarithmic time parallel algorithm for generating multiway Cartesian trees. We show that bottom-up traversals of the multiway Cartesian tree on the interleaved suffix array and longest common prefix array of a string can be used to answer certain string queries. By adding downward pointers in the tree (e.g. using a hash table), we can also generate suffix trees from suffix arrays on arbitrary alphabets in the same bounds. In conjunction with parallel suffix array algorithms, such as the skew algorithm, this gives a rather simple linear work parallel, O(n ε) time (0<ε<1), algorithm for generating suffix trees over an integer alphabet Σ{1, ..., n}, where n is the length of the input string. It also gives a linear work parallel algorithm requiring O(log2 n) time with high probability for constant-sized alphabets. More generally, given a sorted sequence of strings and the longest common prefix lengths between adjacent elements, the algorithm will generate a patricia tree (compacted trie) over the strings. Of independent interest, we describe a work-efficient parallel algorithm for solving the all nearest smaller values problem using Cartesian trees, which is much simpler than the work-efficient parallel algorithm described in previous work.

We also present experimental results comparing the performance of the algorithm to existing sequential implementations and a second parallel algorithm that we implement. We present comparisons for the Cartesian tree algorithm on its own and for constructing a suffix tree. The results show that on a variety of strings our algorithm is competitive with the sequential version on a single processor and achieves good speedup on multiple processors. We present experiments for three applications that require only the Cartesian tree, and also for searching using the suffix tree.

References

  1. M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. 2004. Replacing Suffix Trees with enhanced suffix arrays. J. Discrete Alg. 2, 1, 53--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Apostolico, C. Iliopoulos, G. M. Landau, B. Schieber, and U. Vishkin. 1988. Parallel construction of a suffix tree with applications. Algorithmica 3, 1--4, 347--365.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Barsky, U. Stege, and A. Thomo. 2010. A survey of practical algorithms for suffix tree construction in external memory. Softw. Pract. Exper. 40, 11, 965--988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Berkman, B. Schieber, and U. Vishkin. 1993. Optimal doubly logarithmic parallel algorithms based on finding all nearest smaller values. J. Algor. 14, 3, 344--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. E. Blelloch and J. Shun. 2011. A simple parallel cartesian tree algorithm and its application to suffix tree construction. In Proceedings of the Workshop on Algorithm Engineering and Experiments. 48--58.Google ScholarGoogle Scholar
  6. M. Burrows and D. J. Wheeler. 1994. A Block-Sorting Lossless Data Compression Algorithm. Tech. rep. HP Labs.Google ScholarGoogle Scholar
  7. M. Comin and M. Farreras. 2013. Efficient parallel construction of suffix trees for genomes larger than main memory. In Proceedings of the 20th European MPI Users’ Group Meeting. 211--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Delcher, A. Phillippy, J. Carlton, and S. Salzberg. 2002. Fast Algorithms for large-scale genome alignment and comparision. Nucl. Acids Res. 30, 11, 2478--2483.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Farach and S. Muthukrishnan. 1996. Optimal logarithmic time randomized suffix tree construction. In Proceedings of the International Colloquium on Automata Languages and Programming. 550--561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Fischer and V. Heun. 2006. Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. In Proceedings of the 17th Annual Conference on Combinatorial Pattern Matching. 36--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Gabow, J. Bentley, and R. Tarjan. 1984. Scaling and related techniques for geometry problems. In Proceedings of the 16th Annual ACM Symposium on Theory of Computing. 135--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Ghoting and K. Makarychev. 2009. Indexing genomic sequences on the IBM Blue Gene. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Giegerich, S. Kurtz, and J. Stoye. 2003. Efficient implementation of lazy suffix trees. Software: Pract. Exper. 33, 11, 1035--1049.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Gog and E. Ohlebusch. 2013. Compressed suffix trees: Efficient computation and storage of LCP-values. J. Exp. Algorithmics 18, Article 2.1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Gusfield. 1997. Algorithms on Strings, Trees and Sequences. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Hagerup and C. Rüb. 1989. Optimal merging and sorting on the EREWPRAM. Inf. Process. Lett. 33, 4, 181--185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Hariharan. 1994. Optimal parallel suffix tree construction. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing. 290--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Iliopoulos and W. Rytter. 2004. On parallel transformations of suffix arrays into suffix trees. In Proceedings of the 15th Australasian Workshop on Combinatorial Algorithms (AWOCA).Google ScholarGoogle Scholar
  19. J. Jaja. 1992. Introduction to Parallel Algorithms. Addison-Wesley Professional. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Kärkkäinen, G. Manzini, and S. J. Puglisi. 2009. Permuted longest-common-prefix array. In Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching (CPM). 181--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Kärkkäinen, P. Sanders, and S. Burkhardt. 2006. Linear work suffix array construction. J. ACM 53, 6, 918--936. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM). 181--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. Kulla and P. Sanders. 2007. Scalable parallel suffix array construction. Parallel Comput. 33, 9, 605--612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Kurtz. 1999. Reducing the space requirement of suffix trees. Softw. Pract. Exper. 29, 13, 1149--1171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Kurtz and C. Schleiermacher. 1999. REPuter: Fast computation of maximal repeats in complete genomes. Bioinformatics, 15, 5, 426--427.Google ScholarGoogle ScholarCross RefCross Ref
  26. C. E. Leiserson. 2010. The Cilk++ Concurrency Platform. J. Supercomput. 51, 3, 244--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. U. Manber and E. W. Myers. 1993. Suffix Arrays: A new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Mansour, A. Allam, S. Skiadopoulos, and P. Kalnis. 2011. ERA: Efficient serial and parallel suffix tree construction for very long strings. Proc. VLDB Endow. 5, 1, 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. M. McCreight. 1976. A space-economical suffix tree construction algorithm. J. ACM 23, 2, 262--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Meek, J. M. Patel, and S. Kasetty. 2003. OASIS: An online and accurate technique for local-alignment searches on biological sequences. In Proceedings of the 29th International Conference on Very Large Data Bases. VLDB Endowment, 910--921. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Mori. 2010a. libdivsufsort: A lightweight suffix-sorting library. http://code.google.com/p/libdivsufsort.Google ScholarGoogle Scholar
  32. Y. Mori. 2010b. SAIS: An implementation of the induced sorting algorithm. http://sites.google.com/site/yuta256/sais. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. R. Morrison. 1968. PATRICIA - Practical algorithm to retrieve information coded in alphanumeric. J. ACM 15, 4, 514--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. G. Navarro and V. Mäkinen. 2007. Compressed full-text indexes. ACM Comput. Surv. 39, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. Phoophakdee and M. Zaki. 2007. Genome-scale disk-based suffix tree indexing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 833--844. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. B. Phoophakdee and M. Zaki. 2008. Trellis+: An effective approach for indexing genome-scale sequences using suffix trees. In Proceedings of the Pacific Symposium on Biocomputing. 90--101.Google ScholarGoogle Scholar
  37. C. K. Poon and H. Yuan. 2013. A faster CREW PRAM algorithm for computing Cartesian trees. In Proceeding of the International Conference on Algorithms and Complexity. 336--344.Google ScholarGoogle Scholar
  38. S. J. Puglisi, W. F. Smyth, and A. H. Turpin. 2007. A taxonomy of suffix array construction algorithms. Comput. Surveys 39, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Rajasekaran and J. H. Reif. 1989. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM J. Comput. 18, 3, 594--607. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Reid-Miller, G. L. Miller, and F. Modugno. 1993. List ranking and parallel tree contraction. In Synthesis of Parallel Algorithms, Chapter 3, 115--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K. Sadakane. 2007. Compressed suffix trees with full functionality. Theory of Comput. Syst. 41, 4, 589--607. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Sahinalp and U. Vishkin. 1994. Symmetry breaking for suffix tree construction. In Proceedings of the 26th Annual ACM symposium on Theory of Computing. 300--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Shun and G. E. Blelloch. 2014. Phase-concurrent hash tables for determinism. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures. 96--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. Shun, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, A. Kyrola, H. Vardhan Simhadri, and K. Tangwongsan. 2012. Brief announcement: The problem based benchmark suite. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures. 68--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Shun and F. Zhao. 2013. Practical parallel Lempel-Ziv factorization. In Proceedings of the 2013 Data Compression Conference. IEEE Computer Society, 123--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. D. Tsadok and S. Yona. 2003. ANSI C Implementation of a Suffix Tree. http://mila.cs.technion.ac.il/~yona/suffix_tree/.Google ScholarGoogle Scholar
  47. D. Tsirogiannis and N. Koudas. 2010. Suffix tree construction algorithms on modern hardware. In Proceedings of the 13th International Conference on Extending Database Technology. 263--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. E. Ukkonen. 1995. On-line Construction of Suffix Trees. Algorithmica 14, 3, 249--260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Vuillemin. 1980. A unifying look at data structures. Commun. ACM 23, 4, 229--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. P. Weiner. 1973. Linear pattern matching algorithm. In Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Ziv and A. Lempel. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 3, 337--343. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A simple parallel cartesian tree algorithm and its application to parallel suffix tree construction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Parallel Computing
      ACM Transactions on Parallel Computing  Volume 1, Issue 1
      Inaugural Issue and Special Section on Top Papers from PACT-21, and Regular Papers
      September 2014
      162 pages
      ISSN:2329-4949
      EISSN:2329-4957
      DOI:10.1145/2632163
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 October 2014
      • Accepted: 1 August 2014
      • Revised: 1 July 2014
      • Received: 1 April 2013
      Published in topc Volume 1, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader