Abstract
We present a simple linear work and space, and polylogarithmic time parallel algorithm for generating multiway Cartesian trees. We show that bottom-up traversals of the multiway Cartesian tree on the interleaved suffix array and longest common prefix array of a string can be used to answer certain string queries. By adding downward pointers in the tree (e.g. using a hash table), we can also generate suffix trees from suffix arrays on arbitrary alphabets in the same bounds. In conjunction with parallel suffix array algorithms, such as the skew algorithm, this gives a rather simple linear work parallel, O(n ε) time (0<ε<1), algorithm for generating suffix trees over an integer alphabet Σ{1, ..., n}, where n is the length of the input string. It also gives a linear work parallel algorithm requiring O(log2 n) time with high probability for constant-sized alphabets. More generally, given a sorted sequence of strings and the longest common prefix lengths between adjacent elements, the algorithm will generate a patricia tree (compacted trie) over the strings. Of independent interest, we describe a work-efficient parallel algorithm for solving the all nearest smaller values problem using Cartesian trees, which is much simpler than the work-efficient parallel algorithm described in previous work.
We also present experimental results comparing the performance of the algorithm to existing sequential implementations and a second parallel algorithm that we implement. We present comparisons for the Cartesian tree algorithm on its own and for constructing a suffix tree. The results show that on a variety of strings our algorithm is competitive with the sequential version on a single processor and achieves good speedup on multiple processors. We present experiments for three applications that require only the Cartesian tree, and also for searching using the suffix tree.
- M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. 2004. Replacing Suffix Trees with enhanced suffix arrays. J. Discrete Alg. 2, 1, 53--86. Google ScholarDigital Library
- A. Apostolico, C. Iliopoulos, G. M. Landau, B. Schieber, and U. Vishkin. 1988. Parallel construction of a suffix tree with applications. Algorithmica 3, 1--4, 347--365.Google ScholarDigital Library
- M. Barsky, U. Stege, and A. Thomo. 2010. A survey of practical algorithms for suffix tree construction in external memory. Softw. Pract. Exper. 40, 11, 965--988. Google ScholarDigital Library
- O. Berkman, B. Schieber, and U. Vishkin. 1993. Optimal doubly logarithmic parallel algorithms based on finding all nearest smaller values. J. Algor. 14, 3, 344--370. Google ScholarDigital Library
- G. E. Blelloch and J. Shun. 2011. A simple parallel cartesian tree algorithm and its application to suffix tree construction. In Proceedings of the Workshop on Algorithm Engineering and Experiments. 48--58.Google Scholar
- M. Burrows and D. J. Wheeler. 1994. A Block-Sorting Lossless Data Compression Algorithm. Tech. rep. HP Labs.Google Scholar
- M. Comin and M. Farreras. 2013. Efficient parallel construction of suffix trees for genomes larger than main memory. In Proceedings of the 20th European MPI Users’ Group Meeting. 211--216. Google ScholarDigital Library
- A. Delcher, A. Phillippy, J. Carlton, and S. Salzberg. 2002. Fast Algorithms for large-scale genome alignment and comparision. Nucl. Acids Res. 30, 11, 2478--2483.Google ScholarCross Ref
- M. Farach and S. Muthukrishnan. 1996. Optimal logarithmic time randomized suffix tree construction. In Proceedings of the International Colloquium on Automata Languages and Programming. 550--561. Google ScholarDigital Library
- J. Fischer and V. Heun. 2006. Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE. In Proceedings of the 17th Annual Conference on Combinatorial Pattern Matching. 36--48. Google ScholarDigital Library
- H. Gabow, J. Bentley, and R. Tarjan. 1984. Scaling and related techniques for geometry problems. In Proceedings of the 16th Annual ACM Symposium on Theory of Computing. 135--143. Google ScholarDigital Library
- A. Ghoting and K. Makarychev. 2009. Indexing genomic sequences on the IBM Blue Gene. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis 1--11. Google ScholarDigital Library
- R. Giegerich, S. Kurtz, and J. Stoye. 2003. Efficient implementation of lazy suffix trees. Software: Pract. Exper. 33, 11, 1035--1049.Google ScholarCross Ref
- S. Gog and E. Ohlebusch. 2013. Compressed suffix trees: Efficient computation and storage of LCP-values. J. Exp. Algorithmics 18, Article 2.1. Google ScholarDigital Library
- D. Gusfield. 1997. Algorithms on Strings, Trees and Sequences. Cambridge University Press. Google ScholarDigital Library
- T. Hagerup and C. Rüb. 1989. Optimal merging and sorting on the EREWPRAM. Inf. Process. Lett. 33, 4, 181--185. Google ScholarDigital Library
- R. Hariharan. 1994. Optimal parallel suffix tree construction. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing. 290--299. Google ScholarDigital Library
- C. Iliopoulos and W. Rytter. 2004. On parallel transformations of suffix arrays into suffix trees. In Proceedings of the 15th Australasian Workshop on Combinatorial Algorithms (AWOCA).Google Scholar
- J. Jaja. 1992. Introduction to Parallel Algorithms. Addison-Wesley Professional. Google ScholarDigital Library
- J. Kärkkäinen, G. Manzini, and S. J. Puglisi. 2009. Permuted longest-common-prefix array. In Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching (CPM). 181--192. Google ScholarDigital Library
- J. Kärkkäinen, P. Sanders, and S. Burkhardt. 2006. Linear work suffix array construction. J. ACM 53, 6, 918--936. Google ScholarDigital Library
- T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching (CPM). 181--192. Google ScholarDigital Library
- F. Kulla and P. Sanders. 2007. Scalable parallel suffix array construction. Parallel Comput. 33, 9, 605--612. Google ScholarDigital Library
- S. Kurtz. 1999. Reducing the space requirement of suffix trees. Softw. Pract. Exper. 29, 13, 1149--1171. Google ScholarDigital Library
- S. Kurtz and C. Schleiermacher. 1999. REPuter: Fast computation of maximal repeats in complete genomes. Bioinformatics, 15, 5, 426--427.Google ScholarCross Ref
- C. E. Leiserson. 2010. The Cilk++ Concurrency Platform. J. Supercomput. 51, 3, 244--257. Google ScholarDigital Library
- U. Manber and E. W. Myers. 1993. Suffix Arrays: A new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948. Google ScholarDigital Library
- E. Mansour, A. Allam, S. Skiadopoulos, and P. Kalnis. 2011. ERA: Efficient serial and parallel suffix tree construction for very long strings. Proc. VLDB Endow. 5, 1, 49--60. Google ScholarDigital Library
- E. M. McCreight. 1976. A space-economical suffix tree construction algorithm. J. ACM 23, 2, 262--272. Google ScholarDigital Library
- C. Meek, J. M. Patel, and S. Kasetty. 2003. OASIS: An online and accurate technique for local-alignment searches on biological sequences. In Proceedings of the 29th International Conference on Very Large Data Bases. VLDB Endowment, 910--921. Google ScholarDigital Library
- Y. Mori. 2010a. libdivsufsort: A lightweight suffix-sorting library. http://code.google.com/p/libdivsufsort.Google Scholar
- Y. Mori. 2010b. SAIS: An implementation of the induced sorting algorithm. http://sites.google.com/site/yuta256/sais. Google ScholarDigital Library
- D. R. Morrison. 1968. PATRICIA - Practical algorithm to retrieve information coded in alphanumeric. J. ACM 15, 4, 514--534. Google ScholarDigital Library
- G. Navarro and V. Mäkinen. 2007. Compressed full-text indexes. ACM Comput. Surv. 39, 1. Google ScholarDigital Library
- B. Phoophakdee and M. Zaki. 2007. Genome-scale disk-based suffix tree indexing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 833--844. Google ScholarDigital Library
- B. Phoophakdee and M. Zaki. 2008. Trellis+: An effective approach for indexing genome-scale sequences using suffix trees. In Proceedings of the Pacific Symposium on Biocomputing. 90--101.Google Scholar
- C. K. Poon and H. Yuan. 2013. A faster CREW PRAM algorithm for computing Cartesian trees. In Proceeding of the International Conference on Algorithms and Complexity. 336--344.Google Scholar
- S. J. Puglisi, W. F. Smyth, and A. H. Turpin. 2007. A taxonomy of suffix array construction algorithms. Comput. Surveys 39, 2. Google ScholarDigital Library
- S. Rajasekaran and J. H. Reif. 1989. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM J. Comput. 18, 3, 594--607. Google ScholarDigital Library
- M. Reid-Miller, G. L. Miller, and F. Modugno. 1993. List ranking and parallel tree contraction. In Synthesis of Parallel Algorithms, Chapter 3, 115--194. Google ScholarDigital Library
- K. Sadakane. 2007. Compressed suffix trees with full functionality. Theory of Comput. Syst. 41, 4, 589--607. Google ScholarDigital Library
- S. Sahinalp and U. Vishkin. 1994. Symmetry breaking for suffix tree construction. In Proceedings of the 26th Annual ACM symposium on Theory of Computing. 300--309. Google ScholarDigital Library
- J. Shun and G. E. Blelloch. 2014. Phase-concurrent hash tables for determinism. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures. 96--107. Google ScholarDigital Library
- J. Shun, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, A. Kyrola, H. Vardhan Simhadri, and K. Tangwongsan. 2012. Brief announcement: The problem based benchmark suite. In Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures. 68--70. Google ScholarDigital Library
- J. Shun and F. Zhao. 2013. Practical parallel Lempel-Ziv factorization. In Proceedings of the 2013 Data Compression Conference. IEEE Computer Society, 123--132. Google ScholarDigital Library
- D. Tsadok and S. Yona. 2003. ANSI C Implementation of a Suffix Tree. http://mila.cs.technion.ac.il/~yona/suffix_tree/.Google Scholar
- D. Tsirogiannis and N. Koudas. 2010. Suffix tree construction algorithms on modern hardware. In Proceedings of the 13th International Conference on Extending Database Technology. 263--274. Google ScholarDigital Library
- E. Ukkonen. 1995. On-line Construction of Suffix Trees. Algorithmica 14, 3, 249--260.Google ScholarDigital Library
- J. Vuillemin. 1980. A unifying look at data structures. Commun. ACM 23, 4, 229--239. Google ScholarDigital Library
- P. Weiner. 1973. Linear pattern matching algorithm. In Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory. 1--11. Google ScholarDigital Library
- J. Ziv and A. Lempel. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 3, 337--343. Google ScholarDigital Library
Index Terms
- A simple parallel cartesian tree algorithm and its application to parallel suffix tree construction
Recommendations
A simple parallel Cartesian tree algorithm and its application to suffix tree construction
ALENEX '11: Proceedings of the Meeting on Algorithm Engineering & ExpermimentsWe present a simple linear work and space, and poly-logarithmic time parallel algorithm for generating multiway Cartesian trees. As a special case, the algorithm can be used to generate suffix trees from suffix arrays on arbitrary alphabets in the same ...
A suffix tree or not a suffix tree?
In this paper we study the structure of suffix trees. Given an unlabeled tree on n nodes and suffix links of its internal nodes, we ask the question "Is a suffix tree__ __", i.e., is there a string S whose suffix tree has the same topological structure ...
Parallel construction of a suffix tree with applications
AbstractMany string manipulations can be performed efficiently on suffix trees. In this paper a CRCW parallel RAM algorithm is presented that constructs the suffix tree associated with a string ofn symbols inO(logn) time withn processors. The algorithm ...
Comments