skip to main content
research-article

Succinct indexes for strings, binary relations and multilabeled trees

Published:28 September 2011Publication History
Skip Abstract Section

Abstract

We define and design succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures that ideally occupy asymptotically less space than the information-theoretic lower bound on the space required to encode the given data, and support an extended set of operations using the basic operators defined in the ADT. The main advantage of succinct indexes as opposed to succinct (integrated data/index) encodings is that we make assumptions only on the ADT through which the main data is accessed, rather than the way in which the data is encoded. This allows more freedom in the encoding of the main data. In this article, we present succinct indexes for various data types, namely strings, binary relations and multilabeled trees. Given the support for the interface of the ADTs of these data types, we can support various useful operations efficiently by constructing succinct indexes for them. When the operators in the ADTs are supported in constant time, our results are comparable to previous results, while allowing more flexibility in the encoding of the given data.

Using our techniques, we design a succinct encoding that represents a string of length n over an alphabet of size σ using nHk(S) + lg σ · o(n) + O(n lg σ/lg lg lg σ) bits to support access/rank/select operations in o((lg lg σ)1+ϵ) time, for any fixed constant ϵ > 0. We also design a succinct text index using n H0(S) + O(n lg σ/lg lg σ) bits that supports finding all the occ occurrences of a given pattern of length m in O(m lg lg σ + occ lg n/lgϵ σ) time, for any fixed constant 0 < ϵ < 1. Previous results on these two problems either have a lg σ factor instead of lg lg σ in the running time, or are not compressed. Finally, we present succinct encodings of binary relations and multi-labeled trees that are more compact than previous structures.

References

  1. Barbay, J. 2006. Adaptive search algorithm for patterns, in succinctly encoded XML. Tech. rep. CS-2006-11, University of Waterloo, Ontario, Canada.Google ScholarGoogle Scholar
  2. Barbay, J., Golynski, A., Munro, J. I., and Rao, S. S. 2007. Adaptive searching in succinctly encoded binary relations and tree-structured documents. Theoret. Comput. Sci. 387, 3, 284--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Barbay, J., He, M., Munro, J. I., and Rao, S. S. 2007. Succinct indexes for strings, binary relations and multi-labeled trees. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 680--689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Benoit, D., Demaine, E. D., Munro, J. I., Raman, R., Raman, V., and Rao, S. S. 2005. Representing trees of higher degree. Algorithmica 43, 4, 275--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Clark, D. R. and Munro, J. I. 1996. Efficient suffix trees on secondary storage. In Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms. 383--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Delpratt, O., Rahman, N., and Raman, R. 2006. Engineering the LOUDS succinct tree representation. In Proceedings of the 5th International Workshop on Experimental Algorithms. 134--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Demaine, E. D. and López-Ortiz, A. 2003. A linear lower bound on index size for text retrieval. J. Algor. 48, 1, 2--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ferragina, P., Giancarlo, R., Manzini, G., and Sciortino, M. 2005a. Boosting textual compression in optimal linear time. J. ACM 52, 4, 688--713. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ferragina, P., Luccio, F., Manzini, G., and Muthukrishnan, S. 2005b. Structuring labeled trees for optimal succinctness, and beyond. In Proceedings of the 46th IEEE Symposium on Foundations of Computer Science. 184--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ferragina, P. and Manzini, G. 2005. Indexing compressed text. J. ACM 52, 4, 552--581. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ferragina, P., Manzini, G., Mäkinen, V., and Navarro, G. 2004. An alphabet-friendly FM-index. In Proceedings of the 11th Symposium on String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 3246. Springer-Verlag, 150--160.Google ScholarGoogle Scholar
  12. Gál, A. and Miltersen, P. B. 2003. The cell probe complexity of succinct data structures. In Proceedings of the 30th International Colloquium on Automata, Languages and Programming. 332--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Geary, R. F., Raman, R., and Raman, V. 2006. Succinct ordinal trees with level-ancestor queries. ACM Trans. Algor. 2, 4, 510--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Giancarlo, R. and Sciortino, M. 2003. Optimal partitions of strings: A new class of Burrows-Wheeler compression algorithms. In Proceedings of the 14th Annual Symposium on Combinatorial Pattern Matching. 129--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Golynski, A. 2007. Optimal lower bounds for rank and select indexes. Theoret. Comput. Sci. 387, 3, 348--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Golynski, A., Munro, J. I., and Rao, S. S. 2006. Rank/select operations on large alphabets: A tool for text indexing. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms. 368--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gonnet, G. H., Baeza-Yates, R. A., and Snider, T. 1992. New indices for text: PAT trees and PAT arrays. In Information Retrieval: Data Structures & Algorithms. Prentice-Hall, Chapter 5, 66--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. González, R. and Navarro, G. 2006. Statistical encoding of succinct data structures. In Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching. 294--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Grossi, R., Gupta, A., and Vitter, J. S. 2003. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. 841--850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Grossi, R. and Vitter, J. S. 2005. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35, 2, 378--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gusfield, D. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. He, M. 2007. Succinct indexes. Ph.D. dissertation, University of Waterloo. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. He, M., Munro, J. I., and Rao, S. S. 2005. A categorization theorem on suffix arrays with applications to space efficient text indexes. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms. 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jacobson, G. 1989. Space-efficient static trees and graphs. In Proceedings of the 30th Annual IEEE Symposium on Foundations of Computer Science. 549--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jansson, J., Sadakane, K., and Sung, W.-K. 2007. Ultra-succinct representation of ordered trees. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Manber, U. and Myers, E. W. 1993. Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Miltersen, P. B. 2005. Lower bounds on the size of selection and rank indexes. In Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms. 11--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mortensen, C. W. 2003. Fully-dynamic two dimensional orthogonal range and line segment intersection reporting in logarithmic time. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. 618--627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mortensen, C. W. 2006. Fully dynamic orthogonal range reporting on RAM. SIAM J. Comput. 35, 6, 1494--1525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Munro, J. I., Raman, R., Raman, V., and Rao, S. S. 2003. Succinct representations of permutations. In Proceedings of the 30th International Colloquium on Automata, Languages and Programming. 345--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Munro, J. I. and Raman, V. 2001. Succinct representation of balanced parentheses and static trees. SIAM J. Comput. 31, 3, 762--776. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Raman, R., Raman, V., and Satti, S. R. 2007. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algor. 3, 4, 43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sadakane, K. and Grossi, R. 2006. Squeezing succinct data structures into entropy bounds. In Proceedings of the 17th annual ACM-SIAM Symposium on Discrete Algorithms. 1230--1239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. van Emde Boas, P., Kaas, R., and Zijlstra, E. 1977. Design and implementation of an efficient priority queue. Math. Syst. Theory 10, 99--127.Google ScholarGoogle ScholarCross RefCross Ref
  35. Willard, D. E. 1983. Log-logarithmic worst-case range queries are possible in space &Theta; (N). Inf. Proc. Lett. 17, 2, 81--84.Google ScholarGoogle ScholarCross RefCross Ref
  36. Zhang, N., Özsu, M. T., Aboulnaga, A., and Ilyas, I. F. 2006. XSEED: Accurate and fast cardinality estimation for XPath queries. In Proceedings of the 22nd International Conference on Data Engineering. 61--72. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Succinct indexes for strings, binary relations and multilabeled trees

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Algorithms
                ACM Transactions on Algorithms  Volume 7, Issue 4
                September 2011
                271 pages
                ISSN:1549-6325
                EISSN:1549-6333
                DOI:10.1145/2000807
                Issue’s Table of Contents

                Copyright © 2011 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 28 September 2011
                • Accepted: 1 February 2011
                • Revised: 1 January 2011
                • Received: 1 April 2008
                Published in talg Volume 7, Issue 4

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader