Skip to main content
Log in

On the height of digital trees and related problems

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

This paper studies in a probabilistic framework some topics concerning the way words (strings) can overlap, and relationship of this to the height of digital trees associated with this set of words. A word is defined as a random sequence of (possibly infinite) symbols over a finite alphabet. A key notion of analignment matrix {C ij } n i,j=1 is introduced whereC ij is the length of the longest string that is a prefix of theith and thejth word. It is proved that the height of an associated digital tree is simply related to the alignment matrix through some order statistics. In particular, using this observation and proving some inequalities for order statistics, we establish that the height of adigital trie under anindependent model (i.e., all words are statistically independent) is asymptotically equal to 2 logα n wheren is the number of words stored in the trie and α is a parameter of the probabilistic model. This result is generalized in three directions, namely we considerb-tries,Markovian model (i.e., dependency among letters in a word), and adependent model (i.e., dependency among words). In particular, when consecutive letters in a word are Markov dependent (Markovian model), then we demonstrate that the height converges in probability to 2 logθ n where θ is a parameter of the underlying Markov chain. On the other hand, for suffix trees which fall into the dependent model, we show that the height does not exceed 2 logκ n, where κ is a parameter of the probabilistic model. These results find plenty of applications in the analysis of data structures built over digital words.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. V. Aho, J. E. Hopcroft, and J. D. Ullman,The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA (1974).

    MATH  Google Scholar 

  2. D. Knuth,The Art of Computer Programming. Sorting and Searching, vol. III, Addison-Wesley, Reading, MA (1973).

    Google Scholar 

  3. A. Apostolico, The Myriad Virtues of Suffix Trees,Combinatorial Algorithms on Words, pp. 85–96, Springer-Verlag, New York (1985).

    Google Scholar 

  4. R. Fagin, J. Nievergelt, N. Pippenger, and H. Strong, Extendible Hashing: A Fast Access Method for Dynamic Files,ACM TODS,4, 315–344 (1979).

    Article  Google Scholar 

  5. P. Flajolet, On the Performance Evaluation of Extendible Hashing and Trie Searching,Acta Inform.,20, 345–369 (1983).

    Article  MATH  MathSciNet  Google Scholar 

  6. R. Gallager,Information Theory and Reliable Communications, Wiley, New York (1968).

    Google Scholar 

  7. J. Capetanakis, Tree Algorithms for Packet Broadcast Channels,IEEE Trans. Inform. Theory,25, 505–525 (1979).

    Article  MATH  MathSciNet  Google Scholar 

  8. IEEE Transaction Inform. Theory,31, 2 (1985).

  9. Ph. Jacquet and M. Regnier,Trie Partitioning Process: Limiting Distributions, Lecture Notes in Computer Science, vol. 214, pp. 196–210, Springer-Verlag, Berlin, 1986.

    Google Scholar 

  10. L. Devroye, A Probabilistic Analysis of the Height of Tries and of the Complexity of Trie Sort,Acta Inform.,21, 229–232 (1984).

    Article  MATH  MathSciNet  Google Scholar 

  11. B. Pittel, Asymptotic Growth of a Class of Random Trees,Ann. Probab.,13, 414–427 (1985).

    Article  MATH  MathSciNet  Google Scholar 

  12. B. Pittel, Path in a Random Digital Tree: Limiting Distributions,Adv. in Appl. Probab.,18, 139–155 (1986).

    Article  MATH  MathSciNet  Google Scholar 

  13. M. Regnier, On the Average Height of Trees in Digital Searching and Dynamic Hashing,Inform. Process. Lett.,13, 64–66 (1981).

    Article  MATH  MathSciNet  Google Scholar 

  14. A. Yao, A Note on the Analysis of Extendible Hashing,Inform. Process. Lett.,11, 84–86 (1980).

    Article  MATH  MathSciNet  Google Scholar 

  15. W. Szpankowski, On the Analysis of the Average Height of a Digital Trie: Another Approach, CSD TR-646, Purdue University (1986).

  16. A. Apostolico and W. Szpankowski, Self-Alignments in Words and Their Applications, CSD TR-732, Purdue University (1987), submitted to a journal.

  17. W. Szpankowski, Some Results onV-ary Asymmetric Tries,J. Algorithms,9, 224–244 (1988).

    Article  MATH  MathSciNet  Google Scholar 

  18. P. Kirschenhofer, H. Prodinger, and W. Szpankowski, On the Variance of the External Path Length in a Symmetric Digital Trie,Discrete Appl. Math.,25, 129–143 (1989).

    Article  MATH  MathSciNet  Google Scholar 

  19. H. David,Order Statistics, Wiley, New York (1980).

    Google Scholar 

  20. J. Galambos,The Asymptotic Theory of Extreme Order Statistics, Wiley, New York (1978).

    MATH  Google Scholar 

  21. T. Lai and H. Robbins, A Class of Dependent Random Variables and Their Maxima,Z. Wahrsch. Vern. Gebiete,42, 89–111 (1978).

    Article  MATH  MathSciNet  Google Scholar 

  22. P. Billingsley,Probability and Measures, Wiley, New York (1986).

    Google Scholar 

  23. W. Szpankowski, (Probably) Optimal Solutions to Some Problems NOT Only on Graphs, CSD TR-780, Purdue University (1988); revision TR-872 (1989).

  24. D. Knuth,Art of Computer Programming. Fundamental Algorithms, vol. I, Addison-Wesley, Reading, MA (1973).

    Google Scholar 

  25. D. Sankoff and J. Kruskal (eds.),Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparisons, Addison-Wesley, Reading, MA (1983).

    Google Scholar 

  26. Bull. Math. Biol,46, No. 4 (1984) (a special commemorative issue honoring Margonet O. Dayhoff).

  27. D. Aldous,Probability Approximations via the Poisson Clumping Heristic, Springer-Verlag, New York (1989).

    Google Scholar 

  28. N. Kamarkar, R. M. Karp, G. S. Lueker, and A. D. Odlyzko, Probabilistic Analysis of Optimum Partitioning,J. Appl. Probab.,23, 626–645 (1986).

    Article  MathSciNet  Google Scholar 

  29. S. Karlin and F. Ost, Counts of Long Aligned Word Matches Among Random Letter Sequences,Adv. Appl. Probab.,19, 293–351 (1987).

    Article  MATH  MathSciNet  Google Scholar 

  30. J. F. C. Kingman, Subadditive Processes, inEcole d'Eté de Probabilités de Saint-Flour V—1975, Lecture Notes in Mathematics, vol. 539, Springer-Verlag, Berlin, 1976.

    Google Scholar 

  31. L. Devroye, W. Szpankowski, and B. Rais, A Note on the Height of Suffix Trees, CSD TR-905, Purdue University (1989).

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by Philippe Flajolet.

This research was supported by NSF Grants NCR-8702115 and CCR-8900305, in part by Grant R01 LM05118 from the National Library of Medicine, and by AFOSR Grant 90-0107.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Szpankowski, W. On the height of digital trees and related problems. Algorithmica 6, 256–277 (1991). https://doi.org/10.1007/BF01759045

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01759045

Key words

Navigation