skip to main content
research-article

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

Published:08 February 2010Publication History
Skip Abstract Section

Abstract

We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

References

  1. Airoldi, E., Blei, D., Fienberg, S., and Xing, E. 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981--2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Albert, R., and Barabasi, A. 2002. Statistical mechanics of complex networks. Rev. Mod. Phy. 74, 1, 47--97.Google ScholarGoogle ScholarCross RefCross Ref
  3. Aldous, D. 1985. Exchangeability and related topics. In Ecole d'Eté de Probabilités de Saint-Flour, XIII—1983. Springer-Verlag, Berlin, Germany, 1--198.Google ScholarGoogle Scholar
  4. Antoniak, C. 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 2, 1152--1174.Google ScholarGoogle ScholarCross RefCross Ref
  5. Barabasi, A., and Reka, A. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.Google ScholarGoogle Scholar
  6. Bernardo, J., and Smith, A. 1994. Bayesian Theory. Wiley, Chichester, UK.Google ScholarGoogle Scholar
  7. Billingsley, P. 1995. Probability and Measure. Wiley, New York.Google ScholarGoogle Scholar
  8. Blackwell, D., and MacQueen, J. B. 1973. Ferguson distributions via Pólya urn schemes. Ann. Stat. 1, 353--355.Google ScholarGoogle ScholarCross RefCross Ref
  9. Blei, D., Griffiths, T., Jordan, M., and Tenenbaum, J. 2003a. Hierarchical topic models and the nested Chinese restaurant process. In Adv. Neural Inf. Proc. Syst. 16. MIT Press, Cambridge, MA, 17--24.Google ScholarGoogle Scholar
  10. Blei, D., and Jordan, M. 2003. Modeling annotated data. In Proceedings of the 26th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Blei, D., and Jordan, M. 2005. Variational inference for Dirichlet process mixtures. J. Bayesian Anal. 1, 121--144.Google ScholarGoogle ScholarCross RefCross Ref
  12. Blei, D., and Lafferty, J. 2006. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning. ACM, New York, 113--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Blei, D., and Lafferty, J. 2007. A correlated topic model of Science. Ann. Appl. Stat. 1, 17--35.Google ScholarGoogle ScholarCross RefCross Ref
  14. Blei, D., and Lafferty, J. 2009. Topic models. In Text Mining: Theory and Applications. Taylor and Francis, London, UK.Google ScholarGoogle Scholar
  15. Blei, D., Ng, A., and Jordan, M. 2003b. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarGoogle ScholarCross RefCross Ref
  16. Chakrabarti, S., Dom, B., Agrawal, R., and Raghavan, P. 1998. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB J. 7, 163--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cimiano, P., Hotho, A., and Staab, S. 2005. Learning concept hierarchies from text corpora using formal concept analysis. J. Artifi. Intelli. Res. 24, 305--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soci. Inf. Sci. 6, 391--407.Google ScholarGoogle ScholarCross RefCross Ref
  19. Devroye, L., Györfi, L., and Lugosi, G. 1996. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York.Google ScholarGoogle Scholar
  20. Dietz, L., Bickel, S., and Scheffer, T. 2007. Unsupervised prediction of citation influences. In Proceedings of the 24th International Conference on Machine Learning. ACM, New York, 233--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Drinea, E., Enachesu, M., and Mitzenmacher, M. 2006. Variations on random graph models for the web. Tech. Rep. TR-06-01, Harvard University.Google ScholarGoogle Scholar
  22. Duan, J., Guindani, M., and Gelfand, A. 2007. Generalized spatial Dirichlet process models. Biometrika 94, 809--825.Google ScholarGoogle ScholarCross RefCross Ref
  23. Duda, R., Hart, P., and Stork, D. 2000. Pattern Classification. Wiley-Interscience, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Dumais, S., and Chen, H. 2000. Hierarchical classification of web content. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 256--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Escobar, M., and West, M. 1995. Bayesian density estimation and inference using mixtures. J. Amer. Stat. Assoc. 90, 577--588.Google ScholarGoogle ScholarCross RefCross Ref
  26. Fei-Fei, L., and Perona, P. 2005. A Bayesian hierarchical model for learning natural scene categories. IEEE Comput. Visi. Patt. Rec., 524--531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ferguson, T. 1973. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209--230.Google ScholarGoogle ScholarCross RefCross Ref
  28. Gelfand, A., and Smith, A. 1990. Sampling based approaches to calculating marginal densities. J. Amer. Stati. Assoc. 85, 398--409.Google ScholarGoogle ScholarCross RefCross Ref
  29. Gelman, A., Carlin, J., Stern, H., and Rubin, D. 1995. Bayesian Data Analysis. Chapman & Hall, London, UK.Google ScholarGoogle Scholar
  30. Geman, S., and Geman, D. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Patt. Anal. Mach. Intell. 6, 721--741.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Goldberg, A., and Tarjan, R. 1986. A new approach to the maximum flow problem. JACM 35, 4, 921--940. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Goldwater, S., Griffiths, T., and Johnson, M. 2006a. Contextual dependencies in unsupervised word segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 673--680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Goldwater, S., Griffiths, T., and Johnson, M. 2006b. Interpolating between types and tokens by estimating power-law generators. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 459--467.Google ScholarGoogle Scholar
  34. Griffiths, T., and Ghahramani, Z. 2006. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 475--482.Google ScholarGoogle Scholar
  35. Griffiths, T., and Steyvers, M. 2004. Finding scientific topics. Proc. Nat. Acad. Sci. 101, 5228--5235.Google ScholarGoogle ScholarCross RefCross Ref
  36. Griffiths, T., and Steyvers, M. 2006. Probabilistic topic models. In Latent Semantic Analysis: A Road to Meaning, Erlbaum, Hillsdale, NJ.Google ScholarGoogle Scholar
  37. Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning. Springer, New York.Google ScholarGoogle Scholar
  38. Heller, K., and Ghahramani, Z. 2005. Bayesian hierarchical clustering. In Proceedings of the 22nd International Conference on Machine Learning. ACM, New York, 297--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hjort, N., Holmes, C., Müller, P., and Walker, S. 2010. Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, Cambridge, UK.Google ScholarGoogle ScholarCross RefCross Ref
  40. Hofmann, T. 1999a. The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data. In Proceedings of the 15th International Joint Conferences on Artificial Intelligence. Morgan-Kaufmann, San Francisco, CA, 682--687. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Hofmann, T. 1999b. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Johnson, M., Griffiths, T., and S., G. 2007. Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA, 641--648.Google ScholarGoogle Scholar
  43. Johnson, N., and Kotz, S. 1977. Urn Models and Their Applications: An Approach to Modern Discrete Probability Theory. Wiley, New York.Google ScholarGoogle Scholar
  44. Jordan, M. I. 2000. Graphical models. Stat. Sci. 19, 140--155.Google ScholarGoogle ScholarCross RefCross Ref
  45. Kass, R., and Raftery, A. 1995. Bayes factors. J. American Statistical Association 90, 773--795.Google ScholarGoogle ScholarCross RefCross Ref
  46. Koller, D., and Sahami, M. 1997. Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 170--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Krapivsky, P., and Redner, S. 2001. Organization of growing random networks. Phys. Rev. E 63, 6.Google ScholarGoogle ScholarCross RefCross Ref
  48. Kurihara, K., Welling, M., and Vlassis, N. 2007. Accelerated variational Dirichlet process mixtures. In Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA, 761--768.Google ScholarGoogle Scholar
  49. Larsen, B., and Aone, C. 1999. Fast and effective text mining using linear-time document clustering. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 16--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Lauritzen, S. L. 1996. Graphical Models. Oxford University Press, Oxford, UK.Google ScholarGoogle Scholar
  51. Li, W., Blei, D., and McCallum, A. 2007. Nonparametric Bayes pachinko allocation. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence. AUAI Press, Menlo Park, CA.Google ScholarGoogle Scholar
  52. Liang, P., Petrov, S., Klein, D., and Jordan, M. 2007. The infinite PCFG using hierarchical Dirichlet processes. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Stroudsburg, PA, 688--697.Google ScholarGoogle Scholar
  53. Liu, J. 1994. The collapsed Gibbs sampler in Bayesian computations with application to a gene regulation problem. J. Amer. Stat. Assoc. 89, 958--966.Google ScholarGoogle ScholarCross RefCross Ref
  54. MacEachern, S., and Muller, P. 1998. Estimating mixture of Dirichlet process models. J. Computat. Graph. Stat. 7, 223--238.Google ScholarGoogle Scholar
  55. Marlin, B. 2003. Modeling user rating profiles for collaborative filtering. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 627--634.Google ScholarGoogle Scholar
  56. McCallum, A., Corrada-Emmanuel, A., and Wang, X. 2004. The author-recipient-topic model for topic and role discovery in social networks: Experiments with Enron and academic email. Tech. rep., University of Massachusetts, Amherst.Google ScholarGoogle Scholar
  57. McCallum, A., Nigam, K., Rennie, J., and Seymore, K. 1999. Building domain-specific search engines with machine learning techniques. In Proceedings of the AAAI Spring Symposium on Intelligent Agents in Cyberspace. AAAI Press, Menlo Park, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Milch, B., Marthi, B., Sontag, D., Ong, D., and Kobolov, A. 2005. Approximate inference for infinite contingent Bayesian networks. In Proceedings of 10th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.Google ScholarGoogle Scholar
  59. Mimno, D., and McCallum, A. 2007. Organizing the OCA: Learning faceted subjects from a library of digital books. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital libraries. ACM, New York, 376--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Neal, R. 2000. Markov chain sampling methods for Dirichlet process mixture models. J. Computat. Graph. Stat. 9, 249--265.Google ScholarGoogle Scholar
  61. Newman, D., Chemudugunta, C., and Smyth, P. 2006. Statistical entity-topic models. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 680--686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Pasula, H., and Russell, S. 2001. Approximate inference for first-order probabilistic languages. In Proceedings of the 17th International Joint Conferences on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA, 741--748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Pitman, J. 2002. Combinatorial Stochastic Processes. Lecture Notes for St. Flour Summer School. Springer-Verlag, New York, NY.Google ScholarGoogle Scholar
  64. Poole, D. 2007. Logical generative models for probabilistic reasoning about existence, roles and identity. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA, 1271--1279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Pritchard, J., Stephens, M., and Donnelly, P. 2000. Inference of population structure using multilocus genotype data. Genetics 155, 945--959.Google ScholarGoogle ScholarCross RefCross Ref
  66. Robert, C., and Casella, G. 2004. Monte Carlo Statistical Methods. Springer-Verlag, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Rodríguez, A., Dunson, D. B., and Gelfand, A. E. 2008. The nested Dirichlet process. J. Amer. Stat. Assoc. 103, 1131--1154.Google ScholarGoogle ScholarCross RefCross Ref
  68. Rosen-Zvi, M., Griffiths, T., Steyvers, M., and Smith, P. 2004. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, Menlo Park, CA, 487--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Sanderson, M., and Croft, B. 1999. Deriving concept hierarchies from text. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 206--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Sethuraman, J. 1994. A constructive definition of Dirichlet priors. Stat. Sin. 4, 639--650.Google ScholarGoogle Scholar
  71. Stoica, E., and Hearst, M. 2004. Nearly-automated metadata hierarchy creation. In Companion Proceedings of HLT-NAACL. Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Sudderth, E., Torralba, A., Freeman, W., and Willsky, A. 2005. Describing visual scenes using transformed Dirichlet processes. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 1297--1306.Google ScholarGoogle Scholar
  73. Teh, Y., Gorur, D., and Ghahramani, Z. 2007. Stick-breaking construction for the Indian buffet process. In Proceedings of 11th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.Google ScholarGoogle Scholar
  74. Teh, Y., Jordan, M., Beal, M., and Blei, D. 2007. Hierarchical Dirichlet processes. J. Amer. Stat. Assoc. 101, 1566--1581.Google ScholarGoogle ScholarCross RefCross Ref
  75. Thibaux, R., and Jordan, M. 2007. Hierarchical beta processes and the Indian buffet process. In Proceedings of 11th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.Google ScholarGoogle Scholar
  76. Titterington, D., Smith, A., and Makov, E. 1985. Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester, UK.Google ScholarGoogle Scholar
  77. Vaithyanathan, S., and Dom, B. 2000. Model-based hierarchical clustering. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufmann, San Francisco, CA, 599--608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Xing, E., Jordan, M., and Sharan, R. 2007. Bayesian haplotype inference via the Dirichlet process. J. Comput. Biol. 14, 267--284.Google ScholarGoogle ScholarCross RefCross Ref
  79. Zamir, O., and Etzioni, O. 1998. Web document clustering: A feasibility demonstration. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 46--54. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

      Recommendations

      Reviews

      Grigore Albeanu

      Topic modeling and text analysis are important tasks of any methodology for hierarchical analysis of document collections. Text categorization, text compression, and text summarization are well-known applications, but predicted structures obtained through machine learning can also be used in computational biology, image understanding, and language modeling for speech recognition, to name just a few examples. During the last decade, researchers in machine learning proposed parametric and nonparametric models. In this paper, the authors describe a new algorithm for building a topic hierarchy of a document collection. The authors propose an unsupervised learning method based on the nested Chinese restaurant process (nCRP) and Bayesian nonparametric (BNP) inference, and analyze the method using three different collections of documents from different domains. The paper consists of seven sections, with more than 70 references. Section 1 presents the state of the art related to the scientific context for solving the mentioned problem, and Section 2 describes stochastic process theory and BNP statistics. Readers should be familiar with the Dirichlet and beta distributions, the CRP, the P?lya urn distribution, the stick-breaking processes, the Griffiths-Engen-McCloskey (GEM) distribution, and their connections to random partitions on integers and flexible clustering strategies. Section 3 introduces nCRP and discusses "how similar ideas can be used to define a probability distribution on infinitely deep, infinitely branching trees." In Section 4, based on nCRP, the authors extend the latent Dirichlet allocation (LDA) topic model to the hierarchical version (hLDA). Such a generative process defines a probability distribution across possible corpora of documents. Related work is discussed at the end of the section. Next, the authors develop an algorithm based on Markov chain Monte Carlo (MCMC) and Gibbs sampling. Section 5 analyzes the algorithm using level allocation, the path, hyperparameters, convergence, and the mode. Section 6 presents examples and empirical results, for both simulated and real text data. Section 7 is dedicated to discussions and conclusions on defining prior distributions of trees, based on nCRP and hLDA, and developing a BNP methodology for analyzing document collections. While the content of the collections varies significantly, the statistical principles behind the proposed model allow for the recovery of meaningful sets of topics at multiple levels of abstraction, using trees. Although this paper is a valuable contribution to the field, there is a need for further work on hyperparameters and convergence speed. The paper is long, with some misprints and omissions, and the list of references does not meet the journal's typical standards. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Journal of the ACM
        Journal of the ACM  Volume 57, Issue 2
        January 2010
        248 pages
        ISSN:0004-5411
        EISSN:1557-735X
        DOI:10.1145/1667053
        Issue’s Table of Contents

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 February 2010
        • Accepted: 1 August 2009
        • Revised: 1 February 2008
        • Received: 1 June 2007
        Published in jacm Volume 57, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader