research-article

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

Authors:
David M. Blei

Princeton University, Princeton, New Jersey

Princeton University, Princeton, New Jersey
View Profile

,
Thomas L. Griffiths

University of California, Berkeley, California

University of California, Berkeley, California
View Profile

,
Michael I. Jordan

University of California, Berkeley, California

University of California, Berkeley, California
View Profile

Authors Info & Claims

Journal of the ACM Volume 57 Issue 2Article No.: 7pp 1–30https://doi.org/10.1145/1667053.1667056

Published:08 February 2010Publication History

Journal of the ACM

Abstract

We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

References

Airoldi, E., Blei, D., Fienberg, S., and Xing, E. 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981--2014. Google ScholarDigital Library
Albert, R., and Barabasi, A. 2002. Statistical mechanics of complex networks. Rev. Mod. Phy. 74, 1, 47--97.Google ScholarCross Ref
Aldous, D. 1985. Exchangeability and related topics. In Ecole d'Eté de Probabilités de Saint-Flour, XIII—1983. Springer-Verlag, Berlin, Germany, 1--198.Google Scholar
Antoniak, C. 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 2, 1152--1174.Google ScholarCross Ref
Barabasi, A., and Reka, A. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.Google Scholar
Bernardo, J., and Smith, A. 1994. Bayesian Theory. Wiley, Chichester, UK.Google Scholar
Billingsley, P. 1995. Probability and Measure. Wiley, New York.Google Scholar
Blackwell, D., and MacQueen, J. B. 1973. Ferguson distributions via Pólya urn schemes. Ann. Stat. 1, 353--355.Google ScholarCross Ref
Blei, D., Griffiths, T., Jordan, M., and Tenenbaum, J. 2003a. Hierarchical topic models and the nested Chinese restaurant process. In Adv. Neural Inf. Proc. Syst. 16. MIT Press, Cambridge, MA, 17--24.Google Scholar
Blei, D., and Jordan, M. 2003. Modeling annotated data. In Proceedings of the 26th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 127--134. Google ScholarDigital Library
Blei, D., and Jordan, M. 2005. Variational inference for Dirichlet process mixtures. J. Bayesian Anal. 1, 121--144.Google ScholarCross Ref
Blei, D., and Lafferty, J. 2006. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning. ACM, New York, 113--120. Google ScholarDigital Library
Blei, D., and Lafferty, J. 2007. A correlated topic model of Science. Ann. Appl. Stat. 1, 17--35.Google ScholarCross Ref
Blei, D., and Lafferty, J. 2009. Topic models. In Text Mining: Theory and Applications. Taylor and Francis, London, UK.Google Scholar
Blei, D., Ng, A., and Jordan, M. 2003b. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarCross Ref
Chakrabarti, S., Dom, B., Agrawal, R., and Raghavan, P. 1998. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB J. 7, 163--178. Google ScholarDigital Library
Cimiano, P., Hotho, A., and Staab, S. 2005. Learning concept hierarchies from text corpora using formal concept analysis. J. Artifi. Intelli. Res. 24, 305--339. Google ScholarDigital Library
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soci. Inf. Sci. 6, 391--407.Google ScholarCross Ref
Devroye, L., Györfi, L., and Lugosi, G. 1996. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York.Google Scholar
Dietz, L., Bickel, S., and Scheffer, T. 2007. Unsupervised prediction of citation influences. In Proceedings of the 24th International Conference on Machine Learning. ACM, New York, 233--240. Google ScholarDigital Library
Drinea, E., Enachesu, M., and Mitzenmacher, M. 2006. Variations on random graph models for the web. Tech. Rep. TR-06-01, Harvard University.Google Scholar
Duan, J., Guindani, M., and Gelfand, A. 2007. Generalized spatial Dirichlet process models. Biometrika 94, 809--825.Google ScholarCross Ref
Duda, R., Hart, P., and Stork, D. 2000. Pattern Classification. Wiley-Interscience, New York. Google ScholarDigital Library
Dumais, S., and Chen, H. 2000. Hierarchical classification of web content. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 256--263. Google ScholarDigital Library
Escobar, M., and West, M. 1995. Bayesian density estimation and inference using mixtures. J. Amer. Stat. Assoc. 90, 577--588.Google ScholarCross Ref
Fei-Fei, L., and Perona, P. 2005. A Bayesian hierarchical model for learning natural scene categories. IEEE Comput. Visi. Patt. Rec., 524--531. Google ScholarDigital Library
Ferguson, T. 1973. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209--230.Google ScholarCross Ref
Gelfand, A., and Smith, A. 1990. Sampling based approaches to calculating marginal densities. J. Amer. Stati. Assoc. 85, 398--409.Google ScholarCross Ref
Gelman, A., Carlin, J., Stern, H., and Rubin, D. 1995. Bayesian Data Analysis. Chapman & Hall, London, UK.Google Scholar
Geman, S., and Geman, D. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Patt. Anal. Mach. Intell. 6, 721--741.Google ScholarDigital Library
Goldberg, A., and Tarjan, R. 1986. A new approach to the maximum flow problem. JACM 35, 4, 921--940. Google ScholarDigital Library
Goldwater, S., Griffiths, T., and Johnson, M. 2006a. Contextual dependencies in unsupervised word segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 673--680. Google ScholarDigital Library
Goldwater, S., Griffiths, T., and Johnson, M. 2006b. Interpolating between types and tokens by estimating power-law generators. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 459--467.Google Scholar
Griffiths, T., and Ghahramani, Z. 2006. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 475--482.Google Scholar
Griffiths, T., and Steyvers, M. 2004. Finding scientific topics. Proc. Nat. Acad. Sci. 101, 5228--5235.Google ScholarCross Ref
Griffiths, T., and Steyvers, M. 2006. Probabilistic topic models. In Latent Semantic Analysis: A Road to Meaning, Erlbaum, Hillsdale, NJ.Google Scholar
Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning. Springer, New York.Google Scholar
Heller, K., and Ghahramani, Z. 2005. Bayesian hierarchical clustering. In Proceedings of the 22nd International Conference on Machine Learning. ACM, New York, 297--304. Google ScholarDigital Library
Hjort, N., Holmes, C., Müller, P., and Walker, S. 2010. Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, Cambridge, UK.Google ScholarCross Ref
Hofmann, T. 1999a. The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data. In Proceedings of the 15th International Joint Conferences on Artificial Intelligence. Morgan-Kaufmann, San Francisco, CA, 682--687. Google ScholarDigital Library
Hofmann, T. 1999b. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 50--57. Google ScholarDigital Library
Johnson, M., Griffiths, T., and S., G. 2007. Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA, 641--648.Google Scholar
Johnson, N., and Kotz, S. 1977. Urn Models and Their Applications: An Approach to Modern Discrete Probability Theory. Wiley, New York.Google Scholar
Jordan, M. I. 2000. Graphical models. Stat. Sci. 19, 140--155.Google ScholarCross Ref
Kass, R., and Raftery, A. 1995. Bayes factors. J. American Statistical Association 90, 773--795.Google ScholarCross Ref
Koller, D., and Sahami, M. 1997. Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 170--178. Google ScholarDigital Library
Krapivsky, P., and Redner, S. 2001. Organization of growing random networks. Phys. Rev. E 63, 6.Google ScholarCross Ref
Kurihara, K., Welling, M., and Vlassis, N. 2007. Accelerated variational Dirichlet process mixtures. In Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA, 761--768.Google Scholar
Larsen, B., and Aone, C. 1999. Fast and effective text mining using linear-time document clustering. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 16--22. Google ScholarDigital Library
Lauritzen, S. L. 1996. Graphical Models. Oxford University Press, Oxford, UK.Google Scholar
Li, W., Blei, D., and McCallum, A. 2007. Nonparametric Bayes pachinko allocation. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence. AUAI Press, Menlo Park, CA.Google Scholar
Liang, P., Petrov, S., Klein, D., and Jordan, M. 2007. The infinite PCFG using hierarchical Dirichlet processes. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Stroudsburg, PA, 688--697.Google Scholar
Liu, J. 1994. The collapsed Gibbs sampler in Bayesian computations with application to a gene regulation problem. J. Amer. Stat. Assoc. 89, 958--966.Google ScholarCross Ref
MacEachern, S., and Muller, P. 1998. Estimating mixture of Dirichlet process models. J. Computat. Graph. Stat. 7, 223--238.Google Scholar
Marlin, B. 2003. Modeling user rating profiles for collaborative filtering. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 627--634.Google Scholar
McCallum, A., Corrada-Emmanuel, A., and Wang, X. 2004. The author-recipient-topic model for topic and role discovery in social networks: Experiments with Enron and academic email. Tech. rep., University of Massachusetts, Amherst.Google Scholar
McCallum, A., Nigam, K., Rennie, J., and Seymore, K. 1999. Building domain-specific search engines with machine learning techniques. In Proceedings of the AAAI Spring Symposium on Intelligent Agents in Cyberspace. AAAI Press, Menlo Park, CA. Google ScholarDigital Library
Milch, B., Marthi, B., Sontag, D., Ong, D., and Kobolov, A. 2005. Approximate inference for infinite contingent Bayesian networks. In Proceedings of 10th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.Google Scholar
Mimno, D., and McCallum, A. 2007. Organizing the OCA: Learning faceted subjects from a library of digital books. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital libraries. ACM, New York, 376--385. Google ScholarDigital Library
Neal, R. 2000. Markov chain sampling methods for Dirichlet process mixture models. J. Computat. Graph. Stat. 9, 249--265.Google Scholar
Newman, D., Chemudugunta, C., and Smyth, P. 2006. Statistical entity-topic models. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 680--686. Google ScholarDigital Library
Pasula, H., and Russell, S. 2001. Approximate inference for first-order probabilistic languages. In Proceedings of the 17th International Joint Conferences on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA, 741--748. Google ScholarDigital Library
Pitman, J. 2002. Combinatorial Stochastic Processes. Lecture Notes for St. Flour Summer School. Springer-Verlag, New York, NY.Google Scholar
Poole, D. 2007. Logical generative models for probabilistic reasoning about existence, roles and identity. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA, 1271--1279. Google ScholarDigital Library
Pritchard, J., Stephens, M., and Donnelly, P. 2000. Inference of population structure using multilocus genotype data. Genetics 155, 945--959.Google ScholarCross Ref
Robert, C., and Casella, G. 2004. Monte Carlo Statistical Methods. Springer-Verlag, New York. Google ScholarDigital Library
Rodríguez, A., Dunson, D. B., and Gelfand, A. E. 2008. The nested Dirichlet process. J. Amer. Stat. Assoc. 103, 1131--1154.Google ScholarCross Ref
Rosen-Zvi, M., Griffiths, T., Steyvers, M., and Smith, P. 2004. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, Menlo Park, CA, 487--494. Google ScholarDigital Library
Sanderson, M., and Croft, B. 1999. Deriving concept hierarchies from text. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 206--213. Google ScholarDigital Library
Sethuraman, J. 1994. A constructive definition of Dirichlet priors. Stat. Sin. 4, 639--650.Google Scholar
Stoica, E., and Hearst, M. 2004. Nearly-automated metadata hierarchy creation. In Companion Proceedings of HLT-NAACL. Boston, MA. Google ScholarDigital Library
Sudderth, E., Torralba, A., Freeman, W., and Willsky, A. 2005. Describing visual scenes using transformed Dirichlet processes. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 1297--1306.Google Scholar
Teh, Y., Gorur, D., and Ghahramani, Z. 2007. Stick-breaking construction for the Indian buffet process. In Proceedings of 11th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.Google Scholar
Teh, Y., Jordan, M., Beal, M., and Blei, D. 2007. Hierarchical Dirichlet processes. J. Amer. Stat. Assoc. 101, 1566--1581.Google ScholarCross Ref
Thibaux, R., and Jordan, M. 2007. Hierarchical beta processes and the Indian buffet process. In Proceedings of 11th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.Google Scholar
Titterington, D., Smith, A., and Makov, E. 1985. Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester, UK.Google Scholar
Vaithyanathan, S., and Dom, B. 2000. Model-based hierarchical clustering. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufmann, San Francisco, CA, 599--608. Google ScholarDigital Library
Xing, E., Jordan, M., and Sharan, R. 2007. Bayesian haplotype inference via the Dirichlet process. J. Comput. Biol. 14, 267--284.Google ScholarCross Ref
Zamir, O., and Etzioni, O. 1998. Web document clustering: A feasibility demonstration. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 46--54. Google ScholarDigital Library

Index Terms

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Mathematics of computing
  1. Probability and statistics
    1. Stochastic processes

Recommendations

Modeling topic hierarchies with the recursive chinese restaurant process
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Topic models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet processes (HDP) are simple solutions to discover topics from a set of unannotated documents. While they are simple and popular, a major shortcoming of LDA and HDP is that ...
Read More
Variational inference for the nested Chinese restaurant process
NIPS'09: Proceedings of the 22nd International Conference on Neural Information Processing Systems

The nested Chinese restaurant process (nCRP) is a powerful nonparametric Bayesian model for learning tree-based hierarchies from data. Since its posterior distribution is intractable, current inference methods have all relied on MCMC sampling. In this ...
Read More
Hierarchical topic models and the nested chinese restaurant process
NIPS'03: Proceedings of the 16th International Conference on Neural Information Processing Systems

We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a ...
Read More

Reviews

Reviewer: Grigore Albeanu

Topic modeling and text analysis are important tasks of any methodology for hierarchical analysis of document collections. Text categorization, text compression, and text summarization are well-known applications, but predicted structures obtained through machine learning can also be used in computational biology, image understanding, and language modeling for speech recognition, to name just a few examples. During the last decade, researchers in machine learning proposed parametric and nonparametric models. In this paper, the authors describe a new algorithm for building a topic hierarchy of a document collection. The authors propose an unsupervised learning method based on the nested Chinese restaurant process (nCRP) and Bayesian nonparametric (BNP) inference, and analyze the method using three different collections of documents from different domains. The paper consists of seven sections, with more than 70 references. Section 1 presents the state of the art related to the scientific context for solving the mentioned problem, and Section 2 describes stochastic process theory and BNP statistics. Readers should be familiar with the Dirichlet and beta distributions, the CRP, the P?lya urn distribution, the stick-breaking processes, the Griffiths-Engen-McCloskey (GEM) distribution, and their connections to random partitions on integers and flexible clustering strategies. Section 3 introduces nCRP and discusses "how similar ideas can be used to define a probability distribution on infinitely deep, infinitely branching trees." In Section 4, based on nCRP, the authors extend the latent Dirichlet allocation (LDA) topic model to the hierarchical version (hLDA). Such a generative process defines a probability distribution across possible corpora of documents. Related work is discussed at the end of the section. Next, the authors develop an algorithm based on Markov chain Monte Carlo (MCMC) and Gibbs sampling. Section 5 analyzes the algorithm using level allocation, the path, hyperparameters, convergence, and the mode. Section 6 presents examples and empirical results, for both simulated and real text data. Section 7 is dedicated to discussions and conclusions on defining prior distributions of trees, based on nCRP and hLDA, and developing a BNP methodology for analyzing document collections. While the content of the collections varies significantly, the statistical principles behind the proposed model allow for the recovery of meaningful sets of topics at multiple levels of abstraction, using trees. Although this paper is a valuable contribution to the field, there is a need for further work on hyperparameters and convergence speed. The paper is long, with some misprints and omissions, and the list of references does not meet the journal's typical standards. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Journal of the ACM Volume 57, Issue 2
January 2010
248 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/1667053
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 February 2010
- Accepted: 1 August 2009
- Revised: 1 February 2008
- Received: 1 June 2007
Published in jacm Volume 57, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bayesian nonparametric statistics
unsupervised learning
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 384
  Total Citations
  View Citations
- 4,354
  Total Downloads
- Downloads (Last 12 months)143
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Modeling topic hierarchies with the recursive chinese restaurant process

Variational inference for the nested Chinese restaurant process

Hierarchical topic models and the nested chinese restaurant process

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Modeling topic hierarchies with the recursive chinese restaurant process

Variational inference for the nested Chinese restaurant process

Hierarchical topic models and the nested chinese restaurant process

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media