Abstract
We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.
- Airoldi, E., Blei, D., Fienberg, S., and Xing, E. 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981--2014. Google ScholarDigital Library
- Albert, R., and Barabasi, A. 2002. Statistical mechanics of complex networks. Rev. Mod. Phy. 74, 1, 47--97.Google ScholarCross Ref
- Aldous, D. 1985. Exchangeability and related topics. In Ecole d'Eté de Probabilités de Saint-Flour, XIII—1983. Springer-Verlag, Berlin, Germany, 1--198.Google Scholar
- Antoniak, C. 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 2, 1152--1174.Google ScholarCross Ref
- Barabasi, A., and Reka, A. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.Google Scholar
- Bernardo, J., and Smith, A. 1994. Bayesian Theory. Wiley, Chichester, UK.Google Scholar
- Billingsley, P. 1995. Probability and Measure. Wiley, New York.Google Scholar
- Blackwell, D., and MacQueen, J. B. 1973. Ferguson distributions via Pólya urn schemes. Ann. Stat. 1, 353--355.Google ScholarCross Ref
- Blei, D., Griffiths, T., Jordan, M., and Tenenbaum, J. 2003a. Hierarchical topic models and the nested Chinese restaurant process. In Adv. Neural Inf. Proc. Syst. 16. MIT Press, Cambridge, MA, 17--24.Google Scholar
- Blei, D., and Jordan, M. 2003. Modeling annotated data. In Proceedings of the 26th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 127--134. Google ScholarDigital Library
- Blei, D., and Jordan, M. 2005. Variational inference for Dirichlet process mixtures. J. Bayesian Anal. 1, 121--144.Google ScholarCross Ref
- Blei, D., and Lafferty, J. 2006. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning. ACM, New York, 113--120. Google ScholarDigital Library
- Blei, D., and Lafferty, J. 2007. A correlated topic model of Science. Ann. Appl. Stat. 1, 17--35.Google ScholarCross Ref
- Blei, D., and Lafferty, J. 2009. Topic models. In Text Mining: Theory and Applications. Taylor and Francis, London, UK.Google Scholar
- Blei, D., Ng, A., and Jordan, M. 2003b. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022. Google ScholarCross Ref
- Chakrabarti, S., Dom, B., Agrawal, R., and Raghavan, P. 1998. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB J. 7, 163--178. Google ScholarDigital Library
- Cimiano, P., Hotho, A., and Staab, S. 2005. Learning concept hierarchies from text corpora using formal concept analysis. J. Artifi. Intelli. Res. 24, 305--339. Google ScholarDigital Library
- Deerwester, S., Dumais, S., Landauer, T., Furnas, G., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soci. Inf. Sci. 6, 391--407.Google ScholarCross Ref
- Devroye, L., Györfi, L., and Lugosi, G. 1996. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York.Google Scholar
- Dietz, L., Bickel, S., and Scheffer, T. 2007. Unsupervised prediction of citation influences. In Proceedings of the 24th International Conference on Machine Learning. ACM, New York, 233--240. Google ScholarDigital Library
- Drinea, E., Enachesu, M., and Mitzenmacher, M. 2006. Variations on random graph models for the web. Tech. Rep. TR-06-01, Harvard University.Google Scholar
- Duan, J., Guindani, M., and Gelfand, A. 2007. Generalized spatial Dirichlet process models. Biometrika 94, 809--825.Google ScholarCross Ref
- Duda, R., Hart, P., and Stork, D. 2000. Pattern Classification. Wiley-Interscience, New York. Google ScholarDigital Library
- Dumais, S., and Chen, H. 2000. Hierarchical classification of web content. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 256--263. Google ScholarDigital Library
- Escobar, M., and West, M. 1995. Bayesian density estimation and inference using mixtures. J. Amer. Stat. Assoc. 90, 577--588.Google ScholarCross Ref
- Fei-Fei, L., and Perona, P. 2005. A Bayesian hierarchical model for learning natural scene categories. IEEE Comput. Visi. Patt. Rec., 524--531. Google ScholarDigital Library
- Ferguson, T. 1973. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209--230.Google ScholarCross Ref
- Gelfand, A., and Smith, A. 1990. Sampling based approaches to calculating marginal densities. J. Amer. Stati. Assoc. 85, 398--409.Google ScholarCross Ref
- Gelman, A., Carlin, J., Stern, H., and Rubin, D. 1995. Bayesian Data Analysis. Chapman & Hall, London, UK.Google Scholar
- Geman, S., and Geman, D. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Patt. Anal. Mach. Intell. 6, 721--741.Google ScholarDigital Library
- Goldberg, A., and Tarjan, R. 1986. A new approach to the maximum flow problem. JACM 35, 4, 921--940. Google ScholarDigital Library
- Goldwater, S., Griffiths, T., and Johnson, M. 2006a. Contextual dependencies in unsupervised word segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 673--680. Google ScholarDigital Library
- Goldwater, S., Griffiths, T., and Johnson, M. 2006b. Interpolating between types and tokens by estimating power-law generators. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 459--467.Google Scholar
- Griffiths, T., and Ghahramani, Z. 2006. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 475--482.Google Scholar
- Griffiths, T., and Steyvers, M. 2004. Finding scientific topics. Proc. Nat. Acad. Sci. 101, 5228--5235.Google ScholarCross Ref
- Griffiths, T., and Steyvers, M. 2006. Probabilistic topic models. In Latent Semantic Analysis: A Road to Meaning, Erlbaum, Hillsdale, NJ.Google Scholar
- Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning. Springer, New York.Google Scholar
- Heller, K., and Ghahramani, Z. 2005. Bayesian hierarchical clustering. In Proceedings of the 22nd International Conference on Machine Learning. ACM, New York, 297--304. Google ScholarDigital Library
- Hjort, N., Holmes, C., Müller, P., and Walker, S. 2010. Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, Cambridge, UK.Google ScholarCross Ref
- Hofmann, T. 1999a. The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data. In Proceedings of the 15th International Joint Conferences on Artificial Intelligence. Morgan-Kaufmann, San Francisco, CA, 682--687. Google ScholarDigital Library
- Hofmann, T. 1999b. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 50--57. Google ScholarDigital Library
- Johnson, M., Griffiths, T., and S., G. 2007. Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA, 641--648.Google Scholar
- Johnson, N., and Kotz, S. 1977. Urn Models and Their Applications: An Approach to Modern Discrete Probability Theory. Wiley, New York.Google Scholar
- Jordan, M. I. 2000. Graphical models. Stat. Sci. 19, 140--155.Google ScholarCross Ref
- Kass, R., and Raftery, A. 1995. Bayes factors. J. American Statistical Association 90, 773--795.Google ScholarCross Ref
- Koller, D., and Sahami, M. 1997. Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 170--178. Google ScholarDigital Library
- Krapivsky, P., and Redner, S. 2001. Organization of growing random networks. Phys. Rev. E 63, 6.Google ScholarCross Ref
- Kurihara, K., Welling, M., and Vlassis, N. 2007. Accelerated variational Dirichlet process mixtures. In Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA, 761--768.Google Scholar
- Larsen, B., and Aone, C. 1999. Fast and effective text mining using linear-time document clustering. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 16--22. Google ScholarDigital Library
- Lauritzen, S. L. 1996. Graphical Models. Oxford University Press, Oxford, UK.Google Scholar
- Li, W., Blei, D., and McCallum, A. 2007. Nonparametric Bayes pachinko allocation. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence. AUAI Press, Menlo Park, CA.Google Scholar
- Liang, P., Petrov, S., Klein, D., and Jordan, M. 2007. The infinite PCFG using hierarchical Dirichlet processes. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Stroudsburg, PA, 688--697.Google Scholar
- Liu, J. 1994. The collapsed Gibbs sampler in Bayesian computations with application to a gene regulation problem. J. Amer. Stat. Assoc. 89, 958--966.Google ScholarCross Ref
- MacEachern, S., and Muller, P. 1998. Estimating mixture of Dirichlet process models. J. Computat. Graph. Stat. 7, 223--238.Google Scholar
- Marlin, B. 2003. Modeling user rating profiles for collaborative filtering. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 627--634.Google Scholar
- McCallum, A., Corrada-Emmanuel, A., and Wang, X. 2004. The author-recipient-topic model for topic and role discovery in social networks: Experiments with Enron and academic email. Tech. rep., University of Massachusetts, Amherst.Google Scholar
- McCallum, A., Nigam, K., Rennie, J., and Seymore, K. 1999. Building domain-specific search engines with machine learning techniques. In Proceedings of the AAAI Spring Symposium on Intelligent Agents in Cyberspace. AAAI Press, Menlo Park, CA. Google ScholarDigital Library
- Milch, B., Marthi, B., Sontag, D., Ong, D., and Kobolov, A. 2005. Approximate inference for infinite contingent Bayesian networks. In Proceedings of 10th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.Google Scholar
- Mimno, D., and McCallum, A. 2007. Organizing the OCA: Learning faceted subjects from a library of digital books. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital libraries. ACM, New York, 376--385. Google ScholarDigital Library
- Neal, R. 2000. Markov chain sampling methods for Dirichlet process mixture models. J. Computat. Graph. Stat. 9, 249--265.Google Scholar
- Newman, D., Chemudugunta, C., and Smyth, P. 2006. Statistical entity-topic models. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 680--686. Google ScholarDigital Library
- Pasula, H., and Russell, S. 2001. Approximate inference for first-order probabilistic languages. In Proceedings of the 17th International Joint Conferences on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA, 741--748. Google ScholarDigital Library
- Pitman, J. 2002. Combinatorial Stochastic Processes. Lecture Notes for St. Flour Summer School. Springer-Verlag, New York, NY.Google Scholar
- Poole, D. 2007. Logical generative models for probabilistic reasoning about existence, roles and identity. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA, 1271--1279. Google ScholarDigital Library
- Pritchard, J., Stephens, M., and Donnelly, P. 2000. Inference of population structure using multilocus genotype data. Genetics 155, 945--959.Google ScholarCross Ref
- Robert, C., and Casella, G. 2004. Monte Carlo Statistical Methods. Springer-Verlag, New York. Google ScholarDigital Library
- Rodríguez, A., Dunson, D. B., and Gelfand, A. E. 2008. The nested Dirichlet process. J. Amer. Stat. Assoc. 103, 1131--1154.Google ScholarCross Ref
- Rosen-Zvi, M., Griffiths, T., Steyvers, M., and Smith, P. 2004. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, Menlo Park, CA, 487--494. Google ScholarDigital Library
- Sanderson, M., and Croft, B. 1999. Deriving concept hierarchies from text. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 206--213. Google ScholarDigital Library
- Sethuraman, J. 1994. A constructive definition of Dirichlet priors. Stat. Sin. 4, 639--650.Google Scholar
- Stoica, E., and Hearst, M. 2004. Nearly-automated metadata hierarchy creation. In Companion Proceedings of HLT-NAACL. Boston, MA. Google ScholarDigital Library
- Sudderth, E., Torralba, A., Freeman, W., and Willsky, A. 2005. Describing visual scenes using transformed Dirichlet processes. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 1297--1306.Google Scholar
- Teh, Y., Gorur, D., and Ghahramani, Z. 2007. Stick-breaking construction for the Indian buffet process. In Proceedings of 11th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.Google Scholar
- Teh, Y., Jordan, M., Beal, M., and Blei, D. 2007. Hierarchical Dirichlet processes. J. Amer. Stat. Assoc. 101, 1566--1581.Google ScholarCross Ref
- Thibaux, R., and Jordan, M. 2007. Hierarchical beta processes and the Indian buffet process. In Proceedings of 11th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.Google Scholar
- Titterington, D., Smith, A., and Makov, E. 1985. Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester, UK.Google Scholar
- Vaithyanathan, S., and Dom, B. 2000. Model-based hierarchical clustering. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufmann, San Francisco, CA, 599--608. Google ScholarDigital Library
- Xing, E., Jordan, M., and Sharan, R. 2007. Bayesian haplotype inference via the Dirichlet process. J. Comput. Biol. 14, 267--284.Google ScholarCross Ref
- Zamir, O., and Etzioni, O. 1998. Web document clustering: A feasibility demonstration. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 46--54. Google ScholarDigital Library
Index Terms
- The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies
Recommendations
Modeling topic hierarchies with the recursive chinese restaurant process
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementTopic models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet processes (HDP) are simple solutions to discover topics from a set of unannotated documents. While they are simple and popular, a major shortcoming of LDA and HDP is that ...
Variational inference for the nested Chinese restaurant process
NIPS'09: Proceedings of the 22nd International Conference on Neural Information Processing SystemsThe nested Chinese restaurant process (nCRP) is a powerful nonparametric Bayesian model for learning tree-based hierarchies from data. Since its posterior distribution is intractable, current inference methods have all relied on MCMC sampling. In this ...
Hierarchical topic models and the nested chinese restaurant process
NIPS'03: Proceedings of the 16th International Conference on Neural Information Processing SystemsWe address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a ...
Comments