ABSTRACT
We present a probabilistic generative model of entity relationships and textual attributes that simultaneously discovers groups among the entities and topics among the corresponding text. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the words associated with certain relationships. Significantly, joint inference allows the discovery of groups to be guided by the emerging topics, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and 43 years of similar data from the United Nations. We show that in comparison with traditional, separate latent-variable models for words or Blockstructures for votes, the Group-Topic model's joint inference improves both the groups and topics discovered.
- D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In SIGKDD, 2000. Google ScholarDigital Library
- R. Bekkerman, R. E. Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In ICML, 2005. Google ScholarDigital Library
- I. Bhattacharya and L. Getoor. Deduplication and group detection using links. In LinkKDD, 2004.Google Scholar
- D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
- K. Carley. A theory of group stability. American Sociological Review, 56(3):331--354, 1991.Google ScholarCross Ref
- K. Carley. A comparison of artificial and human organizations. Journal of Economic Behavior and Organization, 56:175--191, 1996.Google ScholarCross Ref
- G. Cox and K. Poole. On measuring the partisanship in roll-call voting: The U.S. House of Represenatatives, 1887--1999. American Journal of Political Science, 46(1):477--489, 2002.Google ScholarCross Ref
- W. W. Denham, C. K. McDaniel, and J. R. Atkins. Aranda and Alyawarra kinship: A quantitative argument for a double helix model. American Ethnologist, 6(1):1--24, 1979.Google ScholarCross Ref
- I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In SIGKDD, 2003. Google ScholarDigital Library
- D. Fenn, O. Suleman, J. Efstathiou, and N. Johnson. How does Europe make its mind up? Connections, cliques, and compatibility between countries in the Eurovision song contest. arXiv:physics/0505071, 2005.Google Scholar
- S. Hix, A. Noury, and G. Roland. Power to the parties: Cohesion and competition in the European Parliament, 1979--2001. British Journal of Political Science, 35(2):209--234, 2005.Google ScholarCross Ref
- A. Jakulin and W. Buntine. Analyzing the US Senate in 2003: Similarities, networks, clusters and blocs, 2004.Google Scholar
- C. Kemp, T. L. Griffiths, and J. Tenenbaum. Discovering latent classes in relational data. Technical report, MIT CSAIL, 2004.Google Scholar
- D. Krackhardt and K. M. Carley. A PCANS model of structure in organization. In Int. Sym. on Command and Control Research and Technology, June 1998.Google Scholar
- J. Kubica, A. Moore, J. Schneider, and Y. Yang. Stochastic link and group detection. In AAAI, 2002. Google ScholarDigital Library
- A. McCallum, A. Corrada-Emanuel, and X. Wang. Topic and role discovery in social networks. In IJCAI, 2005. Google ScholarDigital Library
- K. Nowicki and T. A. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455), 2001.Google ScholarCross Ref
- A. Pajala, A. Jakulin, and W. Buntine. Parliamentary group and individual voting behavior in Finnish Parliamentin year 2003: A group cohesion and voting similarity analysis, 2004.Google Scholar
- M. Sparrow. The application of network analysis to criminal intelligence: an assessment of prospects. Social Networks, 13:251--274, 1991.Google ScholarCross Ref
- E. Voeten. Documenting votes in the UN General Assembly. http://home.gwu.edu/~voeten/UNVoting.htm_Toc82404232.Google Scholar
- S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.Google ScholarCross Ref
Index Terms
- Group and topic discovery from relations and text
Recommendations
Group and topic discovery from relations and their attributes
NIPS'05: Proceedings of the 18th International Conference on Neural Information Processing SystemsWe present a probabilistic generative model of entity relationships and their attributes that simultaneously discovers groups among the entities and topics among the corresponding textual attributes. Block-models of relationship data have been studied ...
Big topic modeling based on a two-level hierarchical latent Beta-Liouville allocation for large-scale data and parameter streaming
AbstractAs an extension to the standard symmetric latent Dirichlet allocation topic model, we implement asymmetric Beta-Liouville as a conjugate prior to the multinomial and therefore propose the maximum a posteriori for latent Beta-Liouville allocation ...
Probabilistic author-topic models for information discovery
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data miningWe propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is represented by a probability distribution over topics,...
Comments