ABSTRACT
The primary purpose of news articles is to convey information about who, what, when and where. But learning and summarizing these relationships for collections of thousands to millions of articles is difficult. While statistical topic models have been highly successful at topically summarizing huge collections of text documents, they do not explicitly address the textual interactions between who/where, i.e. named entities (persons, organizations, locations) and what, i.e. the topics. We present new graphical models that directly learn the relationship between topics discussed in news articles and entities mentioned in each article. We show how these entity-topic models, through a better understanding of the entity-topic relationships, are better at making predictions about entities.
- D. Blei and M. I. Jordan. Modeling annotated data. In Proceedings of the Annual Conference on Research and Development in Information Retrieval (SIGIR03), 2003. Google ScholarDigital Library
- D. Blei and J. Lafferty. Correlated topic models. In Neural Information Processing Systems, volume 18, 2006.Google Scholar
- D. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- E. Brill. Some advances in transformation-based part of speech tagging. National Conference on Artificial Intelligence, 1994. Google ScholarDigital Library
- W. Buntine, J. Lofström, J. Perkiö, S. Perttu, V. Poroshin, T. Silander, H. Tirri, A. Tuominen, and V. Tuulos. A scalable topic-based open source search engine. In IEEE/WIC/ACM International Conference on Web Intelligence, pages 228--234, 2004. Google ScholarDigital Library
- D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In Advances in Neural Information Processing Systems 13, pages 430--436. MIT Press, 2001.Google Scholar
- E. Erosheva, S. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences, 101:5220--5227, 2004.Google ScholarCross Ref
- T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101:5228--5235, 2004.Google ScholarCross Ref
- T. L. Griffiths, M. Steyvers, D. Blei, and J. B. Tenenbaum. Integrating topics and syntax. In Advances in Neural Information Processing Systems 17. MIT Press, Cambridge, MA, 2005.Google Scholar
- J. O. Madadhain, J. Hutchins, and P. Smyth. Prediction and ranking algorithms for event-based network data. In ACM SIGKDD Explorations: Special Issue on Link Mining, volume 7, pages 23--30, 2006. Google ScholarDigital Library
- A. McCallum, A. Corrada Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in social networks. Technical Report UM-CS-2004-096, Department of Computer Science, University of Massachusetts, 2004.Google Scholar
- A. McCallum and B. Wellner. Conditional models of identity uncertainty with applications to noun coreference. In Neural Information Processing Systems, 2004.Google Scholar
- M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proceedings of the Tenth ACM International Conference on Knowledge Discovery and Data Mining (ACM Press), pages 306--315, 2004. Google ScholarDigital Library
- J. Zhu, A. Goncalves, and V. Uren. Adaptive named entity recognition for social network analysis and domain ontology maintenance. In Proceedings of 3rd Professional Knowledge Management Conference, Springer, LNAI, 2005.Google Scholar
Index Terms
- Statistical entity-topic models
Recommendations
Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementAspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are ...
Probabilistic author-topic models for information discovery
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data miningWe propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is represented by a probability distribution over topics,...
Extractive text summarization using clustering-based topic modeling
AbstractText summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
Comments