skip to main content
10.1145/1150402.1150487acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Statistical entity-topic models

Published:20 August 2006Publication History

ABSTRACT

The primary purpose of news articles is to convey information about who, what, when and where. But learning and summarizing these relationships for collections of thousands to millions of articles is difficult. While statistical topic models have been highly successful at topically summarizing huge collections of text documents, they do not explicitly address the textual interactions between who/where, i.e. named entities (persons, organizations, locations) and what, i.e. the topics. We present new graphical models that directly learn the relationship between topics discussed in news articles and entities mentioned in each article. We show how these entity-topic models, through a better understanding of the entity-topic relationships, are better at making predictions about entities.

References

  1. D. Blei and M. I. Jordan. Modeling annotated data. In Proceedings of the Annual Conference on Research and Development in Information Retrieval (SIGIR03), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Blei and J. Lafferty. Correlated topic models. In Neural Information Processing Systems, volume 18, 2006.Google ScholarGoogle Scholar
  3. D. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Brill. Some advances in transformation-based part of speech tagging. National Conference on Artificial Intelligence, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. Buntine, J. Lofström, J. Perkiö, S. Perttu, V. Poroshin, T. Silander, H. Tirri, A. Tuominen, and V. Tuulos. A scalable topic-based open source search engine. In IEEE/WIC/ACM International Conference on Web Intelligence, pages 228--234, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In Advances in Neural Information Processing Systems 13, pages 430--436. MIT Press, 2001.Google ScholarGoogle Scholar
  7. E. Erosheva, S. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences, 101:5220--5227, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101:5228--5235, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  9. T. L. Griffiths, M. Steyvers, D. Blei, and J. B. Tenenbaum. Integrating topics and syntax. In Advances in Neural Information Processing Systems 17. MIT Press, Cambridge, MA, 2005.Google ScholarGoogle Scholar
  10. J. O. Madadhain, J. Hutchins, and P. Smyth. Prediction and ranking algorithms for event-based network data. In ACM SIGKDD Explorations: Special Issue on Link Mining, volume 7, pages 23--30, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. McCallum, A. Corrada Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in social networks. Technical Report UM-CS-2004-096, Department of Computer Science, University of Massachusetts, 2004.Google ScholarGoogle Scholar
  12. A. McCallum and B. Wellner. Conditional models of identity uncertainty with applications to noun coreference. In Neural Information Processing Systems, 2004.Google ScholarGoogle Scholar
  13. M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proceedings of the Tenth ACM International Conference on Knowledge Discovery and Data Mining (ACM Press), pages 306--315, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Zhu, A. Goncalves, and V. Uren. Adaptive named entity recognition for social network analysis and domain ontology maintenance. In Proceedings of 3rd Professional Knowledge Management Conference, Springer, LNAI, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Statistical entity-topic models

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
            August 2006
            986 pages
            ISBN:1595933395
            DOI:10.1145/1150402

            Copyright © 2006 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 August 2006

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader