Article

Statistical entity-topic models

Authors:
David Newman

Univ. of California, Irvine, CA

Univ. of California, Irvine, CA
View Profile

,
Chaitanya Chemudugunta

Univ. of California, Irvine, CA

Univ. of California, Irvine, CA
View Profile

,
Padhraic Smyth

Univ. of California, Irvine, CA

Univ. of California, Irvine, CA
View Profile

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2006Pages 680–686https://doi.org/10.1145/1150402.1150487

Published:20 August 2006Publication History

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 680–686

ABSTRACT

The primary purpose of news articles is to convey information about who, what, when and where. But learning and summarizing these relationships for collections of thousands to millions of articles is difficult. While statistical topic models have been highly successful at topically summarizing huge collections of text documents, they do not explicitly address the textual interactions between who/where, i.e. named entities (persons, organizations, locations) and what, i.e. the topics. We present new graphical models that directly learn the relationship between topics discussed in news articles and entities mentioned in each article. We show how these entity-topic models, through a better understanding of the entity-topic relationships, are better at making predictions about entities.

References

D. Blei and M. I. Jordan. Modeling annotated data. In Proceedings of the Annual Conference on Research and Development in Information Retrieval (SIGIR03), 2003. Google ScholarDigital Library
D. Blei and J. Lafferty. Correlated topic models. In Neural Information Processing Systems, volume 18, 2006.Google Scholar
D. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
E. Brill. Some advances in transformation-based part of speech tagging. National Conference on Artificial Intelligence, 1994. Google ScholarDigital Library
W. Buntine, J. Lofström, J. Perkiö, S. Perttu, V. Poroshin, T. Silander, H. Tirri, A. Tuominen, and V. Tuulos. A scalable topic-based open source search engine. In IEEE/WIC/ACM International Conference on Web Intelligence, pages 228--234, 2004. Google ScholarDigital Library
D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In Advances in Neural Information Processing Systems 13, pages 430--436. MIT Press, 2001.Google Scholar
E. Erosheva, S. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences, 101:5220--5227, 2004.Google ScholarCross Ref
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101:5228--5235, 2004.Google ScholarCross Ref
T. L. Griffiths, M. Steyvers, D. Blei, and J. B. Tenenbaum. Integrating topics and syntax. In Advances in Neural Information Processing Systems 17. MIT Press, Cambridge, MA, 2005.Google Scholar
J. O. Madadhain, J. Hutchins, and P. Smyth. Prediction and ranking algorithms for event-based network data. In ACM SIGKDD Explorations: Special Issue on Link Mining, volume 7, pages 23--30, 2006. Google ScholarDigital Library
A. McCallum, A. Corrada Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in social networks. Technical Report UM-CS-2004-096, Department of Computer Science, University of Massachusetts, 2004.Google Scholar
A. McCallum and B. Wellner. Conditional models of identity uncertainty with applications to noun coreference. In Neural Information Processing Systems, 2004.Google Scholar
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proceedings of the Tenth ACM International Conference on Knowledge Discovery and Data Mining (ACM Press), pages 306--315, 2004. Google ScholarDigital Library
J. Zhu, A. Goncalves, and V. Uren. Adaptive named entity recognition for social network analysis and domain ontology maintenance. In Proceedings of 3rd Professional Knowledge Management Conference, Springer, LNAI, 2005.Google Scholar

Index Terms

Statistical entity-topic models
1. Information systems
  1. Information systems applications
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic algorithms
    2. Probabilistic reasoning algorithms
      1. Markov-chain Monte Carlo methods
      2. Sequential Monte Carlo methods

Recommendations

Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Aspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are ...
Read More
Probabilistic author-topic models for information discovery
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is represented by a probability distribution over topics,...
Read More
Extractive text summarization using clustering-based topic modeling
Abstract
Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2006
986 pages
ISBN:1595933395
DOI:10.1145/1150402
Conference Chair:
Tina Eliassi-Rad
LLNL
,
General Chair:
Lyle Ungar
University of Pennsylvania
,
Program Chairs:
Mark Craven
University of Wisconsin
,
Dimitrios Gunopulos
University of California, Riverside
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
entity recognition
text modeling
topic modeling
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 95
  Total Citations
  View Citations
- 1,293
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Statistical entity-topic models

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon

Probabilistic author-topic models for information discovery

Extractive text summarization using clustering-based topic modeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Statistical entity-topic models

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon

Probabilistic author-topic models for information discovery

Extractive text summarization using clustering-based topic modeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media