research-article

Connections between the lines: augmenting social networks with text

Authors:
Jonathan Chang

Princeton University, Princeton, NJ, USA

Princeton University, Princeton, NJ, USA
View Profile

,
Jordan Boyd-Graber

Princeton University, Princeton, NJ, USA

Princeton University, Princeton, NJ, USA
View Profile

,
David M. Blei

Princeton University, Princeton, NJ, USA

Princeton University, Princeton, NJ, USA
View Profile

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningJune 2009Pages 169–178https://doi.org/10.1145/1557019.1557044

Published:28 June 2009Publication History

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 169–178

ABSTRACT

Network data is ubiquitous, encoding collections of relationships between entities such as people, places, genes, or corporations. While many resources for networks of interesting entities are emerging, most of these can only annotate connections in a limited fashion. Although relationships between entities are rich, it is impractical to manually devise complete characterizations of these relationships for every pair of entities on large, real-world corpora.

In this paper we present a novel probabilistic topic model to analyze text corpora and infer descriptions of its entities and of relationships between those entities. We develop variational methods for performing approximate inference on our model and demonstrate that our model can be practically deployed on large corpora such as Wikipedia. We show qualitatively and quantitatively that our model can construct and annotate graphs of relationships and make useful predictions.

Supplemental Material

p169-chang.mp4

mp4

103.2 MB

Download

References

E. Agichtein and L. Gravano. Querying text databases for efficient information extraction. Data Engineering, International Conference on, 0:113, 2003.Google Scholar
A. Anagnostopoulos, R. Kumar, and M. Mahdian. Influence and correlation in social networks. KDD 2008, 2008. Google ScholarDigital Library
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI 2007, 2007.Google ScholarDigital Library
I. Bhattacharya, S. Godbole, and S. Joshi. Structured entity identification and document categorization: Two tasks with one joint model. KDD 2008, 2008. Google ScholarDigital Library
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
D. Cai, Z. Shao, X. He, X. Yan, and J. Han. Mining hidden community in heterogeneous social networks. LinkKDD 2005, Aug 2005. Google ScholarDigital Library
A. Culotta, R. Bekkerman, and A. McCallum. Extracting social networks and contact information from email and the web. AAAI 2005, 2005.Google Scholar
D. Davidov, A. Rappoport, and M. Koppel. Fully unsupervised discovery of concept-specific relationships by web mining. In ACL, 2007.Google Scholar
C. Diehl, G. M. Namata, and L. Getoor. Relationship identification for social network discovery. In AAAI 2007, July 2007. Google ScholarDigital Library
B. Efron. Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78(382), 1983.Google ScholarCross Ref
D. Gibson, J. Kleinberg, and P. Raghavan. Inferring web communities from link topology. HYPERTEXT 1998, May 1998. Google ScholarDigital Library
T. Hofmann. Probabilistic latent semantic indexing. SIGIR 1999, 1999. Google ScholarDigital Library
M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Oct 1999.Google Scholar
S. Katrenko and P. Adriaans. Learning relations from biomedical corpora using dependency trees. Lecture Notes in Computer Science, 2007.Google Scholar
J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. KDD 2008, 2008. Google ScholarDigital Library
J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Statistical properties of community structure in large social and information networks. WWW 2008, 2008. Google ScholarDigital Library
A. McCallum, A. Corrada-Emmanuel, and X. Wang. Topic and role discovery in social networks. IJCAI 2005, 2005.Google Scholar
A. McGovern, L. Friedland, M. Hay, B. Gallagher, A. Fast, J. Neville, and D. Jensen. Exploiting relational structure to understand publication patterns in high-energy physics. ACM SIGKDD Explorations Newsletter, 5(2), Dec 2003. Google ScholarDigital Library
E. Meeds, Z. Ghahramani, R. Neal, and S. Roweis. Modeling dyadic data with binary latent factors. NIPS 2007, 2007.Google Scholar
Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. WWW 2008, Apr 2008. Google ScholarDigital Library
Q. Mei, D. Xin, H. Cheng, J. Han, and C. Zhai. Semantic annotation of frequent patterns. KDD 2007, 1(3), 2007. Google ScholarDigital Library
R. Nallapati, A. Ahmed, E. P. Xing, and W. W. Cohen. Joint latent topic models for text and citations. KDD 2008, 2008. Google ScholarDigital Library
O. J. Nave. Nave's Topical Bible. Thomas Nelson, 2003.Google Scholar
D. Newman, C. Chemudugunta, and P. Smyth. Statistical entity-topic models. In KDD 2006, pages 680--686, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
M. E. J. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 2006.Google ScholarCross Ref
T. Ohta, Y. Tateisi, and J.-D. Kim. Genia corpus: an annotated research abstract corpus in molecular biology domain. In HLT 2008, San Diego, USA, 2002. Google ScholarDigital Library
M. Rabbat, M. Figueiredo, and R. Nowak. Inferring network structure from co-occurrences. NIPS 2006, 2006.Google Scholar
M. Rosen-Zvi, T. Griffiths, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In AUAI 2004, pages 487--494, Arlington, Virginia, United States, 2004. AUAI Press. Google ScholarDigital Library
S. Sahay, S. Mukherjea, E. Agichtein, E. Garcia, S. Navathe, and A. Ram. Discovering semantic biomedical relations utilizing the web. KDD 2008, 2(1), Mar 2008. Google ScholarDigital Library
M. Steyvers and T. Griffiths. Probabilistic topic models. Handbook of Latent Semantic Analysis, 2007.Google Scholar
L. Tanabe, N. Xie, L. H. Thom, W. Matten, and W. J. Wilbur. Genetag: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics, 6 Suppl 1, 2005.Google Scholar
B. Taskar, M.-F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. NIPS 2003, 2003.Google Scholar
X. Wang, N. Mohanty, and A. McCallum. Group and topic discovery from relations and text. Proceedings of the 3rd international workshop on Link discovery, 2005. Google ScholarDigital Library
S. Wasserman and P. Pattison. Logit models and logistic regressions for social networks: I. an introduction to markov graphs and p*. Psychometrika, 1996.Google ScholarCross Ref
D. Zhou, S. Zhu, K. Yu, X. Song, B. Tseng, H. Zha, and C. Giles. Learning multiple graphs for document recommendations. WWW 2008, Apr 2008. Google ScholarDigital Library

Index Terms

Connections between the lines: augmenting social networks with text
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

An Introduction to Variational Methods for Graphical Models

This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMR-DT database, ...
Read More
Robust Bayesian clustering

A new variational Bayesian learning algorithm for Student-t mixture models is introduced. This algorithm leads to (i) robust density estimation, (ii) robust clustering and (iii) robust automatic model selection. Gaussian mixture models are learning ...
Read More
Bayesian parameter estimation via variational methods

We consider a logistic regression model with a Gaussian prior distribution over the parameters. We show that an accurate variational transformation can be used to obtain a closed form approximation to the posterior distribution of the parameters thereby ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
June 2009
1426 pages
ISBN:9781605584959
DOI:10.1145/1557019
General Chairs:
John Elder
Elder Research, Inc., USA
,
Françoise Soulié Fogelman
KXEN, France
,
Program Chairs:
Peter Flach
University of Bristol, UK
,
Mohammed Zaki
RPI, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graphical models
social network learning
statistical topic models
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 56
  Total Citations
  View Citations
- 1,318
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Connections between the lines: augmenting social networks with text

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

An Introduction to Variational Methods for Graphical Models

Robust Bayesian clustering

Bayesian parameter estimation via variational methods

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Connections between the lines: augmenting social networks with text

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

An Introduction to Variational Methods for Graphical Models

Robust Bayesian clustering

Bayesian parameter estimation via variational methods

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media