skip to main content
10.1145/1557019.1557044acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Connections between the lines: augmenting social networks with text

Published:28 June 2009Publication History

ABSTRACT

Network data is ubiquitous, encoding collections of relationships between entities such as people, places, genes, or corporations. While many resources for networks of interesting entities are emerging, most of these can only annotate connections in a limited fashion. Although relationships between entities are rich, it is impractical to manually devise complete characterizations of these relationships for every pair of entities on large, real-world corpora.

In this paper we present a novel probabilistic topic model to analyze text corpora and infer descriptions of its entities and of relationships between those entities. We develop variational methods for performing approximate inference on our model and demonstrate that our model can be practically deployed on large corpora such as Wikipedia. We show qualitatively and quantitatively that our model can construct and annotate graphs of relationships and make useful predictions.

Skip Supplemental Material Section

Supplemental Material

p169-chang.mp4

mp4

103.2 MB

References

  1. E. Agichtein and L. Gravano. Querying text databases for efficient information extraction. Data Engineering, International Conference on, 0:113, 2003.Google ScholarGoogle Scholar
  2. A. Anagnostopoulos, R. Kumar, and M. Mahdian. Influence and correlation in social networks. KDD 2008, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI 2007, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Bhattacharya, S. Godbole, and S. Joshi. Structured entity identification and document categorization: Two tasks with one joint model. KDD 2008, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Cai, Z. Shao, X. He, X. Yan, and J. Han. Mining hidden community in heterogeneous social networks. LinkKDD 2005, Aug 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Culotta, R. Bekkerman, and A. McCallum. Extracting social networks and contact information from email and the web. AAAI 2005, 2005.Google ScholarGoogle Scholar
  8. D. Davidov, A. Rappoport, and M. Koppel. Fully unsupervised discovery of concept-specific relationships by web mining. In ACL, 2007.Google ScholarGoogle Scholar
  9. C. Diehl, G. M. Namata, and L. Getoor. Relationship identification for social network discovery. In AAAI 2007, July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Efron. Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78(382), 1983.Google ScholarGoogle ScholarCross RefCross Ref
  11. D. Gibson, J. Kleinberg, and P. Raghavan. Inferring web communities from link topology. HYPERTEXT 1998, May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Hofmann. Probabilistic latent semantic indexing. SIGIR 1999, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Oct 1999.Google ScholarGoogle Scholar
  14. S. Katrenko and P. Adriaans. Learning relations from biomedical corpora using dependency trees. Lecture Notes in Computer Science, 2007.Google ScholarGoogle Scholar
  15. J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. KDD 2008, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Statistical properties of community structure in large social and information networks. WWW 2008, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. McCallum, A. Corrada-Emmanuel, and X. Wang. Topic and role discovery in social networks. IJCAI 2005, 2005.Google ScholarGoogle Scholar
  18. A. McGovern, L. Friedland, M. Hay, B. Gallagher, A. Fast, J. Neville, and D. Jensen. Exploiting relational structure to understand publication patterns in high-energy physics. ACM SIGKDD Explorations Newsletter, 5(2), Dec 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Meeds, Z. Ghahramani, R. Neal, and S. Roweis. Modeling dyadic data with binary latent factors. NIPS 2007, 2007.Google ScholarGoogle Scholar
  20. Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. WWW 2008, Apr 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Q. Mei, D. Xin, H. Cheng, J. Han, and C. Zhai. Semantic annotation of frequent patterns. KDD 2007, 1(3), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Nallapati, A. Ahmed, E. P. Xing, and W. W. Cohen. Joint latent topic models for text and citations. KDD 2008, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O. J. Nave. Nave's Topical Bible. Thomas Nelson, 2003.Google ScholarGoogle Scholar
  24. D. Newman, C. Chemudugunta, and P. Smyth. Statistical entity-topic models. In KDD 2006, pages 680--686, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. E. J. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 2006.Google ScholarGoogle ScholarCross RefCross Ref
  26. T. Ohta, Y. Tateisi, and J.-D. Kim. Genia corpus: an annotated research abstract corpus in molecular biology domain. In HLT 2008, San Diego, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Rabbat, M. Figueiredo, and R. Nowak. Inferring network structure from co-occurrences. NIPS 2006, 2006.Google ScholarGoogle Scholar
  28. M. Rosen-Zvi, T. Griffiths, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In AUAI 2004, pages 487--494, Arlington, Virginia, United States, 2004. AUAI Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Sahay, S. Mukherjea, E. Agichtein, E. Garcia, S. Navathe, and A. Ram. Discovering semantic biomedical relations utilizing the web. KDD 2008, 2(1), Mar 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Steyvers and T. Griffiths. Probabilistic topic models. Handbook of Latent Semantic Analysis, 2007.Google ScholarGoogle Scholar
  31. L. Tanabe, N. Xie, L. H. Thom, W. Matten, and W. J. Wilbur. Genetag: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics, 6 Suppl 1, 2005.Google ScholarGoogle Scholar
  32. B. Taskar, M.-F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. NIPS 2003, 2003.Google ScholarGoogle Scholar
  33. X. Wang, N. Mohanty, and A. McCallum. Group and topic discovery from relations and text. Proceedings of the 3rd international workshop on Link discovery, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Wasserman and P. Pattison. Logit models and logistic regressions for social networks: I. an introduction to markov graphs and p*. Psychometrika, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  35. D. Zhou, S. Zhu, K. Yu, X. Song, B. Tseng, H. Zha, and C. Giles. Learning multiple graphs for document recommendations. WWW 2008, Apr 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Connections between the lines: augmenting social networks with text

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
      June 2009
      1426 pages
      ISBN:9781605584959
      DOI:10.1145/1557019

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 June 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader