skip to main content
10.1145/1835804.1835837acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

New perspectives and methods in link prediction

Published:25 July 2010Publication History

ABSTRACT

This paper examines important factors for link prediction in networks and provides a general, high-performance framework for the prediction task. Link prediction in sparse networks presents a significant challenge due to the inherent disproportion of links that can form to links that do form. Previous research has typically approached this as an unsupervised problem. While this is not the first work to explore supervised learning, many factors significant in influencing and guiding classification remain unexplored. In this paper, we consider these factors by first motivating the use of a supervised framework through a careful investigation of issues such as network observational period, generality of existing methods, variance reduction, topological causes and degrees of imbalance, and sampling approaches. We also present an effective flow-based predicting algorithm, offer formal bounds on imbalance in sparse network link prediction, and employ an evaluation method appropriate for the observed imbalance. Our careful consideration of the above issues ultimately leads to a completely general framework that outperforms unsupervised link prediction methods by more than 30% AUC.

Skip Supplemental Material Section

Supplemental Material

kdd2010_lichtenwalter_npml_01.mov

mov

72 MB

References

  1. L. Adamic and E. Adar. Friends and neighbors on the web. Social Networks, 25:211--230, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki. Link prediction using supervised learning. In Workshop on Link Discovery: Issues, Approaches and Apps., 2005.Google ScholarGoogle Scholar
  3. A.-L. Barab´asi, H. Jeong, Z. N´eda, E. Ravasz, A. Schubert, and T. Vicsek. Evolution of the social network of scientific collaboration. Physica A, 311(3-4):590--614, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  4. L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996. Google ScholarGoogle ScholarCross RefCross Ref
  5. L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. V. Chawla, K. W. Bowyer, L. O. Hall, and P. W. Kegelmeyer. Smote: Synthetic minority over-sampling technique. Journal of A.I. Research, 16:341--378, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. A. Cieslak and N. V. Chawla. Learning decision trees for unbalanced data. In Proc. of the ECML. Springer, 2008.Google ScholarGoogle Scholar
  8. L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39--43, 1953.Google ScholarGoogle ScholarCross RefCross Ref
  9. H. Kautz, B. Selman, and M. Shah. Referral web: combining social networks and collaborative filtering. Communications of the ACM, 40(3):63, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. E. Krebs. Mapping networks of terrorist cells. Connections, 24(3):43--52, 2002.Google ScholarGoogle Scholar
  11. D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019--1031, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  12. M. E. J. Newman. Clustering and preferential attachment in growing networks. Physical Review Letters E, 64, 2001.Google ScholarGoogle Scholar
  13. F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42(3):203--231, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. J. Rattigan and D. Jensen. The case for anomalous link discovery. SIGKDD Explorations Newsletter, 7(2):41--47, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. P. Stumpf, C. Wiuf, and R. M. May. Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc. of the Nat Acad. of Sci., 102(12):4221--4224, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  17. C. Wang, V. Satuluri, and S. Parthasarathy. Local probabilistic models for link prediction. In Proc. of the 2007 7th IEEE ICDM, pages 322--331, Washington, D.C., USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, California, USA, second edition, 2005.. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. New perspectives and methods in link prediction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
      July 2010
      1240 pages
      ISBN:9781450300551
      DOI:10.1145/1835804

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader