skip to main content
10.1145/2020408.2020512acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

It's who you know: graph mining using recursive structural features

Published:21 August 2011Publication History

ABSTRACT

Given a graph, how can we extract good features for the nodes? For example, given two large graphs from the same domain, how can we use information in one to do classification in the other (i.e., perform across-network classification or transfer learning on graphs)? Also, if one of the graphs is anonymized, how can we use information in one to de-anonymize the other? The key step in all such graph mining tasks is to find effective node features. We propose ReFeX (Recursive Feature eXtraction), a novel algorithm, that recursively combines local (node-based) features with neighborhood (egonet-based) features; and outputs regional features -- capturing "behavioral" information. We demonstrate how these powerful regional features can be used in within-network and across-network classification and de-anonymization tasks -- without relying on homophily, or the availability of class labels. The contributions of our work are as follows: (a) ReFeX is scalable and (b) it is effective, capturing regional ("behavioral") information in large graphs. We report experiments on real graphs from various domains with over 1M edges, where ReFeX outperforms its competitors on typical graph mining tasks like network classification and de-anonymization.

References

  1. L. Akoglu, M. McGlohon, and C. Faloutsos. Oddball: Spotting anomalies in weighted graphs. In PAKDD, pages 410--421, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, (401):130--131, 1999.Google ScholarGoogle Scholar
  3. A.-L. Barabasi, R. Albert, and H. Jeong. Scale-free characteristics of random networks: the topology of the world-wide web. Physica A: Statistical Mechanics and its Applications, 281(1-4):69--77, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, 2007.Google ScholarGoogle Scholar
  5. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SDM, Apr. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. W. Dai, Q. Yang, G.-R. Xue, and Y. Yu. Boosting for transfer learning. In ICML, pages 193--200, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. SIGCOMM, pages 251--262, Aug-Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Fei and J. Huan. Structure feature selection for graph classification. In CIKM, pages 991--1000, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Flake, S. Lawrence, C. L. Giles, and F. Coetzee. Self-organization and identification of web communities. IEEE Computer, 35(3), Mar. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Gallagher and T. Eliassi-Rad. Leveraging label-independent features for classification in sparsely labeled networks: An empirical study. Lecture Notes in Computer Science, 5498:1--19, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Gallagher, H. Tong, T. Eliassi-Rad, and C. Faloutsos. Using ghost edges for classification in sparsely labeled networks. In KDD, pages 256--264, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Gao, W. Fan, J. Jiang, and J. Han. Knowledge transfer via multiple model local structure mapping. In KDD, pages 283--291, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Ghamrawi and A. McCallum. Collective multi-label classification. In CIKM, pages 195--200, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Girvan and M. E. J. Newman. Community structure in social and biological networks. PNAS, 99:7821, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. He, Y. Liu, and R. D. Lawrence. Graph-based transfer learning. In CIKM, pages 937--946, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Henderson, T. Eliassi-Rad, C. Faloutsos, L. Akoglu, L. Li, K. Maruhashi, B. A. Prakash, and H. Tong. Metric forensics: a multi-level approach for mining volatile graphs. In KDD, pages 163--172, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S. Tomkins. The Web as a graph: Measurements, models and methods. Lecture Notes in Computer Science, 1627:1--17, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Kong and P. S. Yu. Multi-label feature selection for graph classification. In ICDM, pages 274--283, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S.-I. Lee, V. Chatalbashev, D. Vickrey, and D. Koller. Learning a meta-level prior for feature relevance from multiple related tasks. In ICML, pages 489--496, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD, pages 177--187, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. N. Lichtenwalter, J. T. Lussier, and N. V. Chawla. New perspectives and methods in link prediction. In KDD, pages 243--252, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu. Mining behavior graphs for "backtrace" of noncrashing bugs. In SDM, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  23. S. A. Macskassy and F. Provost. A simple relational classifier. In Proc. of the 2nd Workshop on Multi-Relational Data Mining (MRDM) at KDD, pages 64--76, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. E. J. Newman. Power laws, Pareto distributions and Zipf's law. Contemporary Physics, 46:323--351, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  25. X. Ni, J.-T. Sun, J. Hu, and Z. Chen. Cross lingual text classification by mining multilingual topics from wikipedia. In WSDM, pages 375--384, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. C. Noble and D. J. Cook. Graph-based anomaly detection. In KDD, pages 631--636, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Tong, B. A. Prakash, C. E. Tsourakakis, T. Eliassi-Rad, C. Faloutsos, and D. H. Chau. On the vulnerability of large graphs. In ICDM, pages 1091--1096, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Yao, S. Riedel, and A. McCallum. Collective cross-document relation extraction without labelled data. In EMNLP, pages 1013--1023, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Zhang, Z. Ghahramani, and Y. Yang. Learning multiple related tasks using latent independent component analysis. In NIPS, 2005.Google ScholarGoogle Scholar

Index Terms

  1. It's who you know: graph mining using recursive structural features

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2011
      1446 pages
      ISBN:9781450308137
      DOI:10.1145/2020408

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader