ABSTRACT
Given a graph, how can we extract good features for the nodes? For example, given two large graphs from the same domain, how can we use information in one to do classification in the other (i.e., perform across-network classification or transfer learning on graphs)? Also, if one of the graphs is anonymized, how can we use information in one to de-anonymize the other? The key step in all such graph mining tasks is to find effective node features. We propose ReFeX (Recursive Feature eXtraction), a novel algorithm, that recursively combines local (node-based) features with neighborhood (egonet-based) features; and outputs regional features -- capturing "behavioral" information. We demonstrate how these powerful regional features can be used in within-network and across-network classification and de-anonymization tasks -- without relying on homophily, or the availability of class labels. The contributions of our work are as follows: (a) ReFeX is scalable and (b) it is effective, capturing regional ("behavioral") information in large graphs. We report experiments on real graphs from various domains with over 1M edges, where ReFeX outperforms its competitors on typical graph mining tasks like network classification and de-anonymization.
- L. Akoglu, M. McGlohon, and C. Faloutsos. Oddball: Spotting anomalies in weighted graphs. In PAKDD, pages 410--421, 2010. Google ScholarDigital Library
- R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, (401):130--131, 1999.Google Scholar
- A.-L. Barabasi, R. Albert, and H. Jeong. Scale-free characteristics of random networks: the topology of the world-wide web. Physica A: Statistical Mechanics and its Applications, 281(1-4):69--77, 2000.Google ScholarCross Ref
- J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, 2007.Google Scholar
- D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SDM, Apr. 2004.Google ScholarCross Ref
- W. Dai, Q. Yang, G.-R. Xue, and Y. Yu. Boosting for transfer learning. In ICML, pages 193--200, 2007. Google ScholarDigital Library
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. SIGCOMM, pages 251--262, Aug-Sept. 1999. Google ScholarDigital Library
- H. Fei and J. Huan. Structure feature selection for graph classification. In CIKM, pages 991--1000, 2008. Google ScholarDigital Library
- G. Flake, S. Lawrence, C. L. Giles, and F. Coetzee. Self-organization and identification of web communities. IEEE Computer, 35(3), Mar. 2002. Google ScholarDigital Library
- B. Gallagher and T. Eliassi-Rad. Leveraging label-independent features for classification in sparsely labeled networks: An empirical study. Lecture Notes in Computer Science, 5498:1--19, 2010. Google ScholarDigital Library
- B. Gallagher, H. Tong, T. Eliassi-Rad, and C. Faloutsos. Using ghost edges for classification in sparsely labeled networks. In KDD, pages 256--264, 2008. Google ScholarDigital Library
- J. Gao, W. Fan, J. Jiang, and J. Han. Knowledge transfer via multiple model local structure mapping. In KDD, pages 283--291, 2008. Google ScholarDigital Library
- N. Ghamrawi and A. McCallum. Collective multi-label classification. In CIKM, pages 195--200, 2005. Google ScholarDigital Library
- M. Girvan and M. E. J. Newman. Community structure in social and biological networks. PNAS, 99:7821, 2002.Google ScholarCross Ref
- J. He, Y. Liu, and R. D. Lawrence. Graph-based transfer learning. In CIKM, pages 937--946, 2009. Google ScholarDigital Library
- K. Henderson, T. Eliassi-Rad, C. Faloutsos, L. Akoglu, L. Li, K. Maruhashi, B. A. Prakash, and H. Tong. Metric forensics: a multi-level approach for mining volatile graphs. In KDD, pages 163--172, 2010. Google ScholarDigital Library
- J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S. Tomkins. The Web as a graph: Measurements, models and methods. Lecture Notes in Computer Science, 1627:1--17, 1999. Google ScholarDigital Library
- X. Kong and P. S. Yu. Multi-label feature selection for graph classification. In ICDM, pages 274--283, 2010. Google ScholarDigital Library
- S.-I. Lee, V. Chatalbashev, D. Vickrey, and D. Koller. Learning a meta-level prior for feature relevance from multiple related tasks. In ICML, pages 489--496, 2007. Google ScholarDigital Library
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD, pages 177--187, 2005. Google ScholarDigital Library
- R. N. Lichtenwalter, J. T. Lussier, and N. V. Chawla. New perspectives and methods in link prediction. In KDD, pages 243--252, 2010. Google ScholarDigital Library
- C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu. Mining behavior graphs for "backtrace" of noncrashing bugs. In SDM, 2005.Google ScholarCross Ref
- S. A. Macskassy and F. Provost. A simple relational classifier. In Proc. of the 2nd Workshop on Multi-Relational Data Mining (MRDM) at KDD, pages 64--76, 2003.Google ScholarCross Ref
- M. E. J. Newman. Power laws, Pareto distributions and Zipf's law. Contemporary Physics, 46:323--351, 2005.Google ScholarCross Ref
- X. Ni, J.-T. Sun, J. Hu, and Z. Chen. Cross lingual text classification by mining multilingual topics from wikipedia. In WSDM, pages 375--384, 2011. Google ScholarDigital Library
- C. C. Noble and D. J. Cook. Graph-based anomaly detection. In KDD, pages 631--636, 2003. Google ScholarDigital Library
- H. Tong, B. A. Prakash, C. E. Tsourakakis, T. Eliassi-Rad, C. Faloutsos, and D. H. Chau. On the vulnerability of large graphs. In ICDM, pages 1091--1096, 2010. Google ScholarDigital Library
- I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. Google ScholarDigital Library
- L. Yao, S. Riedel, and A. McCallum. Collective cross-document relation extraction without labelled data. In EMNLP, pages 1013--1023, 2010. Google ScholarDigital Library
- J. Zhang, Z. Ghahramani, and Y. Yang. Learning multiple related tasks using latent independent component analysis. In NIPS, 2005.Google Scholar
Index Terms
- It's who you know: graph mining using recursive structural features
Recommendations
Text classification using graph mining-based feature extraction
A graph-based approach to document classification is described in this paper. The graph representation offers the advantage that it allows for a much more expressive document encoding than the more standard bag of words/phrases approach, and ...
Learning block-preserving graph patterns and its application to data mining
Recently, due to the rapid growth of electronic data having graph structures such as HTML and XML texts and chemical compounds, many researchers have been interested in data mining and machine learning techniques for finding useful patterns from graph-...
The K-clique Densest Subgraph Problem
WWW '15: Proceedings of the 24th International Conference on World Wide WebNumerous graph mining applications rely on detecting subgraphs which are large near-cliques. Since formulations that are geared towards finding large near-cliques are hard and frequently inapproximable due to connections with the Maximum Clique problem, ...
Comments