ABSTRACT
Access to realistic, complex graph datasets is critical to research on social networking systems and applications. Simulations on graph data provide critical evaluation of new systems and applications ranging from community detection to spam filtering and social web search. Due to the high time and resource costs of gathering real graph datasets through direct measurements, researchers are anonymizing and sharing a small number of valuable datasets with the community. However, performing experiments using shared real datasets faces three key disadvantages: concerns that graphs can be de-anonymized to reveal private information, increasing costs of distributing large datasets, and that a small number of available social graphs limits the statistical confidence in the results.
The use of measurement-calibrated graph models is an attractive alternative to sharing datasets. Researchers can "fit" a graph model to a real social graph, extract a set of model parameters, and use them to generate multiple synthetic graphs statistically similar to the original graph. While numerous graph models have been proposed, it is unclear if they can produce synthetic graphs that accurately match the properties of the original graphs. In this paper, we explore the feasibility of measurement-calibrated synthetic graphs using six popular graph models and a variety of real social graphs gathered from the Facebook social network ranging from 30,000 to 3 million edges. We find that two models consistently produce synthetic graphs with common graph metric values similar to those of the original graphs. However, only one produces high fidelity results in our application-level benchmarks. While this shows that graph models can produce realistic synthetic graphs, it also highlights the fact that current graph metrics remain incomplete, and some applications expose graph properties that do not map to existing metrics.
- Monte Carlo Statistical Methods. New York: Springer-Verlag, 2004.Google Scholar
- ADAMIC, L. A., BUYUKKOKTEN, O., AND ADAR, E. A social network caught in the web. First Monday 8, 6 (2003).Google ScholarCross Ref
- AHN, Y.-Y., ET AL. Analysis of topological characteristics of huge online social networking services. In Proc. of WWW (May 2007). Google ScholarDigital Library
- BACKSTROM, L., DWORK, C., AND KLEINBERG, J. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In WWW (May 2007). Google ScholarDigital Library
- BACKSTROM, L., DWORK, C., AND KLEINBERG, J. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In WWW (May 2007). Google ScholarDigital Library
- BARAKAT, C., ET AL. A flow-based model for internet backbone traffic. In Proc. of Internet Measurement Workshop (2002). Google ScholarDigital Library
- BARBARO, M., AND ZELLER, T. A face is exposed for AOL searcher no. 4417749, August 2006. NY Times.Google Scholar
- BLUM, A., CHAN, T.-H. H., AND RWEBANGIRA, M. R. A random-surfer web-graph model. In Proc. of ANALCO (2006).Google ScholarCross Ref
- CLAUSET, A., SHALIZI, C. R., AND NEWMAN,M. E. J. Power-law distributions in empirical data. SIAM Review 51, 4 (2009), 661--703. Google ScholarDigital Library
- DOUCEUR, J. R. The Sybil attack. In Proc. of IPTPS (March 2002). Google ScholarDigital Library
- ERDOS, P., AND RENYI, A. On the evolution of random graphs. Mathematical Institute of the Hungarian Acadamy of Science (1960).Google Scholar
- FLOYD, S., AND KOHLER, E. Internet research needs better models. In Proc. of HotNets (Oct. 2002).Google Scholar
- GARRISS, S., ET AL. Re: Reliable email. In Proc. of NSDI (2006). Google ScholarDigital Library
- IVONA, B., ADAM, K., AND RAHUL, S. Graph model selection using maximum likelihood. In Proc. of ICML (2006). Google ScholarDigital Library
- KIM, M., KOTZ, D., AND KIM, S. Extracting a mobility model from real user traces. In Proc. of INFOCOM (April 2006).Google ScholarCross Ref
- KONRAD, A., ZHAO, B. Y., JOSEPH, A. D., AND LUDWIG, R. A markov-based channel model algorithm for wireless networks. ACM Wireless Networks 9, 3 (May 2003), 189--199. Google ScholarDigital Library
- KUMAR, R., ET AL. Stochastic models for the web graph. In Proc. of FOCS (2000). Google ScholarDigital Library
- KUMAR, R., NOVAK, J., AND TOMKINS, A. Structure and evolution of online social networks. In Proc. of ACM KDD (2006). Google ScholarDigital Library
- LESKOVEC, J., AND FALOUTSOS, C. Scalable modeling of real graphs using kronecker multiplication. In Proc. of ICML (2007). Google ScholarDigital Library
- LESKOVEC, J., KLEINBERG, J., AND FALOUTSOS, C. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proc. of ACM KDD (2005). Google ScholarDigital Library
- LI, L., ET AL. Towards a theory of scale-free graphs: Definition, properties, and implications. Internet Math 2, 4 (2005), 431--523.Google ScholarCross Ref
- LUDWIG, R., KONRAD, A., AND JOSEPH, A. Optimizing the end-to-end performance of reliable flows over wireless link. In Proc. of MobiCom (Seattle, WA, 1999). Google ScholarDigital Library
- MAHADEVAN, P., ET AL. Systematic topology analysis and generation using degree correlations. In Proc. of SIGCOMM (2006). Google ScholarDigital Library
- MESSMER, B., AND BUNKE, H. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Trans. Pattern Analysis and Machine Intelligence 20 (1998), 493--504. Google ScholarDigital Library
- MISLOVE, A., ET AL. Measurement and analysis of online social networks. In Proc. of IMC (San Diego, CA, Oct 2007). Google ScholarDigital Library
- MISLOVE, A., ET AL. Growth of the flickr social network. In Proc. of WOSN (Seattle, WA, August 2008). Google ScholarDigital Library
- NARAYANAN, A., AND SHMATIKOV, V. How to break anonymity of the netflix prize dataset. In Proc. of IEEE S&P (May 2008).Google Scholar
- NARAYANAN, A., AND SHMATIKOV, V. De-anonymizing social networks. In Proc. of IEEE S&P (May 2009). Google ScholarDigital Library
- NEWMAN, M. E. J. Mixing patterns in networks. Physical Review E 67-026126 (2003).Google Scholar
- PUTTASWAMY, K. P. N., SALA, A., AND ZHAO, B. Y. Improving anonymity using social links. In Proc. of NPSec (October 2008).Google ScholarCross Ref
- RUSSELL, S., AND NORVIG, P. Artificial Intelligence: A Modern Approach, second ed. Prentice Hall, 2003. Google ScholarDigital Library
- TOIVONEN, R., ET AL. A model for social networks. Physica A: Statistical and Theoretical Physics 371, 2 (Nov. 2006), 851--860.Google ScholarCross Ref
- VAZQUEZ, A. Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations. Physical Review E 67-056104 (2003).Google Scholar
- WATTS, D. J., AND STROGATZ, S. Collective dynamics of 'small-world' networks. Nature, 393 (1998), 440--442.Google ScholarCross Ref
- WILSON, C., BOE, B., SALA, A., PUTTASWAMY, K. P. N., AND ZHAO, B. Y. User interactions in social networks and their implications. In Proc. of EuroSys (April 2009). Google ScholarDigital Library
- YU, H., ET AL. Sybilguard: defending against sybil attacks via social networks. In Proc. of SIGCOMM (September 2006). Google ScholarDigital Library
Index Terms
- Measurement-calibrated graph models for social network experiments
Recommendations
Phylogenetic graph models beyond trees
A graph model for a set S of splits of a set X consists of a graph and a map from X to the vertices of the graph such that the inclusion-minimal cuts of the graph represent S. Phylogenetic trees are graph models in which the graph is a tree. We show ...
Sharing graphs using differentially private graph models
IMC '11: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conferenceContinuing success of research on social and computer networks requires open access to realistic measurement datasets. While these datasets can be shared, generally in the form of social or Internet graphs, doing so often risks exposing sensitive user ...
Image representation and matching with geometric-edge random structure graph
A novel random structure graph model, called G-E graphs, has been proposed.We have introduced some matching techniques for G-E graphs.Experimental results show the better performance of G-E graphs. Structure graphs are often used in image structural ...
Comments