Skip to main content
Top
Published in:

01-12-2016 | Original Article

A synthetic data generator for online social network graphs

Author: David F. Nettleton

Published in: Social Network Analysis and Mining | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Two of the difficulties for data analysts of online social networks are (1) the public availability of data and (2) respecting the privacy of the users. One possible solution to both of these problems is to use synthetically generated data. However, this presents a series of challenges related to generating a realistic dataset in terms of topologies, attribute values, communities, data distributions, correlations and so on. In the following work, we present and validate an approach for populating a graph topology with synthetic data which approximates an online social network. The empirical tests confirm that our approach generates a dataset which is both diverse and with a good fit to the target requirements, with a realistic modeling of noise and fitting to communities. A good match is obtained between the generated data and the target profiles and distributions, which is competitive with other state of the art methods. The data generator is also highly configurable, with a sophisticated control parameter set for different “similarity/diversity” levels.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Ali AM (2014) Synthetic generators for simulating social networks, 2014. Masters thesis, Univ. Florida Ali AM (2014) Synthetic generators for simulating social networks, 2014. Masters thesis, Univ. Florida
go back to reference Ali AM, Alvari H, Hajibagheri A, Lakkaraj K, Sukthankar G (2014) Synthetic generators for cloning social network data. In: Proceedings of SocInfo 2014 Ali AM, Alvari H, Hajibagheri A, Lakkaraj K, Sukthankar G (2014) Synthetic generators for cloning social network data. In: Proceedings of SocInfo 2014
go back to reference Barrett CL, Beckman RJ, Khan M, Kumar VSA, Marathe MV, Stretz PE, Dutta T, Lewis B (2009) Generation and Analysis of Large Synthetic Social Contact Networks. In: Proceedings of the 2009 Winter Simulation Conference, 13–16 Dec 2009, pp 1003–1014 Barrett CL, Beckman RJ, Khan M, Kumar VSA, Marathe MV, Stretz PE, Dutta T, Lewis B (2009) Generation and Analysis of Large Synthetic Social Contact Networks. In: Proceedings of the 2009 Winter Simulation Conference, 13–16 Dec 2009, pp 1003–1014
go back to reference Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. Int AAAI Conf Weblogs Soc Media ICWSM 8(2009):361–362 Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. Int AAAI Conf Weblogs Soc Media ICWSM 8(2009):361–362
go back to reference Block P, Grund T (2014) Multidimensional homophily in friendship networks. Netw Sci (Camb Univ Press) 2(2):189–212 Block P, Grund T (2014) Multidimensional homophily in friendship networks. Netw Sci (Camb Univ Press) 2(2):189–212
go back to reference Blondel VD, Guillaume JL, Lambiotte R, Lefebure E (2008) Fast unfolding of communities in large networks. J Stat Mech P10008 Blondel VD, Guillaume JL, Lambiotte R, Lefebure E (2008) Fast unfolding of communities in large networks. J Stat Mech P10008
go back to reference Boncz P, Perez M, Gavalda R., Angles R, Erling O, Gubichev A, Spasić M, Pham MD, Martínez N (2014) Benchmark Design for Navigational Pattern Matching Benchmarking. LDBC Cooperative Project FP7 – 317548. Coordinators: Arnau Prat, Alex Averbuch. Issue 3 28/09/2014 Boncz P, Perez M, Gavalda R., Angles R, Erling O, Gubichev A, Spasić M, Pham MD, Martínez N (2014) Benchmark Design for Navigational Pattern Matching Benchmarking. LDBC Cooperative Project FP7 – 317548. Coordinators: Arnau Prat, Alex Averbuch. Issue 3 28/09/2014
go back to reference Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring User Influence in Twitter: The Million Follower Fallacy. In: Proceedings of 4th Int. AAAI Conf. on Weblogs and Social Media (ICWSM), vol 10, pp 10–17 Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring User Influence in Twitter: The Million Follower Fallacy. In: Proceedings of 4th Int. AAAI Conf. on Weblogs and Social Media (ICWSM), vol 10, pp 10–17
go back to reference Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: A recursive model for graph mining. In: Proc. SIAM Data Mining Conference, 2004. SIAM, Philadelphia, PA Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: A recursive model for graph mining. In: Proc. SIAM Data Mining Conference, 2004. SIAM, Philadelphia, PA
go back to reference Currarini S, Redondoy FV. A Simple Model of Homophily in Social Networks (2013) University Ca’ Foscari of Venice, Dept. of Economics Research Paper Series No. 24, 2013 Currarini S, Redondoy FV. A Simple Model of Homophily in Social Networks (2013) University Ca’ Foscari of Venice, Dept. of Economics Research Paper Series No. 24, 2013
go back to reference Dean J, Sanjay G (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Sanjay G (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
go back to reference Dehghani M, Johnson K, Hoover J, Sagi E, Garten J, Parmar NJ, Vaisey S, Iliev R, Graham J (2016) Purity homophily in social networks. J Exp Psychol Gen 145(3):366–375CrossRef Dehghani M, Johnson K, Hoover J, Sagi E, Garten J, Parmar NJ, Vaisey S, Iliev R, Graham J (2016) Purity homophily in social networks. J Exp Psychol Gen 145(3):366–375CrossRef
go back to reference Dunbar RIM (1993) Coevolution of neocortical size, group size and language in humans. Behav Brain Sci 16(4):681–735CrossRef Dunbar RIM (1993) Coevolution of neocortical size, group size and language in humans. Behav Brain Sci 16(4):681–735CrossRef
go back to reference Hajibagheri A, Hamzeh A, Sukthankar G (2013). Modeling information diffusion and community membership using stochastic optimization. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on (pp 175–182). IEEE. describes our community detection algorithm, GPSODM Hajibagheri A, Hamzeh A, Sukthankar G (2013). Modeling information diffusion and community membership using stochastic optimization. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on (pp 175–182). IEEE. describes our community detection algorithm, GPSODM
go back to reference Hajibagheri A, Lakkaraju K, Sukthankar G, Wigand RT, Agarwal N (2015) Conflict and Communication in Massively-Multiplayer Online Games, Social Computing, Behavioral-Cultural Modeling, and Prediction, Vol. 9021, Lecture Notes in Computer Science, pp 65–74, 17 March 2015 Hajibagheri A, Lakkaraju K, Sukthankar G, Wigand RT, Agarwal N (2015) Conflict and Communication in Massively-Multiplayer Online Games, Social Computing, Behavioral-Cultural Modeling, and Prediction, Vol. 9021, Lecture Notes in Computer Science, pp 65–74, 17 March 2015
go back to reference Jones R, Kumar R, Pang B, Tomkins A (2007) I know what you did last summer: Query logs and user privacy, Sixteenth ACM Conf. on Information and Knowledge Management, ser. CIKM. 2007, pp 909–914 Jones R, Kumar R, Pang B, Tomkins A (2007) I know what you did last summer: Query logs and user privacy, Sixteenth ACM Conf. on Information and Knowledge Management, ser. CIKM. 2007, pp 909–914
go back to reference Kim M, Leskovec J (2011) Modeling Social Networks with Node Attributes using the Multiplicative Attribute Graph Model. In: Proc. UAI 2011, 27th Conf. on Uncertainty in Artificial Intelligence, Barcelona, Spain, July 14–17, 2011 Kim M, Leskovec J (2011) Modeling Social Networks with Node Attributes using the Multiplicative Attribute Graph Model. In: Proc. UAI 2011, 27th Conf. on Uncertainty in Artificial Intelligence, Barcelona, Spain, July 14–17, 2011
go back to reference Korsgaard M, Picot A, Wigand R, Welpe I, Assmann J (2010) Cooperation, coordination, and trust in virtual teams: Insights from virtual games. In: Online Worlds: Convergence of the Real and the Virtual Korsgaard M, Picot A, Wigand R, Welpe I, Assmann J (2010) Cooperation, coordination, and trust in virtual teams: Insights from virtual games. In: Online Worlds: Convergence of the Real and the Virtual
go back to reference Kossinets G, Watts D (2009) Origins of homophily in an evolving social network. Am J Sociol 115(2):405–450CrossRef Kossinets G, Watts D (2009) Origins of homophily in an evolving social network. Am J Sociol 115(2):405–450CrossRef
go back to reference Lakkaraju K, Whetzel J (2013) Group roles in massively multiplayer online games. In: Proceedings of the Workshop on Collaborative Online Organizations at the 14th International Conference on Autonomous Agents and Multiagent Systems Lakkaraju K, Whetzel J (2013) Group roles in massively multiplayer online games. In: Proceedings of the Workshop on Collaborative Online Organizations at the 14th International Conference on Autonomous Agents and Multiagent Systems
go back to reference Lee J, Lakkaraju K (2014) Predicting guild membership in massively multiplayer online games. In: Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, Washington, D.C., April 2014 Lee J, Lakkaraju K (2014) Predicting guild membership in massively multiplayer online games. In: Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, Washington, D.C., April 2014
go back to reference Leskovec J (2008) Dynamics of Large Networks. PhD Thesis, School of Computer Science, Carnegie-Mellon Univ Leskovec J (2008) Dynamics of Large Networks. PhD Thesis, School of Computer Science, Carnegie-Mellon Univ
go back to reference Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proc. KDD ‘05, 11th ACM SIGKDD Int. Conf. of Knowledge Discovery and Data Mining, 2005, pp 177–187 Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proc. KDD ‘05, 11th ACM SIGKDD Int. Conf. of Knowledge Discovery and Data Mining, 2005, pp 177–187
go back to reference McAfee, A., Brynjolfsson, E. (2012) Big Data: The Management Revolution, Harvard Business Review, October 2012 Issue McAfee, A., Brynjolfsson, E. (2012) Big Data: The Management Revolution, Harvard Business Review, October 2012 Issue
go back to reference McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444CrossRef McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444CrossRef
go back to reference Mislove A, Marcon M, Gummad, KP, Druschel P, Bhattacharjee B (2007) Measurement and Analysis of Online Social Networks. In: Proceedings of IMC ‘07, 7th ACM SIGCOMM Conference on Internet Measurement, pp 29–42 Mislove A, Marcon M, Gummad, KP, Druschel P, Bhattacharjee B (2007) Measurement and Analysis of Online Social Networks. In: Proceedings of IMC ‘07, 7th ACM SIGCOMM Conference on Internet Measurement, pp 29–42
go back to reference Nettleton, DF (2015) Generating synthetic online social network graph data and topologies, 3rd Workshop on Graph-based Technologies and Applications (Graph-TA), UPC, Barcelona, Spain, March 18th 2015 Nettleton, DF (2015) Generating synthetic online social network graph data and topologies, 3rd Workshop on Graph-based Technologies and Applications (Graph-TA), UPC, Barcelona, Spain, March 18th 2015
go back to reference Nettleton DF, Salas J (2016) A data driven anonymization system for information rich online social network graphs. Expert Syst Appl 55:87–105CrossRef Nettleton DF, Salas J (2016) A data driven anonymization system for information rich online social network graphs. Expert Syst Appl 55:87–105CrossRef
go back to reference Newman MEJ (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69:066133CrossRef Newman MEJ (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69:066133CrossRef
go back to reference Ovelgonne M (2013) Distributed community detection in web-scale networks. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on, pp 66–73 Ovelgonne M (2013) Distributed community detection in web-scale networks. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on, pp 66–73
go back to reference Pérez-Rosés H, Sebé F (2015) Synthetic generation of social network data with endorsements. J Simul 9(4):279–286CrossRef Pérez-Rosés H, Sebé F (2015) Synthetic generation of social network data with endorsements. J Simul 9(4):279–286CrossRef
go back to reference Pérez-Rosés H, Sebé F, Ribó JM (2016) Endorsement Deduction and Ranking in Social Networks, Computer Communications, Vol. 73, Part B, 1 January 2016, Pages 200–210, Elsevier Pérez-Rosés H, Sebé F, Ribó JM (2016) Endorsement Deduction and Ranking in Social Networks, Computer Communications, Vol. 73, Part B, 1 January 2016, Pages 200–210, Elsevier
go back to reference Pham MD, Boncz P, Erling O (2012) S3G2: a Scalable Structure-correlated Social Graph Generator. In: Proc. 4th TPC Technology Conference, TPCTC 2012, Istanbul, Turkey, August 27, 2012, Lecture Notes in Computer Science, vol. 7755, pp 156–172 Pham MD, Boncz P, Erling O (2012) S3G2: a Scalable Structure-correlated Social Graph Generator. In: Proc. 4th TPC Technology Conference, TPCTC 2012, Istanbul, Turkey, August 27, 2012, Lecture Notes in Computer Science, vol. 7755, pp 156–172
go back to reference Plimpton SJ, Devine KD (2011) MapReduce in MPI for large-scale graph algorithms. Parallel Comput 37(9):610–632CrossRef Plimpton SJ, Devine KD (2011) MapReduce in MPI for large-scale graph algorithms. Parallel Comput 37(9):610–632CrossRef
go back to reference Que X, Checconi F, Petrini F, Wang T, Yu W (2013) Lightning-fast Community Detection in Social Media: A Scalable Implementation of the Louvain Algorithm. Technical Report AU-CSSE-PASL/13-TR01 (Auburn University, IBM TJ Watson) Que X, Checconi F, Petrini F, Wang T, Yu W (2013) Lightning-fast Community Detection in Social Media: A Scalable Implementation of the Louvain Algorithm. Technical Report AU-CSSE-PASL/13-TR01 (Auburn University, IBM TJ Watson)
go back to reference Ramakrishnan N, Keller B, Mirza BJ. (2001). A. Grama, and G. Karypis, “Privacy risks in recommender systems,” IEEE Internet Computing, vol. 5, no. 6, pp. 54–62, 2001 Ramakrishnan N, Keller B, Mirza BJ. (2001). A. Grama, and G. Karypis, “Privacy risks in recommender systems,” IEEE Internet Computing, vol. 5, no. 6, pp. 54–62, 2001
go back to reference Robins G, Pattison P, Woolcock J (2005) Small and other worlds: global network structures from local processes. Am J Sociol (AJS) 110(4):894–936CrossRef Robins G, Pattison P, Woolcock J (2005) Small and other worlds: global network structures from local processes. Am J Sociol (AJS) 110(4):894–936CrossRef
go back to reference Sala A, Cao L, Wilson C, Zablit R, Zheng H, Zhao BY (2010) Measurement-calibrated Graph Models for Social Network Experiments, WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA Sala A, Cao L, Wilson C, Zablit R, Zheng H, Zhao BY (2010) Measurement-calibrated Graph Models for Social Network Experiments, WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA
go back to reference Schult DA, Swart P (2008) Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conferences (SciPy 2008). Vol. 2008. 2008 Schult DA, Swart P (2008) Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conferences (SciPy 2008). Vol. 2008. 2008
go back to reference Tang L, Liu H, Zhang J, Nazeri N (2008). Community evolution in dynamic multi-mode networks. In: Proc. of the 14th ACM SIGKDD, KDD’08, New York, NY, USA, 2008, pp 677–685 Tang L, Liu H, Zhang J, Nazeri N (2008). Community evolution in dynamic multi-mode networks. In: Proc. of the 14th ACM SIGKDD, KDD’08, New York, NY, USA, 2008, pp 677–685
go back to reference Tarbush B, Teytelboym A (2012) Homophily in Online Social Networks, Internet and Network Economics, Volume 7695 of the series Lecture Notes in Computer Science pp 512-518 (2012). In: Proc. Internet and Network Economics: 8th International Workshop, WINE 2012, Liverpool, UK, December 10–12, 2012. Springer Berlin Heidelberg Tarbush B, Teytelboym A (2012) Homophily in Online Social Networks, Internet and Network Economics, Volume 7695 of the series Lecture Notes in Computer Science pp 512-518 (2012). In: Proc. Internet and Network Economics: 8th International Workshop, WINE 2012, Liverpool, UK, December 10–12, 2012. Springer Berlin Heidelberg
go back to reference Verbrugge LM (1983) A research note on adult friendship contact: a dyadic perspective. Soc Forces 62(1):78–83CrossRef Verbrugge LM (1983) A research note on adult friendship contact: a dyadic perspective. Soc Forces 62(1):78–83CrossRef
go back to reference Viswanath, B, Mislove A, Cha M, Gummadi, KP. (2009). On the Evolution of User Interaction in Facebook. In: Proceedings of 2nd ACM workshop on Online Social Networks, WOSN’09, Barcelona, Spain, 2009, pp 37–42 Viswanath, B, Mislove A, Cha M, Gummadi, KP. (2009). On the Evolution of User Interaction in Facebook. In: Proceedings of 2nd ACM workshop on Online Social Networks, WOSN’09, Barcelona, Spain, 2009, pp 37–42
go back to reference Wang X, Sukthankar G (2013) Link prediction in multirelational collaboration networks. In: Proceedings of the IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining, pp 1445–1447, Canada, Aug 2013 Wang X, Sukthankar G (2013) Link prediction in multirelational collaboration networks. In: Proceedings of the IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining, pp 1445–1447, Canada, Aug 2013
go back to reference Wang X, Maghami M, Sukthankar G (2011) Leveraging network properties for trust evaluation in multi-agent systems. In: Proc. IEEE/WIC/ACM Int. Conf. on Web Intelligence and Intelligent Agent Technology, pp 288–295 Wang X, Maghami M, Sukthankar G (2011) Leveraging network properties for trust evaluation in multi-agent systems. In: Proc. IEEE/WIC/ACM Int. Conf. on Web Intelligence and Intelligent Agent Technology, pp 288–295
go back to reference Wattenhofer M, Wattenhofer R, Zhu Z (2012) The YouTube Social Network. In: Proc. 6th Int. AAAI Conf. on Weblogs and Social Media, Dublin, Ireland, 4–7 June, 2012, pp 354–361 Wattenhofer M, Wattenhofer R, Zhu Z (2012) The YouTube Social Network. In: Proc. 6th Int. AAAI Conf. on Weblogs and Social Media, Dublin, Ireland, 4–7 June, 2012, pp 354–361
go back to reference Weil, J. (2015) “Mark Zuckerberg: Creator of Facebook”, Abdo Publishing, Minneapolis, USA. Ed. Arnold Ringstad, ISBN 978-1-62403-647-7 (2015) Weil, J. (2015) “Mark Zuckerberg: Creator of Facebook”, Abdo Publishing, Minneapolis, USA. Ed. Arnold Ringstad, ISBN 978-1-62403-647-7 (2015)
go back to reference Wigand R, Agrawal N, Osesina O, Hering W, Korsgaard M, Picot A, Drescher M (2012) Social network indices as performance predictors in a virtual organization. In: proceedings of the 4th international conference on Computational Aspects of Social Networks (CASoN) pp 144–149 Wigand R, Agrawal N, Osesina O, Hering W, Korsgaard M, Picot A, Drescher M (2012) Social network indices as performance predictors in a virtual organization. In: proceedings of the 4th international conference on Computational Aspects of Social Networks (CASoN) pp 144–149
go back to reference Xie J, Szymanski BK (2013). Labelrank: A stabilized label propagation algorithm for community detection in networks. In: Network Science Workshop (NSW), 2013 IEEE 2nd (pp 138–143) Xie J, Szymanski BK (2013). Labelrank: A stabilized label propagation algorithm for community detection in networks. In: Network Science Workshop (NSW), 2013 IEEE 2nd (pp 138–143)
go back to reference Xie J, Chen M, Szymanski BK (2013). LabelrankT: Incremental community detection in dynamic networks via label propagation. In: ACM Proceedings of the Workshop on Dynamic Networks Management and Mining (pp 25–32) Xie J, Chen M, Szymanski BK (2013). LabelrankT: Incremental community detection in dynamic networks via label propagation. In: ACM Proceedings of the Workshop on Dynamic Networks Management and Mining (pp 25–32)
go back to reference Yang J, Leskovec J (2012) Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012 Yang J, Leskovec J (2012) Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012
go back to reference Zhao W, Ma H, He Q (2009) Parallel K-Means Clustering Based on MapReduce. In: Proc. CloudCom 2009, LNCS 5931, pp 674–679, 2009 Zhao W, Ma H, He Q (2009) Parallel K-Means Clustering Based on MapReduce. In: Proc. CloudCom 2009, LNCS 5931, pp 674–679, 2009
Metadata
Title
A synthetic data generator for online social network graphs
Author
David F. Nettleton
Publication date
01-12-2016
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 1/2016
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-016-0352-y

Premium Partner