ABSTRACT
In this paper, we study the quantification, practice, and implications of structural data (e.g., social data, mobility traces) De-Anonymization (DA). First, we address several open problems in structural data DA by quantifying perfect and (1-ε)-perfect structural data DA}, where ε is the error tolerated by a DA scheme. To the best of our knowledge, this is the first work on quantifying structural data DA under a general data model, which closes the gap between structural data DA practice and theory. Second, we conduct the first large-scale study on the de-anonymizability of 26 real world structural datasets, including Social Networks (SNs), Collaborations Networks, Communication Networks, Autonomous Systems, and Peer-to-Peer networks. We also quantitatively show the conditions for perfect and (1-ε)-perfect DA of the 26 datasets. Third, following our quantification, we design a practical and novel single-phase cold start Optimization based DA} (ODA) algorithm. Experimental analysis of ODA shows that about 77.7% - 83.3% of the users in Gowalla (.2M users and 1M edges) and 86.9% - 95.5% of the users in Google+ (4.7M users and 90.8M edges) are de-anonymizable in different scenarios, which implies optimization based DA is implementable and powerful in practice. Finally, we discuss the implications of our DA quantification and ODA and provide some general suggestions for future secure data publishing.
- L. Backstrom, C. Dwork, and J. Kleinberg, Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography, WWW 2007. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov, De-anonymizing Social Networks, S&P 2009. Google ScholarDigital Library
- M. Srivatsa and M. Hicks, Deanonymizing Mobility Traces: Using Social Networks as a Side-Channel, CCS 2012. Google ScholarDigital Library
- G. Wondracek, T. Holz, E. Kirda, and C. Kruegel, A Practical Attack to De-Anonymize Social Network Users, S&P 2010. Google ScholarDigital Library
- P. Pedarsani and M. Grossglauser, On the Privacy of Anonymized Networks, KDD 2011. Google ScholarDigital Library
- M. Hay, G. Miklau, D. Jensen, D. Towsley, and P. Weis, Resisting Structural Re-identification in Anonymized Social Networks, VLDB 2008. Google ScholarDigital Library
- K. Liu and E. Terzi, Towards Identity Anonymization on Graphs, SIGMOD 2008. Google ScholarDigital Library
- N. Li, W. Qardaji, and D. Su, On Sampling, Anonymization, and Differential Privacy Or, K-Anonymization Meets Differential Privacy, ASIACCS 2012. Google ScholarDigital Library
- C. Dwork, Differential Privacy, ICALP 2006. Google ScholarDigital Library
- A. Korolova, R. Motwani, S. U. Nabar, and Y. Xu, Link Privacy in Social Networks, CIKM 2008. Google ScholarDigital Library
- E. Zheleva and L. Getoor, To Join or Not to Join: The Illusion of Privacy in Social Networks with Mixed Public and Private User Profiles, WWW 2009. Google ScholarDigital Library
- J. Pang, B. Greenstein, R. Gummadi, S. Seshan, and D. Wetherall, 802.11 User Fingerprinting, Mobicom 2007. Google ScholarDigital Library
- L. Backstrom, E. Sun, and C. Marlow, Find me If You Can: Improving Geographical Prediction with Social and Spatial Proximity, WWW 2010. Google ScholarDigital Library
- S. Han, V. Liu, Q. Pu, S. Peter, T. Anderson, A. Krishnamurthy, and D. Wetherall, Expressive Privacy Control with Pseudonyms, Sigcomm 2013. Google ScholarDigital Library
- P. Mittal, M. Wright, and N. Borisov, Pisces: Anonymous Communication Using Social Networks, NDSS 2013.Google Scholar
- J. Kannan, G. Altekar, P. Maniatis, and B.-G. Chun Making programs forget: Enforcing Lifetime for Sensitive Data, USENIX 2013. Google ScholarDigital Library
- M. Egele, G. Stringhini, C. Krugel, and G. Vigna, COMPA: Detecting Compromised Accounts on Social Networks, NDSS 2013.Google Scholar
- K. Singh, S. Bhola, and W. Lee, xBook: Redesigning Privacy Control in Social Networking Pl atforms, USENIX 2009. Google ScholarDigital Library
- R. Shokri, G. Theodorakopoulos, J.-Y. L. Boudec, and J.-P. Hubaux, Quantifying Location Privacy, S&P 2011. Google ScholarDigital Library
- R. Shokri, G. Theodorakopoulos, C. Troncoso, J.-P. Hubaux, and J.-Y. L. Boudec, Protecting Location Privacy: Optimal Strategy against Localization Attacks, CCS 2012. Google ScholarDigital Library
- M. E. J. Newman, Networks: An Introduction, Oxford University Press, 2010. Google ScholarDigital Library
- M. E. J. Newman, The Structure and Function of Complex Networks, SIAM Review, No. 45, pp. 167-256, 2003.Google Scholar
- B. Bollobás, Random Graphs (Second Edition), Cambridge University Press, 2001.Google Scholar
- J. Riordan, An Introduction to Combinatorial Analysis, Wiley, 1958.Google Scholar
- N. Z. Gong, W. Xu, L. Huang, P. Mittal, E. Stefanov, V. Sekar and D. Song, Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+, IMC 2012. Google ScholarDigital Library
- http://snap.stanford.edu/data/Google Scholar
- H. Pham, C. Shahabi, and Yan Liu, EBM - An Entropy-Based Model to Infer Social Strength from Spatiotemporal Data, SIGMOD 2013. Google ScholarDigital Library
- C. Shah, R. Capra, and P. Hansen, Collaborative Information Seeking, Computer, 2014. Google ScholarDigital Library
- Z. Xu, J. Ramanathan, and R. Ramnath, Identifying Knowledge Brokers and Their Role in Enterprise Research through Social Media, Computer, 2014. Google ScholarDigital Library
Index Terms
- Structural Data De-anonymization: Quantification, Practice, and Implications
Recommendations
General Graph Data De-Anonymization: From Mobility Traces to Social Networks
When people utilize social applications and services, their privacy suffers a potential serious threat. In this article, we present a novel, robust, and effective de-anonymization attack to mobility trace data and social data. First, we design a Unified ...
An Automated Social Graph De-anonymization Technique
WPES '14: Proceedings of the 13th Workshop on Privacy in the Electronic SocietyWe present a generic and automated approach to re-identifying nodes in anonymized social networks which enables novel anonymization techniques to be quickly evaluated. It uses machine learning (decision forests) to matching pairs of nodes in disparate ...
Social Network De-anonymization: More Adversarial Knowledge, More Users Re-identified?
Special Section on Advances in Internet-Based Collaborative TechnologiesPrevious works on social network de-anonymization focus on designing accurate and efficient de-anonymization methods. We attempt to investigate the intrinsic relationship between the attacker’s knowledge and the expected de-anonymization gain. A common ...
Comments