ABSTRACT
The proliferation of online social networks, and the concomitant accumulation of user data, give rise to hotly debated issues of privacy, security, and control. One specific challenge is the sharing or public release of anonymized data without accidentally leaking personally identifiable information (PII). Unfortunately, it is often difficult to ascertain that sophisticated statistical techniques, potentially employing additional external data sources, are unable to break anonymity. In this paper, we consider an instance of this problem, where the object of interest is the structure of a social network, i.e., a graph describing users and their links. Recent work demonstrates that anonymizing node identities may not be sufficient to keep the network private: the availability of node and link data from another domain, which is correlated with the anonymized network, has been used to re-identify the anonymized nodes. This paper is about conditions under which such a de-anonymization process is possible.
We attempt to shed light on the following question: can we assume that a sufficiently sparse network is inherently anonymous, in the sense that even with unlimited computational power, de-anonymization is impossible? Our approach is to introduce a random graph model for a version of the de-anonymization problem, which is parameterized by the expected node degree and a similarity parameter that controls the correlation between two graphs over the same vertex set. We find simple conditions on these parameters delineating the boundary of privacy, and show that the mean node degree need only grow slightly faster than log n with network size n for nodes to be identifiable. Our results have policy implications for sharing of anonymized network information.
- AOL Search Data Scandal. http://en.wikipedia.org/wiki/\\AOL_search_data_scandal.Google Scholar
- W. Aiello, F. Chung, and L. Lu. A Random Graph Model for Massive Graphs. In STOC '00, pages 171--180, 2000. Google ScholarDigital Library
- L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore Art Thou R3579X?: Anonymized Social Networks, Hidden Patterns, and Structural Steganography. In WWW '07, pages 181--190, 2007. Google ScholarDigital Library
- S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang. Complex Networks: Structure and Dynamics. Physics Reports, 424(4--5):175--308, 2006.Google Scholar
- B. Bollobas. Random Graphs (2nd edition). Cambridge University Press, 2001.Google Scholar
- A. Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, and J. Scott. Impact of Human Mobility on Opportunistic Forwarding Algorithms. IEEE Transactions on Mobile Computing, 6:606--620, 2007. Google ScholarDigital Library
- M. Gori, M. Maggini, and L. Sarti. Exact and Approximate Graph Matching Using Random Walks. IEEE Trans. Pattern Anal. Mach. Intell., 27(7):1100--1111, 2005. Google ScholarDigital Library
- R. Gross and A. Acquisti. Information Revelation and Privacy in Online Social Networks. In WPES '05: Proceedings of the 2005 ACM workshop on Privacy in the electronic society, pages 71--80, 2005. Google ScholarDigital Library
- S. Guha, K. Tang, and P. Francis. NOYB: Privacy in Online Social Networks. In WOSN'08: Workshop on Online Social Networks, 2008. Google ScholarDigital Library
- M. Hay, G. Miklau, D. Jensen, D. Towsley, and C. Li. Resisting Structural Re-Identification in Anonymized Networks. VLDB Journal, 19(6), December 2010. Google ScholarDigital Library
- M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On Near-Uniform URL Sampling. In Proceedings of the 9th international World Wide Web conference on Computer networks, pages 295--308, 2000. Google ScholarDigital Library
- S. Hill, F. Provost, and C. Volinsky. Network-based Marketing: Identifying Likely Adopters via Consumer Networks. Statistical Science, 21:256--276, 2006.Google ScholarCross Ref
- S. Janson, T. Łuczak, and A. Rucinski. Random Graphs. Wiley, 2000.Google ScholarCross Ref
- B. Karrer and M. E. J. Newman. Random Graph Models for Directed Acyclic Networks. Physical review. E, 2009.Google Scholar
- A. Korolova, R. Motwani, S. U. Nabar, and Y. Xu. Link Privacy in Social Networks. In CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge management, pages 289--298, 2008. Google ScholarDigital Library
- G. Kossinets, J. Kleinberg, and D. Watts. The Structure of Information Pathways in a Social Communication Network. In KDD '08, pages 435--443, 2008. Google ScholarDigital Library
- B. Krishnamurthy and C. Willis. Characterizing Privacy in Online Social Networks. In WOSN'08: Workshop on Online Social Networks, 2008. Google ScholarDigital Library
- J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-Tracking and the Dynamics of the News Cycle. In KDD '09, pages 497--506, 2009. Google ScholarDigital Library
- J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker Graphs: An Approach to Modeling Networks. J. Mach. Learn. Res., 11:985--1042, 2010. Google ScholarDigital Library
- J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting Positive and Negative Links in Online Social Networks. In WWW '10, pages 641--650, 2010. Google ScholarDigital Library
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In KDD '05, pages 177--187, 2005. Google ScholarDigital Library
- B. Luo and E. R. Hancock. Structural Graph Matching Using the EM Algorithm and Singular Value Decomposition. IEEE Trans. Pattern Anal. Mach. Intell., 23(10):1120--1136, 2001. Google ScholarDigital Library
- D. Martin and A. Schulman. Deanonymizing Users of the SafeWeb Anonymizing Service. In Proceedings of the 11th USENIX Security Symposium, pages 123--137, 2002. Google ScholarDigital Library
- A. Mislove, H. S. Koppula, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Growth of the Flickr Social Network. In WOSP '08: Proceedings of the first workshop on Online social networks, pages 25--30, 2008. Google ScholarDigital Library
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In IMC '07: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pages 29--42, 2007. Google ScholarDigital Library
- R. Myers, R. C. Wilson, and E. R. Hancock. Bayesian Graph Edit Distance. IEEE Trans. Pattern Anal. Mach. Intell., 22(6), 2000. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. Robust De-anonymization of Large Sparse Datasets. In SP '08: Proceedings of the 2008 IEEE Symposium on Security and Privacy, pages 111--125, 2008. Google ScholarDigital Library
- A. Narayanan and V. Shmatikov. De-anonymizing Social Networks. Security and Privacy, IEEE Symposium on, 0:173--187, 2009. Google ScholarDigital Library
- M. E. J. Newman, D. J. Watts, and S. H. Strogatz. Random graph models of social networks. Proceedings of the National Academy of Sciences of the United States of America, 99(Suppl 1):2566--2572, 2002.Google ScholarCross Ref
- L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs. IEEE Trans. Pattern Anal. Mach. Intell., 26(10):1367--1372, 2004. Google ScholarDigital Library
- P. Pedarsani, D. R. Figueiredo, and M. Grossglauser. Densification Arising from Sampling Fixed Graphs. In SIGMETRICS '08, pages 205--216, 2008. Google ScholarDigital Library
- Y. Richter, E. Yom-Tov, and N. Slonim. Predicting Customer Churn in Mobile Networks through Analysis of Social Groups. In Proc. SIAM Int'l Conference on Data Mining (SDM) 2010, 2010.Google ScholarCross Ref
- J. Riordan. An Introduction to Combinatorial Analysis. Wiley, 1958.Google Scholar
- A. Sanfeliu and K. Fu. A Distance Measure between Attributed Relational Graphs for Pattern Recognition. IEEE Transactions On Systems, Man, and Cybernetics, 13(3):353--362, 1983.Google ScholarCross Ref
- D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. On Unbiased Sampling for Unstructured Peer-to-Peer Networks. IEEE/ACM Trans. Netw., 17(2):377--390, 2009. Google ScholarDigital Library
- Y. Tian and J. M. Patel. TALE: A Tool for Approximate Large Graph Matching. Data Engineering, International Conference on, 0:963--972, 2008. Google ScholarDigital Library
- J. R. Ullmann. An Algorithm for Subgraph Isomorphism. J. ACM, 23(1):31--42, 1976. Google ScholarDigital Library
- G. Wondracek, T. Holz, E. Kirda, and C. Kruegel. A Practical Attack to De-Anonymize Social Network Users. In IEEE Symposium on Security & Privacy, 2010. Google ScholarDigital Library
- B. Zhou and J. Pei. Preserving Privacy in Social Networks Against Neighborhood Attacks. In ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, pages 506--515, 2008. Google ScholarDigital Library
Index Terms
- On the privacy of anonymized networks
Recommendations
Preservation of Centrality Measures in Anonymized Social Networks
SOCIALCOM '13: Proceedings of the 2013 International Conference on Social ComputingSocial media sites became a pervasive presence in the nowadays society. We can learn a lot of useful information about human behavior and interaction by paying attention to the information and relations of social media users. This information can be ...
A secure K-automorphism privacy preserving approach with high data utility in social networks
The prevalence of social networks has raised the concern for individual privacy leakage. Although preserving user privacy in social networks has been studied extensively, there is still a lot of room for improvement in the state of the art techniques. ...
Random graph models for wireless and social networks: keynote talk abstract
MobiOpp '12: Proceedings of the third ACM international workshop on Mobile Opportunistic NetworksOperating large-scale social applications over opportunistic wireless networks entails many fascinating engineering challenges. We strive for robust and efficient algorithms for specific problems like opportunistic forwarding, routing, or publish-...
Comments