ABSTRACT
We study how potential attackers can identify accounts on different social network sites that all belong to the same user, exploiting only innocuous activity that inherently comes with posted content. We examine three specific features on Yelp, Flickr, and Twitter: the geo-location attached to a user's posts, the timestamp of posts, and the user's writing style as captured by language models. We show that among these three features the location of posts is the most powerful feature to identify accounts that belong to the same user in different sites. When we combine all three features, the accuracy of identifying Twitter accounts that belong to a set of Flickr users is comparable to that of existing attacks that exploit usernames. Our attack can identify 37% more accounts than using usernames when we instead correlate Yelp and Twitter. Our results have significant privacy implications as they present a novel class of attacks that exploit users' tendency to assume that, if they maintain different personas with different names, the accounts cannot be linked together; whereas we show that the posts themselves can provide enough information to correlate the accounts.
- Social Intelligence Corp., http://www.socialintel.com/.Google Scholar
- R. Schmid, "Salesforce service cloud -- featuring activision," September 2012, http://www.youtube.com/watch?v=eT6iHEdnKQ4&feature=relmfu.Google Scholar
- A. Narayanan and V. Shmatikov, "Robust de-anonymization of large sparse datasets," in Proceedings of the 2008 IEEE Symposium on Security and Privacy (S&P), 2008. Google ScholarDigital Library
- D. Perito, C. Castelluccia, M. Ali Kâafar, and P. Manils, "How unique and traceable are usernames?" in Proceedings of the 11th Privacy Enhancing Technologies Symposium (PETS), 2011. Google ScholarDigital Library
- "Yahoo! placemaker," http://developer.yahoo.com/geo/placemaker/.Google Scholar
- "geonames.org," http://geonames.org.Google Scholar
- D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg, "Mapping the world's photos," in Proceedings of the 18th International Conference on World Wide Web (WWW), 2009. Google ScholarDigital Library
- S. Kinsella, V. Murdock, and N. O'Hare, "I'm eating a sandwich in Glasgow": modeling locations with tweets," in Proceedings of the 3rd International Workshop on Search and Mining User-generated Contents (SMUC), 2011. Google ScholarDigital Library
- Z. Cheng, J. Caverlee, and K. Lee, "You are where you tweet: a content-based approach to geo-locating twitter users," in Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM), 2010. Google ScholarDigital Library
- M. Nanavati, N. Taylor, W. Aiello, and A. Warfield, "Herbert west: deanonymizer," in Proceedings of the 6th USENIX Conference on Hot topics in Security (HotSec), 2011. Google ScholarDigital Library
- "Bing Maps API," http://www.microsoft.com/maps/developers/web.aspx.Google Scholar
- K. S. Jones, "A statistical interpretation of term specificity and its application in retrieval," Journal of Documentation, vol. 28, pp. 11--21, 1972.Google ScholarCross Ref
- B. Picart, "Improved Phone Posterior Estimation Through K-NN and MLP-Based Similarity," Idiap Research Institute, Tech. Rep., 2009.Google Scholar
- S.-h. Cha, "Comprehensive survey on distance / similarity measures between probability density functions," International Journal of Mathematical Models and Methods in Applied Sciences, vol. 1, no. 4, pp. 300--307, 2007.Google Scholar
- V. Keselj, F. Peng, N. Cercone, and C. Thomas, "N-gram-based author profiles for authorship attribution," in Pacific Association for Computational Linguistics, 2003.Google Scholar
- A. Stolcke, "Srilm - an extensible language modeling toolkit," in Proceedings of Int'l Conference on Spoken Language Processing, 2002.Google Scholar
- M. Tranmer and M. Elliot, "Binary logistic regression," Cathie Marsh for Census and Survey Research, Paper 2008--20.Google Scholar
- F. J. Provost, T. Fawcett, and R. Kohavi, "The case against accuracy estimation for comparing induction algorithms," in Proceedings of the Fifteenth International Conference on Machine Learning (ICML), 1998. Google ScholarDigital Library
- W. W. Cohen, P. Ravikumar, and S. E. Fienberg, "A comparison of string distance metrics for name-matching tasks," in Proceedings of IJCAI-03 Workshop on Information Integration, 2003.Google Scholar
- G. Friedland, G. Maier, R. Sommer, and N. Weaver, "Sherlock Holmes' evil twin: on the impact of global inference for online privacy," in Proceedings of the 2011 Workshop on New Security Paradigms Workshop (NSPW), 2011. Google ScholarDigital Library
- D. Irani, S. Webb, K. Li, and C. Pu, "Large online social footprints--an emerging threat," in Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 03 (CSE), 2009. Google ScholarDigital Library
- M. Balduzzi, C. Platzer, T. Holz, E. Kirda, D. Balzarotti, and C. Kruegel, "Abusing social networks for automated user profiling," in Proceedings of 13th International Symposium on Recent Advances in Intrusion Detection (RAID), 2010. Google ScholarDigital Library
- T. Iofciu, P. Fankhauser, F. Abel, and K. Bischoff, "Identifying users across social tagging systems," in Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM), 2011.Google Scholar
- G. Wondracek, T. Holz, E. Kirda, and C. Kruegel, "A practical attack to de-anonymize social network users," in Proceedings of the 31st IEEE Symposium on Security and Privacy (S&P), 2010. Google ScholarDigital Library
- H. Zang and J. Bolot, "Anonymization of location data does not work: a large-scale measurement study," in Proceedings of the 17th annual International Conference on Mobile Computing and Networking (MobiCom), 2011. Google ScholarDigital Library
- B. Hecht, L. Hong, B. Suh, and E. H. Chi, "Tweets from justin bieber's heart: the dynamics of the location field in user profiles," in Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems (CHI), 2011. Google ScholarDigital Library
- A. Chaabane, G. Acs, and M. A. Kaafar, "You are what you like! information leakage through users' interests," in Proceedings of the 19th Annual Network & Distributed System Security Symposium (NDSS), 2012.Google Scholar
- E. Zheleva and L. Getoor, "To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles," in Proceedings of the 18th International Conference on World Wide Web (WWW), 2009. Google ScholarDigital Library
- D. Gayo Avello, "All liaisons are dangerous when all your friends are known to us," in Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia (HT), 2011. Google ScholarDigital Library
- A. Narayanan, H. Paskov, N. Z. Gong, J. Bethencourt, E. Stefanov, E. C. R. Shin, and D. Song, "On the feasibility of internet-scale author identification," in Proceedings of the 33st IEEE Symposium on Security and Privacy (S&P), 2012. Google ScholarDigital Library
- M. A. Mishari and G. Tsudik, "Exploring linkability of user reviews," in Proceedings of the 17th European Symposium on Research in Computer Security (ESORICS), 2012.Google Scholar
- L. Sweeney, "Weaving technology and policy together to maintain confidentiality," Journal of Law, Medicine, and Ethics, vol. 25, no. 2-3, pp. 98--110, 1997.Google ScholarCross Ref
- A. Narayanan and V. Shmatikov, "De-anonymizing social networks," in Proceedings of the 2009 30th IEEE Symposium on Security and Privacy (S&P), 2009. Google ScholarDigital Library
- M. Srivatsa and M. Hicks, "Deanonymizing mobility traces: Using social network as a side-channel," in Proceedings of the ACM Conference on Computer and Communications Security (CCS), 2012. Google ScholarDigital Library
- M. Bishop, J. Cummins, S. Peisert, A. Singh, B. Bhumiratana, D. Agarwal, D. Frincke, and M. Hogarth, "Relationships and data sanitization: A study in scarlet," in Proceedings of the 2010 Workshop on New Security Paradigms (NSPW), 2010. Google ScholarDigital Library
- G. Friedland and R. Sommer, "Cybercasing the Joint: On the Privacy Implications of Geo-Tagging," in Proceedings of the 5th USENIX Conference on Hot Topics in Security (HotSec), 2010. Google ScholarDigital Library
Index Terms
- Exploiting innocuous activity for correlating users across sites
Recommendations
Tweeting across hashtags: overlapping users and the importance of language, topics, and politics
HT '13: Proceedings of the 24th ACM Conference on Hypertext and Social MediaIn this paper we investigate the activity of 1 million users tweeting under 455 different hashtags related to a wide range of topics (political activism, health, technology, sports, Twitter-idioms). We find that 70% of users in the sample tweet across ...
Twitter has a Binary Privacy Setting, are Users Aware of How It Works?
CSCWTwitter accounts are public by default, but Twitter gives the option to create protected accounts, where only approved followers can see their tweets. The publicly visible information changes based on the account type and the visibility of tweets also ...
Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles
CHI '11: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsLittle research exists on one of the most common, oldest, and most utilized forms of online social geographic information: the 'location' field found in most virtual community user profiles. We performed the first in-depth study of user behavior with ...
Comments