ABSTRACT
Motivated by the need to automatically generate behavior-based security challenges to improve user authentication for web services, we consider the problem of large-scale construction of realistic-looking names to serve as aliases for real individuals. We aim to use these names to construct security challenges, where users are asked to identify their real contacts among a presented pool of names. We seek these look-alike names to preserve name characteristics like gender, ethnicity, and popularity, while being unlinkable back to the source individual, thereby making the real contacts not easily guessable by attackers.
To achive this, we introduce the technique of distributed name embeddings, representing names in a high-dimensional space such that distance between name components reflects the degree of cultural similarity between these strings. We present different approaches to construct name embeddings from contact lists observed at a large web-mail provider, and evaluate their cultural coherence. We demonstrate that name embeddings strongly encode gender and ethnicity, as well as name popularity. We applied this algorithm to generate imitation names in email contact list challenge. Our controlled user study verified that the proposed technique reduced the attacker's success rate to 26.08%, indistinguishable from random guessing, compared to a success rate of 62.16% from previous name generation algorithms.
Finally, we use these embeddings to produce an open synthetic name resource of 1 million names for security applications, constructed to respect both cultural coherence and U.S. census name frequencies.
- Anurag Ambekar, Charles Ward, Jahangir Mohammed, Swapna Male, and Steven Skiena. 2009. Name-ethnicity classification from open sources. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge Discovery and Data Mining. ACM, 49--58. Google ScholarDigital Library
- Yoshua Bengio, Aaron Courville, and Pierre Vincent. 2013. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on 35, 8 (2013), 1798--1828.Google ScholarDigital Library
- Joseph Bonneau, Elie Bursztein, Ilan Caron, Rob Jackson, and Mike Williamson. 2015. Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google. In Proceedings of the 24th International Conference on World Wide Web (WWW '15). ACM, New York, NY, USA, 141--150. https://doi.org/10.1145/2736277.2741691 Google ScholarDigital Library
- Joseph Bonneau, Cormac Herley, Paul C. van Oorschot, and Frank Stajano. 2015. Passwords and the Evolution of Imperfect Authentication. Commun. ACM 58, 7 (June 2015), 78--87. https://doi.org/10.1145/2699390 Google ScholarDigital Library
- Census Bureau. 1990. https://www.census.gov/topics/population/genealogy/data/1990_census/1990_census_namefiles.html. (1990).Google Scholar
- Census Bureau. 2000. https://www.census.gov/topics/population/genealogy/data/2000_surnames.html. (2000).Google Scholar
- Elie Bursztein and Ilan Caron. 2015. https://security.googleblog.com/2015/05/new-research-some-tough-questions-for.html. (2015).Google Scholar
- Mike Campbell. 1996. http://www.behindthename.com. (1996).Google Scholar
- J. Chang, I. Rosenn, L. Backstrom, and C. Marlow. 2010. ePluribus: Ethnicity on social networks. In Proceedings of the International Conference in Weblogs and Social Media (ICWSM). 18--25.Google Scholar
- David Freeman, Sakshi Jain, Markus Dürmuth, Battista Biggio, and Giorgio Giacinto. 2016. Who Are You? A Statistical Approach to Measuring User Authenticity. In 23nd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21-24, 2016. The Internet Society, 1--15. http://www.internetsociety.org/sites/default/files/blogs-media/who-are-you-statistical-approach-measuring-user-authenticity.pdfGoogle ScholarCross Ref
- Ralph Gross and Alessandro Acquisti. 2005. Information revelation and privacy in online social networks. In Proceedings of the 2005 ACM workshop on Privacy in the electronic society. ACM, 71--80. Google ScholarDigital Library
- J. Andrew Harris. 2015. What's in a Name? A Method for Extracting Information about Ethnicity from Names. Political Analysis 23, 2 (2015), 212--224. Google ScholarCross Ref
- Yifan Hu, Emden Gansner, and Stephen Kobourov. 2010. Visualizing graphs and clusters as maps. IEEE Computer Graphics and Applications 30 (2010), 54--66. Google ScholarDigital Library
- Mike Just. 2004. Designing and Evaluating Challenge-Question Systems. IEEE Security & Privacy 2, 5 (2004), 32--39. https://doi.org/10.1109/MSP.2004.80 Google ScholarDigital Library
- Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. 2177--2185.Google Scholar
- P. Mateos, R. Webber, and P. Longley. 2007. The cultural, ethnic and linguistic classification of populations and neighbourhoods using personal names. Technical Report CASA Working Papers 116. Centre for Advanced Spatial Analysis University College London.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google Scholar
- Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J. Niels Rosenquist. 2011. Understanding the Demographics of Twitter Users. ICWSM 11 (2011), 5th.Google Scholar
- open source project. 2013. https://code.google.com/archive/p/word2vec/. (2013).Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) 12 (2014).Google Scholar
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701--710. Google ScholarDigital Library
- Pucktada Treeratpituk and C. Lee Giles. 2012. Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. In Proceedings of AAAI Conference on Artificial Intelligence.Google Scholar
- Laurens Van Der Maaten. 2014. Accelerating t-sne using tree-based algorithms. The Journal of Machine Learning Research 15, 1 (2014), 3221--3245.Google ScholarDigital Library
- David L. Word, Charles D. Coleman, Robert Nunziata, and Robert Kominski. 2008. Demographic aspects of surnames from census 2000. Unpublished manuscript, Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download (2008).Google Scholar
Index Terms
- Generating Look-alike Names For Security Challenges
Recommendations
On assigning place names to geography related web pages
JCDL '05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital librariesIn this paper, we attempt to give spatial semantics to web pages by assigning them place names. The entire assignment task is divided into three sub-problems, namely place name extraction, place name disambiguation and place name assignment. We propose ...
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB
Personal name disambiguation is an important task in social network extraction, evaluation and integration of ontologies, information retrieval, cross-document coreference resolution and word sense disambiguation. We propose an unsupervised method to ...
Towards automatically generating descriptive names for unit tests
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software EngineeringDuring maintenance, developers often need to understand the purpose of a test. One of the most potentially useful sources of information for understanding a test is its name. Ideally, test names are descriptive in that they accurately summarize both ...
Comments