research-article

Generating Look-alike Names For Security Challenges

Authors:
Shuchu Han

NEC Labs America, Princeton, NJ & Yahoo! Research, New York, NY, USA

NEC Labs America, Princeton, NJ & Yahoo! Research, New York, NY, USA
View Profile

,
Yifan Hu

Yahoo! Research, New York, NY, USA

Yahoo! Research, New York, NY, USA
View Profile

,
Steven Skiena

Stony Brook University, Stony Brook, NY, USA

Stony Brook University, Stony Brook, NY, USA
View Profile

,
Baris Coskun

Amazon AI & Yahoo! Research, New York, NY, USA

Amazon AI & Yahoo! Research, New York, NY, USA
View Profile

,
Meizhu Liu

Yahoo! Research, New York, NY, USA

Yahoo! Research, New York, NY, USA
View Profile

,
Hong Qin

Stony Brook University, Stony Brook, NY, USA

Stony Brook University, Stony Brook, NY, USA
View Profile

,
Jaime Perez

Yahoo! Research, Sunnyvale, CA, USA

Yahoo! Research, Sunnyvale, CA, USA
View Profile

AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and SecurityNovember 2017Pages 57–67https://doi.org/10.1145/3128572.3140441

Published:03 November 2017Publication History

AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

Pages 57–67

ABSTRACT

Motivated by the need to automatically generate behavior-based security challenges to improve user authentication for web services, we consider the problem of large-scale construction of realistic-looking names to serve as aliases for real individuals. We aim to use these names to construct security challenges, where users are asked to identify their real contacts among a presented pool of names. We seek these look-alike names to preserve name characteristics like gender, ethnicity, and popularity, while being unlinkable back to the source individual, thereby making the real contacts not easily guessable by attackers.

To achive this, we introduce the technique of distributed name embeddings, representing names in a high-dimensional space such that distance between name components reflects the degree of cultural similarity between these strings. We present different approaches to construct name embeddings from contact lists observed at a large web-mail provider, and evaluate their cultural coherence. We demonstrate that name embeddings strongly encode gender and ethnicity, as well as name popularity. We applied this algorithm to generate imitation names in email contact list challenge. Our controlled user study verified that the proposed technique reduced the attacker's success rate to 26.08%, indistinguishable from random guessing, compared to a success rate of 62.16% from previous name generation algorithms.

Finally, we use these embeddings to produce an open synthetic name resource of 1 million names for security applications, constructed to respect both cultural coherence and U.S. census name frequencies.

References

Anurag Ambekar, Charles Ward, Jahangir Mohammed, Swapna Male, and Steven Skiena. 2009. Name-ethnicity classification from open sources. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge Discovery and Data Mining. ACM, 49--58. Google ScholarDigital Library
Yoshua Bengio, Aaron Courville, and Pierre Vincent. 2013. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on 35, 8 (2013), 1798--1828.Google ScholarDigital Library
Joseph Bonneau, Elie Bursztein, Ilan Caron, Rob Jackson, and Mike Williamson. 2015. Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google. In Proceedings of the 24th International Conference on World Wide Web (WWW '15). ACM, New York, NY, USA, 141--150. https://doi.org/10.1145/2736277.2741691 Google ScholarDigital Library
Joseph Bonneau, Cormac Herley, Paul C. van Oorschot, and Frank Stajano. 2015. Passwords and the Evolution of Imperfect Authentication. Commun. ACM 58, 7 (June 2015), 78--87. https://doi.org/10.1145/2699390 Google ScholarDigital Library
Census Bureau. 1990. https://www.census.gov/topics/population/genealogy/data/1990_census/1990_census_namefiles.html. (1990).Google Scholar
Census Bureau. 2000. https://www.census.gov/topics/population/genealogy/data/2000_surnames.html. (2000).Google Scholar
Elie Bursztein and Ilan Caron. 2015. https://security.googleblog.com/2015/05/new-research-some-tough-questions-for.html. (2015).Google Scholar
Mike Campbell. 1996. http://www.behindthename.com. (1996).Google Scholar
J. Chang, I. Rosenn, L. Backstrom, and C. Marlow. 2010. ePluribus: Ethnicity on social networks. In Proceedings of the International Conference in Weblogs and Social Media (ICWSM). 18--25.Google Scholar
David Freeman, Sakshi Jain, Markus Dürmuth, Battista Biggio, and Giorgio Giacinto. 2016. Who Are You? A Statistical Approach to Measuring User Authenticity. In 23nd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21-24, 2016. The Internet Society, 1--15. http://www.internetsociety.org/sites/default/files/blogs-media/who-are-you-statistical-approach-measuring-user-authenticity.pdfGoogle ScholarCross Ref
Ralph Gross and Alessandro Acquisti. 2005. Information revelation and privacy in online social networks. In Proceedings of the 2005 ACM workshop on Privacy in the electronic society. ACM, 71--80. Google ScholarDigital Library
J. Andrew Harris. 2015. What's in a Name? A Method for Extracting Information about Ethnicity from Names. Political Analysis 23, 2 (2015), 212--224. Google ScholarCross Ref
Yifan Hu, Emden Gansner, and Stephen Kobourov. 2010. Visualizing graphs and clusters as maps. IEEE Computer Graphics and Applications 30 (2010), 54--66. Google ScholarDigital Library
Mike Just. 2004. Designing and Evaluating Challenge-Question Systems. IEEE Security & Privacy 2, 5 (2004), 32--39. https://doi.org/10.1109/MSP.2004.80 Google ScholarDigital Library
Omer Levy and Yoav Goldberg. 2014. Neural Word Embedding as Implicit Matrix Factorization. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. 2177--2185.Google Scholar
P. Mateos, R. Webber, and P. Longley. 2007. The cultural, ethnic and linguistic classification of populations and neighbourhoods using personal names. Technical Report CASA Working Papers 116. Centre for Advanced Spatial Analysis University College London.Google Scholar
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR.Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google Scholar
Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J. Niels Rosenquist. 2011. Understanding the Demographics of Twitter Users. ICWSM 11 (2011), 5th.Google Scholar
open source project. 2013. https://code.google.com/archive/p/word2vec/. (2013).Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) 12 (2014).Google Scholar
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 701--710. Google ScholarDigital Library
Pucktada Treeratpituk and C. Lee Giles. 2012. Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. In Proceedings of AAAI Conference on Artificial Intelligence.Google Scholar
Laurens Van Der Maaten. 2014. Accelerating t-sne using tree-based algorithms. The Journal of Machine Learning Research 15, 1 (2014), 3221--3245.Google ScholarDigital Library
David L. Word, Charles D. Coleman, Robert Nunziata, and Robert Kominski. 2008. Demographic aspects of surnames from census 2000. Unpublished manuscript, Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download (2008).Google Scholar

Index Terms

Generating Look-alike Names For Security Challenges
1. Information systems
  1. World Wide Web
    1. Web applications
      1. Internet communications tools
        Email
    2. Web mining
      1. Data extraction and integration
2. Security and privacy
  1. Security services
    1. Authentication
  2. Software and application security
    1. Web application security

Recommendations

On assigning place names to geography related web pages
JCDL '05: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries

In this paper, we attempt to give spatial semantics to web pages by assigning them place names. The entire assignment task is divided into three sub-problems, namely place name extraction, place name disambiguation and place name assignment. We propose ...
Read More
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB

Personal name disambiguation is an important task in social network extraction, evaluation and integration of ontologies, information retrieval, cross-document coreference resolution and word sense disambiguation. We propose an unsupervised method to ...
Read More
Towards automatically generating descriptive names for unit tests
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

During maintenance, developers often need to understand the purpose of a test. One of the most potentially useful sources of information for understanding a test is its name. Ideally, test names are descriptive in that they accurately summarize both ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security
November 2017
140 pages
ISBN:9781450352024
DOI:10.1145/3128572
General Chair:
Bhavani Thuraisingham
University of Texas at Dallas, USA
,
Program Chairs:
Battista Biggio
Pluribus One and University of Cagliari, Italy
,
David Mandell Freeman
Facebook Inc., USA
,
Brad Miller
Google Inc., USA
,
Arunesh Sinha
University of Michigan, Ann Arbor, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
name embeddings
security challenges
user authentication
Qualifiers
- research-article
Conference

Acceptance Rates
AISec '17 Paper Acceptance Rate11of36submissions,31%Overall Acceptance Rate94of231submissions,41%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 534
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Generating Look-alike Names For Security Challenges

AISec '17: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

On assigning place names to geography related web pages

AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB

Towards automatically generating descriptive names for unit tests