Skip to main content
Top

2019 | OriginalPaper | Chapter

Name2Vec: Personal Names Embeddings

Authors : Jeremy Foxcroft, Adrian d’Alessandro, Luiza Antonie

Published in: Advances in Artificial Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Predicting if two names refer to the same entity is an important task for many domains, such as information retrieval, record linkage and data integration. In this paper, we propose to create name-embeddings by employing a Doc2Vec methodology, where each name is viewed as a document and each letter in the name is considered a word. Our hypothesis is that representing names as documents, with letters as words, will help capture the internal structure of names and relationships among letters. We present and discuss an experimental study where we explore the effect of various parameters, and we assess the stability of the models built for the embedding of names. Our results show that the new proposed method can predict with high accuracy when a pair of names matches.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
This is due to the fact that some of the vectors could have negative values as well.
 
Literature
2.
go back to reference Antonie, L., Inwood, K., Lizotte, D.J., Ross, J.A.: Tracking people over time in 19th century Canada for longitudinal analysis. Mach. Learn. 95(1), 129–146 (2014)MathSciNetCrossRef Antonie, L., Inwood, K., Lizotte, D.J., Ross, J.A.: Tracking people over time in 19th century Canada for longitudinal analysis. Mach. Learn. 95(1), 129–146 (2014)MathSciNetCrossRef
3.
go back to reference Carvalho, V.R., Kiran, Y., Borthwick, A.: The Intelius nickname collection: quantitative analyses from billions of public records. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 607–610 (2012) Carvalho, V.R., Kiran, Y., Borthwick, A.: The Intelius nickname collection: quantitative analyses from billions of public records. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 607–610 (2012)
4.
go back to reference Christen, P.: A comparison of personal name matching: techniques and practical issues. In: Proceedings of IEEE International Conference on Data Mining - Workshops, pp. 290–294 (2006) Christen, P.: A comparison of personal name matching: techniques and practical issues. In: Proceedings of IEEE International Conference on Data Mining - Workshops, pp. 290–294 (2006)
5.
go back to reference Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003) Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: KDD Workshop on Data Cleaning and Object Consolidation, vol. 3, pp. 73–78 (2003)
6.
go back to reference Jaro, M.A.: Probabilistic linkage of large public health data files. Stat. Med. 14(5–7), 491–498 (1995)CrossRef Jaro, M.A.: Probabilistic linkage of large public health data files. Stat. Med. 14(5–7), 491–498 (1995)CrossRef
7.
go back to reference Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014) Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
8.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
9.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
11.
go back to reference Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50, May 2010 Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50, May 2010
12.
go back to reference Sim, A., Borthwick, A.: Record2Vec: unsupervised representation learning for structured records. In: IEEE International Conference on Data Mining, ICDM 2018, Singapore, 17–20 November 2018, pp. 1236–1241 (2018) Sim, A., Borthwick, A.: Record2Vec: unsupervised representation learning for structured records. In: IEEE International Conference on Data Mining, ICDM 2018, Singapore, 17–20 November 2018, pp. 1236–1241 (2018)
13.
go back to reference Sukharev, J., Zhukov, L., Popescul, A.: Parallel corpus approach for name matching in record linkage. In: Proceedings of IEEE International Conference on Data Mining, ICDM, pp. 995–1000 (2014) Sukharev, J., Zhukov, L., Popescul, A.: Parallel corpus approach for name matching in record linkage. In: Proceedings of IEEE International Conference on Data Mining, ICDM, pp. 995–1000 (2014)
Metadata
Title
Name2Vec: Personal Names Embeddings
Authors
Jeremy Foxcroft
Adrian d’Alessandro
Luiza Antonie
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-18305-9_52

Premium Partner