Skip to main content
Top
Published in:
Cover of the book

2018 | OriginalPaper | Chapter

Deep Learning Based Approach for Entity Resolution in Databases

Authors : Nihel Kooli, Robin Allesiardo, Erwan Pigneul

Published in: Intelligent Information and Database Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper proposes a Deep Neural Networks (DNN) based approach for entity resolution in databases. This approach is mainly based on a record linkage process which aims to detect records that refer to the same entity. First, record pairs are represented by their word embedding using an N-gram embedding based method. Then, they are classified into matching or unmatching pairs using a DNN model. Three DNN architectures: Multi-Layer Perceptron, Long Short Term Memory networks and Convolutional Neural Networks are investigated and compared for this purpose. The approach is experimented on two databases. The results exceed \(97\%\) for recall and \(96\%\) for precision. The comparison with similarity measure and classical classifier based approaches shows a significant improvement in the results on the two databases.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Kooli, N.: Data matching for entity recognition in OCRed documents. Thesis defense, Lorraine university (2016) Kooli, N.: Data matching for entity recognition in OCRed documents. Thesis defense, Lorraine university (2016)
2.
go back to reference Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRef Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRef
4.
go back to reference Lee, M.L., Ling, T.W., Low, W.L.: IntelliClean: a knowledge-based intelligent data cleaner. In: Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 290–294 (2000) Lee, M.L., Ling, T.W., Low, W.L.: IntelliClean: a knowledge-based intelligent data cleaner. In: Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 290–294 (2000)
5.
go back to reference Fellegi, I., Sunter, A.: A theory for record linkage. J. Am. Stat. Assoc. 64, 1183–1210 (1969)CrossRefMATH Fellegi, I., Sunter, A.: A theory for record linkage. J. Am. Stat. Assoc. 64, 1183–1210 (1969)CrossRefMATH
6.
go back to reference Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48 (2003) Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48 (2003)
7.
go back to reference Tejada, S., Knoblock, C. A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 350–359 (2002) Tejada, S., Knoblock, C. A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 350–359 (2002)
8.
go back to reference Gottapua, R.D., Daglia, C., Ali, B.: Entity resolution using convolutional neural network. In: Procedia Computer Science, vol. 95, pp. 153–158. Elsevier (2016) Gottapua, R.D., Daglia, C., Ali, B.: Entity resolution using convolutional neural network. In: Procedia Computer Science, vol. 95, pp. 153–158. Elsevier (2016)
11.
go back to reference Bilenko, M.: Adaptive blocking: learning to scale up record linkage. In: Proceedings of the 6th IEEE International Conference on Data Mining, pp. 87–96 (2006) Bilenko, M.: Adaptive blocking: learning to scale up record linkage. In: Proceedings of the 6th IEEE International Conference on Data Mining, pp. 87–96 (2006)
12.
go back to reference Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017) Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
13.
go back to reference Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386–408 (1958)CrossRef Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386–408 (1958)CrossRef
14.
go back to reference Collobert, R.: Deep learning for efficient discriminative parsing. In: 21st International Conference on Artificial Intelligence and Statistics, pp. 224–232 (2011) Collobert, R.: Deep learning for efficient discriminative parsing. In: 21st International Conference on Artificial Intelligence and Statistics, pp. 224–232 (2011)
15.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. In: Neural computation (1997) Hochreiter, S., Schmidhuber, J.: Long short-term memory. In: Neural computation (1997)
16.
go back to reference Kingma, D.P., Ba, J.: Distributed representations for biological sequence analysis. In: Data and Text Mining in Biomedical Informatics, abs/1412.6980 (2016) Kingma, D.P., Ba, J.: Distributed representations for biological sequence analysis. In: Data and Text Mining in Biomedical Informatics, abs/1412.6980 (2016)
17.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 - NIPS (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 - NIPS (2012)
18.
go back to reference Yih, W., Meek, C.: Learning vector representations for similarity measures. Microsoft Technical Report MSR-TR-2010-139 (2010) Yih, W., Meek, C.: Learning vector representations for similarity measures. Microsoft Technical Report MSR-TR-2010-139 (2010)
Metadata
Title
Deep Learning Based Approach for Entity Resolution in Databases
Authors
Nihel Kooli
Robin Allesiardo
Erwan Pigneul
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-75420-8_1

Premium Partner