Skip to main content
Erschienen in: Knowledge and Information Systems 2/2019

26.04.2018 | Regular Paper

Self-learning and embedding based entity alignment

verfasst von: Saiping Guan, Xiaolong Jin, Yuanzhuo Wang, Yantao Jia, Huawei Shen, Zixuan Li, Xueqi Cheng

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Entity alignment aims to identify semantical matchings between entities from different groups. Traditional methods (e.g., attribute comparison-based methods, graph operation-based methods and active learning ones) are usually supervised by labeled data as prior knowledge. Since it is not trivial to label data for training, researchers have then turned to unsupervised methods, and have thus developed similarity-based methods, probabilistic methods, graphical model-based methods, etc. In addition, structure or class information is further explored. As an important part of a knowledge graph, entities contain rich semantical information that can be well learned by knowledge graph embedding methods in low-dimensional vector spaces. However, existing methods for entity alignment have paid little attention to knowledge graph embedding. In this paper, we propose a self-learning and embedding based method for entity alignment, thus called SEEA, to iteratively find semantically aligned entity pairs, which makes full use of semantical information contained in the attributes of entities. Experiments on three realistic datasets and comparison with a few baseline methods validate the effectiveness and merits of the proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Algergawy A, Nayak R, Saake G (2010) Element similarity measures in xml schema matching. Inf Sci 180(24):4975–4998CrossRef Algergawy A, Nayak R, Saake G (2010) Element similarity measures in xml schema matching. Inf Sci 180(24):4975–4998CrossRef
2.
Zurück zum Zitat Arasu A, Götz M, Kaushik R (2010) On active learning of record matching packages. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data (SIGMOD’10), pp 783–794 Arasu A, Götz M, Kaushik R (2010) On active learning of record matching packages. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data (SIGMOD’10), pp 783–794
3.
Zurück zum Zitat Araujo S, Tran DT, de Vries AP, Schwabe D (2015) SERIMI: class-based matching for instance matching across heterogeneous datasets. IEEE Trans Knowl Data Eng 27(5):1397–1440CrossRef Araujo S, Tran DT, de Vries AP, Schwabe D (2015) SERIMI: class-based matching for instance matching across heterogeneous datasets. IEEE Trans Knowl Data Eng 27(5):1397–1440CrossRef
4.
Zurück zum Zitat Bibby J (1974) Axiomatisations of the average and a further generalisation of monotonic sequences. Glasg Math J 15(1):63–65MathSciNetCrossRefMATH Bibby J (1974) Axiomatisations of the average and a further generalisation of monotonic sequences. Glasg Math J 15(1):63–65MathSciNetCrossRefMATH
5.
Zurück zum Zitat Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), pp 39–48 Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), pp 39–48
6.
Zurück zum Zitat Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), pp 2787–2795 Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), pp 2787–2795
7.
Zurück zum Zitat Cai P, Li W, Feng Y, Wang Y, Jia Y (2017) Learning knowledge representation across knowledge graphs. In: AAAI 2017 workshop on knowledge-based techniques for problem solving and reasoning (KnowProS’17)’ Cai P, Li W, Feng Y, Wang Y, Jia Y (2017) Learning knowledge representation across knowledge graphs. In: AAAI 2017 workshop on knowledge-based techniques for problem solving and reasoning (KnowProS’17)’
8.
Zurück zum Zitat Chen M, Tian Y, Yang M, Zaniolo C (2016) Multi-lingual knowledge graph embeddings for cross-lingual knowledge alignment. arXiv preprint arXiv:1611.03954 Chen M, Tian Y, Yang M, Zaniolo C (2016) Multi-lingual knowledge graph embeddings for cross-lingual knowledge alignment. arXiv preprint arXiv:​1611.​03954
9.
Zurück zum Zitat Chen Z, Kalashnikov DV, Mehrotra S (2009) Exploiting context analysis for combining multiple entity resolution systems. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data (SIGMOD’09), pp 207–218 Chen Z, Kalashnikov DV, Mehrotra S (2009) Exploiting context analysis for combining multiple entity resolution systems. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data (SIGMOD’09), pp 207–218
10.
Zurück zum Zitat Cohen WW, Richman J (2002) Learning to match and cluster large high-dimensional data sets for data integration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 475–480 Cohen WW, Richman J (2002) Learning to match and cluster large high-dimensional data sets for data integration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 475–480
11.
Zurück zum Zitat Cook RD, Yin X (2001) Theory & methods: special invited paper: dimension reduction and visualization in discriminant analysis (with discussion). Aust N Z J Stat 43(2):147–199MathSciNetCrossRefMATH Cook RD, Yin X (2001) Theory & methods: special invited paper: dimension reduction and visualization in discriminant analysis (with discussion). Aust N Z J Stat 43(2):147–199MathSciNetCrossRefMATH
12.
Zurück zum Zitat Elfeky MG, Verykios VS, Elmagarmid AK (2002) Tailor: a record linkage toolbox. In: Proceedings of the 18th international conference on data engineering (ICDE’02), pp 17–28 Elfeky MG, Verykios VS, Elmagarmid AK (2002) Tailor: a record linkage toolbox. In: Proceedings of the 18th international conference on data engineering (ICDE’02), pp 17–28
13.
Zurück zum Zitat Feng J, Huang M, Wang M, Zhou M, Hao Y, Zhu X (2016) Knowledge graph embedding by flexible translation. In: Proceedings of the 15th international conference on principles of knowledge representation and reasoning (KR’16), pp 557–560 Feng J, Huang M, Wang M, Zhou M, Hao Y, Zhu X (2016) Knowledge graph embedding by flexible translation. In: Proceedings of the 15th international conference on principles of knowledge representation and reasoning (KR’16), pp 557–560
14.
Zurück zum Zitat Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7(2):179–188 Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Hum Genet 7(2):179–188
15.
Zurück zum Zitat Freund Y, Schapire RE (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780 Freund Y, Schapire RE (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780
16.
Zurück zum Zitat Goldberg Y, Levy O (2014) Word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method, arXiv preprint. arXiv:1402.3722 Goldberg Y, Levy O (2014) Word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method, arXiv preprint. arXiv:​1402.​3722
17.
Zurück zum Zitat Hao Y, Zhang Y, He S, Liu K, Zhao J (2016) A joint embedding method for entity alignment of knowledge bases. In: Proceedings of the 1st China conference on knowledge graph and semantic computing (CCKS’16). Springer, pp 3–14 Hao Y, Zhang Y, He S, Liu K, Zhao J (2016) A joint embedding method for entity alignment of knowledge bases. In: Proceedings of the 1st China conference on knowledge graph and semantic computing (CCKS’16). Springer, pp 3–14
18.
Zurück zum Zitat He W, Feng Y, Zou L, Zhao D (2015) Knowledge base completion using matrix factorization. In: Proceedings of the 17th Asia-Pacific web conference (APWeb’15), pp 256–267 He W, Feng Y, Zou L, Zhao D (2015) Knowledge base completion using matrix factorization. In: Proceedings of the 17th Asia-Pacific web conference (APWeb’15), pp 256–267
19.
Zurück zum Zitat Jenatton R, Roux NL, Bordes A, Obozinski G (2012) A latent factor model for highly multi-relational data. In: Proceedings of the 25th international conference on neural information processing systems (NIPS’12), pp 3167–3175 Jenatton R, Roux NL, Bordes A, Obozinski G (2012) A latent factor model for highly multi-relational data. In: Proceedings of the 25th international conference on neural information processing systems (NIPS’12), pp 3167–3175
20.
Zurück zum Zitat Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics (ACL’15), pp 687–696 Ji G, He S, Xu L, Liu K, Zhao J (2015) Knowledge graph embedding via dynamic mapping matrix. In: Proceedings of the 53rd annual meeting of the association for computational linguistics (ACL’15), pp 687–696
21.
Zurück zum Zitat Ji G, Liu K, He S, Zhao J (2016) Knowledge graph completion with adaptive sparse transfer matrix. In: Proceedings of the 30th AAAI conference on artificial intelligence (AAAI’16), pp 985–991 Ji G, Liu K, He S, Zhao J (2016) Knowledge graph completion with adaptive sparse transfer matrix. In: Proceedings of the 30th AAAI conference on artificial intelligence (AAAI’16), pp 985–991
22.
Zurück zum Zitat Jia Y, Wang Y, Lin H, Jin X, Cheng X (2016) Locally adaptive translation for knowledge graph embedding. In: Proceedings of the 30th AAAI conference on artificial intelligence (AAAI’16), pp 992–998 Jia Y, Wang Y, Lin H, Jin X, Cheng X (2016) Locally adaptive translation for knowledge graph embedding. In: Proceedings of the 30th AAAI conference on artificial intelligence (AAAI’16), pp 992–998
23.
Zurück zum Zitat Jiménez-Ruiz E, Grau BC (2011) Logmap: Logic-based and scalable ontology matching. In: Proceedings of the 10th international conference on the Semantic Web-volume part I (ISWC’11), pp 273–288 Jiménez-Ruiz E, Grau BC (2011) Logmap: Logic-based and scalable ontology matching. In: Proceedings of the 10th international conference on the Semantic Web-volume part I (ISWC’11), pp 273–288
24.
Zurück zum Zitat Lacoste-Julien S, Palla K, Davies A, Kasneci G, Graepel T, Ghahramani Z (2013) Sigma: simple greedy matching for aligning large knowledge bases. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’13), pp 572–580 Lacoste-Julien S, Palla K, Davies A, Kasneci G, Graepel T, Ghahramani Z (2013) Sigma: simple greedy matching for aligning large knowledge bases. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’13), pp 572–580
25.
Zurück zum Zitat Lin H, Wang Y, Jia Y, Xiong J, Zhang P, Cheng X (2015) An ensemble matchers based rank aggregation method for taxonomy matching. In: Proceedings of the 17th Asia-Pacific Web conference (APWeb’15), pp 190–202 Lin H, Wang Y, Jia Y, Xiong J, Zhang P, Cheng X (2015) An ensemble matchers based rank aggregation method for taxonomy matching. In: Proceedings of the 17th Asia-Pacific Web conference (APWeb’15), pp 190–202
26.
Zurück zum Zitat Lin Y, Liu Z, Luan H, Sun M, Rao S, Liu S (2015) Modeling relation paths for representation learning of knowledge bases. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP’15), pp 705–714 Lin Y, Liu Z, Luan H, Sun M, Rao S, Liu S (2015) Modeling relation paths for representation learning of knowledge bases. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP’15), pp 705–714
27.
Zurück zum Zitat Lin Y, Liu Z, Sun M (2016) Knowledge representation learning with entities, attributes and relations. In: Proceedings of the 25th international joint conference on artificial intelligence (IJCAI’16), pp 2866–2872 Lin Y, Liu Z, Sun M (2016) Knowledge representation learning with entities, attributes and relations. In: Proceedings of the 25th international joint conference on artificial intelligence (IJCAI’16), pp 2866–2872
28.
Zurück zum Zitat Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the 29th AAAI conference on artificial intelligence (AAAI’15), pp 2181–2187 Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the 29th AAAI conference on artificial intelligence (AAAI’15), pp 2181–2187
29.
Zurück zum Zitat Marie A, Gal A (2008) Boosting schema matchers. In: Proceedings of the OTM 2008 confederated international conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on the move to meaningful internet systems (OTM’08), pp 283–300 Marie A, Gal A (2008) Boosting schema matchers. In: Proceedings of the OTM 2008 confederated international conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on the move to meaningful internet systems (OTM’08), pp 283–300
30.
Zurück zum Zitat Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems (NIPS’13), pp 3111–3119
31.
Zurück zum Zitat Ngo D, Bellahsene Z (2016) Overview of YAM++(not) yet another matcher for ontology alignment task. Web Semant Sci Serv Agents World Wide Web 41:30–49CrossRef Ngo D, Bellahsene Z (2016) Overview of YAM++(not) yet another matcher for ontology alignment task. Web Semant Sci Serv Agents World Wide Web 41:30–49CrossRef
32.
Zurück zum Zitat Ngomo A-CN, Lyko K (2013) Unsupervised learning of link specifications: Deterministic vs. non-deterministic. In: Proceedings of the 8th international conference on ontology matching-volume 1111 (OM’13), pp 25–36 Ngomo A-CN, Lyko K (2013) Unsupervised learning of link specifications: Deterministic vs. non-deterministic. In: Proceedings of the 8th international conference on ontology matching-volume 1111 (OM’13), pp 25–36
33.
Zurück zum Zitat Nguyen DQ, Sirts K, Qu L, Johnson M (2016) Stranse: a novel embedding model of entities and relationships in knowledge bases. In: Proceedings of the 15th conference of North American chapter of the Association for Computational Linguistics: human language technologies (NAACL-HLT’16), pp 460–466 Nguyen DQ, Sirts K, Qu L, Johnson M (2016) Stranse: a novel embedding model of entities and relationships in knowledge bases. In: Proceedings of the 15th conference of North American chapter of the Association for Computational Linguistics: human language technologies (NAACL-HLT’16), pp 460–466
34.
Zurück zum Zitat Nickel M, Tresp V, Kriegel H-P (2011) A three-way model for collective learning on multi-relational data. In: Proceedings of the 28th international conference on machine learning (ICML’11), pp 809–816 Nickel M, Tresp V, Kriegel H-P (2011) A three-way model for collective learning on multi-relational data. In: Proceedings of the 28th international conference on machine learning (ICML’11), pp 809–816
35.
Zurück zum Zitat Nikolov A, d’Aquin M, Motta E (2012) Unsupervised learning of link discovery configuration. In: Proceedings of the 9th international conference on the Semantic Web: research and applications (ESWC’12), pp 119–133 Nikolov A, d’Aquin M, Motta E (2012) Unsupervised learning of link discovery configuration. In: Proceedings of the 9th international conference on the Semantic Web: research and applications (ESWC’12), pp 119–133
36.
Zurück zum Zitat Peukert E, Massmann S, Koenig K (2010) Comparing similarity combination methods for schema matching. GI Jahrestag 1(175):692–701 Peukert E, Massmann S, Koenig K (2010) Comparing similarity combination methods for schema matching. GI Jahrestag 1(175):692–701
37.
Zurück zum Zitat Ravikumar P, Cohen WW (2004) A hierarchical graphical model for record linkage. In: Proceedings of the 20th conference on uncertainty in artificial intelligence (UAI’04), pp 454–461 Ravikumar P, Cohen WW (2004) A hierarchical graphical model for record linkage. In: Proceedings of the 20th conference on uncertainty in artificial intelligence (UAI’04), pp 454–461
38.
Zurück zum Zitat Saleem K, Bellahsene Z, Hunt E (2008) PORSCHE: performance oriented schema mediation. Inf Syst 33(7):637–657CrossRef Saleem K, Bellahsene Z, Hunt E (2008) PORSCHE: performance oriented schema mediation. Inf Syst 33(7):637–657CrossRef
39.
Zurück zum Zitat Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 269–278 Sarawagi S, Bhamidipaty A (2002) Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02), pp 269–278
40.
Zurück zum Zitat Suchanek FM, Abiteboul S, Senellart P (2011) Paris: probabilistic alignment of relations, instances, and schema. Proc VLDB Endow 5(3):157–168CrossRef Suchanek FM, Abiteboul S, Senellart P (2011) Paris: probabilistic alignment of relations, instances, and schema. Proc VLDB Endow 5(3):157–168CrossRef
41.
Zurück zum Zitat Sun Z, Hu W, Li C (2017) Cross-lingual entity alignment via joint attribute-preserving embedding, arXiv preprint. arXiv:1708.05045 Sun Z, Hu W, Li C (2017) Cross-lingual entity alignment via joint attribute-preserving embedding, arXiv preprint. arXiv:​1708.​05045
42.
Zurück zum Zitat Tekli J, Chbeir R (2012) Minimizing user effort in xml grammar matching. Inf Sci 210:1–40CrossRef Tekli J, Chbeir R (2012) Minimizing user effort in xml grammar matching. Inf Sci 210:1–40CrossRef
43.
Zurück zum Zitat Trouillon T, Welbl J, Riedel S, Gaussier E, Bouchard G (2016) Complex embeddings for simple link prediction. In: Proceedings of the 33rd international conference on machine learning (ICML’16), vol 48, pp 2071–2080 Trouillon T, Welbl J, Riedel S, Gaussier E, Bouchard G (2016) Complex embeddings for simple link prediction. In: Proceedings of the 33rd international conference on machine learning (ICML’16), vol 48, pp 2071–2080
44.
Zurück zum Zitat Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI conference on artificial intelligence (AAAI’14), pp 1112–1119 Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 28th AAAI conference on artificial intelligence (AAAI’14), pp 1112–1119
45.
Zurück zum Zitat Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262MathSciNetCrossRefMATH Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262MathSciNetCrossRefMATH
Metadaten
Titel
Self-learning and embedding based entity alignment
verfasst von
Saiping Guan
Xiaolong Jin
Yuanzhuo Wang
Yantao Jia
Huawei Shen
Zixuan Li
Xueqi Cheng
Publikationsdatum
26.04.2018
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2019
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-018-1191-0

Weitere Artikel der Ausgabe 2/2019

Knowledge and Information Systems 2/2019 Zur Ausgabe