Skip to main content
Erschienen in:
Buchtitelbild

2016 | OriginalPaper | Buchkapitel

Automatic Key Selection for Data Linking

verfasst von : Manel Achichi, Mohamed Ben Ellefi, Danai Symeonidou, Konstantin Todorov

Erschienen in: Knowledge Engineering and Knowledge Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The paper proposes an RDF key ranking approach that attempts to close the gap between automatic key discovery and data linking approaches and thus reduce the user effort in linking configuration. Indeed, data linking tool configuration is a laborious process, where the user is often required to select manually the properties to compare, which supposes an in-depth expert knowledge of the data. Key discovery techniques attempt to facilitate this task, but in a number of cases do not fully succeed, due to the large number of keys produced, lacking a confidence indicator. Since keys are extracted from each dataset independently, their effectiveness for the matching task, involving two datasets, is undermined. The approach proposed in this work suggests to unlock the potential of both key discovery techniques and data linking tools by providing to the user a limited number of merged and ranked keys, well-suited to a particular matching task. In addition, the complementarity properties of a small number of top-ranked keys is explored, showing that their combined use improves significantly the recall. We report our experiments on data from the Ontology Alignment Evaluation Initiative, as well as on real-world benchmark data about music.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. In: Semantic Services, Interoperability and Web Applications, pp. 205–227 (2009) Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. In: Semantic Services, Interoperability and Web Applications, pp. 205–227 (2009)
2.
Zurück zum Zitat Symeonidou, D., Armant, V., Pernelle, N., Saïs, F.: SAKey: scalable almost key discovery in RDF data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 33–49. Springer, Heidelberg (2014) Symeonidou, D., Armant, V., Pernelle, N., Saïs, F.: SAKey: scalable almost key discovery in RDF data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 33–49. Springer, Heidelberg (2014)
3.
Zurück zum Zitat Atencia, M., David, J., Euzenat, J.: Data interlinking through robust linkkey extraction. In: ECAI, pp. 15–20 (2014) Atencia, M., David, J., Euzenat, J.: Data interlinking through robust linkkey extraction. In: ECAI, pp. 15–20 (2014)
4.
Zurück zum Zitat Soru, T., Marx, E., Ngomo, A.N.: ROCKER: a refinement operator for key discovery. WWW 2015, 1025–1033 (2015)CrossRef Soru, T., Marx, E., Ngomo, A.N.: ROCKER: a refinement operator for key discovery. WWW 2015, 1025–1033 (2015)CrossRef
5.
Zurück zum Zitat Pernelle, N., Saïs, F., Symeonidou, D.: An automatic key discovery approach for data linking. J. Web Semant. 23, 16–30 (2013)CrossRef Pernelle, N., Saïs, F., Symeonidou, D.: An automatic key discovery approach for data linking. J. Web Semant. 23, 16–30 (2013)CrossRef
6.
Zurück zum Zitat Atencia, M., David, J., Scharffe, F.: Keys and pseudo-keys detection for web datasets cleansing and interlinking. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 144–153. Springer, Heidelberg (2012)CrossRef Atencia, M., David, J., Scharffe, F.: Keys and pseudo-keys detection for web datasets cleansing and interlinking. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 144–153. Springer, Heidelberg (2012)CrossRef
7.
Zurück zum Zitat Symeonidou, D., Sanchez, I., Croitoru, M., Neveu, P., Pernelle, N., Saïs, F., Roland-Vialaret, A., Buche, P., Muljarto, A., Schneider, R.: ICCS, pp. 222–236 (2016) Symeonidou, D., Sanchez, I., Croitoru, M., Neveu, P., Pernelle, N., Saïs, F., Roland-Vialaret, A., Buche, P., Muljarto, A., Schneider, R.: ICCS, pp. 222–236 (2016)
8.
Zurück zum Zitat Ngonga Ngomo, A.-C., Lyko, K.: EAGLE: efficient active learning of link specifications using genetic programming. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 149–163. Springer, Heidelberg (2012)CrossRef Ngonga Ngomo, A.-C., Lyko, K.: EAGLE: efficient active learning of link specifications using genetic programming. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 149–163. Springer, Heidelberg (2012)CrossRef
9.
Zurück zum Zitat Christen, P.: Febrl: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: SIGKDD, pp. 1065–1068. ACM (2008) Christen, P.: Febrl: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: SIGKDD, pp. 1065–1068. ACM (2008)
10.
Zurück zum Zitat Isele, R., Jentzsch, A., Bizer, C.: Efficient multidimensional blocking for link discovery without losing recall. In: WebDB (2011) Isele, R., Jentzsch, A., Bizer, C.: Efficient multidimensional blocking for link discovery without losing recall. In: WebDB (2011)
11.
Zurück zum Zitat Ngomo, A.-C.N., Lehmann, J., Auer, S., Höffner, K.: Raven-active learning of link specifications. In: International Conference on Ontology Matching, pp. 25–36 (2011). CEUR-WS.org Ngomo, A.-C.N., Lehmann, J., Auer, S., Höffner, K.: Raven-active learning of link specifications. In: International Conference on Ontology Matching, pp. 25–36 (2011). CEUR-WS.​org
12.
Zurück zum Zitat Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Semantic Web: Ontology and Knowledge Base Enabled Tools, Services, and Applications, vol. 169 (2013) Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Semantic Web: Ontology and Knowledge Base Enabled Tools, Services, and Applications, vol. 169 (2013)
13.
Zurück zum Zitat Nentwig, M., Hartung, M., Ngomo, A.-C.N., Rahm, E.: A survey of current link discovery frameworks. Semantic Web, pp. 1–18 (2015, preprint) Nentwig, M., Hartung, M., Ngomo, A.-C.N., Rahm, E.: A survey of current link discovery frameworks. Semantic Web, pp. 1–18 (2015, preprint)
14.
Zurück zum Zitat Jentzsch, A., Isele, R., Bizer, C.: Silk-generating RDF links while publishing or consuming linked data. In: ISWC, Citeseer (2010) Jentzsch, A., Isele, R., Bizer, C.: Silk-generating RDF links while publishing or consuming linked data. In: ISWC, Citeseer (2010)
15.
Zurück zum Zitat Ngomo, A.N., Auer, S.: LIMES - a time-efficient approach for large-scale link discovery on the web of data. In: IJCAI, pp. 2312–2317 (2011) Ngomo, A.N., Auer, S.: LIMES - a time-efficient approach for large-scale link discovery on the web of data. In: IJCAI, pp. 2312–2317 (2011)
16.
Zurück zum Zitat Shao, C., Hu, L., Li, J., Wang, Z., Chung, T.L., Xia, J.: RiMOM-IM: a novel iterative framework for instance matching. J. Comput. Sci. Technol. 31(1), 185–197 (2016)MathSciNetCrossRef Shao, C., Hu, L., Li, J., Wang, Z., Chung, T.L., Xia, J.: RiMOM-IM: a novel iterative framework for instance matching. J. Comput. Sci. Technol. 31(1), 185–197 (2016)MathSciNetCrossRef
17.
Zurück zum Zitat Jiménez-Ruiz, E., Cuenca Grau, B.: LogMap: logic-based and scalable ontology matching. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 273–288. Springer, Heidelberg (2011)CrossRef Jiménez-Ruiz, E., Cuenca Grau, B.: LogMap: logic-based and scalable ontology matching. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 273–288. Springer, Heidelberg (2011)CrossRef
18.
Zurück zum Zitat Nikolov, A., Uren, V.S., Motta, E., De Roeck, A.: Integration of semantically annotated data by the KnoFuss architecture. In: Gangemi, A., Euzenat, J. (eds.) EKAW 2008. LNCS (LNAI), vol. 5268, pp. 265–274. Springer, Heidelberg (2008)CrossRef Nikolov, A., Uren, V.S., Motta, E., De Roeck, A.: Integration of semantically annotated data by the KnoFuss architecture. In: Gangemi, A., Euzenat, J. (eds.) EKAW 2008. LNCS (LNAI), vol. 5268, pp. 265–274. Springer, Heidelberg (2008)CrossRef
19.
Zurück zum Zitat Araujo, S., Hidders, J., Schwabe, D., De Vries, A.P.: Serimi-resource description similarity, RDF instance matching, interlinking. arXiv preprint arXiv:1107.1104 (2011) Araujo, S., Hidders, J., Schwabe, D., De Vries, A.P.: Serimi-resource description similarity, RDF instance matching, interlinking. arXiv preprint arXiv:​1107.​1104 (2011)
20.
Zurück zum Zitat Rong, S., Niu, X., Xiang, E.W., Wang, H., Yang, Q., Yu, Y.: A machine learning approach for instance matching based on similarity metrics. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 460–475. Springer, Heidelberg (2012)CrossRef Rong, S., Niu, X., Xiang, E.W., Wang, H., Yang, Q., Yu, Y.: A machine learning approach for instance matching based on similarity metrics. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 460–475. Springer, Heidelberg (2012)CrossRef
21.
Zurück zum Zitat Kejriwal, M., Miranker, D.P.: Semi-supervised instance matching using boosted classifiers. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 388–402. Springer, Heidelberg (2015)CrossRef Kejriwal, M., Miranker, D.P.: Semi-supervised instance matching using boosted classifiers. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 388–402. Springer, Heidelberg (2015)CrossRef
22.
Zurück zum Zitat Lesnikova, T., David, J., Euzenat, J.: Interlinking english, Chinese RDF data using babelnet. In: Proceedings of the 2015 ACM Symposium on Document Engineering, pp. 39–42. ACM (2015) Lesnikova, T., David, J., Euzenat, J.: Interlinking english, Chinese RDF data using babelnet. In: Proceedings of the 2015 ACM Symposium on Document Engineering, pp. 39–42. ACM (2015)
23.
Zurück zum Zitat Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Transactions on knowledge and data engineering 25(1), 158–176 (2013)CrossRef Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Transactions on knowledge and data engineering 25(1), 158–176 (2013)CrossRef
24.
Zurück zum Zitat Achichi, M., Bailly, R., Cecconi, C., Destandau, M., Todorov, K., Troncy, R.: Doremus: doing reusable musical data. In: ISWC PD (2015) Achichi, M., Bailly, R., Cecconi, C., Destandau, M., Todorov, K., Troncy, R.: Doremus: doing reusable musical data. In: ISWC PD (2015)
Metadaten
Titel
Automatic Key Selection for Data Linking
verfasst von
Manel Achichi
Mohamed Ben Ellefi
Danai Symeonidou
Konstantin Todorov
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49004-5_1